DATA VISUALIZATION

How to assign different colors to different conditions in Excel graphs

This post was originally published on May 12, 2014.

Excel is a great program to use if you don’t require complicated graphs and if you use their default formatting (which you shouldn’t!). However, if you need more advanced formatting, making graphs in Excel can become complicated. It’s not impossible though — there are ways to manipulate Excel to create beautiful, custom charts. 

In this tutorial, I describe how to conditionally format graphs in Excel. Conditional formatting is useful when you have two or more conditions that you want to format differently. For example, if you have data from a patient group and a control group and you want to display their data in different colors, sizes, or shapes. Excel does let you format data points individually, but applying the same format to every data point in each condition can quickly become labor intensive. Here's an example where two groups of participants are labeled with different colors:

formatted_scatter_plot.png

STEP 1: ORGANIZE YOUR DATA

To begin, organize all your data in columns. The first column should contain dummy codes for your different conditions. For example, if you have three conditions, you’ll use the numbers 1,2, and 3 to represent your conditions. The remaining columns should contain your variables that you plan to plot — you should have two columns if you want to make a scatter chart and one column if you are making a bar chart or something similar. In this example, I have two groups: a control group, which I dummy coded with 1, and a patient group, which I dummy coded with 2. Each subject has a response time that is listed in column B:

organized_data.png

 

STEP 2: SEPARATE DATA ACCORDING TO CONDITION TYPE

This step includes the meat of the tutorial. First, create new columns for each condition type and each variable (if you have 2 conditions and 1 variable, you’ll need 2 new columns and if you have 3 conditions and 2 variables, you’ll need 6 new columns). In each column, type an IF statement with the following parameters:

=IF(condition = dummy code, variable, NA() )

where "condition" is the cell containing your dummy code, "dummy code" is the actual number of your dummy coding (this number will change in each column), and "variable" is the cell containing your data. So in our example, we’ll use the following IF statements:

In the first new column:    =IF(A6=1,B6,NA())

In the second new column:    =IF(A6=2,B6,NA())

Finally, apply the formatting to all your data by highlighting the cells containing your IF statements and dragging the blue box in the lower righthand corner all the way down to the end of your data.

separated_data.png

STEP 3: CREATE A GRAPH

At this point, you can create whatever type of graph you need — this conditional formatting technique can be used to create a wide variety of graphs. In this example, I created a simple bar chart to visualize subjects’ response times according to condition type.

To create a bar chart, click on the charts tab in the Excel ribbon. Highlight the data in the new columns you just created (the columns with the #N/A values), click on the Bar icon, and select clustered bar chart. Now when you click on a datapoint in the graph, it will highlight all of the other datapoints that belong to the same condition. You can format the datapoints in each condition separately. 

unformatted_bar_graph.png

To make the spacing of the bars uniform, right click on the data series and select format data series. Change the overlap to 100% and the gap width to 100%. To finish formatting, I removed the gridlines, added a title and and x axis label, changed the typeface of the axes and labels to Avenir, changed the color of the bars to a neutral gray for the controls and a teal for the patients, and removed all shadows and other special effects. I also removed the y-axis labels since the subject number does not matter in this case. Here’s the finished graph:

formatted_graph.png

You can use the same conditional formatting technique to create scatter plots too. Here's an example of how to set up the data:

scatterplot_data.png

And here's what the formatted scatter plot looks like:

formatted_scatter_plot.png

How to make shaded error bands in Excel

This tutorial describes how to create error bands or confidence intervals in line graphs using Excel for Mac. Excel has a built-in capability to add error bars to individual time points in a time series, but doing so can make the graph look messy and can conceal the overall shape or trend of the time series’ error. Shaded error bands provide good estimates of uncertainty without distracting from the time series. Creating error bands in Excel isn’t as straight forward as creating individual error bars — but never fear, this tutorial will show you how.

Here’s an sneak-peek of a two time series and their associated shaded error bands:

before_and_after_error_bands.png

STEP 1: FORMAT YOUR DATA

First, you’ll need to organize your data according to the following format:

formatted_data.png

The rows should contain the averages and uncertainty measurements associated with each condition and the columns should contain measurements over time. In the example above, I use standard error but you could also use a confidence interval, standard deviation, variance, or any other measurement of uncertainty.

Below the rows containing the averages and standard deviations, we will add additional rows for the upper and lower bound of each error band.

STEP 2: FIND THE UPPER AND LOWER BOUNDS OF EACH ERROR BAND

To find the upper bound of an error band, simply add the error to the average. So in our example, I added the average (cell C2) and the standard error (cell c4).

Similarly, subtract the error from the average to find the lower bound. In this example, I subtracted the standard error (cell c4) from the average (cell C2).

Find the upper and lower error bound for every time series in your data. Finally, apply the formatting to the other columns by highlighting the cells containing the upper and lower error bounds, and dragging the blue square on the bottom right of the box across the rest of your data columns.

error_band_calculation.png

STEP 3: GRAPH THE TIME SERIES

Select the time series averages and then click the option key and select the error band data too:

selected_cells.png

 Create a line graph by clicking on the Charts tab in the Excel ribbon, clicking the Line icon under the Insert Chart area, and selecting the Marked Line plot. Switch the rows & columns of the chart by clicking the column button (the icon with the table and highlighted column)  in the data section of the Charts ribbon. Next, remove the time series associated with the standard error by clicking on the lines and pressing delete. Your graph should look similar to this:

initial_graph.png

STEP 4: ADD THE ERROR BANDS

Right click on one of the lines associated with an upper or lower error band bound and select Change series chart type. Click the Area icon under the Change Chart Type section and select the Area plot. Repeat this step for every upper and lower bound. Now your graph should look like this:

graph_area_graphs.png

For every lower bound:

  •  change the fill to white
  • change the line color to white
  • remove the shadow

For every upper bound:

  •  change the line color to “no fill”
  • remove the shadow 
  • set the transparency to 50% (shape fill -> more fill colors... -> move the opacity slider to 50%)
  • change the fill color to a lighter version of your time series color

Now your graph should look like this:

unformatted_error_bands.png

STEP 5: FORMAT THE GRAPH

Finally, remove the legend labels associated with the error bounds by selecting each label and pressing delete. Remove the grid lines by selecting a grid line and pressing delete. Format the graph to your liking. Here is an example of a fully formatted graph: 

formatted_graph.png