Using the Histogram Tool from the Analysis ToolPak in Excel 2007 and Excel 2010
Hi and welcome to the last in my series on the Analysis ToolPak which is an add-in available in both Excel 2007 and Excel 2010. The ToolPak allows you to perform sophisticated statistical and mathematical analysis on your data using a number of powerful tools.
Today we will look at the Histogram Tool. This tool allows you to analyse your data and show in chart form how your data is distributed. In today’s example, we will again look at the results that my (fictional) students obtained in their recent exams. Excel 2007 and Excel 2010 allow you to define the boundary values or bin ranges (in my example, these are grade boundaries). You can also calculate cumulative percentages and order the histogram from the highest value to the lowest. The figure below shows the results expressed as a histogram.
I have a hub that covers adding the Analysis ToolPak to both Excel 2007 and Excel 2010 and also discusses how to make the ToolPak visible should it be already selected but not showing in Excel. That hub can be found here:
Creating the boundary values or bin values to be used in the Histogram in Excel 2007 and Excel 2010
Before we create our histogram we need to consider the boundaries we will use (referred to as bin values in Excel). As I am considering my students results, I will use the following grade boundaries:
As you can see from the above figure, there are six boundaries and only five grades. This is because, an E is 0 – 30, a D is 30 – 50 and so on, with an A being 90 to 100. Now that we have the boundaries defined as well as their labels (contained in the column headed Grades) we can begin to construct our Histogram.
Creating a Histogram in Excel 2007 and Excel 2010
The next step is to now create the Histogram itself. This tool is available via the Data Analysis button which is part of the Analysis group on the Data tab.
- Click the Data Analysis button and select Histogram
- The Histogram dialogue box will open
- The Input Range is the range containing your data (in this case the marks my students received in their test)
- The Bin Range is the boundaries you wish to use to separate your data (if you don’t define boundaries, Excel will separate your data evenly and create them for you)
- If you use labels at the top of your columns, click the Labels option
- Configure your Output Range, or create a New Worksheet or Workbook depending on your preference
- Pareto (or sorted Histogram) will sort your data from most frequent to least frequent
- Should you need to include the cumulative percentage in your histogram, select this option
Finally, and VERY IMPORTANTLY select Chart Output or Excel 2007 / Excel 2010 will not create a chart for you (it will instead just create the table headed Bin and Frequency without the histogram itself)
The above figure shows the resulting histogram. As you can see we need to tidy this up but we have a nice bell curve distribution of results. First:
- Adjust the purple and blue boxes to exclude the first and last row of the results as they do not contain any data, just the bottom and top boundary values
- Next, change the chart title to something more meaningful (click on the title and rename it)
- Click and rename the Horizontal Axis to something more useful
- Select Series1 and delete it as we have only one data series so it is not giving us any useful additional information
- Right click on the chart and choose Select Data
- Click Edit under Horizontal (Category) Axis Labels
- Select the section of the column containing your labels
- Click OK twice to return to your chart
You will now have your completed chart:
If you are require further information regarding creating charts in Excel 2007, I have a hub which goes into much more detail around creating charts and graphs which can be found here:
The Histogram Tool which is part of the Analysis ToolPak in Excel 2007 and Excel 2010 is a very useful tool for illustrating how your data is distributed across configurable boundaries (called bin ranges by Excel).
In my example, I took the results of a recent exam and used the tool to illustrate how the results were distributed across the grades from E to A. The histogram can also show the data with the bin or boundaries arranged from highest to lowest (Pareto) and you can also use cumulative percentages too should your data require it.
I have a number of other hubs illustrating how to use a number of the other statistical and mathematical tools, which include:
Correlation: is a relationship between two variables and is either positive (both variables go up), negative (one goes up, the other goes down) or no relationship (the variables have no relationship). I investigate the correlation between pie sales and daily temperatures in my hub:
Sampling: allows you to take a random sample from a larger population and perform analysis on this sample. My example in this hub is the creation of lottery numbers randomly chosen from the pool of possible numbers:
Regression: utilising this tool, you can visually summarise the relationship between variables. You can then predict variables based on the known results of other variables. I investigate if there is a relationship between Phosphate and Nitrogen concentrations of fish mortality in my hub:
Rank and Percentile: permits the ranking of your data and the assigning of percentiles to that data. In my example, my students sitting an exam are assigned grades based on their rank in the class.
Moving Average: this tool is used to detect trends in your data. I look at my hub traffic over time and look to see if the data is trending upwards or not. The tool creates a trend line which will illustrate a trend if one is present.
Many thanks for reading; I do hope that you have found it both informative and useful. Please feel free to leave any comments you may have below.