How to Analyze a Statistical Survey: Standard Deviation, Outliers, and More
It's Analyzing Time!
Now that you have your data, It's time to put it to use. There are quite literally hundreds of things that can be done with your data in order to interpret it. Statistics can sometimes be fickle because of this. For instance, I could say that the average weight for a baby is 12 pounds. Based on this number, any person having a baby would expect it to weigh approximately this much. However, based on standard deviation, or the average difference from the mean, the average baby could actually never weight close to 12 pounds. After all, the average of 1 and 23 is also 12. So here's how you can figure it all out!
X Values


12

23

12

14

21

23

1

1

5

100

Added Total of All X Values = 212

Finding the Arithmetic Mean
The mean is the average value. You probably learned this in grade school, but I'll give a short refresher just in case you've forgotten. In order to find the mean, a person must add together all values and then divide by the total number of values. Here's an example
If you count the total number of calculations added, you'll get a value of ten. Divide the sum of all x values, which is 212, by 10 and you'll have your mean!
212/10=21.2
21.2 is the mean of this number set.
Now this number can sometimes be a very decent representation of the data. Like in the above example of weights and babies, however, this value can sometimes be a very poor representation. In order to measure whether it's a decent representation or not, standard deviation can be used.
Standard Deviation
Standard deviation is the average distance numbers lie from the mean. In other words, if the standard deviation is a large number, the mean might not represent the data very well. Standard deviation is in the eyes of the beholder. Standard deviation could be equal to one and be considered large or it could be in the millions and still be considered small. The importance of the value of standard deviation is dependent on what's being measured. For instance, while deciding the reliability of carbon dating, the standard deviation might be in millions of years. On the other hand, this could be on a scale of billions of years. Being a few million off in this case wouldn't be such a big deal. If I'm measuring the size of the average television screen and the standard deviation is 32 inches, the mean obviously doesn't represent the data well because screens do not have a very large scale to them.
x
 x  21.2
 (x  21.2)^2


12
 9.2
 84.64

23
 1.8
 3.24

12
 9.2
 84.64

14
 7.2
 51.84

21
 0.2
 0.04

23
 1.8
 3.24

1
 20.2
 408.04

1
 20.2
 408.04

5
 16.2
 262.44

100
 78.8
 6209.44

Sum of 7515.6

Finding Standard Deviation and Variance
The first step to finding standard deviation is to find the difference between the mean and each value of x. This is represented by the second column to the right. It does not matter whether you subtract the value from the mean or the mean from the value.
This is because the next step is to square all of these terms. To square a number simply means to multiply it by itself. The squaring of the terms will make all negatives positive. This is because any negative times a negative results in a positive. This is represented in column three. At the end of this step, add all squared terms together.
Divide this sum by the total number of values (In this case, it's ten.) The number computed is what's called the variance. The variance is a number sometimes used in higher level statistical analyses. It's far beyond what this lesson covers, so you can forget about it's importance besides its use to find standard deviation. That is unless you plan to explore higher levels of statistics.
Variance = 7515.6/10 = 751.56
The standard deviation is the square root of the variance. A square root of a number is merely the value that when multiplied by itself, will result in the number.
Standard deviation = √751.56 ≈ 27.4146
Outliers
An outlier is a number that is basically an oddball when compared to the rest of the number set. It has a value that is nowhere near any of the other numbers. Often times, outliers pose very big problems in statistics. For instance, in the sample problem, the value 100 posed a significant issue. The standard deviation was raised much higher than it would have been without this value being present. This means that this number might have also made the mean misrepresent the data set.
x
 n


1
 1

1
 2

5
 3

12
 4

12
 5

14
 6

21
 7

23
 8

23
 9

100
 10

1st quartile
 2nd quartile
 n


1
 14
 1

1
 21
 2

5
 23
 3

12
 23
 4

12
 100
 5

How To Identify Outliers
So how do we know if a number is technically an outlier or not? The first step to determine this is to put all x values in order, like in the first column to the right
Then the median, or middle number, must be found. This can be done by counting the number of x values and dividing by 2. Then you count that many values in from both ends of the data set and you'll find which number is your median. If there are an even number of values, like in this example, you'll get a different value from the opposing sides. The mean of these values is the median. The median values to be averaged are bolded in column one of the first chart. Column two merely counts out the values. In this example.....
10/2 = 5
The value 5 numbers from the top is 12.
The value 5 numbers from the bottom is 14
12 + 14 = 26; 26/2 = median = 13
Now that the median has been found, the 1st and 3rd quartiles can be found. These values are obtained by cutting the data set in half at the median. Then, finding the median of these data sets will find the 1st and 3rd quartiles. The 1st and 3rd quartiles are bolded in the 2nd table to the right.
Now it's time to determine the presence of outliers. This is first done by subtracting the 1st quartile from the 3rd. These two quartiles in conjunction and all numbers in between are known as the inner quartile range. This range represents the middle fifty percent of the data.
23  5 = 18
now this number must be multiplied by 1.5. Why 1.5, you might ask? Well this is just the multiplier that's been agreed on. The resulting number is used to find mild outliers. In order to find extreme outliers, 18 must be multiplied by 3. Either way, the values are as listed bellow.
18 x 1.5 = 27
18 x 3 = 54
By subtracting these numbers from the bottom quartile and adding them to the top, acceptable values can be found. The two resulting numbers will give the range which excludes outliers.
5  27 = 22
23 + 27 = 50
Acceptable range = 22 to 50
In other words, 100 is at least a mild outlier.
5  54 = 49
23 + 54 = 77
Acceptable range = 49 to 77
Since 100 is larger than 77, it is considered to be an extreme outlier.
x


1

5

12

12

14

21

23

23

The sum is 111

What Can Be Done About Outliers?
One way to deal with outliers is to not use the mean at all. Instead, the median can be used to represent a data set. Another option is to use what's known as a trimmed mean.
A trimmed mean is the mean found after cutting an equal portion of values off of both ends of a data set. A trimmed mean of 10% would be the data set with 10% of all values cut off of both ends. I'll use a trimmed mean of 10% for the sample data set. The new mean is......
111/8 = trimmed mean = 13.875
The standard deviation of this value is......
1221.52/8 = variance = 152.69
√152.69 = standard deviation ≈ 12.3568
This value for standard deviation is much more acceptable than the value for the normal mean. Anyone working with this number set might want to consider using the trimmed mean or the median instead of the normal mean.
Conclusion
Now you have some basic tools to evaluate data. If you want to know more about statistics, you might as well take a class. Notice how the normal mean differs from the median and the trimmed mean. This is how statistics can be fickle. If you want to get a point across, using the normal mean could be your ticket to abusing statistics to your will. I'll quote Peter Parker as I always do when speaking of statistics  "With great strength comes great responsibility."
Comments
I always hated statistics at school, mainly because no teacher could explain the subject properly or in an understandable way. This Hub rectifies this very well, so that I can understand at least the basics of statistics.
I think it would be fun to do.Yeah, I was SO tired when I got home . . . part way into the falafel procses, I wondered why I had decided to make dinner . . . crazy for sure .