# Intro to Statistics (for the Social Sciences) #1

## Intro to Statistics

I completed my BA in Psychology at the U of O.I am making this page as a way to help me keep these ideas fresh while I am taking a break before I return for a Masters Degree.

I hope that this information can help you learn these concepts, too.

__Also check out my other sites__

Thank you very much for your support!!!

## --Memory Training--

Students will learn how to remember information taught in school.They will study less and still get better grades with this fast and easy audio program. Click here for more details. Hurry, It only costs $27

## 1) Basic Terminology

-Data (plural): measurements or observations (aka scores)

-------------------------------------------------------------------------

-Variable: A characteristic or condition which has different values for different individuals (ex. height, test scores, gender)

---Independent Variable(IV): The variable that is controlled by an experimenter.

-----Quasi-Independent Variable (Q-IV): A variable that can't be manipulated but is used to determine groups.(height, hair color, age, gender, etc...)

---Dependent Variable (DV): A variable that is allowed to vary and is observed in relation to the IV.(dependent on the independent variable).

-----------------------------------------------------------------------

-Statistics:A set of calculations used to organize, summarize and interpret info.

---Descripitive Statistics:Used to organize, simplify and summarize data.

---Inferential Statistics: Using Sample statistics to make generalizations about their population.

-------------------------------------------------------------------------

-Population: ALL of the individual you wish to study (ex. all students in the US)

---Parameter: A value used to describe a population.

------------------------------------------------------------------------

-Sample: ONLY SOME of the individuals/objects you wish to study from a population (ex. 1000 students from New York)

---Statistic: A value used to describe a sample.

---Sampling Error: A discrepency which occurs between a sample and its population.

-------------------------------------------------------------------

-Control Condition: Individuals in this type of experimentation are given no experimental treatment or are given a type of placebo.(This condition is used to have a base of reference for the experimental group.)

-Experimental Condition: Individuals in this type of experimentation do receive the treatment being tested.

## 2) Basic Symbols

What do all these symbols mean!

Symbols are used in many formulas that are needed to calculate different statistics.

These are some of the basics:

∑= Sum

x = each variable score

SS = Sum of squared deviations

sqrt = Square root

df = degrees of freedom

Symbols used to describe a Population

--µ = mean

--σ = standard Deviation

--σÂ²= variance

--N = Total number of population scores

Symbols used to describe a Sample

-- M = mean

-- s = Standard Deviation

-- sÂ² = variance

-- n = Total number of sample scores

## 3) Distribution: Tables and Graphs

Frequency Distribution: This is a list of the scores for a certain experiment and a measure of the frequency of each score. This information can be used to contruct tables and graphs

Variability:A quantitative measurement of the degree to which the scores in a distribution are spread out or clustered together.

### Normal Distribution

This type of distribution is seen when the variables are clustered together with gradual decrease on either side of the distribution.

This type of distribution is used often in calculations assuming a normal population distribution. I will discuss this type of distribution later in further detail.

It is also called a Gaussian Curve or Bell Curve.

### Negative Skew

A negative skew is when the variables in a distribution are clustered together with a few outliers which change the distribution. (The tail of the graph points to the negative end)

outliers: These are variables that fall outside the normal trends for the distribution.

(ex. Lets say the variable for the graph above is shoe size and most of the data falls within sizes 7 to 10 but if a few individuals had a shoe size of 4. That would skew the distribution negatively.)

tip: greater than 50% of the scores are __above__ the mean

### Postive Skew

A positive skew is when the variables in a distribution are clustered together with a few outliers that change the distribution positively.(The tail of the graph points to the positive end)

(ex. In this case given the same information as the previous example the outlier would have a shoe size of 13 instead of 4. Making the distribution positively skewed.)

tip: greater than 50% of the scores are __below__ the mean

## 4) Central Tendency: The Mean, Median and Mode

Central Tendency: A measurement that uses only one score to describe a distribution of scores.

__These are a few ways to measure central tendency:__

-Mean (µ or M): The average (sum of scores/# of scores)

-------Ex. (5,4,3) 5+4+3= 12, 12/3=4, Mean = 4

-Median:The score which divides all scores in half when put into ascending order.

------Ex. (10,4,3,2,1) Median = 3

-Mode: The score or scores that occur most often in a set.

------Ex. (5,4,3,3,2) Mode = 3

**For statistics, the mean is most often used in calculations of central tendency. *

## 5) More on Variables and Scales

Variables can be IV, DV, and Q-IV. (see basic terminology above)

__They can also be either: Discrete or Continuous__

-Discrete: No values can exist between pre-determined categories.

----(ex. categories can be Male/Female or ratings on a scale from 1-5)

-Continuous Variables: Variables that have an infinite number of possibilities usually numerical.

----(ex. temperature could be 98.5F or 97.6F. There are continuous values for temp.)

__N O I R Scales for Variables__

N = Nominal: A discrete set of categories with different names.

-----(ex. Pop categories: Coke, Sprite, Dr. Pepper)

O = Ordinal: A set of categories ordered by sequence.

----- (ex. best, better, fair, worse, worst)

I = Interval: Ordered Categories with exact distances between categories. NO Real Zero..

----- (ex. Temp with same spacing: 10, 20, 30, 40, 50...) (Temperature always exists so NO real zero value)

R = Ratio:This is a numerical scale with a true zero.

----- (ex. $0.00 is really equal to no money while there can't be true zero temp)

*Later on this information will be important to determine how to collect data and what type of scale is best for the given situation.*

## 6) Basic Calculations using Basic Symbols

Order of Operations:

All are done from left to right

1) Calculate (Within Parenthesis)

2) Exponents (squaredÂ², etc...)

3) Multiply and Divide (* and /)

4) Summation (∑= sum)

5) Any other Addition or Subtraction (+ and -)

Calculating Mean:

M=(∑X)/n

Calculating Sum of squared deviations:

SS=∑(X-Âµ)Â² or

SS = ∑XÂ²-(∑X)Â²/N

* both will result in the same answer*

Calculating Variability:

σÂ² = SS/N

sÂ² = SS/(n-1)

Calculating Standard Error

σM = σ/(sqrt of N)

sM = s/ (sqrt of n)

## 7) Understanding Variance

Variability: A measurement to show the degree to which the the scores or data are spread out or clustered in a distribution.

---A good way to describe a distribution in terms of distance (ex. lets say that most adults are within a foot (12") of 5'5" tall. Variability would show that distance for a normal height and would represent the heights most likely for someone to fall into if they are a part of that population. There would be people who are much taller such as basketball players and those that are much shorter but it is much rarer and would be seen in this distribution as outlying values. )

In certain cases the values may be much closer to the mean or farther from the mean.

---If you look at weight verses height you may find that the range of values for weight will be much larger than the range of the values for height. An adult could weight between *90lbs or 500lbs* while the height of a person is much more limited *3'5" to 7'6"*

* these numbers are only used to represent a point not to show accurate representation of the actual range of weights or height. *

## 8) Z test

__So what is a Z test?__

A "Z test" is a way to standardize each score in a distribution and then determine a relation between all the scores. (A way to know how a certain score compares to the other scores)

__You can use a Z test when:__

-- estimating a population parameter

-- and there is only one sample group

-- and there is only one score per subject in the group

-- and σÂ² is given or can be calculated

__Once a Z test is calculated then:__

all scores are between -4.00 and +4.00

µ = 0

σ = 1

__A Z score:__

1) replaces the original scores, mean and variance.

2) changes data to have a normal distribution

2) is + or - (+ is above the mean and - is below the mean)

3) is a number that represents distance from the mean (Z score mean = 0)

__Calculating Z scores:__

(X-µ)/σ

## 9) T Test

__What is a T Test?__

A "T Test" is very much like a "Z test" in that it standardizes scores in a distribution; However, in a "T Test" instead of using the population variance(σÂ²) we use the sample variance (sÂ²)

__You can use a T test when:__

-- estimating a population parameter

-- and there is only one sample group

-- and there is only one score per subject in the group

-- and σ is NOT given or CANNOT be calculated

__Once a T test is calculated then:__

µ = 0

σ = 1

__A T score is:__

1) + or - (+ is above the mean and - is below the mean)

2) a number that represents distance from the mean which is equal 0 (-4.00 to +4.00)

__Calculating T scores:__

t=(M-µ) / sM

sM= s/ (sqrt of N)

## ---Statistics for Dummies Books---

If this is all mumbo jumbo to you. Try out one of these books.

## 10) Statistics Resources

- Wikipedia-Statistics

More info about statistics! - Statistics Flash Cards

Flash cards to help improve your knowledge of the concepts. - Intro to Statistics (for the Social Sciences) #2

Coming Soon!!

6