Statistics Informed Decisions Using Data 4th edition by Michael Sullivan Chapters 16 Notes
Introduction to the Practice of Statistics
 What is statistics? Many people say that statistics is numbers. After all, we are bombarded by numbers that supposedly represent how we feel and who we are.
 For example, we hear on the radio that 50% of first marriages, 67% of second marriages, and 74% of third marriages end in divorce (Forest Institute of Professional Psychology, Springfield, MO).
 Another interesting consideration about the “facts” we hear or read is that two different sources can report two different results.
 For example, a September 10–11, 2010, poll conducted by CBS News and the New York Times indicated that 70% of Americans disapproved of the job that Congress was doing. However, a September 13–16, 2010, Gallup poll indicated that 77% of Americans disapproved of the job that Congress was doing.
 Statisticsis the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions. This definition can be broken into four parts.
 The first part states that statistics involves the collection of information.
 The second refers to the organization and summarization of information.
 The third states that the information is analyzed to draw conclusions or answer specific questions.
 The fourth part states that results should be reported using some measure that represents how convinced we are that our conclusions reflect reality.
 The information in the definition is data, which the American Heritage Dictionarydefines as “a fact or proposition used to draw a conclusion or make a decision.”
 Data can be numerical, as in height, or nonnumeric, as in gender. In either case, data describe characteristics of an individual.
 Data are important in statistics because data are used to draw a conclusion or make a decision.
 Analysis of data can lead to powerful results. Data can be used to offset anecdotal claims, such as the suggestion that cellular telephones cause brain cancer. After carefully collecting, summarizing, and analyzing data regarding this phenomenon, it was determined that there is no link between cell phone usage and brain cancer.
 Anecdotal means that the information being conveyed is based on casual observation, not scientific research.
 Because data are powerful, they can be dangerous when misused. The misuse of data usually occurs when data are incorrectly obtained or analyzed.
 For example, radio or television talk shows regularly ask poll questions for which respondents must call in or use the Internet to supply their vote. Most likely, the individuals who are going to call in are those who have a strong opinion about the topic. This group is not likely to be representative of people in general, so the results of the poll are not meaningful. We need to be mindful of where the data comes from.
 Even when data tell us that a relation exists, we need to investigate.
 For example, a study showed that breastfed children have higher IQs than those who were not breastfed. Does this study mean that a mother who breastfeeds her child will increase the child's IQ? Not necessarily. It may be that some factor other than breastfeeding contributes to the IQ of the children.
 In this case, it turns out that mothers who breastfeed generally have higher IQs than those who do not. Therefore, it may be genetics that leads to the higher IQ, not breastfeeding. * This illustrates an idea in statistics known as the lurking variable.
 In statistics, we must consider lurking variables, because two variables are often influenced by a third variable. A good statistical study will have a way of dealing with lurking variables.
 A key aspect of data is that they vary. Consider the students in your classroom. Is everyone the same height? No. So, within groups there is variation. Now consider yourself. Do you eat the same amount of food each day? No.
 One goal of statistics is to describe and understand the sources of variation.
 Because of variability, the results that we obtain using data can vary.
 In a mathematics class, if Bob and Jane are asked to solve 3 x + 5 = 11, they will both obtain x = 2 as the solution when they use the correct procedures.
 In a statistics class, if Bob and Jane are asked to estimate the average commute time for workers in Dallas, Texas, they will likely get different answers, even though they both use the correct procedure. The different answers occur because they likely surveyed different individuals, and these individuals have different commute times. Bob and Jane would get the same result if they both asked all commuters or the same commuters about their commutes, but how likely is this? In statistics, when a problem is solved, the results do not have 100% certainty. In statistics, we might say that we are 95% confident that the average commute time in Dallas, Texas, is between 20 and 23 minutes.
Problem Determine whether the following variables are qualitative or quantitative.
(a) Gender
(b) Temperature
(c) Number of days during the past week that a college student studied
(d) Zip code
Approach Quantitative variables are numerical measures such that meaningful arithmetic operations can be performed on the values of the variable. Qualitative variables describe an attribute or characteristic of the individual that allows researchers to categorize the individual.
Solution
(a) Gender is a qualitative variable because it allows a researcher to categorize the individual as male or female. Notice that arithmetic operations cannot be performed on these attributes.
(b) Temperature is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results. For example, 70 ° F is 10 ° F warmer than 60 ° F.
(c) Number of days during the past week that a college student studied is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results.
(d) Zip code is a qualitative variable because it categorizes a location. Notice that, even though they are numeric, adding or subtracting zip codes does not provide meaningful results.
 Scenario: You are walking down the street and notice that a person walking in front of you drops $100. Nobody seems to notice the $100 except you. Since you could keep the money without anyone knowing, would you keep the money or return it to the owner? Suppose you wanted to use this scenario as a gauge of the morality of students at your school by determining the percent of students who would return the money. How might you do this?
 You could attempt to present the scenario to every student at the school, but this is likely to be difficult or impossible if the student body is large.
 A second possibility is to present the scenario to 50 students and use the results to make a statement about all the students at the school.
 The entire group to be studied is called the population. An individual is a person or object that is a member of the population being studied. A sampleis a subset of the population that is being studied.
 In the $100 study, the population is all the students at the school. Each student is an individual. The sample is the 50 students selected to participate in the study.
 Suppose 39 of the 50 students stated that they would return the money to the owner.
 We could present this result by saying that the percent of students in the survey who would return the money to the owner is 78%.
 This is an example of a descriptive statistic because it describes the results of the sample without making any general conclusions about the population.
 In the $100 study, the population is all the students at the school. Each student is an individual. The sample is the 50 students selected to participate in the study.
 A statistic is a numerical summary of a sample. Descriptive statisticsconsist of organizing and summarizing data. Descriptive statistics describe data through numerical summaries, tables, and graphs.
 So 78% is a statistic because it is a numerical summary based on a sample. Descriptive statistics make it easier to get an overview of what the data are telling us.
 Inferential statisticsuses methods that take a result from a sample, extend it to the population, and measure the reliability of the result.
 If we extend the results of our sample to the population, we are performing inferential statistics.
 The accuracy of a generalization always contains uncertainty because a sample cannot tell us everything about a population. Therefore, inferential statistics always includes a level of confidence in the results. So rather than saying that 78% of all students would return the money, we might say that we are 95% confident that between 74% and 82% of all students would return the money. Notice how this inferential statement includes a level of confidence (measure of reliability) in our results. It also includes a range of values to account for the variability in our results.
 A parameteris a numerical summary of a population.
 Suppose the percentage of all students on your campus who own a car is 48.2%. This value represents a parameter because it is a numerical summary of a population. Suppose a sample of 100 students is obtained, and from this sample we find that 46% own a car. This value represents a statistic because it is a numerical summary of a sample.
 Variables are the characteristics of the individuals within the population. For example, recently my son and I planted a tomato plant in our backyard.
 The Process of Statistics
 Example 2 the Process of Statistics: Gun Ownership. The APNational Constitution Center conducted a poll August 11–16, 2010, to learn how adult Americans feel existing guncontrol laws infringe on the second amendment to the U.S. Constitution. The following statistical process allowed the researchers to conduct their study.
 1. Identify the research objective. The researchers wished to determine the percentage of adult Americans who believe guncontrol laws infringe on the public's right to bear arms. Therefore, the population being studied was adult Americans.
 2. Collect the information needed to answer the question posed in (1). It is unreasonable to expect to survey the more than 200 million adult Americans to determine how they feel about guncontrol laws. So the researchers surveyed a sample of 1007 adult Americans. Of those surveyed, 514 stated they believe existing guncontrol laws infringe on the public's right to bear arms.
 3. Describe the data. Of the 1007 individuals in the survey, 51 % (= 514 / 1007) believe existing guncontrol laws infringe on the public's right to bear arms. This is a descriptive statistic because its value is determined from a sample.
 4. Perform inference. The researchers at the APNational Constitution Center wanted to extend the results of the survey to all adult Americans. Remember, when generalizing results from a sample to a population, the results are uncertain. To account for this uncertainty, researchers reported a 3% margin of error. This means that the researchers feel fairly certain (in fact, 95% certain) that the percentage of all adult Americans who believe existing guncontrol laws infringe on the public's right to bear arms is somewhere between 48 % ( 51 % − 3 % ) and 54 % ( 51 % + 3 %).
 Qualitative, or categorical, variables allow for classification of individuals based on some attribute or characteristic.
 Quantitative variables provide numerical measures of individuals. The values of a quantitative variable can be added or subtracted and provide meaningful results.
 Many examples in this text will include a suggested approach, or a way to look at and organize a problem so that it can be solved. The approach will be a suggested method of attack toward solving the problem. This does not mean that the approach given is the only way to solve the problem, because many problems have more than one approach leading to a correct solution.
 Typically, there is more than one correct approach to solving a problem. For example, if you turn the key in your car's ignition and it doesn't start; one approach would be to look under the hood to try to determine what is wrong. (Of course, this approach will work only if you know how to fix cars.) A second, equally valid approach would be to call an automobile mechanic to service the car.
 Example Distinguishing between Qualitative and Quantitative Variables

 1. Identify the research objective. A researcher must determine the question(s) he or she wants answered. The question(s) must be detailed so that it identifies the population that is to be studied.
 2. Collect the data needed to answer the question(s) posed in (1). Conducting research on an entire population is often difficult and expensive, so we typically look at a sample. This step is vital to the statistical process, because if the data are not collected correctly, the conclusions drawn are meaningless. Do not overlook the importance of appropriate data collection.
 3. Describe the data. Descriptive statistics allow the researcher to obtain an overview of the data and can help determine the type of statistical methods the researcher should use.
 4. Perform inference. Apply the appropriate techniques to extend the results obtained from the sample to the population and report a level of reliability of the results.
 Caution: Many nonscientific studies are based on convenience samples, such as Internet surveys or phonein polls. The results of any study performed using these types of sampling method are not reliable.
Problem: Determine whether the quantitative variables are discrete or continuous.
(a) The number of heads obtained after flipping a coin five times.
(b) The number of cars that arrive at a McDonald's drivethru between 12:00 P.M. and 1:00 P.M.
(c) The distance a 2011 Toyota Prius can travel in city driving conditions with a full tank of gas.
Approach A variable is discrete if its value results from counting. A variable is continuous if its value is measured.
Solution
(a) The number of heads obtained by flipping a coin five times is a discrete variable because we can count the number of heads obtained. The possible values of this discrete variable are 0, 1, 2, 3, 4, 5.
(b) The number of cars that arrive at a McDonald's drivethru between 12:00 P.M. and 1:00 P.M. is a discrete variable because we find its value by counting the cars. The possible values of this discrete variable are 0, 1, 2, 3, 4, and so on. Notice that this number has no upper limit.
(c) The distance traveled is a continuous variable because we measure the distance (miles, feet, inches, and so on).
 A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values. The term countable means that the values result from counting, such as 0, 1, 2, 3, and so on. A discrete variable cannot take on every possible value between any two possible values.
 A continuous variable is a quantitative variable that has an infinite number of possible values that are not countable. A continuous variable may take on every possible value between any two values.
 Example Distinguishing between Discrete and Continuous Variables

 Continuous variables are often rounded.
 For example, if a certain make of car gets 24 miles per gallon (mpg) of gasoline, its miles per gallon must be greater than or equal to 23.5 and less than 24.5, or 23.5 ≤ mpg < 24.5.
 The type of variable (qualitative, discrete, or continuous) dictates the methods that can be used to analyze the data.
 The list of observed values for a variable is data. Gender is a variable; the observations male and female are data. Qualitative data are observations corresponding to a qualitative variable. Quantitative data are observations corresponding to a quantitative variable. Discrete data are observations corresponding to a discrete variable. Continuous data are observations corresponding to a continuous variable.
 Continuous variables are often rounded.
Problem For each of the following variables, determine the level of measurement.
(a) Gender
(b) Temperature
(c) Number of days during the past week that a college student studied
(d) Letter grade earned in your statistics class
Approach For each variable, we ask the following: Does the variable simply categorize each individual? If so, the variable is nominal. Does the variable categorize and allow ranking of each value of the variable? If so, the variable is ordinal. Do differences in values of the variable have meaning, but a value of zero does not mean the absence of the quantity? If so, the variable is interval. Do ratios of values of the variable have meaning and there is a natural zero starting point? If so, the variable is ratio.
Solution
(a) Gender is a variable measured at the nominal level because it only allows for categorization of male or female. Plus, it is not possible to rank gender classifications.
(b) Temperature is a variable measured at the interval level because differences in the value of the variable make sense. For example, 70 ° F is 10 ° F warmer than 60 ° F . Notice that the ratio of temperatures does not represent a meaningful result. For example, 60 ° F is not twice as warm as 30 ° F . In addition, 0 ° F does not represent the absence of heat.
 A variable is at the nominal level of measurementif the values of the variable name, label, or categorize. In addition, the naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.
 The word nominal comes from the Latin word nomen, which means to name. When you see the word ordinal, think order.
 A variable is at the ordinal level of measurement if it has the properties of the nominal level of measurement, however the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.
 A variable is at the interval level of measurement if it has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. A value of zero does not mean the absence of the quantity. Arithmetic operations such as addition and subtraction can be performed on values of the variable.
 A variable is at the ratio level of measurement if it has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such as multiplication and division can be performed on the values of the variable.
Observational Studies versus Designed Experiments
 Once our research question is developed, we must develop methods for obtaining the data that can be used to answer the questions posed in our research objective. There are two methods for collecting data, observational studies and designed experiments.To see the difference between these two methods, read the following two studies.
 Example 1 Cellular Phones and Brain Tumors: Researcher Elisabeth Cardis and her colleagues wanted “to determine whether mobile phone use increases the risk of [brain] tumors.” To do so, the researchers identified 5117 individuals from 13 countries who were 30–59 years of age who had brain tumors diagnosed between 2000 and 2004 and matched them with 5634 individuals who did not have brain tumors. The matching was based on age, gender, and region of residence. Both the individuals with tumors and the matched individuals were interviewed to learn about past mobile phone use, as well as sociodemographic background, medical history, and smoking status. The researchers found no significant difference in cell phone use between the two groups. The researchers concluded there is “no increased risk of brain tumors observed in association with use of mobile phones.”
 In Example 1 the study was conducted on humans, while the study in Example 2 was conducted on rats. However, there is a bigger difference. In Example 1, no attempt was made to influence the individuals in the study. The researchers simply interviewed people to determine their historical use of cell phones. No attempt was made to influence the value of the explanatory variable, radiofrequency exposure (cell phone use). Because the researchers simply recorded the past behavior of the participants, the study in Example 1 is an observational study.
 Example 2 Cellular Phones and Brain Tumors: Researchers Joseph L. Roti Roti and associates examined “whether chronic exposure to radio frequency (RF) radiation at two common cell phone signals—835.62 megahertz, a frequency used by analogue cell phones, and 847.74 megahertz, a frequency used by digital cell phones—caused brain tumors in rats.” To do so, the researchers randomly divided 480 rats into three groups. The rats in group 1 were exposed to the analogue cell phone frequency; the rats in group 2 were exposed to the digital frequency; the rats in group 3 served as controls and received no radiation. The exposure was done for 4 hours a day, 5 days a week for 2 years. The rats in all three groups were treated the same, except for the RF exposure. After 505 days of exposure, the researchers reported the following after analyzing the data. “We found no statistically significant increases in any tumor type, including brain, liver, lung or kidney, compared to the control group.”
 In both studies, the goal was to determine if radio frequencies from cell phones increase the risk of contracting brain tumors. Whether or not brain cancer was contracted is the response variable. The level of cell phone usage is the explanatory variable. In research, we wish to determine how varying the amount of an explanatory variable affects the value of a response variable.
 An observational studymeasures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study. Observational studies do not allow a researcher to claim causation, only association.
 So why ever conduct an observational study if we can't claim causation? Often, it is unethical to conduct an experiment. Consider the link between smoking and lung cancer. In a designed experiment to determine if smoking causes lung cancer in humans, a researcher would divide a group of volunteers into group 1 who would smoke a pack of cigarettes every day for the next 10 years, and group 2 who would not smoke. In addition, eating habits, sleeping habits, and exercise would be controlled so that the only difference between the two groups was smoking. After 10 years the experiment's researcher would compare the proportion of participants in the study who contract lung cancer in the smoking group to the nonsmoking group. If the two proportions differ significantly, it could be said that smoking causes cancer. This designed experiment is able to control many of the factors that might affect whether one contracts lung cancer that would not be controlled in an observational study, however, it is a very unethical study.
 If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group, the study is a designed experiment. Designed experiments, are used whenever control of certain variables is possible and desirable. This type of research allows the researcher to identify certain cause and effect relationships among the variables in the study.
 · Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. Confounding is potentially a major problem with observational studies. Often, the cause of confounding is a lurking variable.
 A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables considered in the study.
 There are three major categories of observational studies: (1) crosssectional studies, (2) casecontrol studies, and (3) cohort studies.
 Crosssectional StudiesThese observational studies collect information about individuals at a specific point in time or over a very short period of time.
 For example, a researcher might want to assess the risk associated with smoking by looking at a group of people, determining how many are smokers, and comparing the rate of lung cancer of the smokers to the nonsmokers.
 An advantage of crosssectional studies is that they are cheap and quick to do. However, they have limitations. For our lung cancer study, individuals might develop cancer after the data are collected, so our study will not give the full picture.
 Casecontrol Studies These studies are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In casecontrol studies, individuals who have a certain characteristic may be matched with those who do not.
 For example, we might match individuals who smoke with those who do not. When we say “match” individuals, we mean that we would like the individuals in the study to be as similar (homogeneous) as possible in terms of demographics and other variables that may affect the response variable. Once homogeneous groups are established, we would ask the individuals in each group how much they smoked over the past 25 years. The rate of lung cancer between the two groups would then be compared.
 A disadvantage to this type of study is that it requires individuals to recall information from the past. It also requires the individuals to be truthful in their responses. An advantage of casecontrol studies is that they can be done relatively quickly and inexpensively.
 Cohort StudiesA cohort study first identifies a group of individuals to participate in the study (the cohort). The cohort is then observed over a long period of time. During this period, characteristics about the individuals are recorded and some individuals will be exposed to certain factors (not intentionally) and others will not. At the end of the study the value of the response variable is recorded for the individuals.
 Typically, cohort studies require many individuals to participate over long periods of time. Because the data are collected over time, cohort studies are prospective. Another problem with cohort studies is that individuals tend to drop out due to the long time frame. This could lead to misleading results. Cohort studies are the most powerful of the observational studies.
 A censusis a list of all individuals in a population along with certain characteristics of each individual.
 The United States conducts a census every 10 years to learn the demographic makeup of the United States. Everyone whose usual residence is within the borders of the United States must fill out a questionnaire packet. The cost of obtaining the census in 2010 was approximately $5.4 billion; about 635,000 temporary workers were hired to assist in collecting the data.
 Why is the U.S. Census so important? The results of the census are used to determine the number of representatives in the House of Representatives in each state, congressional districts, distribution of funds for government programs (such as Medicaid), and planning for the construction of schools and roads. The first census of the United States was obtained in 1790 under the direction of Thomas Jefferson. It is a constitutional mandate that a census be conducted every 10 years.

 Random samplingis the process of using chance to select individuals from a population to be included in the sample.
 For the results of a survey to be reliable, the characteristics of the individuals in the sample must be representative of the characteristics of the individuals in the population. The key to obtaining a sample representative of a population is to let chance or randomness play a role in dictating which individuals are in the sample, rather than convenience. If convenience is used to obtain a sample, the results of the survey are meaningless.
 A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample. The number of individuals in the sample is always less than the number of individuals in the population.
 Simple random sampling is like selecting names from a hat.
 Often, however, the size of the population is so large that performing simple random sampling in this fashion is not practical. Typically, each individual in the population is assigned a unique number between 1 and N, where N is the size of the population. Then n distinct random numbers from this list are selected, where n represents the size of the sample. To number the individuals in the population, we need a frame—a list of all the individuals within the population.
 In a sample without replacement, an individual who is selected is removed from the population and cannot be chosen again. In a sample with replacement, a selected individual is placed back into the population and could be chosen a second time. We use sampling without replacement so that we don't select the same client twice.
 Step 1 Table 3 shows the list of clients. We arrange them in alphabetical order, although this is not necessary, and number them from 01 to 30.
 Step 2 A table of random numbers can be used to select the individuals to be in the sample. See Table 4.* We pick a starting place in the table by closing our eyes and placing a finger on it. This method accomplishes the goal of being random. Suppose we start in column 4, row 13. Because our data have two digits, we select twodigit numbers from the table using columns 4 and 5. We select numbers between 01 and 30, inclusive, and skip 00, numbers greater than 30, and numbers already selected.
 Random samplingis the process of using chance to select individuals from a population to be included in the sample.
Problem A sociologist wants to gather data regarding household income within the city of Boston. Obtain a sample using cluster sampling.
Approach The city of Boston can be set up so that each city block is a cluster. Once the city blocks have been identified, we obtain a simple random sample of the city blocks and survey all households on the blocks selected.
Solution Suppose there are 10,493 city blocks in Boston. First, the sociologist must number the blocks from 1 to 10,493. Suppose the sociologist has enough time and money to survey 20 clusters (city blocks). The sociologist should obtain a simple random sample of 20 numbers between 1 and 10,493 and survey all households from the clusters selected. Cluster sampling is a good choice in this example because it reduces the travel time to households that are likely to occur with both simple random sampling and stratified sampling. In addition, there is no need to obtain a frame of all the households with cluster sampling. The only frame needed is one that provides information regarding city blocks.
 First, we must determine whether the individuals within the proposed cluster are homogeneous (similar individuals) or heterogeneous (dissimilar individuals). In Example 3, city blocks tend to have similar households. Survey responses from houses on the same city block are likely to be similar. This results in duplicate information. We conclude that if the clusters have homogeneous individuals it is better to have more clusters with fewer individuals in each cluster.
 What if the cluster is heterogeneous? Under this circumstance, the heterogeneity of the cluster likely resembles the heterogeneity of the population. In other words, each cluster is a scaleddown representation of the overall population. For example, a qualitycontrol manager might use shipping boxes that contain 100 light bulbs as a cluster, since the rate of defects within the cluster would resemble the rate of defects in the population, assuming the bulbs are randomly placed in the box. Thus, when each cluster is heterogeneous, fewer clusters with more individuals in each cluster are appropriate.
 Stratified and cluster samples are different. In a stratified sample, we divide the population into two or more homogeneous groups. Then we obtain a simple random sample from each group. In a cluster sample, we divide the population into groups, obtain a simple random sample of some of the groups, and survey all individuals in the selected groups.
 A stratified sample is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous (or similar) in some way.
 For example, suppose Congress was considering a bill that abolishes estate taxes. In an effort to determine the opinion of her constituency, a senator asks a pollster to conduct a survey within her state. The pollster may divide the population of registered voters within the state into three strata: Republican, Democrat, and Independent. This grouping makes sense because the members within each of the three parties may have the same opinion regarding estate taxes, but opinions among parties may differ. The main criterion in performing a stratified sample is that each group (stratum) must have a common attribute that results in the individuals being similar within the stratum.
 Stratum is singular, while strata is plural. The word strata means divisions. So a stratified sample is a simple random sample of different divisions of the population.

 A systematic sample is obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.
 Because systematic sampling does not require a frame, it is a useful technique when you cannot obtain a list of the individuals in the population.
 To obtain a systematic sample, select a number k, randomly select a number between 1 and k and survey that individual, then survey every kth individual there after. For example, we might decide to survey every k = 8 th individual. We randomly select a number between 1 and 8, such as 5. This means we survey the 5 th , 5 + 8 = 13 th , 13 + 8 = 21 st , 21 + 8 = 29 th , and so on, individuals until we reach the desired sample size.

 A cluster sample is obtained by selecting all individuals within a randomly selected collection or group of individuals.
 Suppose a school administrator wants to learn the characteristics of students enrolled in online classes. Rather than obtaining a simple random sample based on the frame of all students enrolled in online classes, the administrator could treat each online class as a cluster and then obtain a simple random sample of these clusters. The administrator would then survey all the students in the selected clusters. This reduces the number of classes that get surveyed.
 Imagine a mall parking lot. Each subsection of the lot could be a cluster
 Have you ever been stopped in the mall by someone holding a clipboard? These folks are responsible for gathering information, but their methods of data collection are inappropriate, and the results of their analysis are suspect because they obtained their data using a convenience sample.
 A convenience sample is a sample in which the individuals are easily obtained and not based on randomness.
 Studies that use convenience sampling generally have results that are suspect. The results should be looked on with extreme skepticism.
 The most popular of the many types of convenience samples are those in which the individuals in the sample are selfselected (the individuals themselves decide to participate in a survey). These are also called voluntary response samples.
 One example of selfselected sampling is phonein polling; a radio personality will ask his or her listeners to phone the station to submit their opinions.
 Another example is the use of the Internet to conduct surveys. For example, a television news show will present a story regarding a certain topic and ask its viewers to “tell us what you think” by completing a questionnaire online or phoning in an opinion.
 Both of these samples are poor designs because the individuals who decide to be in the sample generally have strong opinions about the topic. A more typical individual in the population will not bother phoning or logging on to a computer to complete a survey. Any inference made regarding the population from this type of sample should be made with extreme caution.
 Convenience samples yield unreliable results because the individuals participating in the survey are not chosen using random sampling. Instead, the interviewer or participant selects who is in the survey. Would an interviewer select an ornery individual? Of course not! Therefore, the sample is likely not to be representative of the population.

 As an example of multistage sampling, consider Nielsen Media Research. Nielsen randomly selects households and monitors the television programs these households are watching through a People Meter. The meter is an electronic box placed on each TV within the household. The People Meter measures what program is being watched and who is watching it. Nielsen selects the households with the use of a twostage sampling process.
 Stage 1 Using U.S. Census data, Nielsen divides the country into geographic areas (strata). The strata are typically city blocks in urban areas and geographic regions in rural areas. About 6000 strata are randomly selected.
 Stage 2 Nielsen sends representatives to the selected strata and lists the households within the strata. The households are then randomly selected through a simple random sample.
 As another example of multistage sampling, consider the sample used by the Census Bureau for the Current Population Survey. This survey requires five stages of sampling:
 Stage 1 Stratified sample
 Stage 2 Cluster sample
 Stage 3 Stratified sample
 Stage 4 Cluster sample
 Stage 5 Systematic sample
 Throughout our discussion of sampling, we did not mention how to determine the sample size. Determining the sample size is key in the overall statistical process. Researchers need to know how many individuals they must survey to draw conclusions about the population within some predetermined margin of error. They must find a balance between the reliability of the results and the cost of obtaining these results. The bottom line is that time and money determine the level of confidence researchers will place on the conclusions drawn from the sample data. The more time and money researchers have available, the more accurate the results of the statistical inference.
 As an example of multistage sampling, consider Nielsen Media Research. Nielsen randomly selects households and monitors the television programs these households are watching through a People Meter. The meter is an electronic box placed on each TV within the household. The People Meter measures what program is being watched and who is watching it. Nielsen selects the households with the use of a twostage sampling process.
 Problem Lipitor is a cholesterollowering drug made by Pfizer. In the Collaborative Atorvastatin Diabetes Study (CARDS), the effect of Lipitor on cardiovascular disease was assessed in 2838 subjects, ages 40 to 75, with type 2 diabetes, without prior history of cardiovascular disease. In this placebocontrolled, doubleblind experiment, subjects were randomly allocated to either Lipitor 10 mg daily (1428) or placebo (1410) and were followed for 4 years. The response variable was the occurrence of any major cardiovascular event. Lipitor significantly reduced the rate of major cardiovascular events (83 events in the Lipitor group versus 127 events in the placebo group). There were 61 deaths in the Lipitor group versus 82 deaths in the placebo group.
(a) What does it mean for the experiment to be placebocontrolled?
(b) What does it mean for the experiment to be doubleblind?
(c) What is the population for which this study applies? What is the sample?
(d) What are the treatments?
(e) What is the response variable? Is it qualitative or quantitative?
Approach We will apply the definitions just presented.
Solution
(a) The placebo is a medication that looks, smells, and tastes like Lipitor. The placebo control group serves as a baseline against which to compare the results from the group receiving Lipitor. The placebo is also used because people tend to behave differently when they are in a study. By having a placebo control group, the effect of this is neutralized.
(b) Since the experiment is doubleblind, the subjects, as well as the individual monitoring the subjects, do not know whether the subjects are receiving Lipitor or the placebo. The experiment is doubleblind so that the subjects receiving the medication do not behave differently from those receiving the placebo and so the individual monitoring the subjects does not treat those in the Lipitor group differently from those in the placebo group.
(c) The population is individuals from 40 to 75 years of age with type 2 diabetes without a prior history of cardiovascular disease. The sample is the 2838 subjects in the study.
(d) The treatments are 10 mg of Lipitor or a placebo daily.
(e) The response variable is whether the subject had any major cardiovascular event, such as a stroke, or not. It is a qualitative variable.
Bias in Sampling
 The goal of sampling is to obtain information about a population through a sample.
 If the results of the sample are not representative of the population, then the sample has bias.
 The word bias could mean to give preference to selecting some individuals over others; it could also mean that certain responses are more likely to occur in the sample than in the population.
 There are three sources of bias in sampling: Sampling bias, Nonresponse bias, and Response bias
 Sampling biasmeans that the technique used to obtain the sample's individuals tends to favor one part of the population over another. Any convenience sample has sampling bias because the individuals are not chosen through a random sample.
 Sampling bias also results due to undercoverage, which occurs when the proportion of one segment of the population is lower in a sample than it is in the population. Undercoverage can result if the frame used to obtain the sample is incomplete or not representative of the population. Some frames, such as the list of all registered voters, may seem easy to obtain; but even this frame may be incomplete since people who recently registered to vote may not be on the published list of registered voters.
 Sampling bias can lead to incorrect predictions. For example, the magazine Literary Digest predicted that Alfred M. Landon would defeat Franklin D. Roosevelt in the 1936 presidential election. The Literary Digest conducted a poll based on a list of its subscribers, telephone directories, and automobile owners. On the basis of the results, the Literary Digest predicted that Landon would win with 57% of the popular vote. However, Roosevelt won the election with about 62% of the popular vote.
 Nonresponse biasexists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to contact them.
 Nonresponse bias can be controlled using callbacks. For example, if a mailed questionnaire was not returned, a callback might mean phoning the individual to conduct the survey. If an individual was not at home, a callback might mean returning to the home at other times in the day.
 Another method to improve nonresponse is using rewards, such as cash payments for completing a questionnaire, or incentives such as a cover letter that states that the responses to the questionnaire will determine future policy.
 Response bias exists when the answers on a survey do not reflect the true feelings of the respondent. Response bias can occur in a number of ways.
 Interviewer Error A trained interviewer is essential to obtain accurate information from a survey. A skilled interviewer can elicit responses from individuals and make the interviewee feel comfortable enough to give truthful responses. For example, a good interviewer can obtain truthful answers to questions as sensitive as “Have you ever cheated on your taxes?”
 Misrepresented AnswersSome survey questions result in responses that misrepresent facts or are flatout lies. For example, a survey of recent college graduates may find that selfreported salaries are inflated. Also, people may overestimate their abilities. For example, ask people how many pushups they can do in 1 minute, and then ask them to do the pushups. How accurate were they?
 The wording of questions can significantly affect the responses and, therefore, the validity of a study.
 Wording of Questions The way a question is worded can lead to response bias in a survey, so questions must always be asked in balanced form. For example, the “yes/no” question
 Type of Question One of the first considerations in designing a question is determining whether the question should be open or closed.
 An open question allows the respondent to choose his or her response
 A closed question requires the respondent to choose from a list of predetermined responses
 Dataentry Error Although not technically a result of response bias, dataentry error will lead to results that are not representative of the population. Once data are collected, the results typically must be entered into a computer, which could result in input errors. For example, 39 may be entered as 93. It is imperative that data be checked for accuracy. In this text, we present some suggestions for checking for data error.
 Nonsampling errors result from undercoverage, nonresponse bias, response bias, or dataentry error. Such errors could also be present in a complete census of the population. Sampling errorresults from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.
 We can think of sampling error as error that results from using a subset of the population to describe characteristics of the population. Nonsampling error is error that results from obtaining and recording the information collected.
1.6 The Design of Experiments
Describe the characteristics of an experiment
 An experiment is a controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment.
 In an experiment, the experimental unit is a person, object, or some other welldefined item upon which a treatment is applied. We often refer to the experimental unit as a subject when he or she is a person. The subject is analogous to the individual in a survey.
 The goal in an experiment is to determine the effect various treatments have on the response variable. For example, we might want to determine whether a new treatment is superior to an existing treatment (or no treatment at all). To make this determination, experiments require a control group. A control groupserves as a baseline treatment that can be used to compare to other treatments.
 For example, a researcher in education might want to determine if students who do their homework using an online homework system do better on an exam than those who do their homework from the text. The students doing the text homework might serve as the control group (since this is the currently accepted practice). The factor is the type of homework. There are two treatments: online homework and text homework. A second method for defining the control group is through the use of a placebo. A placebo is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.
 Blinding refers to nondisclosure of the treatment an experimental unit is receiving. There are two types of blinding: single blinding and double blinding.
 In singleblind experiments, the experimental unit (or subject) does not know which treatment he or she is receiving.
In doubleblind experiments, neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving.
 To designan experiment means to describe the overall plan in conducting the experiment. Conducting an experiment requires a series of steps.
 Step 1 Identify the Problem to Be Solved. The statement of the problem should be as explicit as possible and should provide the experimenter with direction. The statement must also identify the response variable and the population to be studied. Often, the statement is referred to as the claim.
 Step 2 Determine the Factors That Affect the Response Variable. The factors are usually identified by an expert in the field of study. In identifying the factors, ask, “What things affect the value of the response variable?” After the factors are identified, determine which factors to fix at some predetermined level, which to manipulate, and which to leave uncontrolled.
 Step 3 Determine the Number of Experimental Units. As a general rule, choose as many experimental units as time and money allow. Techniques (such as those in Sections 9.1 and 9.2) exist for determining sample size, provided certain information is available.
 Step 4 Determine the Level of Each Factor.There are two ways to deal with the factors, control or randomize.
 1. Control:There are two ways to control the factors.
 (a) Set the level of a factor at one value throughout the experiment (if you are not interested in its effect on the response variable).
 (b) Set the level of a factor at various levels (if you are interested in its effect on the response variable). The combinations of the levels of all varied factors constitute the treatments in the experiment.
 2. Randomize: Randomly assign the experimental units to various treatment groups so that the effect of factors whose levels cannot be controlled is minimized. The idea is that randomization averages out the effects of uncontrolled factors (explanatory variables). It is difficult, if not impossible, to identify all factors in an experiment. This is why randomization is so important. It mutes the effect of variation attributable to factors not controlled.
 1. Control:There are two ways to control the factors.
 Step 5 Conduct the Experiment.
 (a) Randomly assign the experimental units to the treatments. Replication occurs when each treatment is applied to more than one experimental unit. Using more than one experimental unit for each treatment ensures the effect of a treatment is not due to some characteristic of a single experimental unit. It is a good idea to assign an equal number of experimental units to each treatment.
 (b) Collect and process the data. Measure the value of the response variable for each replication. Then organize the results. The idea is that the value of the response variable for each treatment group is the same before the experiment because of randomization. Then any difference in the value of the response variable among the different treatment groups is a result of differences in the level of the treatment.
 Step 6 Test the Claim. This is the subject of inferential statistics. Inferential statistics is a process in which generalizations about a population are made on the basis of results obtained from a sample. Provide a statement regarding the level of confidence in the generalization.
Chapter 1 Summary
We defined statistics as a science in which data are collected, organized, summarized, and analyzed to infer characteristics regarding a population. Statistics also provides a measure of confidence in the conclusions that are drawn. Descriptive statistics consists of organizing and summarizing information, while inferential statistics consists of drawing conclusions about a population based on results obtained from a sample. The population is a collection of individuals about which information is desired and the sample is a subset of the population. Data are the observations of a variable. Data can be either qualitative or quantitative. Quantitative data are either discrete or continuous. Data can be obtained from four sources: a census, an existing source, an observational study, or a designed experiment. A census will list all the individuals in the population, along with certain characteristics. Due to the cost of obtaining a census, most researchers opt for obtaining a sample. In observational studies, the response variable is measured without attempting to influence its value. In addition, the explanatory variable is not manipulated. Designed experiments are used when control of the individuals in the study is desired to isolate the effect of a certain treatment on a response variable. We introduced five sampling methods: simple random sampling, stratified sampling, systematic sampling, cluster sampling, and convenience sampling. All the sampling methods, except for convenience sampling, allow for unbiased statistical inference to be made. Convenience sampling typically leads to an unrepresentative sample and biased results.
CHAPTER 2
1 Organize Qualitative Data in Tables
 A frequency distribution lists each category of data and the number of occurrences for each category of data.
 The relative frequencyis the proportion (or percent) of observations within a category and is found using the formula Relative frequency = frequency sum of all frequencies
 A relative frequency distribution lists each category of data together with the relative frequency.
 A frequency distribution shows the number of observations that belong in each category. A relative frequency distribution shows the proportion of observations that belong in each category.
2 Construct Bar Graphs
 Once raw data are organized in a table, we can create graphs. Graphs allow us to see the data and help us understand what the data are saying about the individuals in the study.
 A bar graphis constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency.
 Graphs that start the scale at some value other than 0 or have bars with unequal widths, bars with different colors, or threedimensional bars can misrepresent the data.
 A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency.
 Suppose we want to know whether more people are finishing college today than in 1990. We could draw a sidebyside bar graph to compare the data for the two different years. Data sets should be compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies difficult or misleading.
 So far we have only looked at bar graphs with vertical bars. However, the bars may also be horizontal. Horizontal bars are preferable when category names are lengthy
3 Construct Pie Charts
 Pie charts are typically used to present the relative frequency of qualitative data. In most cases the data are nominal, but ordinal data can also be displayed in a pie chart.
 A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category.
 When should a bar graph or a pie chart be used? Pie charts are useful for showing the division of all possible values of a qualitative variable into its parts. However, because angles are often hard to judge in pie charts, they are not as useful in comparing two specific values of the qualitative variable. Instead the emphasis is on comparing the part to the whole. Bar graphs are useful when we want to compare the different parts, not necessarily the parts to the whole. For example, to get the “big picture” regarding educational attainment in 2009, a pie chart is a good visual summary. However, to compare bachelor's degrees to high school diplomas, a bar graph is a good visual summary. Since bars are easier to draw and compare, some practitioners forgo pie charts in favor of Pareto charts when comparing parts to the whole.

2.2 Organizing Quantitative Data: The Popular Displays
 A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.
 Classes are categories into which data are grouped. When a data set consists of a large number of different discrete data values or when a data set consists of continuous data, we must create classes by using intervals of numbers.
 Creating the classes for summarizing continuous data is an art form. There is no such thing as the correct frequency distribution. However, there can be less desirable frequency distributions. The larger the class width, the fewer classes a frequency distribution will have.
 Choosing the Lower Class Limit of the First Class
 Choose the smallest observation in the data set or a convenient number slightly lower than the smallest observation in the data set. For example, in Table 12, the smallest observation is 3.22. A convenient lower class limit of the first class is 3.
 Determining the Class Width
 Decide on the number of classes. Generally, there should be between 5 and 20 classes. The smaller the data set, the fewer classes you should have. For example, we might choose 10 classes for the data in Table 12.
 Determine the class width by computing: Class width ≈ largest data value − smallest data value number of classes Round this value up to a convenient number. For example, using the data in Table 12, we obtain class width ≈ 12.03 − 3.22 10 = 0.881. We round this up to 1 because this is an easy number to work with. Rounding up may result in fewer classes than were originally intended.
 Rounding up is different from rounding off. For example, 6.2 rounded up would be 7, while 6.2 rounded off would be 6.
 A stemandleaf plot is another way to represent quantitative data graphically. In a stemandleaf plot (or stem plot), we use the digits to the left of the rightmost digit to form the stem. Each rightmost digit forms a leaf. For example, a data value of 147 would have 14 as the stem and 7 as the leaf.

 Construction of a StemandLeaf Plot
 Step 1 The stem of a data value will consist of the digits to the left of the right most digit. The leaf of a data value will be the rightmost digit.
 Step 2 Write the stems in a vertical column in increasing order. Draw a vertical line to the right of the stems.
 Step 3 Write each leaf corresponding to the stems to the right of the vertical line.
 Step 4 Within each stem, rearrange the leaves in ascending order, title the plot, and include a legend to indicate what the values represent.
 One more graph! We draw a dot plot by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed. Though limited in usefulness, dot plots provide a quick picture of the data.
 Uniform distribution is when the frequency of each value of the variable is evenly spread out across the values of the variable.
 Bellshaped distribution when the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle.
 Skewed right when the tail to the right of the peak is longer than the tail to the left of the peak.
 Skewed left when the tail to the left of the peak is longer than the tail to the right of the peak.
2.3 Additional Displays of Quantitative Data
1 Construct Frequency Polygons
 Another way to graphically represent quantitative data sets is through frequency polygons. They provide the same information as histograms.
 A class midpoint is the sum of consecutive lower class limits divided by 2.
 A frequency polygon is a graph that uses points, connected by line segments, to represent the frequencies for the classes. It is constructed by plotting a point above each class midpoint on a horizontal axis at a height equal to the frequency of the class. Next, line segments are drawn connecting consecutive points. Two additional line segments are drawn connecting each end of the graph with the horizontal axis.

2 Create Cumulative Frequency and Relative Frequency Tables
 Since quantitative data can be ordered (that is, written in ascending or descending order), they can be summarized in a cumulative frequency distribution and a cumulative relative frequency distribution.
 A cumulative frequency distribution displays the aggregate frequency of the category. In other words, for discrete data, it displays the total number of observations less than or equal to the category. For continuous data, it displays the total number of observations less than or equal to the upper class limit of a class.
 A cumulative relative frequency distribution displays the proportion (or percentage) of observations less than or equal to the category for discrete data and the proportion (or percentage) of observations less than or equal to the upper class limit for continuous data.
 Since quantitative data can be ordered (that is, written in ascending or descending order), they can be summarized in a cumulative frequency distribution and a cumulative relative frequency distribution.
 Construction of a StemandLeaf Plot
3 Construct Frequency and Relative Frequency Ogives
 An ogive (read as “ oh j i ¯ ve ”) is a graph that represents the cumulative frequency or cumulative relative frequency for the class. It is constructed by plotting points whose xcoordinates are the upper class limits and whose ycoordinates are the cumulative frequencies or cumulative relative frequencies of the class. Then line segments are drawn connecting consecutive points. An additional line segment is drawn connecting the first point to the horizontal axis at a location representing the upper limit of the class that would precede the first class (if it existed).
 We can construct a relative frequency ogive using the data in Table 20 by plotting points whose xcoordinates are the upper class limits and whose ycoordinates are the cumulative relative frequencies of the classes. We then connect the points with line segments.
4 Draw TimeSeries Graphs
 If the value of a variable is measured at different points in time, the data are referred to as timeseries data. The closing price of Cisco Systems stock at the end of each month for the past 12 years is an example of timeseries data.
 A timeseries plotis obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points.
2.4 Graphical Misrepresentations of Data
1 Describe What Can Make a Graph Misleading or Deceptive
 Statistics: The only science that enables different experts using the same figures to draw different conclusions.—EVAN ESAR
 Often, statistics gets a bad rap for having the ability to manipulate data to support any position. One method of distorting the truth is through graphics. Since graphics are so powerful, care must be taken in constructing graphics and in interpreting their messages. Graphics may mislead or deceive. We will call graphs misleading if they unintentionally create an incorrect impression. We consider graphs deceptive if they purposely create an incorrect impression. In either case, a reader's incorrect impression can have serious consequences. Therefore, it is important to be able to recognize misleading and deceptive graphs.
 The most common graphical misrepresentations of data involve the scale of the graph, an inconsistent scale, or a misplaced origin. Increments between tick marks should be constant, and scales for comparative graphs should be the same. Also, because readers usually assume that the baseline, or zero point, is at the bottom of the graph, a graph that begins at a higher or lower value can be misleading.
Chapter 2 Summary
 Raw data are first organized into tables. Data are organized by creating classes into which they fall. Qualitative data and discrete data have values that provide clearcut categories of data. However, with continuous data the categories, called classes, must be created. Typically, the first table created is a frequency distribution, which lists the frequency with which each class of data occurs. Other types of distributions include the relative frequency distribution and the cumulative frequency distribution.
 Once data are organized into a table, graphs are created. For data that are qualitative, we can create bar charts and pie charts. For data that are quantitative, we can create histograms, stemandleaf plots, frequency polygons, and ogives.
 In creating graphs, care must be taken not to draw a graph that misleads or deceives the reader. If a graph's vertical axis does not begin at zero, the symbol, ( a zigzag that looks like a lightning bolt), should be used to indicate the gap that exists in the graph.
Chapter 3
Statistics Informed Decisions Using Data 4^{th} edition
Michael Sullivan
3.1 Measures of Central Tendency
1 Determine the Arithmetic Mean of a Variable from Raw Data
 In everyday language, the word averageoften represents the arithmetic mean. To compute the arithmetic mean of a set of data, the data must be quantitative.
 Whenever you hear the word average, be aware that the word may not always be referring to the mean. One average could be used to support one position, while another average could be used to support a different position.
 The arithmetic meanof a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations.
 The arithmetic mean is generally referred to as the mean.
 The population arithmetic mean, μ (pronounced “mew”), is computed using all the individuals in a population. The population mean is a parameter.
 The sample arithmetic mean, x ¯ (pronounced “xbar”), is computed using sample data. The sample mean is a statistic.
 To find the mean of a set of data, add up all the observations and divide by the number of observations.
 We usually use Greek letters to represent parameters and Roman letters to represent statistics. The formulas for computing population and sample means follow:
 If x 1 , x 2 , … , x N are the N observations of a variable from a population, then the population mean, μ , is μ = x 1 + x 2 + … + x N N = ∑ x i N ( 1 ) If x 1 , x 2 , … , x n are n observations of a variable from a sample, then the sample mean, x ¯ , is x ¯ = x 1 + x 2 + ··· + x n n = ∑ x i n ( 2 )
 Note that N represents the size of the population, and n represents the size of the sample. The symbol Σ (the Greek letter capital sigma) tells us to add the terms. The subscript i shows that the various values are distinct and does not serve as a mathematical operation. For example, x 1 is the first data value, x 2 is the second, and so on.
 It helps to think of the mean of a data set as the center of gravity. In other words, the mean is the value such that a histogram of the data is perfectly balanced, with equal weight on each side of the mean.
2 Determine the Median of a Variable from Raw Data
 A second measure of central tendency is the median. To compute the median of a set of data, the data must be quantitative.
 The median of a variable is the value that lies in the middle of the data when arranged in ascending order. We use Mto represent the median.
 To help remember the idea behind the median, think of the median of a highway; it divides the highway in half. So the median divides the data in half, with at most half the data below the median and at most half above it.
 Steps in Finding the Median of a Data Set
 Step 1 Arrange the data in ascending order.
 Step 2 Determine the number of observations, n.
 Step 3 Determine the observation in the middle of the data set.
 • If the number of observations is odd, then the median is the data value exactly in the middle of the data set. That is, the median is the observation that lies in the n + 1 2 position.
 • If the number of observations is even, then the median is the mean of the two middle observations in the data set. That is, the median is the mean of the observations that lie in the n 2 position and the n 2 + 1 position.
 Problem The data in Table 2 represent the length (in seconds) of a random sample of songs released in the 1970s. Find the median length of the songs.
Table 2
Song Name
 Length


�Sister Golden Hair�
 201

�Black Water�
 257

�Free Bird�
 284

�The Hustle�
 208

�Southern Nights�
 179

�Stayin� Alive�
 222

�We Are Family�
 217

�Heart of Glass�
 206

�My Sharona�
 240

Approach We will follow the steps listed above.
Solution
 Step 1 Arrange the data in ascending order:
179 , 201 , 206 , 208 , 217 , 222 , 240 , 257 , 284
 Step 2 There are n = 9 observations.
 Step 3 Since n is odd, the median, M, is the observation exactly in the middle of the data set, 217 seconds (the n + 1 2 = 9 + 1 2 = 5 th data value). We list the data in ascending order and show the median in blue.
179 , 201 , 206 , 208 , 217 , 222 , 240 , 257 , 284
Notice there are four observations on each side of the median.
 Problem Find the median score of the data in Table 1
Approach We will follow the steps listed on the previous page.
Solution
Step 1 Arrange the data in ascending order:
62 , 68 , 71 , 74 , 77 , 82 , 84 , 88 , 90 , 94
Step 2 There are n = 10 observations.
Step 3 Because n is even, the median is the mean of the two middle observations, the fifth ( n 2 = 10 2 = 5 ) and sixth ( n 2 + 1 = 10 2 + 1 = 6 ) observations with the data written in ascending order. So the median is the mean of 77 and 82:
M = 77 + 82 2 = 79.5
Notice that there are five observations on each side of the median.
62 , 68 , 71 , 74 , 77 , 82 , 84 , 88 , 90 , 94 ↑ M = 79.5
We conclude that 50% (or half) of the students scored less than 79.5 and 50% (or half) of the students scored above 79.5.
3 Explain What It Means for a Statistic to Be Resistant
 Problem Yolanda wants to know how much time she typically spends on her cell phone. She goes to her phone's Web site and records the call length for a random sample of 12 calls, shown in Table 3 on the following page. Find the mean and median length of a cell phone call. Which measure of central tendency better describes the length of a typical phone call?
Table 3 Source: Yolanda Sullivan's cell phone records
1
 7
 4
 1

2
 4
 43
 48

3
 5
 3
 6

Approach We will find the mean and median using MINITAB. To help judge which is the better measure of central tendency, we will also draw a dot plot of the data.
Solution Figure 4 indicates that the mean call length is x ¯ = 7.3 minutes and the median call length is 3.5 minutes. Figure 5 shows a dot plot of the data using MINITAB.
Figure 4 Descriptive Statistics: TalkTime
Variable
 N
 N*
 Mean
 SE Mean
 StDev
 Minimum
 Q1
 Median
 Q3
 Maximum


TalkTime
 12
 0
 7.25
 3.74
 12.96
 1
 2.25
 5.75
 3.5
 48

Which measure of central tendency do you think better describes the typical call length? Since only one phone call is longer than the mean, we conclude that the mean is not representative of the typical call length. So the median is the better measure of central tendency.
 A numerical summary of data is said to be resistantif extreme values (very large or small) relative to the data do not affect its value substantially.
 So the median is resistant, while the mean is not resistant.
 When data are skewed, there are extreme values in the tail, which tend to pull the mean in the direction of the tail. For example, in skewedright distributions, there are large observations in the right tail. These observations increase the value of the mean, but have little effect on the median. Similarly, if a distribution is skewed left, the mean tends to be smaller than the median. In symmetric distributions, the mean and the median are close in value.
 A word of caution is in order. The relation between the mean, median, and skewness are guidelines. The guidelines tend to hold up well for continuous data, but when the data are discrete, the rules can be easily violated.
 You may be asking yourself, “Why would I ever compute the mean?” After all, the mean and median are close in value for symmetric data, and the median is the better measure of central tendency for skewed data. The reason we compute the mean is that much of the statistical inference that we perform is based on the mean.
4 Determine the Mode of a Variable from Raw Data
 A third measure of central tendency is the mode, which can be computed for either quantitative or qualitative data.
 The mode of a variable is the most frequent observation of the variable that occurs in the data set.
 To compute the mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no mode, one mode, or more than one mode. If no observation occurs more than once, we say the data have no mode.
 Problem The following data represent the number of Oring failures on the shuttle Columbia for its 17 flights prior to its fatal flight:
0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 , 2 , 3
Find the mode number of Oring failures.
Approach We tally the number of times we observe each data value. The data value with the highest frequency is the mode.
Solution The mode is 0 because it occurs most frequently (11 times).
 Problem Find the mode of the exam score data listed in Table 1, which is repeated here:
82 , 77 , 90 , 71 , 62 , 68 , 74 , 84 , 94 , 88
Approach Tally the number of times we observe each data value. The data value with the highest frequency is the mode.
Solution Since each data value occurs only once, there is no mode.
 Problem The data in Table 6 represent the location of injuries that required rehabilitation by a physical therapist. Determine the mode location of injury.
Table 6Source: Krystal Catton, student at Joliet Junior College
Back
 Back
 Hand
 Neck
 Knee
 Knee


Wrist
 Back
 Groin
 Shoulder
 Shoulder
 Back

Elbow
 Back
 Back
 Back
 Back
 Back

Back
 Shoulder
 Shoulder
 Knee
 Knee
 Back

Hip
 Knee
 Hip
 Hand
 Back
 Wrist

Approach Determine the location of injury that occurs with the highest frequency.
Solution The mode location of injury is the back, with 12 instances.
Measure of Central Tendency
 Computation
 Interpretation
 When to Use


Mean
 Population mean: ? = ? x i NSample mean: x � = ? x i n
 Center of gravity
 When data are quantitative and the frequency distribution is roughly symmetric

Median
 Arrange data in ascending order and divide the data set in half
 Divides the bottom 50% of the data from the top 50%
 When the data are quantitative and the frequency distribution is skewed left or right

Mode
 Tally data to determine most frequent observation
 Most frequent observation
 When the most frequent observation is the desired measure of central tendency or the data are qualitative

3.2 Measures of Dispersion
Determine the range of a variable from raw data
 Dispersion is the degree to which the data are spread out. Example 1 demonstrates why measures of central tendency are not sufficient in describing a distribution.
 The range, R, of a variable is the difference between the largest and the smallest data value. That is, Range = R = largest data value − smallest data value
 Problem The data in Table 8 represent the scores on the first exam of 10 students enrolled in Introductory Statistics. Compute the range.
Table 8
Student
 Score


1. Michelle
 82

2. Ryanne
 77

3. Bilal
 90

4. Pam
 71

5. Jennifer
 62

6. Dave
 68

7. Joel
 74

8. Sam
 84

9. Justine
 94

10. Juan
 88

Approach The range is the difference between the largest and smallest data values.
Solution The highest test score is 94 and the lowest test score is 62. The range is
R = 94 − 62 = 32
All the students in the class scored between 62 and 94 on the exam. The difference between the best score and the worst score is 32 points.
 Notice that the range is affected by extreme values in the data set, so the range is not resistant. If Jennifer scored 28, the range becomes R = 94 − 28 = 66. Also, the range is computed using only two values in the data set (the largest and smallest). The standard deviation, on the other hand, uses all the data values in the computations.
2 Determine the Standard Deviation of a Variable from Raw Data
 Measures of dispersion are meant to describe how spread out data are. In other words, they describe how far, on average, each observation is from the typical data value. Standard deviation is based on the deviation about the mean. For a population, the deviation about the mean for the ith observation is x i − μ . For a sample, the deviation about the mean for the ith observation is x i − x ¯ . The further an observation is from the mean, the larger the absolute value of the deviation.
 The sum of all deviations about the mean must equal zero. That is, Σ ( x i − μ ) = 0 and Σ ( x i − x ¯ ) = 0
 This result follows from the fact that observations greater than the mean are offset by observations less than the mean. Because this sum is zero, we cannot use the average deviation about the mean as a measure of spread. There are two possible solutions to this “problem.” We could either find the mean of the absolute values of the deviations about the mean, or we could find the mean of the squared deviations because squaring a nonzero number always results in a positive number. The first approach yields a measure of dispersion called the mean absolute deviation (MAD) (see Problem 42). The second approach leads to variance. The problem with variance is that squaring the deviations about the mean leads to squared units of measure, such as dollars squared. It is difficult to have a reasonable interpretation of dollars squared, so we “undo” the squaring process by taking the square root of the sum of squared deviations. We have the following definition for the population standard deviation.
 Recall,  a  = a if a ≥ 0 , and  a  = − a if a < 0 so  3  = 3 and  − 3  = 3
 The population standard deviation of a variable is the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is, it is the square root of the mean of the squared deviations about the population mean. The population standard deviation is symbolically represented by σ (lowercase Greek sigma).
 σ = ( x 1 − μ ) 2 + ( x 2 − μ ) 2 + ⋯ + ( x N − μ ) 2 N = ∑ ( x i − μ ) 2 N ( 1 ) where x 1 , x 2 , … , x N are the N observations in the population and μ is the population mean.
 A formula that is equivalent to Formula (1), called the computational formula, for determining the population standard deviation is σ = ∑ x i 2 − ( ∑ x i ) 2 N N ( 2 ) where ∑ x i 2 means to square each observation and then sum these squared values, and ( Σ x i ) 2 means to add up all the observations and then square the sum.
 Problem Compute the population standard deviation of the test scores in Table 8.
Approach Using Formula (1)
 Approach Using Formula (2)


• Step 1 Create a table with four columns. Enter the population data in Column 1. In Column 2, enter the population mean. • Step 2 Compute the deviation about the mean for each data value, x i − μ . Enter the result in Column 3.
 • Step 1 Create a table with two columns. Enter the population data in Column 1. Square each value in Column 1 and enter the result in Column 2. • Step 2 Sum the entries in Column 1. That is, find Σ x i . Sum the entries in Column 2. That is, find ∑ x i 2 .

• Step 3 In Column 4, enter the squares of the values in Column 3. • Step 4 Sum the entries in Column 4, and divide this result by the size of the population, N. • Step 5 Determine the square root of the value found in Step 4.
 • Step 3 Substitute the values found in Step 2 and the value for N into the computational formula and simplify.

• Step 1 See Table 9. Column 1 lists the observations in the data set, and Column 2 contains the population mean
 • Step 1 See Table 10. Column 1 lists the observations in the data set, and Column 2 contains the values in column 1 squared.

Table 9
Score, x i
 Population Mean, ?
 Deviation about the Mean, x i ? ?
 Squared Deviationsabout the Mean, ( x i ? ? ) 2


82
 79
 82 ? 79 = 3
 3 2 = 9

77
 79
 77 ? 79 = ? 2
 ( ? 2 ) 2 = 4

90
 79
 11
 121

71
 79
 ? 8
 64

62
 79
 ? 17
 289

68
 79
 ? 11
 121

74
 79
 ? 5
 25

84
 79
 5
 25

94
 79
 15
 225

88
 79
 9
 81

? ( x i ? ? ) = 0
 ? ( x i ? ? ) 2 = 964

Table 10
Score, x i
 Score Squared, x 2 i


82
 82 2 = 6724

77
 77 2 = 5929

90
 8100

71
 5041

62
 3844

68
 4624

74
 5476

84
 7056

94
 8836

88
 7744

? x i = 790
 ? x i 2 = 63 , 374

Formula (1) is sometimes referred to as the conceptual formula because it allows us to see how standard deviation measures spread. Look at Table 9 in Example 3. Notice that observations that are “far” from the mean, 79, result in larger squared deviations about the mean. For example, because the second observation, 77, is not “far” from 79, the squared deviation, 4, is not large, whereas the fifth observation, 62, is rather “far” from 79, so the squared deviation, 289, is much larger. So, if a data set has many observations that are “far” from the mean, the sum of the squared deviations will be large, and therefore the standard deviation will be large.
 The sample standard deviation, s, of a variable is the square root of the sum of squared deviations about the sample mean divided by n − 1, where n is the sample size.s = ( x 1 − x ¯ ) 2 + ( x 2 − x ¯ ) 2 + … ( x n − x ¯ ) 2 n − 1 = ∑ ( x i − x ¯ ) 2 n − 1 ( 3 ) where x 1 , x 2 , … , x n are the n observations in the sample and x ¯ is the sample mean.
 A computational formula that is equivalent to Formula (3) for computing the sample standard deviation is s = ∑ x i 2 − ( ∑ x i ) 2 n n − 1 ( 4 )
 The standard deviation is used along with the mean to numerically describe distributions that are bell shaped and symmetric. The mean measures the center of the distribution, while the standard deviation measures the spread of the distribution. So how does the value of the standard deviation relate to the spread of the distribution?
 If we are comparing two populations, then the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure. The units of measure must be the same so that we are comparing apples with apples. For example, $100 is not the same as 100 Japanese yen, because $1 is equivalent to about 109 yen. This means a standard deviation of $100 is substantially higher than a standard deviation of 100 yen.
3 Determine the Variance of a Variable from Raw Data
 The variance of a variable is the square of the standard deviation. The population variance is σ 2 and the sample variance is s 2 .
4 Use the Empirical Rule to Describe Data That Are Bell Shaped
 If data have a distribution that is bell shaped, the Empirical Rule can be used to determine the percentage of data that will lie within k standard deviations of the mean.
 If a distribution is roughly bell shaped, then approximately 68% of the data will lie within 1 standard deviation of the mean. That is, approximately 68% of the data lie between μ − 1 σ and μ + 1 σ. approximately 95% of the data will lie within 2 standard deviations of the mean. That is, approximately 95% of the data lie between μ − 2 σ and μ + 2 σ . Approximately 99.7% of the data will lie within 3 standard deviations of the mean. That is, approximately 99.7% of the data lie between μ − 3 σ and μ + 3 σ .
5 Use Chebyshev's Inequality to Describe Any Set of Data
 Chebyshev's Inequality
 For any data set or distribution, at least ( 1 − 1 k 2 ) 100 % of the observations lie within k standard deviations of the mean, where k is any number greater than 1. That is, at least ( 1 − 1 k 2 ) 100 % of the data lie between μ − k σ and μ + k σ for k > 1.
Example 9 Using Chebyshev's Inequality
Problem Use the data from University A in Table 7.
 (a) Determine the minimum percentage of students who have IQ scores within 3 standard deviations of the mean according to Chebyshev's Inequality.
 (b) Determine the minimum percentage of students who have IQ scores between 67.8 and 132.2, according to Chebyshev's Inequality.
 (c) Determine the actual percentage of students who have IQ scores between 67.8 and 132.2.
Approach
 (a) We use Chebyshev's Inequality with k = 3.
 (b) We have to determine the number of standard deviations 67.8 and 132.2 are from the mean of 100.0. We then substitute this value for k in Chebyshev's Inequality.
 (c) We refer to Table 7 and count the number of observations between 67.8 and 132.2. We divide this result by 100, the number of observations in the data set.
Solution
 (a) Using Chebyshev's Inequality with k = 3 , we find that at least ( 1 − 1 3 2 ) 100 % = 88.9 % of all students have IQ scores within 3 standard deviations of the mean. Since the mean of the data set is 100.0 and the standard deviation is 16.1, at least 88.9% of the students have IQ scores between x ¯ − k s = 100.0 − 3 ( 16.1 ) = 51.7 and x ¯ + k s = 100 + 3 ( 16.1 ) = 148.3.
 (b) Since 67.8 is exactly 2 standard deviations below the mean [ 100 − 2 ( 16.1 ) = 67.8 ] and 132.2 is exactly 2 standard deviations above the mean [ 100 + 2 ( 16.1 ) = 132.2 ] , Chebyshev's Inequality (with k = 2 ) says that at least ( 1 − 1 2 2 ) 100 % = 75 % of all IQ scores lie between 67.8 and 132.2.
 (c) Of the 100 IQ scores listed, 96 or 96% are between 67.8 and 132.2. Notice that Chebyshev's Inequality provides a conservative result.
3.3 Measures of Central Tendency and Dispersion from Grouped Data
1 Approximate the Mean of a Variable from Grouped Data
 Since raw data cannot be retrieved from a frequency table, we assume that within each class the mean of the data values is equal to the class midpoint. We then multiply the class midpoint by the frequency. This product is expected to be close to the sum of the data that lie within the class.
 Approximate the Mean of a Variable from a Frequency Distribution
 Population Mean μ = ∑ x i f i ∑ f i = x 1 f 1 + x 2 f 2 + … + x n f n f 1 + f 2 + … + f n Sample Mean x ¯ = ∑ x i f i ∑ f i = x 1 f 1 + x 2 f 2 + … + x n f n f 1 + f 2 + … + f n ( 1 ) where x i is the midpoint or value of the ith class f i is the frequency of the ith class n is the number of classes.
2 Compute the Weighted Mean
 Sometimes, certain data values have a higher importance or weight associated with them. In this case, we compute the weighted mean. For example, your gradepoint average is a weighted mean, with the weights equal to the number of credit hours in each course. The value of the variable is equal to the grade converted to a point value.
 The weighted mean, x ¯ w , of a variable is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing this sum by the sum of the weights. It can be expressed using the formula
 x ¯ w = Σ w i x i Σ w i = w 1 x 1 + w 2 x 2 + … + w n x n w 1 + w 2 + … + w n ( 2 )
 where w i is the weight of the ith observation
 x i is the value of the ith observation
 The weighted mean, x ¯ w , of a variable is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing this sum by the sum of the weights. It can be expressed using the formula
 x ¯ w = Σ w i x i Σ w i = w 1 x 1 + w 2 x 2 + … + w n x n w 1 + w 2 + … + w n ( 2 )
 where w i is the weight of the ith observation
 x i is the value of the ith observation
3 Approximate the Standard Deviation of a Variable from Grouped Data
 Approximate the Standard Deviation of a Variable from a Frequency Distribution
 Population Standard Deviation σ = Σ ( x i − μ ) 2 f i Σ f i Sample Standard Deviation s = Σ ( x i − x ¯ ) 2 f i ( Σ f i ) − 1 ( 3 )
 where x i is the midpoint or value of the ith class
 f i is the frequency of the ith class
 An algebraically equivalent formula for the population standard deviation is
 Σ x i 2 f i − ( Σ x i f i ) 2 Σ f i Σ f i
 Problem The data in Table 13 represent the 5year rate of return of a random sample of 40 largeblended mutual funds. Approximate the standard deviation of the 5year rate of return.
Approach We will use the sample standard deviation Formula (3).
 Step 1 Create a table with the class in column 1, the class midpoint in column 2, the frequency in column 3, and the unrounded mean in column 4.
 Step 2 Compute the deviation about the mean, x i − x ¯ , for each class, where x i is the class midpoint of the ith class and x ¯ is the sample mean. Enter the results in column 5.
 Step 3 Square the deviation about the mean and multiply this result by the frequency to obtain ( x i − x ¯ ) 2 f i . Enter the results in column 6.
 Step 4 Add the entries in columns 3 and 6 to obtain Σ f i and Σ ( x i − x ¯ ) 2 f i .
 Step 5 Substitute the values found in Step 4 into Formula (3) to obtain an approximate value for the sample standard deviation.
Solution
 Step 1 We create Table 15. Column 1 contains the classes. Column 2 contains the class midpoint of each class. Column 3 contains the frequency of each class. Column 4 contains the unrounded sample mean obtained in Example 1.
Table 15
Class (5year rate of return)
 Class Midpoint, x i
 Frequency, f i
 x �
 x i ? x �
 ( x i ? x � ) 2 f i


33.99
 3 + 4 2 = 3.5
 16
 5.2
 ?1.7
 46.24

44.99
 4.5
 13
 5.2
 ?0.7
 6.37

55.99
 5.5
 4
 5.2
 0.3
 0.36

66.99
 6.5
 1
 5.2
 1.3
 1.69

77.99
 7.5
 0
 5.2
 2.3
 0

88.99
 8.5
 1
 5.2
 3.3
 10.89

99.99
 9.5
 0
 5.2
 4.3
 0

1010.99
 10.5
 2
 5.2
 5.3
 56.18

1111.99
 11.5
 2
 5.2
 6.3
 79.38

1212.99
 12.5
 1
 5.2
 7.3
 53.29

? f i = 40
 ? ( x i ? x � ) 2 f i = 254.4
 ? ( x i ? x � ) 2 f i = 254.4

 Step 2 Column 5 contains the deviation about the mean, x i − x ¯ , for each class.
 Step 3 Column 6 contains the values of the squared deviation about the mean multiplied by the frequency, ( x i − x ¯ ) 2 f i .
 Step 4 Add the entries in columns 3 to 6 to obtain Σ f i = 40 and Σ ( x i − x ¯ ) 2 f i = 254.4.
 Step 5 Substitute these values into Formula (3) to obtain an approximate value for the sample standard deviation.
s = Σ ( x i − x ¯ ) 2 f i ( Σ f i ) − 1 = 254.4 40 − 1 ≈ 2.55
The approximate standard deviation of the 5year rate of return is 2.55%. The standard deviation from the raw data listed in Example 3 from Section 2.2 is 2.64%.
3.4 Measures of Position and Outliers
1 Determine and Interpret zScores
 The zscore represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation. There is both a population zscore and a sample zscore:
 Population z Score z = x − μ σ Sample z Score z = x − x ¯ s ( 1 )
 The zscore is unitless. It has mean 0 and standard deviation 1.
 If a data value is larger than the mean, the zscore is positive. If a data value is smaller than the mean, the zscore is negative. If the data value equals the mean, the zscore is zero. A zscore measures the number of standard deviations an observation is above or below the mean. For example, a zscore of 1.24 means the data value is 1.24 standard deviations above the mean. A zscore of − 2.31 means the data value is 2.31 standard deviations below the mean.
 The zscore provides a way to compare apples to oranges by converting variables with different centers or spreads to variables with the same center (0) and spread (1).
 Problem Determine whether the New York Yankees or the Cincinnati Reds had a relatively better runproducing season. The Yankees scored 859 runs and play in the American League, where the mean number of runs scored was μ = 721.2 and the standard deviation was σ = 93.5 runs . The Reds scored 790 runs and play in the National League, where the mean number of runs scored was μ = 700.7 and the standard deviation was σ = 58.4 runs .
Approach To determine which team had the relatively better runproducing season, we compute each team's zscore. The team with the higher zscore had the better season. Because we know the values of the population parameters, we will compute the population zscore.
Solution We compute each team's zscore, rounded to two decimal places.
Yankees: z − score = x − μ σ = 859 − 721.2 93.5 = 1.47 Reds: z − score = x − μ σ = 790 − 700.7 58.4 = 1.53
So the Yankees had run production 1.47 standard deviations above the mean, while the Reds had run production 1.53 standard deviations above the mean. Therefore, the Reds had a relatively better year at scoring runs than the Yankees.
2 Interpret Percentiles
 Recall that the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile.
 The kth percentile, denoted P k , of a set of data is a value such that kpercent of the observations are less than or equal to the value.
 So percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. For example, P 1 divides the bottom 1% of the observations from the top 99%, P 2 divides the bottom 2% of the observations from the top 98%, and so on.
 Problem Jennifer just received the results of her SAT exam. Her SAT Mathematics score of 600 is at the 74th percentile. What does this mean?
Approach The kth percentile of an observation means that k percent of the observations are less than or equal to the observation.
Interpretation A percentile rank of 74% means that 74% of SAT Mathematics scores are less than or equal to 600 and 26% of the scores are greater. So 26% of the students who took the exam scored better than Jennifer.
3 Determine and Interpret Quartiles
 The most common percentiles are quartiles. Quartiles divide data sets into fourths, or four equal parts. The first quartile, denoted Q 1 , divides the bottom 25% of the data from the top 75%. Therefore, the first quartile is equivalent to the 25th percentile. The second quartile, Q 2 , divides the bottom 50% of the data from the top 50%; it is equivalent to the 50th percentile or the median. Finally, the third quartile Q 3 , divides the bottom 75% of the data from the top 25%; it is equivalent to the 75th percentile.
 The first quartile, Q 1 , , is equivalent to the 25th percentile, P 25 . The 2nd quartile, Q 2 , is equivalent to the 50th percentile, P 50 , which is equivalent to the median, M. Finally, the third quartile, Q 3 , is equivalent to the 75th percentile, P 75 .
 Step 1 Arrange the data in ascending order.
 Step 2 Determine the median, M, or second quartile, Q 2 .
 Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q 1 , is the median of the bottom half and the third quartile, Q 3 , is the median of the top half.
4 Determine and Interpret the Interquartile Range
 Quartiles, on the other hand, are resistant to extreme values. For this reason, we would like to find a measure of dispersion that is based on quartiles.
 The interquartile range, IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula
 IQR = Q 3 − Q 1
Shape Of Distribution
 Measure Of Central Tendency
 Measure Of Dispersion


Symmetric
 Mean
 Standard Deviation

Skewed Left Or Skewed Right
 Median
 Interquartile Range

5 Check a Set of Data for Outliers
 When performing any type of data analysis, we should always check for extreme observations in the data set. Extreme observations are referred to as outliers. Outliers in a data set should be investigated. They can occur by chance, because of error in the measurement of a variable, during data entry, or from errors in sampling. For example, in the 2000 presidential election, a precinct in New Mexico accidentally recorded 610 absentee ballots for Al Gore as 110. Workers in the Gore camp discovered the dataentry error through an analysis of vote totals.
 Outliers do not always occur because of error. Sometimes extreme observations are common within a population. For example, suppose we wanted to estimate the mean price of a European car. We might take a random sample of size 5 from the population of all European automobiles. If our sample included a Ferrari F430 Spider (approximately $175,000), it probably would be an outlier, because this car costs much more than the typical European automobile. The value of this car would be considered unusual because it is not a typical value from the data set.
 Outliers distort both the mean and the standard deviation, because neither is resistant. Because these measures often form the basis for most statistical inference, any conclusions drawn from a set of data that contains outliers can be flawed.
 Checking for Outliers by Using Quartiles
 Problem Check the data that represent the collision coverage claims for outliers.
 Step 1 Determine the first and third quartiles of the data.
 Step 2 Compute the interquartile range.
 Step 3 Determine the fences. Fences serve as cutoff points for determining outliers.
 Lower fence = Q 1 + 1.5 ( IQR ) Upper fence = Q 3 + 1.5 ( IQR )
 Step 4 If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.
Approach We follow the preceding steps. Any data value that is less than the lower fence or greater than the upper fence will be considered an outlier.
Solution
Step 1 The quartiles found in Example 3 are Q 1 = $ 735 and Q 3 = $ 4668.
Step 2 The interquartile range, IQR, is IQR = Q 3 − Q 1 = $ 4668 − $ 735 = $ 3933
Step 3 The lower fence, LF, is LF = Q 1 − 1.5 ( IQR ) = $ 735 − 1.5 ( $ 3933 ) = − $ 5164.5
The upper fence, UF, is UF = Q 3 + 1.5 ( IQR ) = $ 4668 + 1.5 ( $ 39.33 ) = $ 10 , 567.5
Step 4 There are no observations below the lower fence. However, there is an observation above the upper fence. The claim of $21,147 is an outlier.
3.5 The FiveNumber Summary and Boxplots
 · Well, we want these summaries to see what the data can tell us. We explore the data to see if they contain interesting information that may be useful in our research. The summaries make this exploration much easier. In fact, because these summaries represent an exploration, a famous statistician named John Tukey called this material exploratory data analysis.
 Tukey defined exploratory data analysis as “detective work numerical detective work—or graphical detective work.” He believed exploration of data is best carried out the way a detective searches for evidence when investigating a crime. Our goal is only to collect and present evidence. Drawing conclusions (or inference) is like the deliberations of the jury. What we have done so far falls under the category of exploratory data analysis. We have only collected information and presented summaries, not reached any conclusions.
1 Compute the fivenumber summary
 The three measures of dispersion presented in Section 3.2 (range, variance, and standard deviation) are not resistant to extreme values. However, the interquartile range, Q 3 − Q 1 , the difference between the 75th and 25th percentiles, is resistant. It is interpreted as the range of the middle 50% of the data. However, the median, Q 1 , and Q 3 do not provide information about the extremes of the data, the smallest and largest values in the data set.
 The fivenumber summary of a set of data consists of the smallest data value, Q 1 , the median, Q 3 , and the largest data value. We organize, the fivenumber summary as follows:
 MINIMUM Q 1 M Q 3 MAXIMUM
 Problem The data shown in Table 17 show the finishing times (in minutes) of the men in the 60 to 64yearold age group in a 5kilometer race. Determine the fivenumber summary of the data.
Table 17 Source: Laura Gillogly, Student At Joliet Junior College
19.95
 23.25
 23.32
 25.55
 25.83
 26.28
 42.47

28.58
 28.72
 30.18
 30.35
 30.95
 32.13
 49.17

33.23
 33.53
 36.68
 37.05
 37.43
 41.42
 54.63

Approach The fivenumber summary requires that we list the minimum data value, Q 1 , M (the median), Q 3 , and the maximum data value. We need to arrange the data in ascending order and then use the procedures introduced in Section 3.4 to obtain Q 1 , M, and Q 3 .
Solution The data in ascending order are as follows:
19.95 , 23.25 , 23.32 , 25.55 , 25.83 , 26.28 , 28.58 , 28.72 , 30.18 , 30.35 , 30.95 , 32.13 , 33.23 , 33.53 , 36.68 , 37.05 , 37.43 , 41.42 , 42.47 , 49.17 , 54.63
The smallest number (the fastest time) in the data set is 19.95. The largest number in the data set is 54.63. The first quartile, Q 1 , is 26.06. The median, M, is 30.95. The third quartile, Q 3 , is 37.24. The fivenumber summary is
19.95 26.06 30.95 37.24 54.63
2 Draw and Interpret Boxplots
 The fivenumber summary can be used to create another graph, called the boxplot.
 Drawing a Boxplot
 Step 1 Determine the lower and upper fences: Lower fence = Q 1 − 1.5 ( I Q R ) Upper fence = Q 3 + 1.5 ( I Q R ) where I Q R = Q 3 − Q 1
 Step 2 Draw a number line long enough to include the maximum and minimum values. Insert vertical lines at Q 1 , M, and Q 3 . Enclose these vertical lines in a box.
 Step 3 Label the lower and upper fences.
 Step 4 Draw a line from Q 1 to the smallest data value that is larger than the lower fence. Draw a line from Q 3 to the largest data value that is smaller than the upper fence. These lines are called whiskers.
 Step 5 Any data values less than the lower fence or greater than the upper fence are outliers and are marked with an asterisk (*).
Chapter 3 Summary
This chapter concentrated on describing distributions numerically. Measures of central tendency are used to indicate the typical value in a distribution. Three measures of central tendency were discussed. The mean measures the center of gravity of the distribution. The data must be quantitative to compute the mean. The median separates the bottom 50% of the data from the top 50%. The data must be at least ordinal to compute the median. The mode measures the most frequent observation. The data can be either quantitative or qualitative to compute the mode. The median is resistant to extreme values, while the mean is not.
Measures of dispersion describe the spread in the data. The range is the difference between the highest and lowest data values. The standard deviation is based on the average squared deviation about the mean. The variance is the square of the standard deviation. The range, standard deviation, and variance, are not resistant. The mean and standard deviation are used in many types of statistical inference.
The mean, median, and mode can be approximated from grouped data. The standard deviation can also be approximated from grouped data.
We can determine the relative position of an observation in a data set using z scores and percentiles. A z score denotes how many standard deviations an observation is from the mean. Percentiles determine the percent of observations that lie above and below an observation. The interquartile range is a resistant measure of dispersion. The upper and lower fences can be used to identify potential outliers. Any potential outlier must be investigated to determine whether it was the result of a dataentry error or some other error in the datacollection process, or is an unusual value in the data set.
The fivenumber summary provides an idea about the center and spread of a data set through the median and the interquartile range. The length of the tails in the distribution can be determined from the smallest and largest data values. The fivenumber summary is used to construct boxplots. Boxplots can be used to describe the shape of the distribution and to visualize outliers.
Did you find these noted helpful
Would you like me to post more of my notes?
Chapter 4
4.1 Scatter Diagrams and Correlation
1 Draw and interpret scatter diagrams
 Before we can graphically represent bivariate data, we need to decide which variable we want to use to predict the value of the other variable.
 For example, it seems reasonable to think that as the speed at which a golf club is swung increases, the distance the golf ball travels also increases.
 Therefore, we might use clubhead speed to predict distance. We call distance the response (or dependent) variable and clubhead speed the explanatory (or predictor or independent) variable.
 The response variable is the variable whose value can be explained by the value of the explanatory or predictor variable.
 We use the term explanatory variable because it helps to explain variability in the response variable.
 A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.
 Two variables that are linearly related are positively associated when aboveaverage values of one variable are associated with aboveaverage values of the other variable and belowaverage values of one variable are associated with belowaverage values of the other variable. That is, two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.
 Two variables that are linearly related are negatively associated when aboveaverage values of one variable are associated with belowaverage values of the other variable. That is, two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases.
 If two variables are positively associated, then as one goes up the other also tends to go up. If two variables are negatively associated, then as one goes up the other tends to go down.
2 Describe the Properties of the Linear Correlation Coefficient
 It is dangerous to use only a scatter diagram to determine if two variables are linearly related.
 The horizontal and vertical scales of a scatter diagram should be set so that the scatter diagram does not mislead a reader.
 Just as we can manipulate the scale of graphs of univariate data, we can also manipulate the scale of the graphs of bivariate data, possibly resulting in incorrect conclusions. Therefore, numerical summaries of bivariate data should be used in addition to graphs to determine any relation that exists between two variables.
 The linear correlation coefficient or Pearson product moment correlation coefficient is a measure of the strength and direction of the linear relation between two quantitative variables. The Greek letter p (rho) represents the population correlation coefficient, and rrepresents the sample correlation coefficient. We present only the formula for the sample correlation coefficient.
 r = ∑ ( x i − x ¯ s x ) ( y i − y ¯ s y ) n − 1 ( 1 )
 where x ¯ is the sample mean of the explanatory variable
 s x is the sample standard deviation of the explanatory variable
 y ¯ is the sample mean of the response variable
 s y is the sample standard deviation of the response variable
 n is the number of individuals in the sample
 Properties of the Linear Correlation Coefficient
 1. The linear correlation coefficient is always between − 1 and 1, inclusive. That is, − 1 ≤ r ≤ 1 .
 2. If r = + 1 , then a perfect positive linear relation exists between the two variables. See Figure 4(a).
 3. If r = − 1 , then a perfect negative linear relation exists between the two variables. See Figure 4(d).
 4. The closer r is to + 1 , the stronger is the evidence of positive association between the two variables. See Figures 4(b) and 4(c).
 5. The closer r is to − 1 , the stronger is the evidence of negative association between the two variables. See Figures 4(e) and 4(f).
 6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So r close to 0 does not imply no relation, just no linear relation. See Figures 4(g) and 4(h).
 7. The linear correlation coefficient is a unitless measure of association. So the unit of measure for x and y plays no role in the interpretation of r.
 8. The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient.
 A linear correlation coefficient close to 0 does not imply that there is no relation, just no linear relation.
4 Determine Whether a Linear Relation Exists between Two Variables
 A question you may be asking yourself is “How do I know the correlation between two variables is strong enough for me to conclude that a linear relation exists between them?” Although rigorous tests can answer this question, for now we will be content with a simple comparison test.
 Testing for a Linear Relation
 Step 1 Determine the absolute value of the correlation coefficient.
 Step 2 Find the critical value in Table II from Appendix A for the given sample size.
 Step 3 If the absolute value of the correlation coefficient is greater than the critical value, we say a linear relation exists between the two variables. Otherwise, no linear relation exists.
 We use two vertical bars to denote absolute value, as in  5  or  − 4  . Recall,  5  = 5 ,  − 4  = 4 , and  0  = 0.
5 Explain the Difference between Correlation and Causation
 A linear correlation coefficient that implies a strong positive or negative association does not imply causation if it was computed using observational data.
 If data used in a study are observational, we cannot conclude the two correlated variables have a causal relationship.
 For example, the correlation between teenage birthrate and homicide rate since 1993 is 0.9987, but we cannot conclude that higher teenage birthrates cause a higher homicide rate because the data are observational.
 In fact, timeseries data are often correlated because both variables happen to move in the same (or opposite) direction over time. Both teenage birthrates and homicide rates have been declining since 1993, so they have a high positive correlation.
 Is there another way two variables can be correlated without there being a causal relation? Yes—through a lurking variable.
 A lurking variable is related to both the explanatory variable and response variable.
 For example, as airconditioning bills increase, so does the crime rate. Does this mean that folks should turn off their air conditioners so that crime rates decrease? Certainly not! In this case, the lurking variable is air temperature. As air temperatures rise, both airconditioning bills and crime rates rise.
 Confounding means that any relation that may exist between two variables may be due to some other variable not accounted for in the study.
4.2 LeastSquares Regression
1 Find the leastsquares regression line and use the line to make predictions
 Once the scatter diagram and linear correlation coefficient show that two variables have a linear relation, we then find a linear equation that describes this relation. One way to find a line that describes the relation is to select two points from the data that appear to provide a good fit and to find the equation of the line through these points.
 Problem The data in Table 5 on the following page represent the clubhead speed and the distance a golf ball travels for eight swings of the club.
Table 5Source: Paul Stephenson, Student at Joliet Junior College
ClubHead Speed (mph) x
 Distance (yd) y
 (x, y)


100
 257
 (100, 257)

102
 264
 (102, 264)

103
 274
 (103, 274)

101
 266
 (101, 266)

105
 277
 (105, 277)

100
 263
 (100, 263)

99
 258
 (99, 258)

105
 275
 (105, 275)

 (a) Find a linear equation that relates clubhead speed, x (the explanatory variable), and distance, y (the response variable), by selecting two points and finding the equation of the line containing the points.
 (b) Graph the line on the scatter diagram.
 (c) Use the equation to predict the distance a golf ball will travel if the clubhead speed is 104 miles per hour.
Approach
 (a)We perform the following steps:
 Step 1 Select two points so that a line drawn through the points appears to give a good fit. Call the points ( x 1 , y 1 ) and ( x 2 , y 2 ) . See the scatter diagram in Figure 1 on page 192.
 Step 2 Find the slope of the line containing the two points using m = y 2 − y 1 x 2 − x 1 .
 Step 3 Use the point–slope formula, y − y 1 = m ( x − x 1 ) , to find the equation of the line through the points selected in Step 1. Express the equation in the form y = m x + b , where m is the slope and b is the yintercept.
 (b) Draw a line through the points selected in Step 1 of part (a).
 (c) Let x = 104 in the equation found in part (a).
Solution
 (a) Step 1 We select ( x 1 , y 1 ) = ( 99 , 258 ) and ( x 2 , y 2 ) = ( 105 , 275 ) , because a line drawn through these two points seems to give a good fit.
Step 2 m = y 2 − y 1 x 2 − x 1 = 275 − 258 105 − 99 = 17 6 = 2.8333
Step 3 We use the point–slope formula to find the equation of the line.
y − y 1 = m ( x − x 1 ) y − 258 = 2.8333 ( x − 99 ) m = 2.8333 , x 1 = 99 , y 1 = 258 y − 258 = 2.8333 x − 280.4967 y = 2.8333 x − 22.4967 ( 1 )
The slope of the line is 2.8333, and the yintercept is − 22.4967.
 (b) Figure 10 shows the scatter diagram along with the line drawn through the points (99, 258) and (105, 275).
 (c) We let x = 104 in equation (1) to predict the distance.
y = 2.8333 ( 104 ) − 22.4967 = 272.2 yards
We predict that a golf ball will travel 272.2 yards when it is hit with a clubhead speed of 104 miles per hour.
 The residual represents how close our prediction comes to the actual observation. The smaller the residual, the better the prediction.
 The leastsquares regression line is the line that minimizes the sum of the squared errors (or residuals). This line minimizes the sum of the squared vertical distance between the observed values of y and those predicted by the line, y ^ (read “yhat”). We represent this as “ minimize ∑ residual s 2 ”.
 The LeastSquares Regression Line
 y ^ = b 1 x + b 0
 where
 b 1 = r · s y s x is the slope of the leastsquares regression line* ( 2 )
 and
 b 0 = y ¯ − b 1 x ¯ is the y intercept of the leastsquares regression line ( 3 )
 Note: x ¯ is the sample mean and s x is the sample standard deviation of the explanatory variable x; y ¯ is the sample mean and s y is the sample standard deviation of the response variable y.
 * An equivalent formula is b 1 = s x y s x x = ∑ x i y i − ( ∑ x i ) ( ∑ y i ) n ∑ x i 2 − ( ∑ x i ) 2 n
 The notation y ^ is used in the leastsquares regression line to remind us that it is a predicted value of y for a given value of x. The leastsquares regression line, y ^ = b 1 x + b 0 , always contains the point ( x ¯ , y ¯ ) . This property can be useful when drawing the leastsquares regression line by hand.
 Since s y and s x must both be positive, the sign of the linear correlation coefficient, r, and the sign of the slope of the leastsquares regression line, b 1 , are the same. For example, if r is positive, then b 1 will also be positive.
 The predicted value of y , y ^ , has an interesting interpretation. It is an estimate of the mean value of the response variable for any value of the explanatory variable. For example, suppose a leastsquares regression equation is obtained that relates students’ grade point average (GPA) to the number of hours studied each week. If the equation results in a predicted GPA of 3.14 when a student studies 20 hours each week, we would say the mean GPA of all students who study 20 hours each week is 3.14.
2 Interpret the Slope and the yIntercept of the LeastSquares Regression Line
 Interpretation of Slope In algebra, we learned that the definition of slope is rise run or change in y change in x . If a line has slope 2 3 , then if x increases by 3, y will increase by 2. Or if the slope of a line is − 4 = − 4 1 , then if x increases by 1, y will decrease by 4. Interpreting slope for leastsquares regression lines has a minor twist, however. Statistical models such as a leastsquares regression equation are probabilistic. This means that any predictions or interpretations made as a result of the model are based on uncertainty. Therefore, when we interpret the slope of a leastsquares regression equation, we do not want to imply that there is 100% certainty behind the interpretation.
 Interpretation of the yIntercept The yintercept of any line is the point where the graph intersects the vertical axis. It is found by letting x = 0 in an equation and solving for y. To interpret the yintercept, we must first ask two questions:
 1. Is 0 a reasonable value for the explanatory variable?
 2. Do any observations near x = 0 exist in the data set?
 If the answer to either of these questions is no, we do not interpret the yintercept. In the regression equation of Example 3, a swing speed of 0 miles per hour does not make sense, so we do not interpret the yintercept.
 In general, we interpret a yintercept as being the value of the response variable when the value of the explanatory variable is 0.
 The second condition for interpreting the yintercept is especially important because we should not use the regression model to make predictions outside the scope of the model, meaning we should not use the regression model to make predictions for values of the explanatory variable that are much larger or much smaller than those observed. This is a dangerous practice because we cannot be certain of the behavior of data for which we have no observations.
 When the correlation coefficient indicates no linear relation between the explanatory and response variables, and the scatter diagram indicates no relation at all between the variables, then we use the mean value of the response variable as the predicted value so that y ^ = y ¯ .
4.3 Diagnostics on the LeastSquares Regression Line
1 Compute and interpret the coefficient of determination
 The coefficient of determination, R 2 , measures the proportion of total variation in the response variable that is explained by the leastsquares regression line.
 The coefficient of determination is a measure of how well the leastsquares regression line describes the relation between the explanatory and response variables. The closer R 2 is to 1, the better the line describes how changes in the explanatory variable affect the value of the response variable.
 The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 ≤ R 2 ≤ 1. If R 2 = 0 , the leastsquares regression line has no explanatory value. If R 2 = 1 , the leastsquares regression line explains 100% of the variation in the response variable.
 The deviation between the observed and mean values of the response variable is called the total deviation, so total deviation = y − y ¯ . The deviation between the predicted and mean values of the response variable is called the explained deviation, so
 Total deviation = unexplained deviation + explained deviation y − y ¯ = ( y − y ^ ) + ( y ^ − y ¯ )
 Total variation = unexplained variation + explained variation ∑ ( y − y ¯ ) 2 = ∑ ( y − y ^ ) 2 + ∑ ( y ^ − y ¯ ) 2
 Unexplained variation is found by summing the squares of the residuals, ∑ residuals 2 = ∑ ( y − y ^ ) 2 . So the smaller the sum of squared residuals, the smaller the unexplained variation and, therefore, the larger R 2 will be. Therefore, the closer the observed y's are to the regression line (the predicted y's), the larger R 2 will be. The coefficient of determination, R 2 , is the square of the linear correlation coefficient for the leastsquares regression model y ^ = b 1 x + b 0 . Written in symbols, R 2 = r 2 .
2 Perform Residual Analysis on a Regression Model
 Recall that a residual is the difference between the observed value of yand the predicted value, y ^ . Residuals play an important role in determining the adequacy of the linear model. We will analyze residuals for the following purposes:
 To determine whether a linear model is appropriate to describe the relation between the explanatory and response variables
 To determine whether the variance of the residuals is constant
 To check for outliers
 In Section 4.1 we learned how to use the correlation coefficient to determine whether a linear relation exists between the explanatory variable and response variable. However, if a correlation coefficient indicates a linear relation exists between two variables, must that mean the relation is linear? No! To determine if a linear model is appropriate, we also need to draw a residual plot, which is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis.
 If a plot of the residuals against the explanatory variable shows a discernible pattern, such as a curve, then the explanatory and response variable may not be linearly related.
 · If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated. This requirement is called constant error variance
3 Identify Influential Observations
 An influential observation is an observation that significantly affects the leastsquares regression line's slope and/or yintercept, or the value of the correlation coefficient. How do we identify influential observations? We first remove the point that is believed to be influential from the data set, and then we recompute the correlation or regression line. If the correlation coefficient, slope, or yintercept changes significantly, the removed point is influential.
 Draw a boxplot of the explanatory variable. If an outlier appears, the observation corresponding to the outlier may be influential.
 Influential observations typically exist when the point is an outlier relative to the values of the explanatory variable.
 As with outliers, influential observations should be removed only if there is justification to do so. When an influential observation occurs in a data set and its removal is not warranted, two popular courses of action are to (1) collect more data so that additional points near the influential observation are obtained or (2) use techniques that reduce the influence of the influential observation. (These techniques are beyond the scope of this text.) In the case of Example 6, we are justified in removing Bubba Watson from the data set because our experiment called for the same player to swing the club for each trial.
4.4 Contingency Tables and Association
1 Compute the marginal distribution of a variable
 · The first step in summarizing data in a contingency table is to determine the distribution of each variable separately. To do so, we create marginal distributions.
 A marginal distributionof a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.
 The distributions are called marginal distributions because they appear in the right and the bottom margin of the contingency table.
 A marginal distribution removes the effect of either the row variable or the column variable in the contingency table.
 To create a marginal distribution for a variable, we calculate the row and column totals for each category of the variable. The row totals represent the distribution of the row variable. The column totals represent the distribution of the column variable.
 Problem Find the frequency marginal distributions for employment status and level of education from the data in Table 9.
Approach Find the row total for the category “employed” by adding the number of employed individuals who did not finish high school, who finished high school, and so on. Repeat this process for each category of employment status.
Find the column total for the category “did not finish high school” by adding the number of employed individuals, unemployed individuals, and individuals not in the labor force who did not finish high school. Repeat this process for each level of education.
Solution In Table 10, the blue entries represent the marginal distribution of the row variable “employment status.” For example, there were 9 , 993 + 34 , 130 + 34 , 067 + 43 , 992 = 122 , 182 thousand employed individuals in 2010. The red entries represent the marginal distribution of the column variable “level of education.”
Table 10 Level of Education
Employment Status
 Did Not Finish High School
 High School Graduate
 Some College
 Bachelor'S Degree Or Higher
 Totals


Employed
 9,993
 34,130
 34,067
 43,992
 122,182

Unemployed
 1,806
 3,838
 3,161
 2,149
 10,954

Not In The Labor Force
 19,969
 30,246
 18,373
 16,290
 84,878

TOTALS
 31,768
 68,214
 55,601
 62,431
 218,014

The marginal distribution for employment status removes the effect of level of education; the marginal distribution for level of education removes the effect of employment status. The marginal distribution of level of education shows there were about twice as many Americans with a bachelor's degree or higher as there were Americans who did not finish high school (62,431 thousand versus 31,768 thousand) in 2010. The marginal distribution of employment status shows that 122,182 thousand Americans were employed. The table also indicates that there were 218,014 thousand U.S. residents 25 years old or older.
2 Use the Conditional Distribution to Identify Association among Categorical Data
 A conditional distribution lists the relative frequency of each category of the response variable, given a specific value of the explanatory variable in the contingency table.
3 Explain Simpson's Paradox
 Simpson's Paradox describes a situation in which an association between two variables inverts or goes away when a third variable is introduced to the analysis.
Chapter 4 Summary
In this chapter we looked at describing the relation between two quantitative variables (Sections 4.1 to 4.3) and between two qualitative variables (Section 4.4).
The first step in describing the relation between two quantitative variables is to draw a scatter diagram. The explanatory variable is plotted on the horizontal axis and the corresponding response variable on the vertical axis. The scatter diagram can be used to discover whether the relation between the explanatory and the response variables is linear. In addition, for linear relations, we can judge whether the linear relation shows positive or negative association.
A numerical measure for the strength of linear relation between two quantitative variables is the linear correlation coefficient. It is a number between − 1 and 1, inclusive. Values of the correlation coefficient near − 1 are indicative of a negative linear relation between the two variables. Values of the correlation coefficient near + 1 indicate a positive linear relation between the two variables. If the correlation coefficient is near 0, then little linear relation exists between the two variables.
Be careful! Just because the correlation coefficient between two quantitative variables indicates that the variables are linearly related, it does not mean that a change in one variable causes a change in a second variable. It could be that the correlation is the result of a lurking variable.
Once a linear relation between the two variables has been discovered, we describe the relation by finding the leastsquares regression line. This line best describes the linear relation between the explanatory and response variables. We can use the leastsquares regression line to predict a value of the response variable for a given value of the explanatory variable.
The coefficient of determination, R 2 , measures the percent of variation in the response variable that is explained by the leastsquares regression line. It is a measure between 0 and 1, inclusive. The closer R 2 is to 1, the more explanatory value the line has. Whenever a leastsquares regression line is obtained, certain diagnostics must be performed. These include verifying that the linear model is appropriate, verifying the residuals have constant variance, and checking for outliers and influential observations.
Section 4.4 introduced methods that allow us to describe any association that might exist between two qualitative variables. This is done through contingency tables. Both marginal and conditional distributions allow us to describe the effect one variable might have on the other variable in the study. We also construct bar graphs to see the association between the two variables in the study. Again, just because two qualitative variables are associated does not mean that a change in one variable causes a change in a second variable. We also looked at Simpson's Paradox, which represents situations in which an association between two variables inverts or goes away when a third (lurking) variable is introduced into the analysis.
Chapter 5
5.1 Probability Rules
1 Apply the rules of probabilities
 · Probability deals with experiments that yield random shortterm results or outcomes yet reveal longterm predictability. The longterm proportion in which a certain outcome is observed is the probability of that outcome.
 In probability, an experiment is any process with uncertain results that can be repeated. The result of any single trial of the experiment is not known ahead of time. However, the results of the experiment over many trials produce regular patterns that enable us to predict with remarkable accuracy.
 For example, an insurance company cannot know whether a particular 16yearold driver will have an accident over the course of a year. However, based on historical records, the company can be fairly certain that about three out of every ten 16yearold male drivers will have a traffic accident during the course of a year. Therefore, of 816,000 male 16yearold drivers (816,000 repetitions of the experiment), the insurance company is fairly confident that about 30%, or 244,800, will have an accident. This prediction helps to establish insurance rates for any particular 16yearold male driver.
 The sample space, S, of a probability experiment is the collection of all possible outcomes.
 An outcome is the result of one trial of a probability experiment. The sample space is a list of all possible results of a probability experiment.
 An event is any collection of outcomes from a probability experiment. An event consists of one outcome or more than one outcome. We will denote events with one outcome, sometimes called simple events, e i . In general, events are denoted using capital letters such as E.
 Problem A probability experiment consists of rolling a single fair die.
(a) Identify the outcomes of the probability experiment.
(b) Determine the sample space.
(c) Define the event E = "roll an even number . "
Approach The outcomes are the possible results of the experiment. The sample space is a list of all possible outcomes.
Solution
(a) The outcomes from rolling a single fair die are e 1 = " rolling a one" = { 1 } , e 2 = " rolling a two" = { 2 } , e 3 = " rolling a three" = { 3 } , e 4 = " rolling a four" = { 4 } , e 5 = " rolling a five" = { 5 } , and e 6 = " rolling a six" = { 6 } .
(b) The set of all possible outcomes forms the sample space, S = { 1 , 2 , 3 , 4 , 5 , 6 } . There are 6 outcomes in the sample space.
(c) The event E = "roll an even number" = { 2 , 4 , 6 } .
A fair die is one in which each possible outcome is equally likely. For example, rolling a 2 is just as likely as rolling a 5. We contrast this with a loaded die, in which a certain outcome is more likely. For example, if rolling a 1 is more likely than rolling a 2, 3, 4, 5, or 6, the die is loaded.
 Rules of Probabilities
 1. The probability of any event E, P(E), must be greater than or equal to 0 and less than or equal to 1. That is, 0 ≤ P ( E ) ≤ 1.
 2. The sum of the probabilities of all outcomes must equal 1. That is, if the sample space S = { e 1 , e 2 ⋯ , e n } , then
 P ( e 1 ) + P ( e 2 ) + ⋯ + P ( e n ) = 1
 Rule 1 states that probabilities less than 0 or greater than 1 are not possible. Therefore, probabilities such as 1.32 or − 0.3 are not possible. Rule 2 states when the probabilities of all outcomes are added, the sum must be 1.
 A probability model lists the possible outcomes of a probability experiment and each outcome's probability. A probability model must satisfy Rules 1 and 2 of the rules of probabilities.
 If an event is impossible, the probability of the event is 0. If an event is a certainty, the probability of the event is 1.
 The closer a probability is to 1, the more likely the event will occur. The closer a probability is to 0, the less likely the event will occur. For example, an event with probability 0.8 is more likely to occur than an event with probability 0.75. An event with probability 0.8 will occur about 80 times out of 100 repetitions of the experiment, whereas an event with probability 0.75 will occur about 75 times out of 100.
 Be careful of this interpretation. An event with a probability of 0.75 does not have to occur 75 times out of 100. Rather, we expect the number of occurrences to be close to 75 in 100 trials. The more repetitions of the probability experiment, the closer the proportion with which the event occurs will be to 0.75 (the Law of Large Numbers).
 An unusual event is an event that has a low probability of occurring.
 An unusual event is an event that is not likely to occur.
 Typically, an event with a probability less than 0.05 (or 5%) is considered unusual, but this cutoff point is not set in stone. The researcher and the context of the problem determine the probability that separates unusual events from not so unusual events.
 For example, suppose that the probability of being wrongly convicted of a capital crime punishable by death is 3%. Even though 3% is below our 5% cutoff point, this probability is too high in light of the consequences (death for the wrongly convicted), so the event is not unusual (unlikely) enough. We would want this probability to be much closer to zero.
 Now suppose that you are planning a picnic on a day having a 3% chance of rain. In this context, you would consider “rain” an unusual (unlikely) event and proceed with the picnic plans.
 The point is this: Selecting a probability that separates unusual events from not so unusual events is subjective and depends on the situation. Statisticians typically use cutoff points of 0.01, 0.05, and 0.10.
2 Compute and Interpret Probabilities Using the Empirical Method
 Approximating Probabilities Using the Empirical Approach
 The probability of an event E is approximately the number of times event E is observed divided by the number of repetitions of the experiment.
 P ( E ) ≈ relative frequency of E = frequency of E number of trials of experiment (1)
 The probability obtained using the empirical approach is approximate because different runs of the probability experiment lead to different outcomes and, therefore, different estimates of P(E). Consider flipping a coin 20 times and recording the number of heads. Use the results of the experiment to estimate the probability of obtaining a head. Now repeat the experiment. Because the results of the second run of the experiment do not necessarily yield the same results, we cannot say the probability equals the relative frequency; rather we say the probability is approximately the relative frequency. As we increase the number of trials of a probability experiment, our estimate becomes more accurate (again, the Law of Large Numbers).
3 Compute and Interpret Probabilities Using the Classical Method
 The empirical method obtains an approximate probability of an event by conducting a probability experiment. The classical method of computing probabilities does not require that a probability experiment actually be performed. Rather, it relies on counting techniques.
 The classical method of computing probabilities requires equally likely outcomes. An experiment has equally likely outcomes when each outcome has the same probability of occurring. For example, in throwing a fair die once, each of the six outcomes in the sample space, { 1 , 2 , 3 , 4 , 5 , 6 } , has an equal chance of occurring. Contrast this situation with a loaded die in which a five or six is twice as likely to occur as a one, two, three, or four.
 Computing Probability Using the Classical Method
 If an experiment has n equally likely outcomes and if the number of ways that an event E can occur is m, then the probability of E, P(E), is
 P ( E ) = number of ways that E can occur number of possible outcomes = m n (2)
 So, if S is the sample space of this experiment,
 P ( E ) = N ( E ) N ( S ) (3)
 Where N(E) is the number of outcomes in E, and N(S) is the number of outcomes in the sample space.
4 Use Simulations to Obtain Data Based on Probabilities
 Instead of obtaining data from existing sources, we could also simulate a probability experiment using a graphing calculator or statistical software to replicate the experiment as many times as we like. Simulation is particularly helpful for estimating the probability of more complicated events.
5 Recognize and Interpret Subjective Probabilities
 A subjective probability of an outcome is a probability obtained on the basis of personal judgment.
 It is important to understand that subjective probabilities are perfectly legitimate and are often the only method of assigning likelihood to an outcome. As another example, a financial reporter may ask an economist about the likelihood the economy will fall into recession next year. Again, we cannot conduct an experiment n times to obtain a relative frequency. The economist must use her knowledge of the current conditions of the economy and make an educated guess about the likelihood of recession.
5.2 The Addition Rule and Complements
1 Use the Addition Rule for Disjoint Events
 Two events are disjoint if they have no outcomes in common. Another name for disjoint events is mutually exclusiveevents.
 Two events are disjoint if they cannot occur at the same time.
 We can use Venn diagrams to represent events as circles enclosed in a rectangle.
 Addition Rule for Disjoint Events
 If E and F are disjoint (or mutually exclusive) events, then
 P ( E or F ) = P ( E ) + P ( F )
 The Addition Rule for Disjoint Events states that, if you have two events that have no outcomes in common, the probability that one or the other occurs is the sum of their probabilities.
2 Use the General Addition Rule
 The General Addition Rule
 For any two events E and F,
 P ( E or F ) = P ( E ) + P ( F ) − P ( E and F )
 Problem Suppose a single card is selected from a standard 52card deck. Compute the probability of the event E = "drawing a king" or F = "drawing a diamond".
Approach The events are not disjoint because the outcome “king of diamonds” is in both events, so use the General Addition Rule.
Solution
P ( E or F ) = P ( E ) + P ( F ) − P ( E and F ) P ( king or diamond ) = P ( king ) + P ( diamond ) − P ( king or diamonds ) = 4 52 + 13 52 − 1 52 = 16 52 = 4 13
Table 6 Source: U.S. Census Bureau, Current Population Reports
Status
 Males (in millions)
 Females (in millions)


Never married
 40.2
 34

Married
 62.2
 62

Widowed
 3
 11.4

Divorced
 10
 13.8

Separated
 2.4
 3.2

 Table 6 is called a contingency table or twoway table, because it relates two categories of data. The row variable is marital status, because each row in the table describes the marital status of an individual. The column variable is gender. Each box inside the table is called a cell. For example, the cell corresponding to married individuals who are male is in the second row, first column. Each cell contains the frequency of the category: There were 62.2 million married males in the United States in 2010. Put another way, in the United States in 2010, there were 62.2 million individuals who were male and married.
3 Compute the Probability of an Event Using the Complement Rule
 · Complement of an Event
 Let S denote the sample space of a probability experiment and let E denote an event. The complement of E , denoted E c , is all outcomes in the sample space S that are not outcomes in the event E.
 Because Eand E c are mutually exclusive,
 P ( E or E c ) = P ( E ) + P ( E c ) = P ( S ) = 1
 Subtracting P(E) from both sides, we obtain the following result.
 If E represents any event and E c represents the complement of E, then
 P ( E c ) = 1 − P ( E )
 For any event, either the event happens or it doesn't. Use the Complement Rule when you know the probability that some event will occur and you want to know the chance it will not occur.
5.3 Independence and the Multiplication Rule
1 Identify independent events
 We use the Addition Rule for Disjoint Events to compute the probability of observing an outcome in event E or event F. We now describe a probability rule for computing the probability that E and F both occur.
 Two events E and F are independent if the occurrence of event E in a probability experiment does not affect the probability of event F. Two events are dependent if the occurrence of event E in a probability experiment affects the probability of event F.
 In determining whether two events are independent, ask yourself whether the probability of one event is affected by the other event. For example, what is the probability that a 29yearold male has high cholesterol? What is the probability that a 29yearold male has high cholesterol, given that he eats fast food four times a week? Does the fact that the individual eats fast food four times a week change the likelihood that he has high cholesterol? If yes, the events are not independent.
 Disjoint Events versus Independent EventsIt is important to know that disjoint events and independent events are different concepts. Recall that two events are disjoint if they have no outcomes in common, that is, if knowing that one of the events occurs, we know the other event did not occur. Independence means that one event occurring does not affect the probability of the other event occurring. Therefore, knowing two events are disjoint means that the events are not independent.
 Consider the experiment of rolling a single die. Let E represent the event “roll an even number,” and let F represent the event “roll an odd number.” We can see that E and F are mutually exclusive (disjoint) because they have no outcomes in common. In addition, P ( E ) = 1 2 and P ( F ) = 1 2 . However, if we are told that the roll of the die is going to be an even number, then what is the probability of event F? Because the outcome will be even, the probability of event F is now 0 (and the probability of event E is now 1).
2 Use the Multiplication Rule for Independent Events
 Suppose that you flip a fair coin twice. What is the probability that you obtain a head on both flips, that is, a head on the first flip and you obtain a head on the second flip? If H represents the outcome “heads” and Trepresents the outcome “tails,” the sample space of this experiment is
 S = { HH, HT, TH, TT }
 There is one outcome with both heads. Because each outcome is equally likely, we have
 P ( heads on Flip 1 and heads on Flip 2 ) = N ( heads on Flip 1 and heads on Flip 2 ) N ( S ) = 1 4
 We may have intuitively figured this out by recognizing P ( head ) = 1 2 for each flip. So it seems reasonable that
 P ( heads on Flip 1 and heads on Flip 2 ) = P ( heads on Flip 1 ) ⋅ P ( heads on Flip 2 ) = 1 2 ⋅ 1 2 = 1 4
 Because both approaches result in the same answer, 1 4 , we conjecture that P ( E and F ) = P ( E ) ⋅ P ( F ) . Our conjecture is correct.
 Multiplication Rule for Independent Events
 If E and F are independent events, then
 P ( E and F ) = P ( E ) ⋅ P ( F )
 Multiplication Rule for nIndependent Events
 If events E 1 , E 2 , E 3 , ⋯ , E n are independent, then
 P ( E 1 and E 2 and E 3 , and ⋯ and E n ) = P ( E 1 ) · P ( E 2 ) · ⋯ · P ( E n )
5.4 Conditional Probability and the General Multiplication Rule
1 Compute conditional probabilities
 Conditional Probability
 The notation P(FE) is read “the probability of event F given event E.” It is the probability that the event F occurs, given that the event E has occurred.
 If E and F are any two events, then
 P ( F  E ) = P ( E and F ) P ( E ) = N ( E and F ) N ( E ) ( 1 )
 The probability of event F occurring, given the occurrence of event E, is found by dividing the probability of E and F by the probability of E, or by dividing the number of outcomes in E and F by the number of outcomes in E.
2 Compute Probabilities Using the General Multiplication Rule
 General Multiplication Rule
 The probability that two events E and F both occur is
 P ( E and F ) = P ( E ) · P ( F  E )
 In words, the probability of E and F is the probability of event E occurring times the probability of event F occurring, given the occurrence of event E.
 Two events E and Fare independent if P ( E  F ) = P ( E ) or, equivalently, if P ( F  E ) = P ( F ) .
 If either condition in our definition is true, the other is as well. In addition, for independent events, P ( E and F ) = P ( E ) · P ( F )
 Two events E and Fare independent if P ( E  F ) = P ( E ) or, equivalently, if P ( F  E ) = P ( F ) .
5.5 Counting Techniques
1 Solve counting problems using the Multiplication Rule
 Counting plays a major role in many diverse areas, including probability. In this section, we look at special types of counting problems and develop general techniques for solving them.
 Problem The fixedprice dinner at Mabenka Restaurant provides the following choices: Appetizer: soup or salad Entrée: baked chicken, broiled beef patty, baby beef liver, or roast beef au jus Dessert: ice cream or cheesecake How many different meals can be ordered?
Approach Ordering such a meal requires three separate decisions:
Choose an Appetizer
 Choose an Entrée
 Choose a Dessert


2 choices
 2 choices
 2 choices

Solution For each choice of appetizer, we have 4 choices of entrée, and for each of these 2 · 4 = 8 choices, there are 2 choices for dessert. A total of 2 · 4 · 2 = 16 different meals can be ordered
 Multiplication Rule of Counting
 If a task consists of a sequence of choices in which there are p selections for the first choice, q selections for the second choice, r selections for the third choice, and so on, then the task of making these selections can be done in
 p · q · r · ⋯
 different ways.
 Problem The International Airline Transportation Association (IATA) assigns threeletter codes to represent airport locations. For example, the code for Fort Lauderdale International Airport is FLL. How many different airport codes are possible?
Approach We are choosing 3 letters from 26 letters and arranging them in order. Notice that repetition of letters is allowed. Use the Multiplication Rule of Counting, recognizing we have 26 ways to choose the first letter, 26 ways to choose the second letter, and 26 ways to choose the third letter.
Solution By the Multiplication Rule, 26 · 26 · 26 = 26 3 = 17 , 576 different airport codes are possible.
 If n ≥ 0 is an integer, the factorial symbol, n!, is defined as follows: n ! = n ( n − 1 ) · ⋯ · 3 · 2 · 1 0 ! = 1 1 ! = 1
2 Solve Counting Problems Using Permutations
 A permutation is an ordered arrangement in which r objects are chosen from n distinct (different) objects so that r ≤ n and repetition is not allowed. The symbol n P r , represents the number of permutations of r objects selected from n objects.
 This formula for n P r can be written in factorial notation: n P r = n · ( n − 1 ) · ( n − 2 ) · ⋯ · ( n − r + 1 ) = n · ( n − 1 ) · ( n − 2 ) · ⋯ · ( n − r + 1 ) · ( n − r ) · ⋯ · 3 · 2 · 1 ( n − r ) · ⋯ · 3 · 2 · 1 = n ! ( n − r ) !
3 Solve Counting Problems Using Combinations
 In a permutation, order is important. For example, the arrangements ABC and BAC are considered different arrangements of the letters A, B, and C. If order is unimportant, we do not distinguish ABC from BAC. In poker, the order in which the cards are received does not matter. The combination of the cards is what matters.
 A combination is a collection, without regard to order, in which r objects are chosen from n distinct objects with r ≤ n and without repetition. The symbol n C r represents the number of combinations of n distinct objects taken r at a time.
3 Solve Counting Problems Involving Permutations with Nondistinct Items
 The number of permutations of n objects of which n 1 are of one kind, n 2 are of a second kind, …, and n k are of a kth kind is given by n ! n 1 ! ⋅ n 2 ! ⋅ ⋯ ⋅ n k ! ( 3 ) where n = n 1 + n 2 + … + n k .
Description
 Formula


Combination
 The selection of r objects from a set of n different objects when the order in which the objects are selected does not matter (so AB is the same as BA) and an object cannot be selected more than once (repetition is not allowed)

Permutation of Distinct Items with Replacement
 The selection of r objects from a set of n different objects when the order in which the objects are selected matters (so AB is different from BA) and an object may be selected more than once (repetition is allowed)

Permutation of Distinct Items without Replacement
 The selection of r objects from a set of n different objects when the order in which the objects are selected matters so (AB is different from BA) and an object cannot be selected more than once (repetition is not allowed)

Permutation of Nondistinct Items without Replacement
 The number of ways n objects can be arranged (order matters) in which there are n 1 of one kind, n 2 of a second kind, p, and n k of a kth kind, where n = n 1 + n 2 + ? ? ? + n k

5 Compute Probabilities Involving Permutations and Combinations
 The counting techniques presented in this section can be used along with the classical method to compute certain probabilities. Recall that this method stated the probability of an event E is the number of ways event E can occur divided by the number of different possible outcomes of the experiment provided each outcome is equally likely.
 Problem In the Illinois Lottery, an urn contains balls numbered 1 to 52. From this urn, six balls are randomly chosen without replacement. For a $1 bet, a player chooses two sets of six numbers. To win, all six numbers must match those chosen from the urn. The order in which the balls are picked does not matter. What is the probability of winning the lottery?
Approach The probability of winning is given by the number of ways a ticket could win divided by the size of the sample space. Each ticket has two sets of six numbers and therefore two chances of winning. The size of the sample space S is the number of ways 6 objects can be selected from 52 objects without replacement and without regard to order, so N ( S ) = 52 C 6 .
Solution The size of the sample space is N ( S ) = 52 C 6 = 52 ! 6 ! ⋅ ( 52 − 6 ) ! = 52 ⋅ 51 ⋅ 50 ⋅ 49 ⋅ 48 ⋅ 47 ⋅ 46 ! 6 ! ⋅ 46 ! = 20 , 358 , 520 Each ticket has two chances of winning. If E is the event “winning ticket,” then N ( E ) = 2 and P ( E ) = 2 20 , 358 , 520 = 0.000000098
There is about a 1 in 10,000,000 chance of winning the Illinois Lottery!
5.6 Putting it Together: Which Method Do I use?
1 Determine the appropriate probability rule to use
 Working with probabilities can be challenging because of the number of different probability rules. This chapter provides the basic building blocks of probability theory, but knowing when to use a particular rule takes practice. To aid you in this learning process, consider the flowchart in Figure 16. While not all situations can be handled directly with the formulas provided, they can be combined and expanded to many more situations.
 The first step is to determine whether we are finding the probability of a single event. If we are dealing with a single event, we must decide whether to use the classical method (equally likely outcomes), the empirical method (relative frequencies), or subjective assignment. For experiments involving more than one event, we first decide which type of statement we have. For events involving ‘AND’, we must know if the events are independent. For events involving ‘OR’, we need to know if the events are disjoint (mutually exclusive).
 Problem In the game show Deal or No Deal?, a contestant is presented with 26 suitcases that contain amounts ranging from $0.01 to $1,000,000. The contestant must pick an initial case that is set aside as the game progresses. The amounts are randomly distributed among the suitcases prior to the game as shown in Table 11. What is the probability that the contestant picks a case worth at least $100,000?
Table 11
Prize
 Number of Suitcases


$0.01$100
 8

$200$1000
 6

$5,000$50,000
 5

$100,000$1,000,000
 7

Approach Follow the flowchart in Figure 16.
Solution There is a single event, so we must decide among the empirical, classical, or subjective approaches to determine the probability. The probability experiment is selecting a suitcase. Each prize amount is randomly assigned to one of the 26 suitcases, so the outcomes are equally likely. Table 11 shows that 7 cases contain at least $100,000. Letting E = " worth at least $100,000," we compute P(E) using the classical approach.
P ( E ) = N ( E ) N ( S ) = 7 26 = 0.269
The chance the contestant selects a suitcase worth at least $100,000 is 26.9%. In 100 different games, we would expect about 27 games to result in a contestant choosing a suitcase worth at least $100,000.
2 Determine the Appropriate Counting Technique to Use
 To determine the appropriate counting technique to use, we need the ability to distinguish between a sequence of choices and an arrangement of items. We also need to determine whether order matters in the arrangements. See Figure 17. Keep in mind that one problem may require several counting rules.
 We first must decide whether we have a sequence of choices or an arrangement of items. For a sequence of choices, we use the Multiplication Rule of Counting if the number of choices at each stage is independent of previous choices. This may involve the rules for arrangements, since each choice in the sequence could involve arranging items. If the number of choices at each stage is not independent of previous choices, we use a tree diagram. When determining the number of arrangements of items, we want to know whether the order of selection matters. If order matters, we also want to know whether we are arranging all the items available or a subset of the items.
 Problem The Hazelwood city council consists of 5 men and 4 women. How many different subcommittees can be formed that consist of 3 men and 2 women?
Approach Follow the flowchart in Figure 17.
Solution We need to find the number of subcommittees having 3 men and 2 women. So we consider a sequence of events: select the men, then select the women. Since the number of choices at each stage is independent of previous choices (the men chosen will not impact which women are chosen), we use the Multiplication Rule of Counting to obtain
N (subcommittees) = N (ways to pick 3 men) ⋅ N (ways to pick 2 women)
To select the men, we must consider the number of arrangements of 5 men taken 3 at a time. Since the order of selection does not matter, we use the combination formula.
N ( ways to pick 3 men ) = 5 C 3 = 5 ! 3 ! ⋅ 2 ! = 10
To select the women, we must consider the number of arrangements of 4 women taken 2 at a time. Since the order of selection does not matter, we use the combination formula again.
N ( ways to pick 2 women ) = 4 C 3 = 4 ! 2 ! ⋅ 2 ! = 6
Combining our results, we obtain N (subcommittees) = 10 ⋅ 6 = 60 . There are 60 possible subcommittees that contain 3 men and 2 women.
Chapter 5 Summary and Formulas
Chapter 5 Summary
In this chapter, we introduced the concept of probability. Probability is a measure of the likelihood of a random phenomenon or chance behavior. Because we are measuring a random phenomenon, there is shortterm uncertainty. However, this shortterm uncertainty gives rise to longterm predictability.
Probabilities are numbers between zero and one, inclusive. The closer a probability is to one, the more likely the event is to occur. If an event has probability zero, it is said to be impossible. Events with probability one are said to be certain.
We introduced three methods for computing probabilities: (1) the empirical method, (2) the classical method, and (3) subjective probabilities. Empirical probabilities rely on the relative frequency with which an event happens. Classical probabilities require the outcomes in the experiment to be equally likely. We count the number of ways an event can happen and divide this by the number of possible outcomes of the experiment. Empirical probabilities require that an experiment be performed, whereas classical probability does not. Subjective probabilities are probabilities based on the opinion of the individual providing the probability. They are educated guesses about the likelihood of an event occurring, but still represent a legitimate way of assigning probabilities.
We are also interested in probabilities of multiple outcomes. For example, we might be interested in the probability that either event E or event F happens. The Addition Rule is used to compute the probability of E or F; the Multiplication Rule is used to compute the probability that both E and F occur. Two events are mutually exclusive (or disjoint) if they do not have any outcomes in common. That is, mutually exclusive events cannot happen at the same time. Two events E and F are independent if knowing that one of the events occurs does not affect the probability of the other. The complement of an event E, denoted E c , is all the outcomes in the sample space that are not in E.
Finally, we introduced counting methods. The Multiplication Rule of Counting is used to count the number of ways a sequence of events can occur. Permutations are used to count the number of ways r items can be arranged from a set of n distinct items. Combinations are used to count the number of ways r items can be selected from a set of n distinct items without replacement and without regard to order. These counting techniques can be used to calculate probabilities using the classical method.
Formulas
Empirical Probability P ( E ) ≈ frequency of E number of trials of experiment
Classical Probability P ( E ) = number of ways that E can occur number of possible outcomes = N ( E ) N ( S )
Addition Rule for Disjoint Events
P ( E or F ) = P ( E ) + P ( F )
General Addition Rule
P ( E or F ) = P ( E ) + P ( F ) − P ( E and F )
Probabilities of Complements
P ( E c ) = 1 − P ( E )
Multiplication Rule for Independent Events
P ( E and F ) = P ( E ) ⋅ P ( F )
Multiplication Rule for n Independent Events
P ( E 1 and E 2 and E 3 … and E n ) = P ( E 1 ) ⋅ P ( E 2 ) ⋅ ⋯ ⋅ P ( E n )
Conditional Probability Rule
P ( F  E ) = P ( E and F ) P ( E ) = N ( E and F ) N ( E )
General Multiplication Rule
P ( E and F ) = P ( E ) ⋅ P ( F  E )
Factorial Notation
n ! = n ⋅ ( n − ) ⋅ ( n − 2 ) ⋅ ⋯ ⋅ 3 ⋅ 2 ⋅ 1
Combination
n C r = n ! r ! ( n − r ) !
Permutation
n P r = n ! ( n − r ) !
Permutations with Nondistinct Items
n ! n 1 ! ⋅ n 2 ! ⋅ ⋯ ⋅ n k !
Chapter 6
1 Distinguish between discrete and continuous random variables
 Suppose we flip a coin two times. The outcomes of the experiment are { HH , HT , TH , TT } . Rather than being interested in a particular outcome, we might be interested in the number of heads. If the outcome of a probability experiment is a numerical result, we say the outcome is a random variable.
 So, in our coinflipping example, if the random variable X represents the number of heads in two flips of a coin, the possible values of X are x = 0 , 1 , or 2. Notice that we follow the practice of using a capital letter, such as X, to identify the random variable and a lowercase letter, x, to list the possible values of the random variable, that is, the sample space of the experiment.
 A random variable is a numerical measure of the outcome of a probability experiment, so its value is determined by chance. Random variables are typically denoted using capital letters such as X.
 Discrete random variables typically result from counting (0, 1, 2, 3, and so on). Continuous random variables are variables that result from measurement.
 A discrete random variable has either a finite or countable number of values. The values of a discrete random variable can be plotted on a number line with space between each point. See Figure 1(a).
 A continuous random variable has infinitely many values. The values of a continuous random variable can be plotted on a line in an uninterrupted fashion. See Figure 1(b).
2 Identify Discrete Probability Distributions
 The probability distribution of a discrete random variable X provides the possible values of the random variable and their corresponding probabilities. A probability distribution can be in the form of a table, graph, or mathematical formula.
 Rules for a Discrete Probability Distribution
 Let P(x) denote the probability that the random variable X equals x; then
 1. ∑ P ( x ) = 1
 2.0 ≤ P ( x ) ≤ 1
3 Construct Probability Histograms
 In a probability histogram, the horizontal axis corresponds to the value of the random variable and the vertical axis represents the probability of each value of the random variable.
4 Compute and Interpret the Mean of a Discrete Random Variable
 Remember, when we describe the distribution of a variable, we describe its center, spread, and shape. We will use the mean to describe the center and use the standard deviation to describe the spread.
 Let's see where the formula for computing the mean of a discrete random variable comes from. One semester I asked a small statistics class of 10 students to disclose the number of people living in their households. I obtained the following data:
 2 , 4 , 6 , 6 , 4 , 4 , 2 , 3 , 5 , 5
 What is the mean number of people in the 10 households? We could find the mean by adding the observations and dividing by 10, but we will take a different approach. Letting the random variable X represent the number of people in the household, we obtain the probability distribution in Table 2.
Table 2
x
 P(x)


2
 2 10 = 0.2

3
 1 10 = 0.1

4
 3 10 = 0.3

5
 2 10 = 0.2

6
 2 10 = 0.2

 Now we compute the mean as follows:
 μ = ∑ x i N = 2 + 4 + 6 + 6 + 4 + 4 + 2 + 3 + 5 + 5 10 = 2 + 2 ︷ 2 + 3 ︷ 1 + 4 + 4 + 4 ︷ 3 + 5 + 5 ︷ 2 + 6 + 6 ︷ 2 10 = 2 ⋅ 2 + 3 ⋅ 1 + 4 ⋅ 3 + 5 ⋅ 2 + 6 ⋅ 2 10 = 2 ⋅ 2 10 + 3 ⋅ 1 10 + 4 ⋅ 3 10 + 5 ⋅ 2 10 + 6 ⋅ 2 10 = 2 ⋅ P ( 2 ) + 3 ⋅ P ( 3 ) + 4 ⋅ P ( 4 ) + 5 ⋅ P ( 5 ) + 6 ⋅ P ( 6 ) = 2 ( 0.2 ) + 3 ( 0.1 ) + 4 ( 0.3 ) + 5 ( 0.2 ) + 6 ( 0.2 ) = 4.1
 We conclude that the mean of a discrete random variable is found by multiplying each possible value of the random variable by its corresponding probability and then adding these products.
 To find the mean of a discrete random variable, multiply the value of each random variable by its probability. Then add these products.
 The Mean of a Discrete Random Variable
 The mean of a discrete random variable is given by the formula
 μ X = ∑ [ x ⋅ P ( x ) ] ( 1 )
 Where x is the value of the random variable and P(x) is the probability of observing the value x.
 The mean of a discrete random variable can be thought of as the mean outcome of the probability experiment if we repeated the experiment many times.
 Interpretation of the Mean of a Discrete Random Variable
 Suppose an experiment is repeated n independent times and the value of the random variable X is recorded. As the number of repetitions of the experiment increases, the mean value of the n trials will approach μ X , the mean of the random variable X. In other words, let x 1 be the value of the random variable X after the first experiment, x 2 be the value of the random variable X after the second experiment, and so on. Then
 The Mean of a Discrete Random Variable
 x ¯ = x 1 + x 2 + ⋯ + x n n
 The difference between x ¯ and μ X gets closer to 0 as n increases.
5 Interpret the Mean of a Discrete Random Variable as an Expected Value
 Because the mean of a random variable represents what we would expect to happen in the long run, it is also called the expected value, E(X). The interpretation of expected value is the same as the interpretation of the mean of a discrete random variable.
 The expected value of a discrete random variable is the mean of the discrete random variable.
6 Compute the Standard Deviation of a Discrete Random Variable
 The standard deviation of a discrete random variable is the square root of a weighted average of the squared deviations for which the weights are the probabilities.
 Standard Deviation of a Discrete Random Variable
 The standard deviation of a discrete random variable X is given by
 σ X = ∑ [ ( x − μ X ) 2 ⋅ P ( x ) ] ( 2 a ) = ∑ [ x 2 ⋅ P ( x ) ] − μ X 2 ( 2 b )
 Where x is the value of the random variable, μ X is the mean of the random variable, and P ( x ) is the probability of observing a value of the random variable.
6.2 The Binomial Probability Distribution
1 Determine whether a probability experiment is a binomial experiment
 The binomial probability distribution is a discrete probability distribution that describes probabilities for experiments in which there are two mutually exclusive (disjoint) outcomes. These two outcomes are generally referred to as success (such as making a free throw) and failure (such as missing a free throw).
 Experiments in which only two outcomes are possible are referred to as binomial experiments, provided that certain criteria are met.
 The prefix bi means “two”. This should help remind you that binomial experiments deal with situations in which there are only two outcomes: success or failure.
 Criteria for a Binomial Probability Experiment
 An experiment is said to be a binomial experiment if
 1. The experiment is performed a fixed number of times. Each repetition of the experiment is called a trial.
 2. The trials are independent. This means that the outcome of one trial will not affect the outcome of the other trials.
 3. For each trial, there are two mutually exclusive (disjoint) outcomes: success or failure.
 4. The probability of success is the same for each trial of the experiment.
 Let the random variable X be the number of successes in n trials of a binomial experiment. Then X is called a binomial random variable. Before introducing the method for computing binomial probabilities, it is worthwhile to introduce some notation.
 Notation Used in the Binomial Probability Distribution
 There are n independent trials of the experiment.
 Let p denote the probability of success for each trial so that 1 − p is the probability of failure for each trial.
 Let X denote the number of successes in n independent trials of the experiment. So 0 ≤ x ≤ n .
2 Compute Probabilities of Binomial Experiments
 We are now prepared to compute probabilities for a binomial random variable X. We present three methods for obtaining binomial probabilities: (1) the binomial probability distribution formula, (2) a table of binomial probabilities, and (3) technology.
 Binomial Probability Distribution Function
 The probability of obtaining x successes in n independent trials of a binomial experiment is given by
 P ( x ) = n C x p x ( 1 − p ) n − x x = 0 , 1 , 2 , …. , n ( 1 )
 Where p is the probability of success.
Phrase
 Math Symbol


“at least” or “no less than” or “greater than or equal to”
 ≥

“more than” or “greater than”
 >

“fewer than” or “less than”
 <

“no more than” or “at most” or “less than or equal to”
 ≤

“exactly” or “equals” or “is”
 =

3 Compute the Mean and Standard Deviation of a Binomial Random Variable
 The mean of a binomial random variable equals the product of the number of trials of the experiment and the probability of success. It can be interpreted as the expected number of successes in n trials of the experiment
 A binomial experiment with n independent trials and probability of success p has a mean and standard deviation given by the formulas
 μ X = n p and σ X = n p ( 1 − p ) ( 2 )
 ProblemAccording to CTIA, 25% of all U.S. households are wirelessonly households. In a simple random sample of 300 households, determine the mean and standard deviation number of wirelessonly households.
 Approach This is a binomial experiment with n = 300 and p = 0.25 . Use Formula (2) to find the mean and standard deviation, respectively.
 Solution
 μ X = n p = 300 ( 0.25 ) = 75
 and
 σ X = n p ( 1 − p ) = 300 ( 0.25 ) ( 1 − 0.25 ) = 56.25 = 7.5
 Interpretation We expect that, in a random sample of 300 households, 75 will be wirelessonly
4 Construct Binomial Probability Histograms
 For a fixed p, as the number of trials n in a binomial experiment increases, the probability distribution of the random variable X becomes bell shaped. As a rule of thumb, if n p ( 1 − p ) ≥ 10 , * the probability distribution will be approximately bell shaped.
 Provided that n p ( 1 − p ) ≥ 10 , the interval μ − 2 σ to μ + 2 σ represents the “usual” observations. Observations outside this interval may be considered unusual.
 This result allows us to use the Empirical Rule to identify unusual observations in a binomial experiment. Recall the Empirical Rule states that in a bellshaped distribution about 95% of all observations lie within two standard deviations of the mean. That is, about 95% of the observations lie between μ − 2 σ and μ + 2 σ . Any observation that lies outside this interval may be considered unusual because the observation occurs less than 5% of the time.
6.3 The Poisson Probability Distribution
1 Determine If a Probability Experiment Follows a Poisson Process
 Another discrete probability model is the Poisson probability distribution, named after Siméon Denis Poisson. This probability distribution can be used to compute probabilities of experiments in which the random variable X counts the number of occurrences(successes) of a particular event within a specified interval (usually time or space).
 A random variable X, the number of successes in a fixed interval, follows a Poisson processprovided the following conditions are met.
 1. The probability of two or more successes in any sufficiently small subinterval is 0.
 2. The probability of success is the same for any two intervals of equal length.
 3. The number of successes in any interval is independent of the number of successes in any other interval provided the intervals are not overlapping.
 Ex.: A McDonald's manager knows from prior experience that cars arrive at the drivethrough at an average rate of two cars per minute between the hours of 12:00 noon and 1:00 P.M. The random variable X, the number of cars that arrive between 12:20 and 12:40, follows a Poisson process.
 In the McDonald's example, if we divide the time interval into a sufficiently small length (say, 1 second), it is impossible for more than one car to arrive. This satisfies part 1 of the definition. Part 2 is satisfied because the cars arrive at an average rate of 2 cars per minute over the 1hour interval. Part 3 is satisfied because the number of cars that arrive in any 1minute interval (say between 12:23 P.M. and 12:24 P.M.) is independent of the number of cars that arrive in any other 1minute interval (say between 12:35 P.M. and 12:36 P.M.).
 A random variable X, the number of successes in a fixed interval, follows a Poisson processprovided the following conditions are met.
2 Compute Probabilities of a Poisson Random Variable
 Poisson probabilities are used to determine the probability of the number of successes in a fixed interval of time or space.
 If X is the number of successes in an interval of fixed length t, then the probability of obtaining xsuccesses in the interval is
 P ( x ) = ( λ t ) x x ! e − λ t x = 0 , 1 , 2 , 3 , … ( 1 )
 where λ (the Greek letter lambda) represents the average number of occurrences of the event in some interval of length 1 and e ≈ 2.71828.
 An important point is that atleast and morethan probabilities for a Poisson process must be found using the complement since the random variable X can be any integer greater than or equal to 0.
 Problem A McDonald's manager knows that cars arrive at the drivethrough at the average rate of two cars per minute between the hours of 12 noon and 1:00 P.M. She needs to determine and interpret the probability of the following events: (a) Exactly 6 cars arrive between 12 noon and 12:05 P.M.(b) Fewer than 6 cars arrive between 12 noon and 12:05 P.M.(c) At least 6 cars arrive between 12 noon and 12:05 P.M.
Approach The manager needs a method to determine the probabilities. The cars arrive at a rate of two per minute over the time interval between 12 noon and 1:00 P.M. We know from Example 1 that the random variable X follows a Poisson process, where x = 0 , 1 , 2 , … . The Poisson probability distribution function requires a value for λ and t. Since the cars arrive at a rate of two per minute, λ = 2. The interval of time we are interested in is 5 minutes, so t = 5.
Solution We use the Poisson probability distribution function (1).
(a) The probability that exactly six cars arrive between 12 noon and 12:05 P.M. is P ( 6 ) = [ 2 ( 5 ) ] 6 6! e − 2 ( 5 ) = 1 , 000 , 000 720 e − 10 = 0.0631
Interpretation On about 6 of every 100 days, exactly 6 cars will arrive between 12:00 noon and 12:05 P.M.
(b) The probability that fewer than 6 cars arrive between 12:00 noon and 12:05 P.M. is
P ( X < 6 ) = P ( X ≤ 5 ) = P ( 0 ) + P ( 1 ) + P ( 2 ) + P ( 3 ) + P ( 4 ) + P ( 5 ) = [ 2 ( 5 ) ] 0 0 ! e − 2 ( 5 ) + [ 2 ( 5 ) ] 1 1! e − 2 ( 5 ) + [ 2 ( 5 ) ] 2 2! e − 2 ( 5 ) + [ 2 ( 5 ) ] 3 3! e − 2 ( 5 ) + [ 2 ( 5 ) ] 4 4! e − 2 ( 5 ) + [ 2 ( 5 ) ] 5 5! e − 2 ( 5 ) = 1 1 e − 10 + 10 1 e − 10 + 100 2 e − 10 + 1000 6 e − 10 + 10,000 24 e − 10 + 100,000 120 e − 10 = 0.0671
Interpretation On about 7 of every 100 days, fewer than 6 cars will arrive between 12:00 noon and 12:05 P.M.
(c) The probability that at least 6 cars arrive between 12 noon and 12:05 P.M. is the complement of the probability that fewer than 6 cars arrive during that time. That is, P ( X ≥ 6 ) = 1 − P ( X < 6 ) = 1 − 0.0671 = 0.9329
Interpretation On about 93 of every 100 days, at least six cars will arrive between 12:00 noon and 12:05 p.m.
3 Find the Mean and Standard Deviation of a Poisson Random Variable
 A random variable Xthat follows a Poisson process with parameter λ has mean (or expected value) and standard deviation given by the formulas
 μ X = λ t and σ X = λ t = μ X
 where t is the length of the interval.
 Because μ X = λ t , we restate the Poisson probability distribution function in terms of its mean.
 If X is the number of successes in an interval of fixed length and X follows a Poisson process with mean μ , the probability distribution function for Xis
 P ( x ) = μ x x ! e − μ x = 0 , 1 , 2 , 3 … .
Chapter 6 Summary
In this chapter, we discussed discrete probability distributions. A random variable represents the numerical measurement of the outcome from a probability experiment. Discrete random variables have either a finite or a countable number of outcomes. The term countable means that the values result from counting. Probability distributions must satisfy the following two criteria: (1) All probabilities must be between 0 and 1, inclusive, and (2) the sum of all probabilities must equal 1. Discrete probability distributions can be presented by a table, graph, or mathematical formula.
The mean and standard deviation of a random variable describe the center and spread of the distribution. The mean of a random variable is also called its expected value.
We discussed two discrete probability distributions in particular, the binomial and Poisson. A probability experiment is considered a binomial experiment if there is a fixed number, n, of independent trials of the experiment with only two outcomes. The probability of success, p, is the same for each trial of the experiment. Special formulas exist for computing the mean and standard deviation of a binomial random variable.
We also discussed the Poisson probability distribution. A Poisson process is one in which the following conditions are met: (1) The probability of two or more successes in any sufficiently small subinterval is 0. (2) The probability of success is the same for any two intervals of equal length. (3) The number of successes in any interval is independent of the number of successes in any other disjoint interval. Special formulas exist for computing the mean and standard deviation of a random variable that follows a Poisson process.
Formulas
Mean (or Expected Value) of a Discrete Random Variable
μ X = E ( X ) = ∑ x P ( x )
Standard Deviation of a Discrete Random Variable
σ X = ∑ ( x − μ X ) 2 ⋅ P ( x ) = ∑ [ x 2 ⋅ P ( x ) ] − μ 2 X
Binomial Probability Distribution Function
P ( x ) = n C x p x ( 1 − p ) n − x x = 0 , 1 , 2 , … , n
Mean of a Binomial Random Variable
μ x = n p
Standard Deviation of a Binomial Random Variable
σ X = n p ( 1 − p )
Poisson Probability Distribution Function
P ( x ) = ( λ t ) x x ! e − λ t = μ x x ! e − μ x = 0 , 1 , 2 , …
Mean and Standard Deviation of a Poisson Random Variable
μ x = λ t σ X = λ t = μ X
Comments
No comments yet.