Site MapHelpFeedback1.5 An Introduction to Survey Sampling (Optional)
1.5 An Introduction to Survey Sampling (Optional)
(See related pages)

Random sampling is not the only kind of sampling. Methods for obtaining a sample are called sampling designs, and the sample we take is sometimes called a sample survey. In this section we explain three sampling designs that are alternatives to random sampling—stratified random samplingA sampling design in which we divide a population into nonoverlapping subpopulations and then select a random sample from each subpopulation (stratum)., cluster samplingA sampling design in which we sequentially cluster population uints into subpopulations., and systematic samplingsystematic sample: A sample taken by moving systematically through the population. For instance, we might randomly select one of the first 200 population units and then systematically sample every 200th population unit thereafter..

One common sampling design involves separately sampling important groups within a population. Then, the samples are combined to form the entire sample. This approach is the idea behind stratified random samplingA sampling design in which we divide a population into nonoverlapping subpopulations and then select a random sample from each subpopulation (stratum)..

In order to select a stratified random sampleA sampling design in which we divide a population into nonoverlapping subpopulations and then select a random sample from each subpopulation (stratum)., we divide the population into nonoverlapping groups of similar units (people, objects, etc.). These groups are called strataThe subpopulations in a stratified sampling design.. Then a random sample is selected from each stratum, and these samples are combined to form the full sample.

It is wise to stratify when the population consists of two or more groups that differ with respect to the variable of interest. For instance, consumers could be divided into strata based on gender, age, ethnic group, or income.

As an example, suppose that a department store chain proposes to open a new store in a location that would serve customers who live in a geographical region that consists of (1) an industrial city, (2) a suburban community, and (3) a rural area. In order to assess the potential profitability of the proposed store, the chain wishes to study the incomes of all households in the region. In addition, the chain wishes to estimate the proportion and the total number of households whose members would be likely to shop at the store. The department store chain feels that the industrial city, the suburban community, and the rural area differ with respect to income and the store’s potential desirability. Therefore, it uses these subpopulations as strata and takes a stratified random sample.

Taking a stratified sample can be advantageous because such a sample takes advantage of the fact that units in the same stratum are similar to each other. It follows that a stratified sample can provide more accurate information than a random sample of the same size. As a simple example, if all of the units in each stratum were exactly the same, then examining only one unit in each stratum would allow us to describe the entire population. Furthermore, stratification can make a sample easier (or possible) to select. Recall that, in order to take a random sample, we must have a frame, or list, of all of the population units. Although a frame might not exist for the overall population, a frame might exist for each stratum. For example, suppose nearly all the households in the department store’s geographical region have telephones. Although there might not be a telephone directory for the overall geographical region, there might be separate telephone directories for the industrial city, the suburb, and the rural area. Although we do not discuss how to analyze data from a stratified random sample in the main body of this text, we do so in Appendix F (Part I) on the CD-ROM that accompanies this book. For a more complete discussion of stratified random sampling, see Mendenhall, Schaeffer, and Ott (1986).

Sometimes it is advantageous to select a sample in stages. This is a common practice when selecting a sample from a very large geographical region. In such a case, a frame often does not exist. For instance, there is no single list of all registered voters in the United States. There is also no single list of all households in the United States. In this kind of situation, we can use multistage cluster samplingA sampling design in which we sequentially cluster population uints into subpopulations.. To illustrate this procedure, suppose we wish to take a sample of registered voters from all registered voters in the United States. We might proceed as follows:

Stage 1:Randomly select a sample of counties from all of the counties in the United States.
Stage 2:Randomly select a sample of townships from each county selected in Stage 1.
Stage 3:Randomly select a sample of voting precincts from each township selected in Stage 2.
Stage 4:Randomly select a sample of registered voters from each voting precinct selected in Stage 3.

We use the term cluster samplingA sampling design in which we sequentially cluster population Units into subpopulations. to describe this type of sampling because at each stage we “cluster” the voters into subpopulations. For instance, in Stage 1 we cluster the voters into counties, and in Stage 2 we cluster the voters in each selected county into townships. Also, notice that the random sampling at each stage can be carried out because there are lists of (1) all counties in the United States, (2) all townships in each county, (3) all voting precincts in each township, and (4) all registered voters in each voting precinct.

As another example, consider sampling the households in the United States. We might use Stages 1 and 2 above to select counties and townships within the selected counties. Then, if there is a telephone directory of the households in each township, we can randomly sample households from each selected township by using its telephone directory. Because most households today have telephones, and telephone directories are readily available, most national polls are now conducted by telephone.

It is sometimes a good idea to combine stratification with multistage cluster sampling. For example, suppose a national polling organization wants to estimate the proportion of all registered voters who favor a particular presidential candidate. Because the presidential preferences of voters might tend to vary by geographical region, the polling organization might divide the United States into regions (say, Eastern, Midwestern, Southern, and Western regions). The polling organization might then use these regions as strata, and might take a multistage cluster sample from each stratum (region).

The analysis of data produced by multistage cluster sampling can be quite complicated. We explain how to analyze data produced by one- and two-stage cluster sampling in Appendix F (Part 2) on the CD-ROM that accompanies this book. This appendix also includes a discussion of an additional survey sampling technique called ratio estimation. For a more detailed discussion of cluster sampling and ratio estimation, see Mendenhall, Schaeffer, and Ott (1986).

In order to select a random sample, we must number the units in a frame of all the population units. Then we use a random number table (or a random number generator on a computer) to make the selections. However, numbering all the population units can be quite time-consuming. Moreover, random sampling is used in the various stages of many complex sampling designs (requiring the numbering of numerous populations). Therefore, it is useful to have an alternative to random sampling. One such alternative is called systematic samplingsystematic sample: A sample taken by moving systematically through the population. For instance, we might randomly select one of the first 200 population units and then systematically sample every 200th population unit thereafter.. In order to systematically select a sample of <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/small_n.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> units without replacement from a frame of <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/capital_n.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> units, we divide <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/capital_n.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> by <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/small_n.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> and round the result down to the nearest whole number. Calling the rounded result <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/l.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a>, we then randomly select one unit from the first <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/l.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> units in the frame—this is the first unit in the systematic sample. The remaining units in the sample are obtained by selecting every <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/l.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a>th unit following the first (randomly selected) unit. For example, suppose we wish to sample a population of <a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=gif::::/sites/dl/free/0072977477/339821/capital_n.gif','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (0.0K)</a> = 14,327 allergists to investigate how often they have prescribed a particular drug during the last year. A medical society has a directory listing the 14,327 allergists, and we wish to draw a systematic sample of 500 allergists from this frame. Here we compute 14,327/500 = 28.654, which is 28 when rounded down. Therefore, we number the first 28 allergists in the directory from 1 to 28, and we use a random number table to randomly select one of the first 28 allergists. Suppose we select allergist number 19. We interview allergist 19 and every 28th allergist in the frame thereafter, so we choose allergists 19, 47, 75, and so forth until we obtain our sample of 500 allergists. In this scheme, we must number the first 28 allergists, but we do not have to number the rest because we can “count off” every 28th allergist in the directory. Alternatively, we can measure the approximate amount of space in the directory that it takes to list 28 allergists. This measurement can then be used to select every 28th allergist.

In this book we concentrate on showing how to analyze data produced by random sampling. However, if the order of the population units in a frame is random with respect to the characteristic under study, then a systematic sample should be (approximately) a random sample and we can analyze the data produced by the systematic sample by using the same methods employed to analyze random samples. For instance, it would seem reasonable to assume that the alphabetically ordered allergists in a medical directory would be random (that is, have nothing to do with) the number of times the allergists prescribed a particular drug. Similarly, the alphabetically ordered people in a telephone directory would probably be random with respect to many of the people’s characteristics that we might wish to study.

When we employ random sampling, we eliminate bias in the choice of the sample from a frame. However, a proper sampling design does not guarantee that the sample will produce accurate information. One potential problem is undercoverageA situation in sampling in which some groups of population units are underrepresented..

UndercoverageA situation in sampling in which some groups of population units are underrepresented. occurs when some population units are excluded from the process of selecting the sample.

This problem occurs when we do not have a complete, accurate list of all the population units. For example, although telephone polls today are common, 7 to 8 percent of the people in the United States do not have telephones. In general, undercoverage usually causes low-income people to be underrepresented. If underrepresented groups differ from the rest of the population with respect to the characteristic under study, the survey results will be biased. Another potentially serious problem is nonresponseA situation in which population units selected to participate in a survey do not respond to the survey instrument.

NonresponseA situation in which population units selected to participate in a survey do not respond to the survey instrument. occurs when a population unit selected as part of the sample cannot be contacted or refuses to participate.

In some surveys, 35 percent or more of the selected individuals cannot be contacted—even when several callbacks are made. In such a case, other participants are often substituted for the people who cannot be contacted. If the substitute participants differ from the originally selected participants with respect to the characteristic under study, the survey will again be biased. Third, when people are asked potentially embarrassing questions, their responses might not be truthful. We then have what we call response biasA situation in which survey participants do not respond truthfully to the survey questions.. Fourth, the wording of the questions asked can influence the answers received. Slanted questions often evoke biased responses. For example, consider the following question:

Which of the following best describes your views on gun control?

  1. The government should take away our guns, leaving us defenseless against heavily armed criminals.
  2. We have the right to keep and bear arms.

This question is biased toward eliciting a response against gun control.

Exercises for Section 1.5

CONCEPTS

1.24When is it appropriate to use stratified random sampling? What are strata, and how should strata be selected?<a onClick="window.open('/olcweb/cgi/pluginpop.cgi?it=jpg::::/sites/dl/free/0072977477/hm_logo_cmyk.jpg','popWin', 'width=NaN,height=NaN,resizable,scrollbars');" href="#"><img valign="absmiddle" height="16" width="16" border="0" src="/olcweb/styles/shared/linkicons/image.gif"> (K)</a>
1.25When is cluster sampling used? Why do we describe this type of sampling by using the term cluster? 
1.26Explain each of the following terms:
  1. Undercoverage
  2. Nonresponse
  3. Response bias
 
1.27Explain how to take a systematic sample of 100 companies from the 1,853 companies that are members of an industry trade association. 
1.28Explain how a stratified random sample is selected. Discuss how you might define the strata to survey student opinion on a proposal to charge all students a $100 fee for a new university-run bus system that will provide transportation between off-campus apartments and campus locations. 
1.29Marketing researchers often use city blocks as clusters in cluster sampling. Using this fact, explain how a market researcher might use multistage cluster sampling to select a sample of consumers from all cities having a population of more than 10,000 in a large state having many such cities. 

Exercises  1.24, 1.25, 1.26, 1.27, 1.28, 1.29








Business Statistic in PracticeOnline Learning Center

Home > Chapter 1 > 1.5 An Introduction to Survey Sampling (Optional)