Random samples If the information contained in a sample is to accurately reflect the population under study, the sample should be randomly selected from the population. To intuitively illustrate random sampling, suppose that a small company employs 15 people and wishes to randomly select two of them to attend a convention. To make the random selections, we number the employees from 1 to 15, and we place in a hat 15 identical slips of paper numbered from 1 to 15. We thoroughly mix the slips of paper in the hat and, blindfolded, choose one. The number on the chosen slip of paper identifies the first randomly selected employee. Then, still blindfolded, we choose another slip of paper from the hat. The number on the second slip identifies the second randomly selected employee. Of course, it is impractical to carry out such a procedure when the population is very large. It is easier to use a random number tableA table containing random digits that is often used to select a random sample.. To show how to use such a table, we must more formally define a random sample.2 A random sampleA sample selected so that, on each selection from the population, every unit remaining in the population on that selection has the same chance of being chosen. is selected so that, on each selection from the population, every unit remaining in the population on that selection has the same chance of being chosen. |
To understand this definition, first note that we can randomly select a sample with or without replacement. If we sample with replacementAsampling procedure in which we place any unit that has been chosen back into the population to give the unit a chance to be chosen on succeeding selections., we place the unit chosen on any particular selection back into the population. Thus we give this unit a chance to be chosen on any succeeding selection. In such a case, all of the units in the population remain as candidates to be chosen for each and every selection. Randomly choosing two employees with replacement to attend a convention would make no sense because we wish to send two different employees to the convention. If we sample without replacementA sampling procedure in which we do not place previously selected units back into the population and, therefore, do not give these units a chance to be chosen on succeeding selections., we do not place the unit chosen on a particular selection back into the population. Thus we do not give this unit a chance to be selected on any succeeding selection. In this case, the units remaining as candidates for a particular selection are all of the units in the population except for those that have previously been selected. It is best to sample without replacementA sampling procedure in which we do not place previously selected units back into the population and, therefore, do not give these units a chance to be chosen on succeeding selections.. Intuitively, because we will use the sample to learn about the population, sampling without replacement will give us the fullest possible look at the population. This is true because choosing the sample without replacement guarantees that all of the units in the sample will be different (and that we are looking at as many different units from the population as possible). |  (K) CHAPTER 1 |
In the following example, we illustrate how to use a random number table, or computer-generated random numbers, to select a random sample. | Example 1.1 The Cell Phone Case: Estimating Cell Phone Costs3 |
Businesses and college students have at least two things in commonboth find cellular phones to be nearly indispensable because of their convenience and mobility, and both often rack up unpleasantly high cell phone bills. Students high bills are usually the result of overagea student uses more minutes than his or her plan allows. Businesses also lose money due to overage, and, in addition, lose money due to underage when some employees do not use all of the (already paid-for) minutes allowed by their plans. Because cellular carriers offer more than 10,000 rate plans, it is nearly impossible for a business to intelligently choose calling plans that will meet its needs at a reasonable cost. Rising cell phone costs have forced companies having large numbers of cellular users to hire services to manage their cellular and other wireless resources. These cellular management services use sophisticated software and mathematical models to choose cost efficient cell phone plans for their clients. One such firm, MobileSense Inc. of Westlake Village, California, specializes in automated wireless cost management. According to Doug L. Stevens, Vice President of Sales and Marketing at MobileSense, cell phone carriers count on overage and underage to deliver almost half of their revenues. As a result, a companys typical cost of cell phone use can easily exceed 25 cents per minute. However, Mr. Stevens explains that by using MobileSense automated cost management to select calling plans, this cost can be reduced to 12 cents per minute or less. In this case we will demonstrate how a bank can use a random sample of cell phone users to study its cellular phone costs. Based on this cost information, the bank will decide whether to hire a cellular management service to choose calling plans for the banks employees. While the bank has over 10,000 employees on a variety of calling plans, the cellular management service suggests that by studying the calling patterns of cellular users on 500-minute plans, the bank can accurately assess whether its cell phone costs can be substantially reduced. The bank has 2,136 employees on a 500-minute-per-month plan with a monthly cost of $50. The overage charge is 40 cents per minute, and there are additional charges for long distance and roaming. The bank will estimate its cellular cost per minute for this plan by examining the number of minutes used last month by each of 100 randomly selected employees on this 500-minute plan. According to the cellular management service, if the cellular cost per minute for the random sample of 100 employees is over 18 cents per minute, the bank should benefit from automated cellular management of its calling plans. In order to randomly select the sample of 100 cell phone users, the bank will make a numbered list of the 2,136 users on the 500-minute plan. This list is called a frameA list of all of the units in a population. This is needed in order to select a random sample.. The bank can then use a random number tableA table containing random digits that is often used to select a random sample., such as Table 1.1(a), to select the needed sample. To see how this is done, notice that any single-digit number in the table is assumed to have been randomly selected from the digits 0 to 9. Any two-digit number in the table is assumed to have been randomly selected from the numbers 00 to 99. Any three-digit number is assumed to have been randomly selected from the numbers 000 to 999, and so forth. Note that the table entries are segmented into groups of five to make the table easier to read. Because the total number of cell phone users on the 500-minute plan (2,136) is a four-digit number, we arbitrarily select any set of four digits in the table (we have circled these digits). This number, which is 0511, identifies the first randomly selected user. Then, moving in any direction from the 0511 (up, down, right, or leftit does not matter which), we select additional sets of four digits. These succeeding sets of digits identify additional randomly selected users. Here we arbitrarily move down from 0511 in the table. The first seven sets of four digits we obtain are  (K)
| | TABLE 1.1 Random Numbers |  (K) |
(See Table 1.1(a)these numbers are enclosed in a rectangle.) Since there are no users numbered 7156, 4461, 3990, or 4919 (remember only 2,136 users are on the 500-minute plan), we ignore these numbers. This implies that the first three randomly selected users are those numbered 0511, 0285, and 1915. Continuing this procedure, we can obtain the entire random sample of 100 users. Notice that, because we are sampling without replacement, we should ignore any set of four digits previously selected from the random number table. While using a random number table is one way to select a random sample, this approach has a disadvantage that is illustrated by the current situation. Specifically, since most four-digit random numbers are not between 0001 and 2136, obtaining 100 different, four-digit random numbers between 0001 and 2136 will require ignoring a large number of random numbers in the random number table, and we will in fact need to use a random number table that is larger than Table 1.1(a). Although larger random number tables are readily available in books of mathematical and statistical tables, a good alternative is to use a computer software package, which can generate random numbers that are between whatever values we specify. For example, Table 1.1(b) gives the MINITAB output of 100 different, four-digit random numbers that are between 0001 and 2136 (note that the leading 0s are not included in these four digit numbers). If used, the random numbers in Table 1.1(b) identify the 100 employees that should form the random sample. After the random sample of 100 employees is selected, the number of cellular minutes used by each employee during the month (the employees cellular usage) is found and recorded. The 100 cellular-usage figures are given in Table 1.2. Looking at this table, we can see that there is substantial overage and underagemany employees used far more than 500 minutes, while many others failed to use all of the 500 minutes allowed by their plan. In Chapter 2 we will use these 100 usage figures to estimate the cellular cost per minute for the 500-minute plan. | TABLE 1.2 A Sample of Cellular Usages (in minutes) for 100 Randomly Selected Employees  (11.0K) CellUse |  (K) |
Approximately random samples In general, to take a random sample we must have a list, or frameA list of all of the units in a population. This is needed in order to select a random sample., of all the population units. This is needed because we must be able to number the population units in order to make random selections from them (by, for example, using a random number table). In Example 1.1, where we wished to study a population of 2,136 cell phone users who were on the banks 500-minute cellular plan, we were able to produce a frame (list) of the population units. Therefore, we were able to select a random sample. Sometimes, however, it is not possible to list and thus number all the units in a population. In such a situation we often select a systematic sampleA sample taken by moving systematically through the population. For instance, we might randomly select one of the first 200 population units and then systematically sample every 200th population unit thereafter., which approximates a random sample. | Example 1.2 The Marketing Research Case: Rating a New Bottle Design4 |
The design of a package or bottle can have an important effect on a companys bottom line. For example, an article in the September 16, 2004, issue of USA Today reported that the introduction of a contoured 1.5-liter bottle for Coke drinks (including the reduced-calorie soft drink Coke C2) played a major role in Coca-Colas failure to meet third-quarter earnings forecasts in 2004. According to the article, Cokes biggest bottler, Coca-Cola Enterprises, said it would miss expectations because of the 1.5-liter bottle and the absence of common 2-liter and 12-pack sizes for C2 in supermarkets.5 | In this case a brand group is studying whether changes should be made in the bottle design for a popular soft drink. To research consumer reaction to a new design, the brand group will use the mall intercept method6 in which shoppers at a large metropolitan shopping mall are intercepted and asked to participate in a consumer survey. Each shopper will be exposed to the new bottle design and asked to rate the bottle image. Bottle image will be measured by combining consumers responses to five items, with each response measured using a 7-point Likert scale. The five items and the scale of possible responses are shown in Figure 1.1. Here, since we describe the least favorable response and the most favorable response (and we do not describe the responses between them), we say that the scale is anchored at its ends. Responses to the five items will be summed to obtain a composite score for each respondent. It follows that the minimum composite score possible is 5 and the maximum composite score possible is 35. Furthermore, experience has shown that the smallest acceptable composite score for a successful bottle design is 25. |  (K) Bonnie Kamin/PhotoEdit, Inc. |
| | FIGURE 1.1 The Bottle Design Survey Instrument |  (22.0K) |
In this situation, it is not possible to list and number each and every shopper at the mall while the study is being conducted. Consequently, we cannot use random numbers (as we did in the cell phone case) to obtain a random sample of shoppers. Instead, we can select a systematic sampleA sample taken by moving systematically through the population. For instance, we might randomly select one of the first 200 population units and then systematically sample every 200th population unit thereafter.. To do this, every 100th shopper passing a specified location in the mall will be invited to participate in the survey. Here, selecting every 100th shopper is arbitrarywe could select every 200th, every 300th, and so forth. By selecting every 100th shopper, it is probably reasonable to believe that the responses of the survey participants are not related. Therefore, it is reasonable to assume that the sampled shoppers obtained by the systematic sampling process make up an approximate random sample. During a Tuesday afternoon and evening, a sample of 60 shoppers is selected by using the systematic sampling process. Each shopper is asked to rate the bottle design by responding to the five items in Figure 1.1, and a composite score is calculated for each shopper. The 60 composite scores obtained are given in Table 1.3. Since these scores range from 20 to 35, we might infer that most of the shoppers at the mall on the Tuesday afternoon and evening of the study would rate the new bottle design between 20 and 35. Furthermore, since 57 of the 60 composite scores are at least 25, we might estimate that the proportion of all shoppers at the mall on the Tuesday afternoon and evening who would give the bottle design a composite score of at least 25 is 57/60 = .95. That is, we estimate that 95 percent of the shoppers would give the bottle design a composite score of at least 25. | TABLE 1.3 A Sample of Bottle Design Ratings (Composite Scores for a Systematic Sample of 60 Shoppers)  (11.0K) Design |  (9.0K) |
In Chapter 2 we will see how to estimate a typical composite score and we will further analyze the composite scores in Table 1.3. In some situations, we need to decide whether a sample taken from one population can be employed to make statistical inferences about another, related population. Often logical reasoning is used to do this. For instance, we might reason that the bottle design ratings given by shoppers at the mall on the Tuesday afternoon and evening of the research study would be representative of the ratings given by (1) shoppers at the same mall at other times, (2) shoppers at other malls, and (3) consumers in general. However, if we have no data or other information to back up this reasoning, making such generalizations is dangerous. In practice, marketing research firms choose locations and sampling times that data and experience indicate will produce a representative cross-section of consumers. To simplify our presentation, we will assume that this has been done in the bottle design case. Therefore, we will suppose that it is reasonable to use the 60 bottle design ratings in Table 1.3 to make statistical inferences about all consumers. To conclude this section, we emphasize the importance of taking a random (or approximately random) sample. Statistical theory tells us that, when we select a random (or approximately random) sample, we can use the sample to make valid statistical inferences about the sampled population. However, if the sample is not random, we cannot do this. A classic example occurred prior to the presidential election of 1936, when the Literary Digest predicted that Alf Landon would defeat Franklin D. Roosevelt by a margin of 57 percent to 43 percent. Instead, Roosevelt won the election in a landslide. Literary Digests error was to sample names from telephone books and club membership rosters. In 1936 the country had not yet recovered from the Great Depression, and many unemployed and low-income people did not have phones or belong to clubs. The Literary Digests sampling procedure excluded these people, who overwhelmingly voted for Roosevelt. At this time, George Gallup, founder of the Gallup Poll, was beginning to establish his survey business. He used an approximately random sample to correctly predict Roosevelts victory. As another example, todays television and radio stations, as well as newspaper columnists, use voluntary response samples. In such samples, participants self-selectthat is, whoever wishes to participate does so (usually expressing some opinion). These samples overrepresent people with strong (usually negative) opinions. For example, the advice columnist Ann Landers once asked her readers, If you had it to do over again, would you have children? Of the nearly 10,000 parents who voluntarily responded, 70 percent said that they would not. An approximately random sample taken a few months later found that 91 percent of parents would have children again. We further discuss random sampling in optional Section 1.5. CONCEPTS | 1.1 | Define a population. Give an example of a population that you might study when you start your career after graduating from college. |  (K) | | 1.2 | Define what we mean by a variable, and explain the difference between a quantitative variable and a qualitative (categorical) variable. | | | 1.3 | Below we list several variables. Which of these variables are quantitative and which are qualitative? Explain. - The dollar amount on an accounts receivable invoice.
- The net profit for a company in 2005.
- The stock exchange on which a companys stock is traded.
- The national debt of the United States in 2005.
- The advertising medium (radio, television, or print) used to promote a product.
| | | 1.4 | Explain the difference between a census and a sample. | | | 1.5 | Explain each of the following terms: - Descriptive statistics.
- Random sample.
- Statistical inference.
- Systematic sample.
| | | 1.6 | Explain why sampling without replacement is preferred to sampling with replacement. | | | METHODS AND APPLICATIONS | | 1.7 | The Forbes 2000 is a ranking of the worlds biggest companies (measured on a composite of sales, profits, assets and market values) by the editors of Forbes magazine. Below we give the best performing U.S. companies in the food, drink and tobacco industry from the Forbes 2000 as listed on the Forbes magazine website on February 2, 2005.  (11.0K) BestPerf  (48.0K)
Consider the random numbers given in the random number table of Table 1.1 (a). Starting in the upper left corner of Table 1.1 (a) and moving down the two leftmost columns, we see that the first three two-digit numbers obtained are  (K)
Starting with these three random numbers, and moving down the two leftmost columns of Table 1.1 (a) to find more two-digit random numbers, use Table 1.1 to randomly select five of these companies to be interviewed in detail about their business strategies. Hint: Note that we have numbered the companies in the Forbes list from 1 to 22. | | | 1.8 | Table 1.4 gives the most admired company in each of 30 industries as shown in the 2005 list of Global Most Admired Companies on the Fortune magazine website on March 14, 2005. Starting in the upper right corner of the random number table of Table 1.1 (a) and moving down the two rightmost columns, we see that the first three two-digit numbers obtained are  (K)
Starting with these three random numbers, and moving down the two rightmost columns of Table 1.1 (a) to find more two-digit random numbers, use Table 1.1 to randomly select four of these industries for further study.  (11.0K) MostAdm | | | THE VIDEO GAME SATISFACTION RATING CASE  (11.0K) VideoGame
A company that produces and markets video game systems wishes to assess its customers level of satisfaction with a relatively new model, the XYZ-Box. In the six months since the introduction of the model, the company has received 73,219 warranty registrations from purchasers. The company will randomly select 65 of these registrations and will conduct telephone interviews with the purchasers. Specifically, each purchaser will be asked to state his or her level of agreement with each of the seven statements listed on the survey instrument given in Figure 1.2. Here, the level of agreement for each statement is measured on a 7-point Likert scale. Purchaser satisfaction will be measured by adding the purchasers responses to the seven statements. It follows that for each consumer the minimum composite score possible is 7 and the maximum is 49. Furthermore, experience has shown that a purchaser of a video game system is very satisfied if his or her composite score is at least 42. | | FIGURE 1.2 The Video Game Satisfaction Survey Instrument |  (K) |
- Assume that the warranty registrations are numbered from 1 to 73,219 in a computer. Starting in the upper left corner of Table 1.1(a) and moving down the five leftmost columns, we see that the first three five-digit numbers obtained are
 (K)
Starting with these three random numbers and moving down the five leftmost columns of Table 1.1(a) to find more five-digit random numbers, use Table 1.1 to randomly select the numbers of the first 10 warranty registrations to be included in the sample of 65 registrations. - Suppose that when the 65 customers are interviewed, their composite scores are obtained and are as given in Table 1.5. Using the data, estimate limits between which most of the 73,219 composite scores would fall. Also, estimate the proportion of the 73,219 composite scores that would be at least 42.
| TABLE 1.5 Composite Scores for the Video Game Satisfaction Rating Case  (11.0K) VideoGame |  (K) |
| | | THE BANK CUSTOMER WAITING TIME CASE  (11.0K) WaitTime
A bank manager has developed a new system to reduce the time customers spend waiting to be served by tellers during peak business hours. Typical waiting times during peak business hours under the current system are roughly 9 to 10 minutes. The bank manager hopes that the new system will lower typical waiting times to less than six minutes. A 30-day trial of the new system is conducted. During the trial run, every 150th customer who arrives during peak business hours is selected until a systematic sample of 100 customers is obtained. Each of the sampled customers is observed, and the time spent waiting for teller service is recorded. The 100 waiting times obtained are given in Table 1.6. Moreover, the bank manager feels that this systematic sample is as representative as a random sample of waiting times would be. Using the data, estimate limits between which the waiting times of most of the customers arriving during peak business hours would be. Also, estimate the proportion of waiting times of customers arriving during peak business hours that are less than six minutes. | TABLE 1.6 Waiting Times (in Minutes) for the Bank Customer Waiting Time Case  (11.0K) WaitTime |  (K) |
| | | 1.11 | In an article titled Turned Off in the June 24, 1995, issue of USA Weekend, Dan Olmsted and Gigi Anders report on the results of a survey conducted by the magazine. Readers were invited to write in and answer several questions about sex and vulgarity on television. Olmsted and Anders summarized the survey results as follows:
Nearly all of the 65,000 readers responding to our write-in survey say TV is too vulgar, too violent, and too racy. TV execs call it reality.
Some of the key survey results were as follows: | |
l 96% are very or somewhat concerned about SEX on TV. l 97% are very or somewhat concerned about VULGAR LANGUAGE on TV. l 97% are very or somewhat concerned about VIOLENCE on TV. Note: Because participants were not chosen at random, the results of the write-in survey may not be scientific. - Note the disclaimer at the bottom of the survey results. In a write-in survey, anyone who wishes to participate may respond to the survey questions. Therefore, the sample is not random and we say that the survey is not scientific. What kind of people would be most likely to respond to a survey about TV sex and violence? Do the survey results agree with your answer?
- If a random sample of the general population were taken, do you think that its results would be the same? Why or why not? Similarly, for instance, do you think that 97 percent of the general population is very or somewhat concerned about violence on TV?
- Another result obtained in the write-in survey is as follows:
- Should V-chips be installed on TV sets so parents could easily block violent programming?
 (K)
If you planned to start a business manufacturing and marketing such V-chips (at a reasonable price), would you expect 90 percent of the general population to desire a V-chip? Why or why not?
Exercises 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11
2 Actually, there are several different kinds of random samples. The type we will define is sometimes called a simple random sample. For brevitys sake, however, we will use the term random sample. 3 The authors would like to thank Mr. Doug L. Stevens, Vice President of Sales and Marketing, at MobileSense Inc., Westlake Village, California, for his help in developing this case. 4 This case was motivated by an example in the book Essentials of Marketing Research by W. R. Dillon, T. J. Madden, and N. H. Firtle (Burr Ridge, IL: Richard D. Irwin, 1993). The authors also wish to thank Professor L. Unger of the Department of Marketing at Miami University for helpful discussions concerning how this type of marketing study would be carried out. 5Source: Coke says earnings will come up short, by Theresa Howard, USA Today, September 16, 2004, p. 801. 6 This is a commonly used research design. For example, see the Burke Marketing Research website at http://burke.com/about/inc_background.htm, Burke Marketing Research, March 26, 2005. |