banner



How Can A Researcher Draw An Unbiased Sample?

Sampling is the statistical process of selecting a subset (called a "sample") of a population of interest for purposes of making observations and statistical inferences well-nigh that population. Social science research is generally about inferring patterns of behaviors within specific populations. We cannot study unabridged populations considering of feasibility and toll constraints, and hence, we must select a representative sample from the population of interest for observation and assay. Information technology is extremely important to cull a sample that is truly representative of the population so that the inferences derived from the sample can be generalized back to the population of interest. Improper and biased sampling is the primary reason for oft divergent and erroneous inferences reported in stance polls and exit polls conducted by different polling groups such every bit CNN/Gallup Poll, ABC, and CBS, prior to every U.S. Presidential elections.

The Sampling Process

Figure 8.1. The sampling process

The sampling process comprises of several stage. The first stage is defining the target population. A population can exist defined as all people or items ( unit of analysis ) with the characteristics that one wishes to study. The unit of measurement of analysis may exist a person, group, organization, country, object, or any other entity that yous wish to draw scientific inferences about. Sometimes the population is obvious. For case, if a manufacturer wants to make up one's mind whether finished goods manufactured at a production line meets sure quality requirements or must be scrapped and reworked, then the population consists of the entire ready of finished goods manufactured at that production facility. At other times, the target population may be a little harder to understand. If yous wish to identify the primary drivers of academic learning among high school students, then what is your target population: loftier school students, their teachers, school principals, or parents? The right respond in this case is high schoolhouse students, because yous are interested in their performance, non the performance of their teachers, parents, or schools. Besides, if you wish to clarify the behavior of roulette wheels to identify biased wheels, your population of interest is not unlike observations from a unmarried roulette wheel, merely different roulette wheels (i.e., their behavior over an space set of wheels).

The second step in the sampling process is to choose a sampling frame . This is an accessible section of the target population (usually a listing with contact data) from where a sample can exist drawn. If your target population is professional employees at work, because you cannot admission all professional employees around the world, a more than realistic sampling frame will be employee lists of 1 or two local companies that are willing to participate in your study. If your target population is organizations, then the Fortune 500 list of firms or the Standard & Poor'southward (Due south&P) list of firms registered with the New York Stock exchange may be adequate sampling frames.

Note that sampling frames may not entirely exist representative of the population at large, and if so, inferences derived by such a sample may not be generalizable to the population. For case, if your target population is organizational employees at large (e.one thousand., you lot wish to study employee self-esteem in this population) and your sampling frame is employees at automotive companies in the American Midwest, findings from such groups may non even be generalizable to the American workforce at large, let alone the global workplace. This is considering the American auto industry has been under severe competitive pressures for the last 50 years and has seen numerous episodes of reorganization and downsizing, possibly resulting in low employee morale and self-esteem. Furthermore, the majority of the American workforce is employed in service industries or in small businesses, and not in automotive industry. Hence, a sample of American auto manufacture employees is not particularly representative of the American workforce. Likewise, the Fortune 500 list includes the 500 largest American enterprises, which is non representative of all American firms in general, most of which are medium and small-sized firms rather than large firms, and is therefore, a biased sampling frame. In dissimilarity, the S&P list will let y'all to select large, medium, and/or small companies, depending on whether yous use the S&P big-cap, mid-cap, or small-cap lists, but includes publicly traded firms (and not private firms) and hence nonetheless biased. Also note that the population from which a sample is fatigued may not necessarily be the aforementioned as the population most which nosotros actually want data. For case, if a researcher wants to the success rate of a new "quit smoking" program, and then the target population is the universe of smokers who had access to this program, which may exist an unknown population. Hence, the researcher may sample patients arriving at a local medical facility for smoking cessation treatment, some of whom may non have had exposure to this particular "quit smoking" programme, in which example, the sampling frame does non represent to the population of interest.

The last step in sampling is choosing a sample from the sampling frame using a well-defined sampling technique. Sampling techniques can be grouped into two broad categories: probability (random) sampling and not-probability sampling. Probability sampling is ideal if generalizability of results is important for your report, but at that place may be unique circumstances where non-probability sampling can also be justified. These techniques are discussed in the next ii sections.

Probability Sampling

Probability sampling is a technique in which every unit in the population has a gamble (non-nada probability) of being selected in the sample, and this chance can exist accurately determined. Sample statistics thus produced, such as sample mean or standard departure, are unbiased estimates of population parameters, as long as the sampled units are weighted according to their probability of pick. All probability sampling accept two attributes in common: (i) every unit in the population has a known non-naught probability of existence sampled, and (2) the sampling procedure involves random selection at some point. The unlike types of probability sampling techniques include:

Simple random sampling. In this technique, all possible subsets of a population (more accurately, of a sampling frame) are given an equal probability of being selected. The probability of selecting any set of due north units out of a total of Due north units in a sampling frame is N C n . Hence, sample statistics are unbiased estimates of population parameters, without any weighting. Unproblematic random sampling involves randomly selecting respondents from a sampling frame, but with large sampling frames, unremarkably a tabular array of random numbers or a computerized random number generator is used. For instance, if you wish to select 200 firms to survey from a list of grand firms, if this list is entered into a spreadsheet similar Excel, you can use Excel's RAND() function to generate random numbers for each of the 1000 clients on that list. Next, you sort the listing in increasing order of their corresponding random number, and select the first 200 clients on that sorted listing. This is the simplest of all probability sampling techniques; notwithstanding, the simplicity is also the forcefulness of this technique. Because the sampling frame is not subdivided or partitioned, the sample is unbiased and the inferences are virtually generalizable amid all probability sampling techniques.

Systematic sampling. In this technique, the sampling frame is ordered according to some criteria and elements are selected at regular intervals through that ordered list. Systematic sampling involves a random kickoff and then proceeds with the pick of every chiliad th element from that point onwards, where 1000 = North / due north , where k is the ratio of sampling frame size Due north and the desired sample size n , and is formally called the sampling ratio . It is important that the starting point is not automatically the offset in the list, but is instead randomly chosen from within the first thou elements on the listing. In our previous example of selecting 200 firms from a list of 1000 firms, you lot can sort the thousand firms in increasing (or decreasing) order of their size (i.e., employee count or almanac revenues), randomly select one of the first five firms on the sorted list, and then select every fifth firm on the list. This process will ensure that there is no overrepresentation of large or small firms in your sample, but rather that firms of all sizes are generally uniformly represented, as it is in your sampling frame. In other words, the sample is representative of the population, at least on the basis of the sorting criterion.

Stratified sampling. In stratified sampling, the sampling frame is divided into homogeneous and not-overlapping subgroups (called "strata"), and a unproblematic random sample is drawn inside each subgroup. In the previous example of selecting 200 firms from a listing of 1000 firms, you can starting time by categorizing the firms based on their size as large (more than 500 employees), medium (between 50 and 500 employees), and small (less than fifty employees). You can then randomly select 67 firms from each subgroup to make upwardly your sample of 200 firms. However, since there are many more small firms in a sampling frame than large firms, having an equal number of pocket-sized, medium, and large firms volition make the sample less representative of the population (i.e., biased in favor of large firms that are fewer in number in the target population). This is chosen not-proportional stratified sampling because the proportion of sample inside each subgroup does not reflect the proportions in the sampling frame (or the population of involvement), and the smaller subgroup (large-sized firms) is over-sampled . An culling technique will be to select subgroup samples in proportion to their size in the population. For example, if at that place are 100 large firms, 300 mid-sized firms, and 600 small firms, you can sample twenty firms from the "big" group, 60 from the "medium" group and 120 from the "small" group. In this instance, the proportional distribution of firms in the population is retained in the sample, and hence this technique is called proportional stratified sampling. Note that the non-proportional approach is particularly effective in representing small subgroups, such equally big-sized firms, and is not necessarily less representative of the population compared to the proportional arroyo, as long as the findings of the not-proportional approach is weighted in accordance to a subgroup'due south proportion in the overall population.

Cluster sampling. If you have a population dispersed over a broad geographic region, it may not be viable to acquit a simple random sampling of the entire population. In such case, information technology may be reasonable to separate the population into "clusters" (usually along geographic boundaries), randomly sample a few clusters, and measure all units inside that cluster. For instance, if you wish to sample city governments in the country of New York, rather than travel all over the state to interview central city officials (as yous may have to practise with a simple random sample), you can cluster these governments based on their counties, randomly select a set of iii counties, and then interview officials from every official in those counties. However, depending on between- cluster differences, the variability of sample estimates in a cluster sample will generally be higher than that of a unproblematic random sample, and hence the results are less generalizable to the population than those obtained from elementary random samples.

Matched-pairs sampling. Sometimes, researchers may want to compare ii subgroups within one population based on a specific benchmark. For case, why are some firms consistently more than profitable than other firms? To conduct such a study, y'all would have to categorize a sampling frame of firms into "loftier profitable" firms and "low assisting firms" based on gross margins, earnings per share, or some other measure out of profitability. Y'all would and then select a elementary random sample of firms in i subgroup, and match each firm in this grouping with a firm in the 2nd subgroup, based on its size, industry segment, and/or other matching criteria. Now, yous have two matched samples of loftier-profitability and low-profitability firms that you can study in greater detail. Such matched-pairs sampling technique is ofttimes an ideal mode of understanding bipolar differences betwixt different subgroups within a given population.

Multi-phase sampling. The probability sampling techniques described previously are all examples of unmarried-phase sampling techniques. Depending on your sampling needs, you lot may combine these single-stage techniques to comport multi-stage sampling. For instance, yous can stratify a list of businesses based on firm size, so bear systematic sampling within each stratum. This is a 2-stage combination of stratified and systematic sampling. Likewise, yous can starting time with a cluster of school districts in the land of New York, and inside each cluster, select a uncomplicated random sample of schools; within each school, select a elementary random sample of class levels; and within each form level, select a simple random sample of students for study. In this case, yous accept a four-stage sampling process consisting of cluster and uncomplicated random sampling.

Not-Probability Sampling

Nonprobability sampling is a sampling technique in which some units of the population accept zero chance of selection or where the probability of choice cannot be accurately determined. Typically, units are selected based on sure non-random criteria, such as quota or convenience. Because selection is non-random, nonprobability sampling does not allow the estimation of sampling errors, and may be subjected to a sampling bias. Therefore, information from a sample cannot be generalized back to the population. Types of not-probability sampling techniques include:

Convenience sampling. Also called accidental or opportunity sampling, this is a technique in which a sample is fatigued from that part of the population that is close to paw, readily available, or convenient. For instance, if y'all stand up exterior a shopping center and paw out questionnaire surveys to people or interview them every bit they walk in, the sample of respondents you will obtain volition be a convenience sample. This is a not-probability sample considering yous are systematically excluding all people who shop at other shopping centers. The opinions that you would go from your called sample may reflect the unique characteristics of this shopping eye such as the nature of its stores (east.grand., high end-stores will attract a more than affluent demographic), the demographic profile of its patrons, or its location (e.m., a shopping middle close to a university volition attract primarily university students with unique purchase habits), and therefore may not be representative of the opinions of the shopper population at large. Hence, the scientific generalizability of such observations will be very limited. Other examples of convenience sampling are sampling students registered in a certain class or sampling patients arriving at a certain medical clinic. This type of sampling is most useful for airplane pilot testing, where the goal is instrument testing or measurement validation rather than obtaining generalizable inferences.

Quota sampling. In this technique, the population is segmented into mutually-exclusive subgroups (simply as in stratified sampling), and and so a non-random ready of observations is chosen from each subgroup to see a predefined quota. In proportional quota sampling , the proportion of respondents in each subgroup should match that of the population. For example, if the American population consists of 70% Caucasians, 15% Hispanic-Americans, and thirteen% African-Americans, and you wish to understand their voting preferences in an sample of 98 people, yous tin can stand up outside a shopping center and ask people their voting preferences. But you lot will take to cease asking Hispanic-looking people when you lot accept 15 responses from that subgroup (or African-Americans when y'all take 13 responses) even every bit yous go on sampling other ethnic groups, so that the ethnic composition of your sample matches that of the full general American population. Not-proportional quota sampling is less restrictive in that you don't have to achieve a proportional representation, but perhaps meet a minimum size in each subgroup. In this case, you may make up one's mind to have fifty respondents from each of the 3 ethnic subgroups (Caucasians, Hispanic-Americans, and African- Americans), and stop when your quota for each subgroup is reached. Neither type of quota sampling will be representative of the American population, since depending on whether your study was conducted in a shopping center in New York or Kansas, your results may be entirely different. The non-proportional technique is even less representative of the population just may be useful in that it allows capturing the opinions of minor and underrepresented groups through oversampling.

Skilful sampling. This is a technique where respondents are chosen in a non-random manner based on their expertise on the phenomenon being studied. For instance, in society to empathize the impacts of a new governmental policy such as the Sarbanes-Oxley Act, you tin can sample an group of corporate accountants who are familiar with this act. The reward of this approach is that since experts tend to be more familiar with the subject matter than non-experts, opinions from a sample of experts are more credible than a sample that includes both experts and non-experts, although the findings are still non generalizable to the overall population at large.

Snowball sampling. In snowball sampling, y'all get-go by identifying a few respondents that match the criteria for inclusion in your study, then enquire them to recommend others they know who also run across your pick criteria. For instance, if yous wish to survey computer network administrators and you know of merely i or 2 such people, you can start with them and enquire them to recommend others who also do network administration. Although this method inappreciably leads to representative samples, it may sometimes be the only fashion to reach hard-to-reach populations or when no sampling frame is available.

Statistics of Sampling

In the preceding sections, we introduced terms such as population parameter, sample statistic, and sampling bias. In this section, we volition endeavor to understand what these terms mean and how they are related to each other.

When you measure out a certain observation from a given unit, such as a person'due south response to a Likert-scaled item, that observation is called a response (encounter Effigy 8.2). In other words, a response is a measurement value provided by a sampled unit of measurement. Each respondent will give you unlike responses to dissimilar items in an instrument. Responses from different respondents to the aforementioned item or observation tin exist graphed into a frequency distribution based on their frequency of occurrences. For a big number of responses in a sample, this frequency distribution tends to resemble a bell-shaped curve called a normal distribution , which tin can be used to estimate overall characteristics of the unabridged sample, such as sample mean (boilerplate of all observations in a sample) or standard difference (variability or spread of observations in a sample). These sample estimates are called sample statistics (a "statistic" is a value that is estimated from observed data). Populations likewise take means and standard deviations that could be obtained if we could sample the entire population. Nonetheless, since the entire population can never exist sampled, population characteristics are always unknown, and are chosen population parameters (and non "statistic" because they are non statistically estimated from data). Sample statistics may differ from population parameters if the sample is non perfectly representative of the population; the difference between the two is called sampling error . Theoretically, if we could gradually increase the sample size so that the sample approaches closer and closer to the population, then sampling mistake volition decrease and a sample statistic will increasingly approximate the corresponding population parameter.

If a sample is truly representative of the population, then the estimated sample statistics should be identical to corresponding theoretical population parameters. How do we know if the sample statistics are at least reasonably shut to the population parameters? Here, we need to empathise the concept of a sampling distribution . Imagine that you took three unlike random samples from a given population, equally shown in Figure 8.three, and for each sample, y'all derived sample statistics such every bit sample mean and standard difference. If each random sample was truly representative of the population, then your three sample means from the three random samples volition be identical (and equal to the population parameter), and the variability in sample means will be cypher. Merely this is extremely unlikely, given that each random sample volition likely constitute a unlike subset of the population, and hence, their means may be slightly unlike from each other. However, you lot can have these three sample ways and plot a frequency histogram of sample ways. If the number of such samples increases from three to 10 to 100, the frequency histogram becomes a sampling distribution. Hence, a sampling distribution is a frequency distribution of a sample statistic (like sample hateful) from a gear up of samples , while the commonly referenced frequency distribution is the distribution of a response (observation) from a single sample . Just similar a frequency distribution, the sampling distribution volition also tend to have more than sample statistics clustered effectually the hateful (which presumably is an gauge of a population parameter), with fewer values scattered around the hateful. With an infinitely large number of samples, this distribution volition arroyo a normal distribution. The variability or spread of a sample statistic in a sampling distribution (i.eastward., the standard difference of a sampling statistic) is called its standard error . In contrast, the term standard deviation is reserved for variability of an observed response from a single sample.

Effigy viii.ii. Sample Statistic.

The mean value of a sample statistic in a sampling distribution is presumed to be an judge of the unknown population parameter. Based on the spread of this sampling distribution (i.due east., based on standard error), it is likewise possible to approximate conviction intervals for that prediction population parameter. Conviction interval is the estimated probability that a population parameter lies within a specific interval of sample statistic values. All normal distributions tend to follow a 68-95-99 percentage dominion (meet Figure 8.4), which says that over 68% of the cases in the distribution lie inside one standard deviation of the mean value (µ + 1σ), over 95% of the cases in the distribution lie within two standard deviations of the mean (µ + 2σ), and over 99% of the cases in the distribution lie within three standard deviations of the mean value (µ + 3σ). Since a sampling distribution with an space number of samples will approach a normal distribution, the same 68-95-99 dominion applies, and it can be said that:

  • (Sample statistic + ane standard error) represents a 68% confidence interval for the population parameter.
  • (Sample statistic + two standard errors) represents a 95% confidence interval for the population parameter.
  • (Sample statistic + three standard errors) represents a 99% conviction interval for the population parameter.

Figure viii.3. The sampling distribution.

A sample is "biased" (i.eastward., not representative of the population) if its sampling distribution cannot be estimated or if the sampling distribution violates the 68-95-99 percent dominion. Every bit an aside, note that in almost regression analysis where we examine the significance of regression coefficients with p<0.05, we are attempting to see if the sampling statistic (regression coefficient) predicts the corresponding population parameter (true effect size) with a 95% confidence interval. Interestingly, the "vi sigma" standard attempts to identify manufacturing defects outside the 99% confidence interval or six standard deviations (standard deviation is represented using the Greek letter sigma), representing significance testing at p<0.01.

Figure 8.4. The 68-95-99 percent dominion for confidence interval.

Source: https://courses.lumenlearning.com/suny-hccc-research-methods/chapter/chapter-8-sampling/

Posted by: osbywaye1974.blogspot.com

0 Response to "How Can A Researcher Draw An Unbiased Sample?"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel