Stratified Random Sampling under Optimum Allocation and Comparison with Simple Random Sampling
Stratified Random Sampling under Optimum Allocation and Comparison with Simple Random Sampling
Sampling Methods:
Sampling is a tool that is used to indicate how much data to collect and how often it should be collected.
There are many sampling techniques a researcher can use for drawing samples. The two main categories are 'Probability Sampling' and 'Non-probability Sampling'.
This article talks about 'Stratified Random Sampling' that comes under probability sampling and its efficiency when compared to the Simple Random Sampling.
Random samples are useful only when they give an equal representation of the population. But when the population is far from uniform, the method by which a sample is obtained is crucial. Thus, Simple Random Sampling is not always the most efficient method to draw samples.
In the real world, the population has many divisions based on various characteristics. These divisions seldom have equal proportions of data points. Here, drawing a sample by Simple Random Sampling may not give a precise representation of the population. In this situation, we use Stratified Random Sampling.
Stratified Random Sampling:
Stratified sampling is a type of sampling method in which the total population is divided into smaller groups or strata to complete the sampling process. The strata are formed based on some common characteristics in the population data. After dividing the population into strata, random samples are then selected from each stratum.
Here, the subpopulations, that is the strata are internally homogenous, which gives a precise estimate of the stratum mean. The strata should have maximum homogeneity internally and maximum heterogeneity between each other to get a precise estimate.
The strata or sub-groups should be different and the data should not overlap. While using stratified sampling, the researcher should use simple probability sampling. The population is divided into various subgroups such as age, gender, nationality, job profile, educational level etc. Stratified sampling is used when the researcher wants to understand the existing relationship between two or more groups. There are answer the question - How to construct the strata and how to allocate the weights for collecting the sample from each strata?
The two main methods for allocation of the stratum size are:
- Proportional Allocation
- Optimum Allocation
Consider the following example:
This data is about the Amazon Top 50 Bestselling Books from 2009 to 2019. The dataset contains 550 books and has been categorized into ‘fiction’ and ‘non-fiction’ using Goodreads. This characteristic can be the basis for stratification. To estimate the average price of a book in the top 50 list, we use stratified sampling technique.Suppose we have to estimate the mean price (in dollars) of the top 50 bestseller books using stratified random sampling with 'Optimum Allocation'. 'R' software can be used for the analysis. Out of 550, 240 top-selling books are 'Fictions' and 310 are 'non fictions'.
Therefore there will be 2 strata (or sub-populations) 'Fictions' of size 240 and 'Non-fictions' of size 310.
The price difference between the prices for two categories of books is significant in the figure which is also verified using the appropriate inferential test (with p-value = 0.032).
Obtaining stratum size, stratum mean and stratum standard deviation:
Means of prices (in dollars) for fiction books and non-fiction books are significantly different.
Mean price for fictions = 10.85 dollars.
Mean price for non-fictions = 14.84194 dollars.
Stratified random sampling using optimum allocation:
The estimate of mean prices (in dollars) for fictions = 10.02778 dollars, non-fictions = 15.375 dollars.
The estimate of mean price of books = 13.450 dollars. The SE of this estimate = 1.0149741. The 95% confidence interval for this estimate = (11.460687, 15.43931).
The variance of the estimate mean is an important measure to know the efficiency or the precision of the estimate.
The variance of mean price (in dollars) under Stratified Random Sampling with Optimum Allocation = 0.7420331.
Generating a simple random sample without replacement:
To compare the estimate price derived by stratified sampling with simple random sampling; we draw a random sample without stratification and compare its variance with the one gotten through stratified sampling.
Thus, the variance of mean price (in dollars) under Simple Random Sampling (SRSWOR) = 1.317934
Gain in precision:
Gain = Variance (SRSWOR) - Variance (Stratified Optimum Allocation) * 100
Variance(Stratified Optimum Allocation)
= 77.6112248 %
Conclusion:
The variance of the estimate under Simple Random Sampling is greater than that under Stratified Random Sampling. Thus, whenever the population is heterogeneous based on some characteristic, a sample obtained by Stratified Random Sample gives a more precise estimate.
Overlapping can be an issue if there are subjects that fall into multiple subgroups. When simple random sampling is performed, those who are in multiple subgroups are more likely to be chosen. The result could be a misrepresentation or inaccurate reflection of the population.
The above example, of course, has clearly defined groups; so stratification is easy to execute. In other situations, however, it might be far more difficult.
The main advantage of stratified random sampling is that it captures key population characteristics in the sample. Similar to a weighted average, this method of sampling produces characteristics in the sample that are proportional to the overall population. Stratified random sampling works well for populations with a variety of attributes but is otherwise ineffective if subgroups cannot be formed.
Stratification gives a smaller error in estimation and greater precision than the simple random sampling method. The greater the differences between the strata, the greater the gain in precision.





Comments
Post a Comment