STRATIFIED SAMPLING TECHNIQUE (WOR)-NEYMAN (OPTIMUM) AND PROPORTIONAL ALLOCATION AND GAIN IN EFFICIENCY DUE TO NEYMAN ALLOCATION OVER PROPORTIONAL ALLOCATION

NAME: SUROTAMA CHAKRABORTY
ROLL NO.: 2148148
EMAIL ID:surotama.chakraborty@stat.christuniversity.in
DATE: 22.11.2021
Abstract:
In practice, we often found that the data is not homogenous. In such cases we use stratified sampling. There are many sampling schemes about allocation of sample size in stratified sampling. Here we will be discussing about neyman allocation and proportional allocation and will see which method yields efficient output.

Introduction:
Suppose we have a population of size N, which is suspected to be heterogenous wr.to study variable Y. In this case srs does not provide good sample. If somehow, we can divide the entire population in some non-overlapping groups or strata, then we are done.
Where is the stratum size, h=1 (1) L and
The strata are so formed that the units within a stratum are more or less homogeneous and heterogeneity is there between the strata. Under such situation, drawing units from entire population, the entire population is divided in some strata and then simple random samples are drawn from each stratum. This scheme is known as stratified random sampling.
Methodology:

Neyman Allocation (Optimal Allocation):
For stratified sampling, we should carefully consider the problem of forming strata, sampling procedures for different strata, and allocation of sample sizes to the respective stratum. Sampling allocation is a method to allocate the sample from each stratum.
Neyman allocation is the one of the important sampling allocations. Neyman allocation is a special case of optimal allocation used when the costs in the strata are approximately equal and it is also called as minimum variance allocation. The allocation of samples among different strata is based on a consideration of the stratum size and the stratum variation. In this allocation, it is assumed that the sampling cost per unit among different strata is the same and the size of the sample is fixed. The sample sizes allocated by-
Proportional Allocation:
Sometimes, we do not have any information about stratum variance and per unit cost of survey. Then a rational way of allocating the total sample size in different strata to allocate in proportion to strata sizes, such that-

Gain in efficiency due to neyman (optimum) allocation over proportional allocation:


Advantages of proportional allocation over neyman allocation (optimum allocation):

In practice, we don’t have information about stratum variance and per unit cost of survey, in such a situation proportional allocation is the only way to allocate sample size.

In this case, sample size is the representative of size of the stratum.

Disadvantages of proportional allocation over neyman allocation (optimum allocation):

If the stratum variance differs from stratum to stratum, then neyman allocation (optimum allocation) works better than proportional allocation. Neyman allocation is more efficient in this case, which is usually the case.

DATA ANALYSIS:

Now, we will take a real-life dataset and using r software we will allocate sample size using neyman (optimum) allocation and proportional allocation.

DATA DESCRIPTION:

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variable’s sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica

Now,the population, the measurements in centimeters of the variables sepal length of the flowers are heterogeneous w.r.to the three species of iris. Again the measurements of sepal length of flowers having same species are homogeneous in nature. Hence, if we divide the entire population in some non-overlapping classes i.e. strata w.r.to characteristics variable i.e.species,then by using this sampling scheme we can find the sample size and estimate the population mean.

OBJECTIVE:

The main objective of this study is to find random sample of adequate size using proportional and optimum allocation and estimating the population mean and population total of measurements of sepal length of flowers and finding gain due to stratification (optimum allocation) over SRS.
















CONCLUSION:

1.In this study, the appropriate variable of stratification is species of iris. We have taken a random sample of size 54. Under proportion allocation, the sample sizes for three strata are given by 18,18,18. Under optimum allocation, the sample sizes for three strata are given by 14,20,20.

2.Estimated population mean of sepal length of flowers under proportional allocation is 5.920370. Estimated standard error of mean sepal length of flowers under proportional allocation is 0.06050068. That means 0.06050068 difference from population mean can be interpreted by taking different samples. 95% confidence interval for mean sepal length of flowers under proportional allocation is given by [5.801791,6.038950]. That means out of 100 samples taken 95 of them will lie in this bound.

Estimated population total of sepal length of flowers under proportional allocation is 888.0556. Estimated standard error of total sepal length of flowers under proportional allocation is 9.075102. That means 9.075102 difference from population total can be interpreted by taking different samples. 95% confidence interval for total sepal length of flowers under proportional allocation is given by [870.2687 ,905.8424]. That means out of 100 samples taken 95 of them will lie in this bound.

3.Estimated population mean of sepal length of flowers under optimum allocation is 5.938889. Estimated standard error of mean sepal length of flowers under optimum allocation is 0.05919022. That means 0.05919022 difference from population mean can be interpreted by taking different samples. 95% confidence interval for mean sepal length of flowers under optimum allocation is given by [5.822878,6.054900]. That means out of 100 samples taken 95 of them will lie in this bound.

Estimated population total of sepal length of flowers under optimum allocation is 890.8333. Estimated standard error of total sepal length of flowers under optimum allocation is 8.878533. That means 8.878533 difference from population total can be interpreted by taking different samples. 95% confidence interval for total sepal length of flowers under optimum allocation is given by [873.4317,908.2349]. That means out of 100 samples taken 95 of them will lie in this bound.

4.The estimated gain in efficiency due to optimum allocation over proportional allocation without replacement is 0.1828147, i.e. optimum allocation (Neyman’s allocation) is approximately 18% more efficient than proportional allocation without replacement.

5.Also it is to be noted that the standard error for optimum allocation is less than proportion allocation for both population mean and population total. We can conclude that optimum allocation is more efficient than proportional allocation. But after computing the gain in efficiency, we can see that 18% gain in efficiency due to neyman (optimum) allocation over proportional allocation is there. Hence, we can conclude that after performing the optimum allocation not that much substantial improvement in the estimator can be seen. Thus, allocating sample size using optimum allocation, will not yield better results for this particular data of study.


In political survey, stratified random sample acts better than systematic or simple random sampling. Here, the researcher would specifically seek to include participants of various minority groups such as race or religion, based on their proportionality to the total population as mentioned above.


Comments

Popular posts from this blog

PPSWOR AND HORVITZ THOMPSON ESTIMATOR

Population Proportion of Size Without Replacement Using DesRaj Estimator

HORVITZ-THOMPSON ESTIMATOR - An Unordered Estimator