STRATIFIED SAMPLING TECHNIQUE (WOR)-NEYMAN (OPTIMUM) AND PROPORTIONAL ALLOCATION AND GAIN IN EFFICIENCY DUE TO NEYMAN ALLOCATION OVER PROPORTIONAL ALLOCATION
ROLL NO.: 2148148
EMAIL ID:surotama.chakraborty@stat.christuniversity.in
DATE: 22.11.2021
Abstract:
In practice, we often found that the data is not homogenous. In such cases we use stratified sampling. There are many sampling schemes about allocation of sample size in stratified sampling. Here we will be discussing about neyman allocation and proportional allocation and will see which method yields efficient output.
Introduction:
Suppose we have a population of size N, which is suspected to be heterogenous wr.to study variable Y. In this case srs does not provide good sample. If somehow, we can divide the entire population in some non-overlapping groups or strata, then we are done. Where is the stratum size, h=1 (1) L and The strata are so formed that the units within a stratum are more or less homogeneous and heterogeneity is there between the strata. Under such situation, drawing units from entire population, the entire population is divided in some strata and then simple random samples are drawn from each stratum. This scheme is known as stratified random sampling. Methodology:
Neyman Allocation (Optimal Allocation):
For stratified sampling, we should carefully consider the problem of forming strata, sampling procedures for different strata, and allocation of sample sizes to the respective stratum. Sampling allocation is a method to allocate the sample from each stratum.
Neyman allocation is the one of the important sampling allocations. Neyman allocation is a special case of optimal allocation used when the costs in the strata are approximately equal and it is also called as minimum variance allocation. The allocation of samples among different strata is based on a consideration of the stratum size and the stratum variation. In this allocation, it is assumed that the sampling cost per unit among different strata is the same and the size of the sample is fixed. The sample sizes allocated by-
Proportional Allocation:
Sometimes, we do not have any information about stratum variance and per unit cost of survey. Then a rational way of allocating the total sample size in different strata to allocate in proportion to strata sizes, such that-
Gain in efficiency due to neyman (optimum)
allocation over proportional allocation:
Advantages of proportional allocation over neyman
allocation (optimum allocation):
In practice, we don’t have information about
stratum variance and per unit cost of survey, in such a situation proportional
allocation is the only way to allocate sample size.
In this case, sample size is the representative of
size of the stratum.
Disadvantages of proportional allocation over
neyman allocation (optimum allocation):
If the stratum variance differs from stratum to
stratum, then neyman allocation (optimum allocation) works better than
proportional allocation. Neyman allocation is more efficient in this case,
which is usually the case.
DATA ANALYSIS:
Now,
we will take a real-life dataset and using r software we will allocate sample
size using neyman (optimum) allocation and proportional allocation.
DATA DESCRIPTION:
This famous (Fisher’s or Anderson’s) iris
data set gives the measurements in centimeters of the variable’s sepal length
and width and petal length and width, respectively, for 50 flowers from each of
3 species of iris. The species are Iris setosa, versicolor, and virginica
Now,the population, the measurements in
centimeters of the variables sepal length of the flowers are heterogeneous
w.r.to the three species of iris. Again the measurements of sepal length of
flowers having same species are homogeneous in nature. Hence, if we divide the
entire population in some non-overlapping classes i.e. strata w.r.to
characteristics variable i.e.species,then by using this sampling scheme we can
find the sample size and estimate the population mean.
OBJECTIVE:
The main objective of this study is to
find random sample of adequate size using proportional and optimum allocation
and estimating the population mean and population total of measurements of
sepal length of flowers and finding gain due to stratification (optimum allocation)
over SRS.
CONCLUSION:
1.In this study, the appropriate variable of
stratification is species of iris. We have taken a random sample of size 54.
Under proportion allocation, the sample sizes for three strata are given by
18,18,18. Under optimum allocation, the sample sizes for three strata are given
by 14,20,20.
2.Estimated population mean of sepal
length of flowers under proportional allocation is 5.920370. Estimated standard
error of mean sepal length of flowers under proportional allocation is
0.06050068. That means 0.06050068 difference from population mean can be
interpreted by taking different samples. 95% confidence interval for mean sepal
length of flowers under proportional allocation is given by
[5.801791,6.038950]. That means out of 100 samples taken 95 of them will lie in
this bound.
Estimated population total of sepal
length of flowers under proportional allocation is 888.0556. Estimated standard
error of total sepal length of flowers under proportional allocation is
9.075102. That means 9.075102 difference from population total can be
interpreted by taking different samples. 95% confidence interval for total
sepal length of flowers under proportional allocation is given by [870.2687
,905.8424]. That means out of 100 samples taken 95 of them will lie in this
bound.
3.Estimated population mean of sepal
length of flowers under optimum allocation is 5.938889. Estimated standard
error of mean sepal length of flowers under optimum allocation is 0.05919022.
That means 0.05919022 difference from population mean can be interpreted by
taking different samples. 95% confidence interval for mean sepal length of flowers
under optimum allocation is given by [5.822878,6.054900]. That means out of 100
samples taken 95 of them will lie in this bound.
Estimated population total of sepal
length of flowers under optimum allocation is 890.8333. Estimated standard
error of total sepal length of flowers under optimum allocation is 8.878533.
That means 8.878533 difference from population total can be interpreted by
taking different samples. 95% confidence interval for total sepal length of
flowers under optimum allocation is given by [873.4317,908.2349]. That means
out of 100 samples taken 95 of them will lie in this bound.
4.The estimated gain in efficiency due to
optimum allocation over proportional allocation without replacement is
0.1828147, i.e. optimum allocation (Neyman’s allocation) is approximately
18% more efficient than proportional allocation without replacement.
5.Also it is to be noted that the
standard error for optimum allocation is less than proportion allocation for
both population mean and population total. We can conclude that optimum
allocation is more efficient than proportional allocation. But after computing
the gain in efficiency, we can see that 18% gain in efficiency due to neyman
(optimum) allocation over proportional allocation is there. Hence, we can conclude
that after performing the optimum allocation not that much substantial
improvement in the estimator can be seen. Thus, allocating sample size using
optimum allocation, will not yield better results for this particular data of
study.
Comments
Post a Comment