Estimation of gain due to Stratified Sampling (Neyman Allocation) over SRSWOR

Name: Anwesha Das
Registration No: 2148124
Email id: anwesha.das@stat.christuniversity.in

Estimation of gain due to Stratified Sampling (Neyman Allocation) over SRSWOR

Introduction:
In simple random sampling, it has been seen that the precision of the standard estimator of the population mean depends on two aspects, namely, sample size and the variability of character under study. Therefore in order to get an estimator with increased precision we can increase the sample size, which is not always possible every time. The other possible way to estimate the population mean with greater precision is to divide the population into several groups each of which is more homogeneous than the entire population and draw sample of predetermined size from each of these groups.

So in case of heterogeneous data in real life, we can use stratified sampling. There are many sampling allocations is stratified sampling depending on the nature of the strata viz. proportional allocations, Neyman optimum allocations, cost optimum allocations etc. We will discuss about Neyman allocations on Stratified sampling Without Replacement.

Objectives:
i) Estimation of population mean using Neyman allocation in stratified sampling.
ii) Estimation of population mean using SRSWOR
iii) Gain in stratification due to Stratified Sampling, Neyman Allocations over SRSWOR



Stratified Sampling:

In stratified Sampling, the efficiency can be increased greatly by dividing the heterogeneous population into homogeneous groups (strata) with respect to the characteristics under study and then method of selecting samples from each of the groups separately is called Stratified sampling. Stratified sampling is commonly used in large-scale surveys. Like voter surveys, house price prediction surveys etc.

The population of N units is stratified into k strata, the ith  strata having Ni th units. These strata are non-overlapping so that they comprise the whole population such that N1+N2+………+Nk=N.

A sample is drawn from each stratum independently, the sample size within the ith  stratum being ni such that n1+n2+…….+nk=n. The procedure of taking samples in this way is called a Stratified Sampling. If the sample is selected by simple random sampling from each stratum is called Stratified Random Sampling.

Methodology

Simple Random Sampling without Replacement (SRSWOR):

SRSWOR is a method of selection of n units of sample out of the N population units one by one such that at any stage of selection each unit has equal chance of being selected i.e. 1/N.

Neyman Optimum Allocation (for a given sample size):

For stratified sampling, we should carefully consider the problem of forming strata, sampling procedures for different strata, and allocation of sample sizes to the respective stratum. Sampling allocation is a method to allocate the sample from each stratum.

Neyman allocation is the one of the important sampling allocations. Neyman allocation is a special case of optimal allocation used when the costs in the strata are approximately equal and it is also called as minimum variance allocation. The allocation of samples among different strata is based on a consideration  of the stratum size and the stratum variation. In this allocation, it is assumed that the sampling cost per unit among different strata is the same and the size of the sample is fixed. The sample sizes is allocated by 

Under Neyman allocation, nh is proportional to Nh*Sh . If all variances in strata and costs are equal, proportional allocation is the same as optimal allocation.

A formula for minimum variance with fixed n is obtained by substituting the value of nh in variance of the estimate for simple random sampling


Advantages of Stratified Sampling:

1) Allows us to draw comparisons between subgroup of a population as the populations as the population is divided into homogeneous strata based on the shared characteristics.

2)  Most accurate and efficient probability sampling method compared to other sampling designs as elements are chosen from multiple distinct groups of a population, especially when aided by online survey tools.

3) Smaller sampling sizes can be used as stratified random sampling has high accuracy. This saves researchers' time while conducting the research.

Disadvantages of Stratified Sampling:

1) A sampling frame for each stratum is required in order to use this sampling method. This may make it harder and more tedious to conduct sampling.

2) More time consuming than other sampling methods, such as SRS or systematic sampling as more steps are required in the selection of the sample groups.

3) Stratified sampling is an expensive method of sampling as researchers need access to all elements of the target population in case they are selected to be a part of the sample groups.


Stratified sampling designs are widely used in sales surveys. In some cases, however the sample relatively small because of limitation of survey cost or other factors. The allocation method of sampling efforts among strata in thus survey plays a major role. Generally, four sampling designs are used in this fishery independent survey. One among the four designs is stratified sampling design are used in this fishery. Neyman allocation is used here to obtain the high precision with minimum variance.

Data Analysis:

Here we will discuss the Neyman allocation in stratified sampling using R code. For this purpose we collected a datasets on car sales from Kaggle. 

Data Description:

The data contains the manufacturer of the vehicle, model name of the vehicle, sales of cars, vehicle types, price of the cars etc.

In this blog we are interested to estimate the total sales of the cars (in thousand) .

Note: All the values regarding the sales of cars is given in thousands.

Here, we have stratified the data based upon the vehicle types to estimate the total sales of the cars.
Then we will compare the estimated value with the SRSWOR method of estimation.


To work with stratified sampling in R, here we will use a package "samplingbook", which we have already installed. But to use the package while knitting we have to run the library.




Now, for stratified sampling at first we have to identify the strata. Here we have taken the vehicle type "Passenger" as our first strata and "Car" as the second strata.





Now,  to use Neyman allocation we have to find out the required sample sizes from each strata considered. Here, we have considered the total sample sizes as 50.


As we got the number of sample units to be collected from each stratum, now our objective is to collect the samples from the respective stratums according to the sample sizes.


 


Now, as we got the respective samples from each stratum, now we have to combine the samples together for easier computation.


Now, the estimate for mean and total of total sales of the cars respectively are given below.



Now, we are interested in calculating the variance of the estimate of mean of total sales of cars under Neyman optimum allocation.


Now, to compare the Neyman estimates of mean of sales of cars with SRSWOR we have to calculate the same in SRSWOR method.



Form the beginning we were interested how much Neyman allocation benefits over SRSWOR. For that purpose only we will find the estimation of gain in efficiency due to stratification of Neyman optimum allocation over SRSWOR.


Interpretation:

The estimate of mean of sales of the cars is 39.12876, using Neyman optimum allocation.
The estimate of mean of sales of the cars is 58.04892, using SRSWOR.

The estimate of total of sales of the cars is 6143.216, using Neyman optimum allocation.
The estimate of total of sales of the cars is 9113.68, using SRSWOR.

The Standard Error of the estimate of mean of the sales of the cars is 5.597687 in Neyman allocation, whereas that for SRSWOR is 65.45. As we know the smaller the standard error the better the estimate, which implies that The Neyman allocation represents the population much better than SRSWOR. 
Also, we can conclude that 5.597687 amount of difference in estimate of mean of sales of the cars can be observed by taking different different samples using Neyman optimum allocation & 65.45 amount of difference can be seen for the same purpose using  SRSWOR. It is clearly visible that Neyman optimum allocation represents the population much more better than SRSWOR.

The estimation of gain in efficiency due to stratification i.e. Neyman allocation over SRSWOR is 1.458147, which implies that 1% gain in efficiency is there for Neyman optimum allocation over SRSWOR. We can conclude that there is no such need of doing stratification i.e. Neyman optimum allocation  over SRSWOR as SRSWOR is much more easy to compute.





Comments

Popular posts from this blog

PPSWOR AND HORVITZ THOMPSON ESTIMATOR

Population Proportion of Size Without Replacement Using DesRaj Estimator

HORVITZ-THOMPSON ESTIMATOR - An Unordered Estimator