Stratified Random Sampling under Ratio and Regression Estimation

Stratified Random Sampling under Ratio and Regression Estimation 
Description : 

Sanjana Rajamani - 2148145

Stratified Random Sampling : 

A stratified random sampling is a method of sampling that divides a population into smaller sub-groups known as strata. These subpopulations are non-overlapping and together they comprise the whole of the population , so that

N1+N2+N3+.....+NL   = N


The subpopulations are called strata. Stratification can only be fully utilized when the Nh values are known. After strata have been determined, samples are drawn from each of them. Each sample is collected independently. Here the sample sizes within the strata are denoted by n1,n2+n3+.....+nL , respectively.



The stratified random sampling process differs from simple random sampling, in which data is random selected from an entire population, so each sample is equally likely to occur.


If a simple random sample is taken in each stratum, the whole procedure is described as stratified random sampling.


The Mean and Variance of Stratified random sampling is :

Mean is given by :


Variance is given by :

Stratification is a common technique and there are many reasons as to why this is commonly used and the principal ones are the following listed below:




  1. If data of known precision are needed for certain subdivisions of the population, each subdivision should be treated as a "population" in its own right.
  2. It is also mainly used for administrative convenience.
  3. For example, sampling issues may differ markedly among hotels, the general population, business lists, etc.
  4. This may result in a gain in precision when estimating the characteristics of the entire population. A heterogeneous population might be divided into subpopulations, each of which is homogeneous within itself.

The main property of the estimates from a stratified sample and with the best choice of sample sizes is to obtain maximum precision.

The main advantage of stratified random sampling is that it captures key characteristics of a population in a sample. As with a weighted average, this sampling method produces characteristics in the sample that are proportional to those in the overall population. When subgroups cannot be formed, stratified random sampling does not work well.

When do we use Stratified random sampling ?

  1. Ensuring the diversity of the sample
  2. Ensuring similar variance
  3. Lowering the overall variance in the population
  4. Allowing for a variety of data collection methods

As compared to simple random sampling, stratification results in fewer estimation errors and greater precision. The greater the difference between strata, the higher the precision.


Ratio estimates in Stratified Random Sampling : 

There are two ways in which a ratio estimate of the population total can be made. One is to make a separate ratio estimate of the total of each stratum and add these total. No assumption is made that the true ratio remains constant from stratum to stratum.We only require the knowledge of the separate totals. Estimate is given by,


A concern with the separate ratio estimate is that with small sample sizes per stratum, the individual stratum variance estimates will be biased, and this bias is extended across strata. small stratum sizes (ni x 20), or if the within-stratum ratios are approximately equal.

Regression estimates in Stratified Random Sampling : 

Like the ratio estimate, the linear regression estimate is designed to increase precision by the use of an auxiliary variate xi which is correlated with yi. This suggests an estimate based on the linear regression of yi on xi rather than on the ratio of the two variables.

As with ratio estimate , two types of regression estimate can be made in stratified random sampling. in the first estimate a separate estimate is computed for each stratum mean, that is,


Comparison between ratio and regression estimators:

Aim :

The main aim is to derive the estimate of the population mean using regression estimation and compare it with ratio estimation 

Objective :

To estimate the average real estate farm loans assuming that the average non-real estate farm loans in the country is known and is equal to $878.16. Also using the regression estimator to give the estimates with 95% confidence interval for this data set and discuss the results.

Notations :

X - Non-real estate farm loans 

Y - Real estate farm loans

Data Description :

Given below is a random sample of 21 states from a population of 50 states of a country using SRSWOR


Calling the required packages for our analysis : 



We first get our scatter plot to see the correlation and find the value of the correlation coefficient : 



We see the line is passing through the origin. The value of correlation coefficient is moderately high. Now we use the auxiliary variable X and use regression model function which gives the regression coefficient “b”. Using the weights from X we estimate the values of Y : 


In our regression model we take the intercept as 0 as our data points are passing through the origin :

Regression Estimator is y_reg=y_bar+b*(X_bar-x_bar) :

The value of Xbar is known

Now we will calculate the variance using regression estimator

Standard Error = First finding the unbiased estimate of the variance of y bar and then finding the square root of it. For this we need correlation coefficient (r),N,n and sample mean square y (sy2)

    

     Ratio Estimate :



Conclusions :

1.   Standard error of regression estimator is less than standard error of ratio estimator, hence regression estimator is a better method of estimation than ratio.

2.  The data shows a moderately high correlation between the variables. We obtain the regression estimate as 0.39819 with a standard error of 0.03368 The estimate of the mean is $594.1101 with the standard error estimate of 68.15831. The 95% confidence interval is (736.2859,451.9344). With the given value of the estimator of the mean, we can conclude that the population mean lies within these values.

3.   On comparing with ratio estimator whose standard error is 121.1869, we can conclude that regression estimator (with standard error 68.15831) is a better estimator than ratio estimator. The regression estimator is more precise than the ratio estimator unless y=kx, i.e. the relation between y and x is a straight line through the origin.






Comments

Popular posts from this blog

PPSWOR AND HORVITZ THOMPSON ESTIMATOR

Population Proportion of Size Without Replacement Using DesRaj Estimator

HORVITZ-THOMPSON ESTIMATOR - An Unordered Estimator