Probability Proportional to Size Sampling without replacement (PPSWOR) using Murthy’s unordered estimator

PPSWOR using Murthy’s unordered estimator

By: Prachi Sharma 
Roll no. 2148141

Abstract:

Sample Survey Design is an integral aspect of Statistical Analysis. In this blog, I’ve made an attempt to explain the insights of the Probability Proportion to Size (PPS) sampling technique. The PPS sample without replacement is selected by the Sen-Midzuno method and the parameters of a practical dataset is being estimated by using Murthy’s unordered estimator.

Introduction:

Starting from Scratch

Census v/s Sampling

An attempt to gather information about every individual unit in a population is called a census. This study is also known as a complete enumeration i.e., a complete count. 

Whereas, a sample is a part of the population or we can say that it is a subset of units in a population. This sample is then further examined to represent the whole.

Okay! But why do we need sampling?

There are many advantages of Sampling over the census. Few are listed as below:

·         Low cost

·         Less time consuming

·         Accuracy of data is high.

·         Suitable in limited resources.

·         Sampling is the only practical method when the population is infinite.

There are different types of sampling techniques. This means that we can select our desired sample in a number of different ways. The methodology used to select a sample from the population depends on the type of analysis being performed.

One such sampling method is Probability Proportion to size (PPS)

But before understanding what PPS is, first, let us understand what is meant by 

Simple Random Sampling (SRS)

SRS technique is a method of selecting a sample where every unit in the population is selected randomly. Just like its name this sampling scheme provides a random sample.

Probability Proportional to size sampling (PPS)

Probability Proportional to size sampling is a method of sampling in which every unit in the population is selected based on its size.

SRS v/s PPS

In SRS, the probability of selection of each unit is equal whereas under the PPS scheme each unit has a probability of selection that is proportional to its size.

If Y is the variable under study (dependant variable) and X is an auxiliary variable (independent variable) related to Y, then under the PPS technique, the units are selected with probability proportional to the value of X. This value is known as size. This is termed as probability proportional to a given measure of size (PPS) sampling. 

Why PPS and not SRS?

Whenever the units vary in size, simple random sampling is not an appropriate procedure as no importance is given to the size of the unit. Such auxiliary information about the size of the units can be utilized in selecting the sample so as to get more efficient estimators of the population parameter. PPS scheme takes the use of the size of auxiliary variable.

Where SRS is preferred:

If we survey the entire population of one company, then the population of each department may not be applicable this sampling would not work within the structure of a PPS survey because the size of each section does not matter.

Where PPS is preferred:

If we were to survey a company regarding a subject that affects each department, such as the number of break rooms to invest in each section, the population of the department becomes a key factor when we must take the size of different sections of the population into account, PPS is probably the way to create sampling.

Applications:

1.     Auditing

Under auditing domain, the objective of using PPS sampling is to test account balances. It tests the reasonableness of a recorded account balance or class of transactions. It is used to determine the accuracy of financial accounts that is to test for overstatements. The process of using PPS in testing this company’s accounts balances involves the following steps.

2.     Industries:

The objective of PPS is to find the production of different industrial companies that is when we want to find the products for different companies, we will use the PPS sampling technique because the size of each company differs having this auxiliary variable, we can estimate the variable of interest that is total production.

3.      Agriculture:

In an agriculture survey, the yield depends on the area under cultivation. So here the size is the value of auxiliary variable X and the study variable Y is yield. For example, if we are having 100 farms with the area under crop and then we want to find the average yield per farm and population total we can use this technique to get an accurate result.

Getting Started:

As we know, sampling without replacement is more efficient than with replacement, and this rule also applies to PPS sampling. Since the probability of inclusion changes by draws or selected units' order, then PPSWOR is divided into ordered estimator and unordered estimator for better clarity.

Furthermore, to overcome the difficulty of changing expectations with each draw, associate a new variate with each draw such that its expectation is equal to the population value of the variate under study. Such estimators take into account the order of the draw. They are called ordered estimates.

For selecting a sample there are always two ways: with replacement and without replacement

Also, we know that sampling without replacement is more efficient than with replacement, this is also the case with PPS sampling. As without replacement technique provides us better estimates, in this blog we’ll discuss only this method.

Sen – Midzuno Method

The without replacement sample will be selected by the Sen-Midzuno method.

This method consists in selecting the first unit with PPSWOR and the remaining units by SRSWOR.

If we are required to select a sample with “n” units from a population of size “N”, then according to the Sen-Midzuno method after the selection of the first unit by PPSWOR the remaining “n-1” units are selected from “N-1” units of the population by SRSWOR.

The inclusion probabilities for individual and pairwise units for this selection procedure are given as under:


Considering the sample units vary from “i” to “q” in a sample with size “n”, the probability of including these “n” units in the sample is given by:


Ordered and unordered estimators

The important point to be noted here is that the probability of inclusion changes by draw or the order of selected units. And hence here the estimators are divided among “ordered estimators” and “unordered estimators”.

Ordered estimators are those which are based on the order of units selected in the sample and don’t require calculations of inclusion probabilities.

On the other hand, unordered estimators don’t depend on the order in which the units are drawn within the sample.

In many case studies, it is being noted that unordered estimators are more efficient than ordered estimators.

Murthy’s Unordered Estimator

Murthy’s Unordered Estimator can be obtained if all possible ordered estimators are weighted with their respective probabilities.

In sampling “n” units without replacement from a finite population of “N” units, there will be NCn unordered samples or samples. Each unordered sample of size “n” can be ordered in “M” or n! ways. We can say that an unordered sample corresponds to “M” ordered samples.

We’ll now consider a scheme of selection in which the probability of selecting the sample is “psi” The probability of getting the unordered sample is the sum of the probabilities of getting the ordered samples corresponding to samples.

Mathematically we can state as under:


Let y_si be an estimator of a population parameter “θ” based on the ordered sample si. An unordered estimator of “θ” is given by



In PPSWOR sampling, ym_cap is an unbiased estimator of θ and its sampling variance is given as under:



Example:

We’ll now understand with the help of a practical example. Suppose we’re having the data of 8 orchards. We need to estimate the total production of all 8 orchards from the sample selected using the Midzuno method. Also, we need to calculate the standard error of this estimator.

The following data is being given to us, where we have No. of trees as our auxiliary variable represented as (Xi) and Yield as our study variable denoted as (Yi)


We have stored this data in excel and now we're importing the dataset and doing a basic analysis of it.


Getting the number of rows in our data-set that is the total population size.

Loading the required package to run some functions.

Drawing a sample of size 2 using the Mid-Zuno method with PPSWOR sampling.

A sample of 2 units is being selected. Orchard second and sixth are there in the sample selected using the Midzuno method.

We'll now store the yield and number of trees along with their respective probabilities as under:


Using Murthy unordered estimator for total production.

The estimate of the variance of the estimator is calculated as under along with standard error.


Conclusion:

The standard error of the estimator is 3.648062.
The standard error tells us that how accurate the mean of any given sample from that population is likely to be compared to the true population mean.
This implies that if samples are repeated then on average there will be a difference of 3.648062 in the value of Murthy's estimator.

Comments

Popular posts from this blog

Population Proportion of Size Without Replacement Using DesRaj Estimator

PPSWOR AND HORVITZ THOMPSON ESTIMATOR