Probability Proportional to Size Sampling without replacement (PPSWOR) using Murthy’s unordered estimator
PPSWOR using Murthy’s unordered estimator
Abstract:
Sample Survey Design is an integral aspect of Statistical Analysis. In this blog, I’ve made an attempt to explain the insights of the Probability Proportion to Size (PPS) sampling technique. The PPS sample without replacement is selected by the Sen-Midzuno method and the parameters of a practical dataset is being estimated by using Murthy’s unordered estimator.
Introduction:
Starting from Scratch
Census v/s Sampling
An attempt
to gather information about every individual unit in a
population is called a census. This study is also known as a complete enumeration
i.e., a complete count.
Whereas, a sample is a part of the population
or we can say that it is a subset of units in a population. This sample is then
further examined to represent the whole.
Okay!
But why do we need sampling?
There are many advantages of Sampling over the census. Few are listed as below:
·
Low cost
·
Less time consuming
·
Accuracy of data is high.
·
Suitable in limited resources.
· Sampling is the only practical method when the population is infinite.
There
are different types of sampling techniques. This means that we can select our
desired sample in a number of different ways. The
methodology used to select a sample from the population depends on the type of
analysis being performed.
One such sampling method is Probability Proportion to size
(PPS)
But before understanding what PPS is, first, let us understand what is meant by
Simple Random Sampling (SRS)
SRS technique is a method of selecting a sample where every
unit in the population is selected randomly. Just like its name this
sampling scheme provides a random sample.
Probability Proportional to size sampling (PPS)
Probability Proportional to size sampling is a method of sampling in which every unit in the population is selected based on its size.
SRS v/s PPS
In SRS, the probability of selection of each unit is equal
whereas under the PPS scheme each unit has a probability of selection that is
proportional to its size.
If Y is the
variable under study (dependant variable) and X is an auxiliary variable
(independent variable) related to Y, then under the PPS technique, the units
are selected with probability proportional to the value of X. This value
is known as size. This is termed as probability proportional to a given measure
of size (PPS) sampling.
Why PPS and
not SRS?
Whenever the units vary in size, simple random sampling is not an appropriate procedure as no importance is given to the size of the unit. Such auxiliary information about the size of the units can be utilized in selecting the sample so as to get more efficient estimators of the population parameter. PPS scheme takes the use of the size of auxiliary variable.
Where SRS
is preferred:
If we survey the
entire population of one company, then the population of each department may
not be applicable this sampling would not work within the structure of a PPS
survey because the size of each section does not matter.
Where PPS
is preferred:
If we were to survey a company regarding a subject that affects each department, such as the number of break rooms to invest in each section, the population of the department becomes a key factor when we must take the size of different sections of the population into account, PPS is probably the way to create sampling.
Applications:
1. Auditing
Under auditing
domain, the objective of using PPS sampling is to test account balances. It tests
the reasonableness of a recorded account balance or class of transactions. It
is used to determine the accuracy of financial accounts that is to test for
overstatements. The process of using PPS in
testing this company’s accounts balances involves the following steps.
2. Industries:
The objective of PPS is to find the production of different industrial companies that is when we want to find the products for different companies, we will use the PPS sampling technique because the size of each company differs having this auxiliary variable, we can estimate the variable of interest that is total production.
3. Agriculture:
In an agriculture
survey, the yield depends on the area under cultivation. So here the size is
the value of auxiliary variable X and the study variable Y is yield. For
example, if we are having 100 farms with the area under crop and then we want
to find the average yield per farm and population total we can use this
technique to get an accurate result.
Getting Started:
As
we know, sampling without replacement is more efficient than with replacement,
and this rule also applies to PPS sampling. Since the probability of inclusion
changes by draws or selected units' order, then PPSWOR is divided into ordered
estimator and unordered estimator for better clarity.
Furthermore,
to overcome the difficulty of changing expectations with each draw,
associate a new variate with each draw such that its expectation is equal to
the population value of the variate under study. Such estimators take into
account the order of the draw. They are called ordered estimates.
For
selecting a sample there are always two ways: with replacement and without
replacement
Also, we know that sampling without replacement is more efficient than with replacement, this is also the case with PPS sampling. As without replacement technique provides us better estimates, in this blog we’ll discuss only this method.
Sen – Midzuno Method
The without replacement sample will be selected by the Sen-Midzuno method.
This method consists in selecting the first unit
with PPSWOR and the remaining units by SRSWOR.
If
we are required to select a sample with “n” units from a population of size
“N”, then according to the Sen-Midzuno method after the selection of the first unit by
PPSWOR the remaining “n-1” units are selected from “N-1” units of the population by
SRSWOR.
The inclusion probabilities for individual and pairwise units for this selection procedure are given as under:
Considering
the sample units vary from “i” to “q” in a sample with size “n”, the
probability of including these “n” units in the sample is given by:
Ordered and
unordered estimators
The important point
to be noted here is that the probability of inclusion changes by draw or the
order of selected units. And hence here the estimators are divided among
“ordered estimators” and “unordered estimators”.
Ordered estimators
are those which are based on the order of units selected in the sample and
don’t require calculations of inclusion probabilities.
On the other hand,
unordered estimators don’t depend on the order in which the units are drawn
within the sample.
In many case
studies, it is being noted that unordered estimators are more efficient than
ordered estimators.
Murthy’s
Unordered Estimator
Murthy’s
Unordered Estimator can be obtained if all possible ordered estimators are
weighted with their respective probabilities.
In
sampling “n” units without replacement from a finite population of “N” units,
there will be NCn unordered samples or samples. Each unordered sample of size
“n” can be ordered in “M” or n! ways. We can say that an unordered sample
corresponds to “M” ordered samples.
We’ll now
consider a scheme of selection in which the probability of selecting the sample
is “psi” The probability of getting the unordered sample is the sum of the
probabilities of getting the ordered samples corresponding to samples.
Mathematically
we can state as under:
Let y_si be an estimator of a population parameter “θ” based on the ordered sample si. An unordered estimator of “θ” is given by
In PPSWOR sampling, ym_cap is an unbiased estimator of θ and its sampling variance is given as under:
Example:
We’ll
now understand with the help of a practical example. Suppose we’re having the
data of 8 orchards. We need to estimate the total production of all 8 orchards
from the sample selected using the Midzuno method. Also, we need to calculate the standard error of this estimator.
The following data is being given to us, where we have No. of trees as our auxiliary variable represented as (Xi) and Yield as our study variable denoted as (Yi)
We have stored this data in excel and now we're importing the dataset and doing a basic analysis of it.
Getting the number of rows in our
data-set that is the total population size.
Loading the required package to run
some functions.
Drawing a sample of size 2 using the Mid-Zuno method with PPSWOR sampling.
A sample of 2 units is being selected. Orchard second and sixth are there in the sample selected using the Midzuno method.
We'll now store the yield and number of trees along with their respective probabilities as under:
Using Murthy unordered estimator for total production.
The estimate of the variance of the estimator is calculated as under along with standard error.
Comments
Post a Comment