PROBABILITY PROPORTIONAL TO SIZE WITH REPLACEMENT

                       PROBABILITY PROPORTIONAL TO SIZE WITH REPLACEMENT

NAME: SIVAPRIYA

ROLL NO: 2148147

PPS SAMPLING:

DEFINITION:

                   Probability proportional to size (PPS) sampling is a method of sampling in which from a finite population, a size measure is available for each population unit before sampling and where the probability of selecting a unit is proportional to its size.

EXAMPLE:

If you are doing a survey of healthcare in different regions, you would grant a percentage of sample units based on the population of each area. Let us take a village. A village that has only 2,500 residents requires fewer healthcare facilities than a city of 250,000. If you survey employees in a company by department, the number of positions in each section is part of the sampling calculation. Every person is still part of the sampling, you are just breaking them down into separate populations as part of the analysis.

TYPES OF PPS SAMPLING:

There are two types of PPS sampling :

1) Probability proportional to size with replacement

2) Probability proportional to size without replacement

We are going to discuss about PPS sampling with replacement procedure.

PPS SAMPLING WITH REPLACEMENT (WR):
DEFINITION:

The probability of selection of a unit will not change and the probability of selecting a specified unit is the same at any stage. There is no redistribution of the probabilities after a draw.

There are two methods to draw a sample with PPSWR

1. CUMMULATIVE TOTAL METHOD :

The steps of selecting a simple random sampling of size n in this method are:

- associating the natural numbers from 1 to N units in the population and

 - then selecting those n units whose serial numbers correspond to a set of n numbers where each number is less than or equal to N which is drawn from a random number table.

This is the way ,














DISADVANTAGES:

    This procedure involves writing down the successive cumulative totals. This is time consuming and tedious if the number of units in the population is large. This problem can be overcome by the Lahiri’s method.

LAHIRI'S METHOD:
Let M=Max Xi i.e., maximum of the sizes of N units in the population or some convenient number greater than M . 
 The sampling procedure has the following steps: 
 1. Select a pair of the random number (i, j) such that 1<= i<= N, 1<=j<=M. 
 2.If j<= Xi ,then ith unit is selected otherwise rejected and another pair of random number is chosen.   3. To get a sample of size n , this procedure is repeated till n units are selected.



ADVANTAGES:

1. It does not require writing down all cumulative totals for each unit.
2. Sizes of all the units need not be known beforehand. We need only some number greater than the maximum size and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing sample under this scheme.  

DISADVANTAGES:
                  It results in the wastage of time and efforts if units get rejected. A draw is ineffective if one of the ineffective random numbers is selected.


NOTATIONS IN PPS SAMPLING WITH REPLACEMENT :

Yi -Value of study variable for the i th unit of the population, i = 1, 2,…,N. 

Xi -Known value of an auxiliary variable (size) for the i th unit of the population. 

Pi- Probability of selection of i th unit in the population at any given draw and is proportional to size .

In ppswr , an unbiased estimator of the population total Y is given by




The Variance is given by,


APPLICATION:

BUSINESS:
Applications of PPSWR sampling in business surveys can be split into two main categories: (a) sampling from area frames with a multi-stage design and (
b) sampling ultimate units directly from a list frame. The classical situation for (a) is a multi-stage sample where primary sampling units are drawn with probability proportional to some measure of the unit's size (PPS). Typically, due to extensive stratification, just a few units are selected in each stratum.

INDUSTRIES:
To find the production of different industrial companies that is when we want to find the production for different companies, we will use PPS sampling technique because the size of each company differs having this auxiliary variable, we can estimate the variable of interest that is total production.

AGRICULTURE:
In an agriculture survey, the yield depends on the area under cultivation. So here the size is the value of auxiliary variable X and the study variable Y is yield. For example, if we are having 100 farms with the area under crop and then we want to find the average yield per farm and population total we can use this technique to get accurate result.

R CODING:

AIM:

To draw a sample of size 8  using ppswr sampling technique  and estimate the relative efficiency of ppswr sampling  for estimating the total amount of the real estate farm loans on the nonreal estate farm loans with respect to the ratio estimator of population total.

DESCRIPTION OF THE DATA:

This data shows the amounts of real and non real estate farm loans in different states of US during 2007.

library(readxl)
US
<- read_excel("Estate in US.xlsx")
View(US)

library(samplingbook)

## Loading required package: pps

## Loading required package: sampling

## Loading required package: survey

## Loading required package: grid

## Loading required package: Matrix

## Loading required package: survival

##
## Attaching package: 'survival'

## The following objects are masked from 'package:sampling':
##
##     cluster, strata

##
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
##
##     dotchart

# ppswr sampling to select a sample of size n
set.seed(
24)
sample
<-ppswr(US$`Nonreal estate farm loans`,8)
sample

## [1] 15 13 34 24 27 43 15 36

ppswr<-US[sample, ]
ppswr

## # A tibble: 8 x 4
##   S.No. `State and Territory` `Nonreal estate farm loans` `Real estate farm loa~
##   <dbl> <chr>                                       <dbl>                  <dbl>
## 1    15 IA                                          3910.                  2327.
## 2    13 IL                                          2611.                  2131.
## 3    34 ND                                          1241.                   449.
## 4    24 MS                                           550.                   627.
## 5    27 NE                                          3585.                  1338.
## 6    43 TX                                          3520.                  1249.
## 7    15 IA                                          3910.                  2327.
## 8    36 OK                                          1716.                   612.

X<-sum(US$`Nonreal estate farm loans`)
n
<-8
N
<-50
# Estimate the average mean
avg_y
<-(X/n*N)*sum(US$`Real estate farm loans`/US$`Nonreal estate farm loans`)
avg_y

## [1] 18057941

# Estimate the population total
y_hat
<-(1/n)*X*sum(US$`Real estate farm loans`/US$`Nonreal estate farm loans`)
y_hat

## [1] 361158.8

# Estimate the variance of population total
vt
<-(1/(n*(n-1)))*((sum(US$`Real estate farm loans`* X/US$`Nonreal estate farm loans`)^2)-(n*y_hat^2))
vt

## [1] 130435690969

# Standard error
SE
<-sqrt(vt)
SE

## [1] 361158.8

# Estimate the variance of population mean
Vms
<-(1/(N^2))*vt
Vms

## [1] 52174276

# Standard error of population mean
se
<-sqrt(Vms)
se

## [1] 7223.176

# Gain efficiency of ppswr over ratio estimate of population total
x
<-var(US$`Real estate farm loans`)
x

## [1] 342021.5

y<-var(US$`Nonreal estate farm loans`)
y

## [1] 1176526

rh<-sum(US$`Real estate farm loans`)/sum(US$`Nonreal estate farm loans`)
rh

## [1] 0.6324964

Cr<-cor(US$`Real estate farm loans`,US$`Nonreal estate farm loans`)
Cr

## [1] 0.8038341

V_rat<-((N-n)/(N*n))*(y+rh^2*x+2*rh*Cr*x*y)
V_rat

## [1] 42963542250

gain<-((vt-V_rat)/V_rat)*100
gain

## [1] 203.5962

INTEPRETATION:

We have drawn a sample of size 8 from the population of size 50. They are 15 ,13, 34, 24, 27, 43, 15 and 36 using ppswr sampling technique.

We have estimated the population mean and total using ppswr. We get the estimated population mean as 18057941 and the population total as 361158.8. We have estimated variance and standard error for population total which are 130435690969 and 361158.8 respectively.

We have also estimated variance and standard error for population mean which are 52174276 and 7223.176 respectively.

The gain in efficiency of ppswr over ratio estimation is 203.5962 which implies that ppswr is more efficient than ratio estimator.

CONCLUSION:

Thus from the above interpretation we can say that ppswr estimator gives the right estimate of a sample from the population and the estimate varies according to the sample size.


ri 















Comments

Popular posts from this blog

PPSWOR AND HORVITZ THOMPSON ESTIMATOR

Population Proportion of Size Without Replacement Using DesRaj Estimator

HORVITZ-THOMPSON ESTIMATOR - An Unordered Estimator