HORVITZ-THOMPSON ESTIMATOR - An Unordered Estimator

HORVITZ-THOMPSON ESTIMATOR
An Unordered Estimator

ANGEL LAL-2148121

1MSTAT - Christ (Deemed to be University)

INTRODUCTION

Horvitz-Thompson estimator is the unbiased estimator of the population total. Horvitz-Thompson estimator is categorized under Unordered estimators because this estimator does not depend on the order in which the units are drawn within the sample. This estimator is named after Daniel G. Horvitz and Donovan J. Thompson, and was derived in 1952. This method is mostly applied in survey analysis, in account for missing data and also as many sources of unequal selection probabilities. This unordered estimator is proved to be more efficient compared to other ordered estimators.

METHODOLOGY

Suppose that the initial probability of selection of the units Ui is pi, where pi = xi/x for i = 1,2,...,N. The probability that the unit Ui is included in the sample would be given by

Further, the probability that both the units Ui and Uj are included in the sample is:

Suppose that yi be the value of the i-th unit with 𝜋i the probability of inclusion in the sample. The H-T estimator then defined as

And the H-T estimate of the mean is given by

In a Bayesian probabilistic frame work 𝜋i is considered the proportion of individuals in a target population belonging to the i-th stratum. So yi/𝜋i can be considered as an estimate of the complete sample of persons within i-th stratum.

In post stratified study designs, estimation of 𝜋 and 𝜇 are done in different steps and computing the variance of 𝛍HT is not that easy. Resampling techniques such as the bootstraps or the jack-knife can be applied to gain consistent estimate of the variance of the H-T estimator.

H-T ESTIMATOR AS AN UNBIASED ESTIMATOR OF POPULATION TOTAL IN PPSWOR

The H-T estimator can be shown as an unbiased estimator of the population total. Consider expectation of the Horvitz-Thompson estimator,

REDUCTION OF H-T ESTIMATOR TO THE ESTIMATOR OF POPULATION TOTAL UNDER SRSWOR

We know that in SRSWOR the probability that i-th unit is included in the unit is n/N. So,

VARIANCE OF THE HORVITZ-THOMPSON ESTIMATOR

A general form of linear estimator can be written as

where a_i is a random variate taking value 1 if i-th unit is drawn and 0 otherwise; c_i are the constants attached units U_i (i=1,2,...,N).

a_i follows a binomial distribution for a sample of size 1 with probability p_i. So E(a_i) = 𝜋_i(1-𝜋_i). Since a_i a_j is 1 only if both units are distinct and appear in the sample,

Now,

if it is unbiased.

Therefore;

So,

is an unbiased estimator. Now the sampling variance is given by,

The variance of the estimator depends on the quantities 𝜋ij and 𝜋i which are calculated from the sampling procedure.

UNBIASED SAMPLE ESTIMATOR OF VARIANCE OF H-T ESTIMATOR

An unbiased sample estimator of the variance of H-T Estimator is given by

provided that none of the 𝜋ij in the population is 0.

One of the drawback of this variance estimator is that the variance term does not reduce to 0 even when all the values are equal. Another one is that the variance term may assume negative values for some samples.

For n=2, the H-T estimator of the total is,

and its estimated variance is

HORVITZ-THOMPSON ESTIMATOR USING R PROGRAMMING

Let us discuss a scenario and see how to use H-T estimator in R programming to find out the variance and population total.

In a village, there are 8 orchards with the following number of trees and corresponding yields.

Let us find out the total production of the 8 orchards and the variance using H-T estimator, by selecting a sample of 3.

Here first we are selecting a sample of size 3 from the population using midzuno-sen sampling procedure and then finding out the variance and total population estimator using H-T estimator. H-T estimator is included in the samplingbook package in R-Programming. So first install and load the package using library function.

The syntax of the H-T estimator code in R is

htestimate(y, N, PI, pk, pik, method = 'ht')

with the arguments,

vector of observations

integer for population size

square matrix of second order inclusion probabilities with n rows and cols. It is necessary to be specified for variance estimation by methods 'ht' and 'yg'. Here for H-T estimator, 'ht' is used.

vector of first order inclusion probabilities of length n for the sample elements. It is necessary to be specified for variance estimation by methods 'hh' and 'ha'.

pik

an optional vector of first order inclusion probabilities of length N for the population elements . It can be used for variance estimation by method 'ha'.

method

method to be used for variance estimation. Options are 'yg' (Yates and Grundy) and 'ht' (Horvitz-Thompson), approximate options are 'hh' (Hansen-Hurwitz) and 'ha' (Hajek).

Now let us see the R code for the above mentioned problem.

INTERPRETATION OF THE R CODE

Here as we can see using the sample chosen from the population, htestimate() is applied to find out mean estimator and standard error of the data. Mean production of the orchards using H-T estimator on the sample is 38.875 which is same as the total mean we found out using the entire population. Standard error comes out to be 3.269 which explains that any change in the sample taken can vary the result to 3.269 units. From the mean estimator the total estimate of the population is found out and as we can see, the total production of the 8 orchards using H-T estimator is 311.

CONCLUSION

Horvitz-Thompson estimator is the unordered, unbiased estimator of the population total. The R command for using this estimate is htestimate(). This estimator is used when the sample is drawn from the population without any particular order. Also the estimator can be used for any sampling design when the estimator has only distinct units in the sample. It is observed that the H-T estimator under the Sen-Mid-Zuno scheme of sampling is more efficient than the other estimator with pps and replacement.

Search This Blog

Complex Sample Survey Designs