Relative Efficiency of Cluster Sampling
MST171 SAMPLE SURVEY DESIGN
NAME: KEERTHANA A (2148135)
Relative Efficiency of Cluster Sampling
INTRODUCTION TO CLUSTER SAMPLING
The population has been defined as a collection of a finite number of
distinct and identifiable units known as sampling units. The element or
elementary unit of the population is the smallest identity content in a
population. Cluster refers to a collection of such basic units. Clusters are
formed up of elements that have a lot of similarities in their characteristics.
Cluster sampling occurs when these clusters are viewed as sampling units and a
small number of them are chosen with equal or unequal probabilities. The
elements in selected clusters will be seen, measured, and interviewed in their
entirety. The cluster should have a small number of elements and a large number
of clusters in the population. For example, if we are interested
in getting information or data on a colony's monthly average revenue,
the colony can be divided into N numbers of blocks called clusters, and a
simple random sample of n blocks can be taken. Individuals living in the chosen
clusters would be identified for interviewing in order to gather data.
EXAMPLE: Let us consider a case of cluster sampling in which a number of people in a city are to be interviewed. For selecting a sample, the telephone directories are used and it is decided to interview people through telephone. Now, since all the residents can be numbered, a random sampling technique could have been used to choose sample houses. Also, we could form strata of houses for high, middle, and low income groups. Now, if we choose houses throughout the city in random manner, then cost of visiting widely scattered dwellings will certainly be prohibitive. An alternative way of sample selection is to group blocks or areas into clusters of approximately equal population. Then, a number of these clusters can be chosen at Sampling random. Within each cluster, all households may be interviewed. On comparing this (cluster) sampling procedure with that of making random choice of households throughout the city, it is clear that the cost per element (a, household) is certainly going to be lower because of lower listing cost (as it is necessary only to list the houses on the blocks selected) and lower location cost. Also, it is going to be easy for an interviewer to talk to several people on one block rather than to several people scattered throughout the city. A necessary condition for the validity of this procedure is that every unit of the population under study must correspond to one and only one unit of the cluster so that the total number of sampling units in the frame may cover all the units of the population under study without any omission or duplication. When this condition is not satisfied, bias is introduced.
Construction of clusters
The clusters are constructed such that the sampling units are heterogeneous
within the clusters and homogeneous among the clusters. There are two ways to
construct the clusters- equal size and unequal size.
In case of equal clusters, when the population is divided into N clusters of each size n. We select a sample of n clusters from N clusters using Simple Random Sampling WOR. Then, the total population size = NM and the total sample size = nM.
Let yij : Value of the characteristic under study for the value of jth element (j = 1,2…,M) in the ith cluster (i = 1,2…,N).
Relative Efficiency
We note that the estimator for equal sized clusters is based on a sample of
nM units in the form of n clusters each consisting of M units. Thus, if the
same number of units are selected from a population of NM units by without
replacement simple random sampling procedure, then the sample mean estimator and
its variance V(Y) are given by the relations:
And, in relation to the sample mean estimator y, the relative efficiency of the estimator for equal sized clusters is given by
where V(Ycbarcap) denotes the variance for equal
sized cluster.
Observe that the relative efficiency defined
above involves value of study variable for all population units. However, in
practice, the investigator has only the sample observations of n clusters of M
units each. For this, he needs the estimates of two variances involved in the
formulae of relative efficiency (RE).
Then, estimator of relative efficiency of
estimator (for equal size clusters) with respect to the usual estimator from
a cluster sample is given by
Solving a problem using R.
A company has 25 centers located at different places in a State. Each
center has been provided with 4 telephones. In order to estimate the average
number of calls per telephone made on a typical day for this company, a sample
of 5 centers, using without replacement simple random sampling, were selected.
The data regarding the number of calls made on a typical working day from each
telephone of' the sample centers are as summarized in table.
Estimate the average number of daily calls per telephone made from all the 25 centers. Also, estimate the relative efficiency of the estimator used with respect to the usual sample mean estimator, from the sample selected above.
Advantages:
a) The cluster sampling provides significant gains in data collection costs, since traveling costs are smaller.
b) Since the researcher need not cover all the clusters and only a sample of clusters are covered, it becomes a more practical method which facilitates fieldwork.
Limitations:
a) The cluster sampling method is less precise than sampling of units from the whole population since the latter is expected to provide a better cross-section of the population than the former, due to the usual tendency of units in a cluster to be homogeneous.
b) The sampling efficiency of cluster sampling is likely to decrease with the decrease in cluster size or increase in number of clusters.
The above advantages or limitations of cluster sampling suggest that, in practical situations where sampling efficiency is less important but the cost is of greater significance, the cluster sampling method is extensively used.
Comments
Post a Comment