Exhaustive Comparision of SRSWOR and SRSWR
Aim
We aim to make a blog on the topic of SRSWOR and SRSWR in a brief fashion and use R programming to draw important conclusions.
Objective
Aim of this blog is to illustrate that SRSWOR is a better sampling technique than SRSWR
What is Sampling?
To get to the deeper concepts, we first need to thoroughly understand what a sample is. A sample is a small part or quantity taken from something large that is usually representative of the population. Now, a random sample is one in which each population unit has an equal chance of getting picked.
Aim of the
Experiment
The aim here is to take a dataset and use
any of the variables to compare the variances the samples suffer from, in each
of SRSWOR and SRSWR. Since we have already established above that SRSWOR is a
better sampling method, this experiment serves to prove our statement about the
same.
Data Description
Data was acquired from freely available
data pool from Kaggle which depicts various covid related details such as death,
confirmed cases, healed, etc; segregated on the basis of countries. We take the
number of deaths as our variable of study and take samples using both SRSWOR
and SRSWR
Codes and
analysis
library(readxl)
DS<-read_excel("C:/Users/ShinPyro/Desktop/College/Assignment/Countrywise
Covid dist.xlsx")
is.data.frame(DS)
## [1] TRUE
head(DS)
#We get an idea of the dataset we are using with the help of the Head()
function and we already know our data is in the form of a dataframe as seen
above.
## # A tibble: 6 x 15
##
`Country/Region` Confirmed
Deaths Recovered Active `New cases` `New deaths`
##
<chr>
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan 36263 1269
25198 9796 106 10
## 2 Albania 4880 144
2745 1991 117 6
## 3 Algeria 27973 1163
18837 7973 616 8
## 4 Andorra 907 52
803 52 10 0
## 5 Angola 950 41
242 667 18 1
## 6 Antigua and Barbuda 86
3 65 18 4 0
## # ... with 8 more variables: New recovered
<dbl>, Deaths / 100 Cases <dbl>,
## # Recovered
/ 100 Cases <dbl>, Deaths / 100 Recovered <chr>,
## # Confirmed
last week <dbl>, 1 week change <dbl>, 1 week % increase
<dbl>,
## # WHO
Region <chr>
#As we can see, the data
is arranged in the alphabetical order
PTotal <- sum(DS$Deaths) #Here we
check for the population total against which we will check the values of the
population total estimate which will be found next using both SRSWOR and SRSWR
method.
## [1] 654036
set.seed(39014) #This
prevents the computer from taking different samples at different times
sampleWR<-sample(DS$Deaths,32,replace =
T, prob = NULL)
sampleWR #We use the
sample() function to take a sample of 32 units from the dataset DS$Deaths where
replacce is T, meaning the samples can be repeated
#We see the samples selected as follows:
##
[1] 6160 8777
7 43 116
121 483 4838
1166 408 7067
146
## [13]
0 6 0 35112
22 4838 26
5532 294 33408 1761
474
## [25]
1 24 34
748 0 1764
0 1945
m_WR<-mean(sampleWR)
m_WR
## [1] 3603.781
total_WR <- m_WR*187
total_WR
## [1] 673907.1
#This suggests that when we estimate the population using
the sample picked by the program, we get an estimated population total of
~673907 deaths.
set.seed(39014)
sampleWOR<-sample(DS$Deaths,32,replace =
F, prob = NULL)
sampleWOR
##
[1] 6160 8777
7 43 116
121 4838 1166
408 7067 0
6
## [13] 0
35112 22 146
26 5532 294 33408
1761 474 1
24
## [25]
34 748 0
1764 0 1945
141 423
m_WOR<-mean(sampleWOR)
m_WOR
## [1] 3455.125
total_WOR <- m_WOR*187
total_WOR
## [1] 646108.4
#This suggests that when we estimate the population using
the sample picked by the program, we get an estimated population total of
~646108 deaths.
As has
been made clear above, SRSWOR gives a better estimate of the population total
than SRSWR but to further confirm our statement, we calculate the variance in
each case and calculate the gain in efficiency
N=187
n= 32
vsrswor <- (((N-n)/(N*n))*(var(DS$Deaths)))
vsrswor
## [1] 5149659
vsrswr <- (((N-1)/(N*n)) *(var(DS$Deaths)))
vsrswr
## [1] 6179591
gain<- (vsrswr-vsrswor)/vsrswr
gain
## [1] 0.1666667
Code based Conclusion
As we see
above, there is a net positive gain in the efficiency suggesting that samples
taken using SRSWOR technique yield better estimates for the population
parameters.
#########################################################################
Blog Conclusion
Under "Gain in Efficiency", we depicted how the variance in SRSWR method was higher than SRSWOR since there was some positive value involved in case of SRSWR that increased the overall variance.
If thought about practically, there is always the chance of getting the same sample unit over and over again in case of SRSWR which is definitely not an adequate representation of the population whereas in case of SRSWOR, since the chance of one unit getting picked twice or more is completely eliminated, the WOR method proves to be more representative of the population.
Our assumptions and proofs are further solidified when we see a 0.16 gain or 16% gain in efficiency when we use both techniques to calculate the variance from a real work dataset.
We also notice that SRSWOR gives an estimaate of population total that is much closer to the actual population toal values than the estimates we get from SRSWR method, hence confirming that SRSWOR is a better estimator of popoulation parameters than SRSWR










Comments
Post a Comment