Simple Random Sampling Without Replacement
Simple Random Sampling Without Replacement
Tejus Prabhu
2148116
Simple random sampling is of the most fundamental methods of estimation of a sample from a given population. It is a method in which the units are selected from the population in a random order without any predefined order with an equal chance of selection.
We will now explain the method of Simple Random Sampling Without Replacement in the following blog. It is a method of selection of n sample units from N population units such that at any point in time, any one of the units which would replace the other unit will have an equal chance (equal probability) of selection.
The figure above shows the procedure of Simple Random Sampling Without Replacement in a graphical manner in which the population consists of white and black balls. The selected sample units taken can consist of either purely white balls, purely black balls, or a mix of black and white balls. In this case, the first sample has two black and two white, the second and third sample contains one black and three white balls.
The following procedure of simple random sampling is:
Take N population units and label every unit from 1 to N.
We have to choose any number randomly and select the respective value which corresponds to the sample number.
If the sample values are repeated, ignore the process and continue selecting other samples without replacing anything.
The Probability of selecting a sample using the method is given by 1/(NCn)
The number of samples is given by (NCn).
Advantages of Simple Random Sampling Without Replacement:
It is a very simple method of estimation.
There is accuracy during the selection of samples.
The researcher doesn’t need to have any prior knowledge about the dataset and he can randomly select without any prior preparation.
Disadvantages of Simple Random Sampling Without Replacement:
It is not as effective as methods like Stratified or Systematic Sampling in general cases.
It is a costly and time-consuming process.
Bias can occur during the sampling process.
The chances of errors can occur as the method can also select outliers and not the desired samples.
Formulas Involving Simple Random Sampling Without Replacement:
The above formula represents the mean of the sample.
The above formula represents the mean of the population.
The above shows the variance of the sample population.
The above shows the unbiased estimator of the variance.
The confidence interval formulas are given by:
Lower Limit=sample mean - Z*Standard Error
Upper Limit=sample mean + Z*Standard Error
The Simple Random Sampling Without Replacement is considered to be more efficient than the method of Simple Random Sampling With Replacement due to the lower value of the variance obtained after substitution.
Applications of Simple Random Sampling Without Replacement:
Simple Random Sampling can be used in various domains of research in the field of business, medicine, engineering, etc. Large volumes of data are generated through various means of data collection and it can be difficult to analyze each and every unit of the population.
Now, we depict the method of simple random sampling without replacement using the R Programming language.
Simple Random Sampling Without Replacement
Tejus Prabhu
11/23/2021
AIM
We have to take a sample from the given population using the method of Simple Random Sampling Without Replacement and provide the estimates.
Data Description
Here, the data gives the information about the Covid-19 vaccination rates and the total number of people vaccinated in different parts of Turkey. The dataset has been taken from Kaggle.
Now, we have to import the dataset
library(readxl)
DS=read_excel("C:/Users/tejus/OneDrive/Desktop/covid19Vaccination.xlsx")
DS
## # A tibble: 81 x 13
## ID DATE_ SEQID CITY CITY2 `_1DOSE` `_2DOSE` `_TOTAL`
## <dbl> <dttm> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 438102 2021-06-26 22:41:31 5412 Adana Adana 784464 355404 1139868
## 2 438103 2021-06-26 22:41:31 5412 Adiyaman Adiy~ 170847 77955 248802
## 3 438104 2021-06-26 22:41:31 5412 Afyonkarahisar Afyon 292611 138934 431545
## 4 438105 2021-06-26 22:41:31 5412 Agri Agri 94807 41146 135953
## 5 438106 2021-06-26 22:41:31 5412 Aksaray Aksa~ 131889 62603 194492
## 6 438107 2021-06-26 22:41:31 5412 Amasya Amas~ 169285 87322 256607
## 7 438108 2021-06-26 22:41:31 5412 Ankara Anka~ 2652085 1241906 3893991
## 8 438109 2021-06-26 22:41:31 5412 Antalya Anta~ 1055350 530415 1585765
## 9 438110 2021-06-26 22:41:31 5412 Ardahan Arda~ 34629 18642 53271
## 10 438111 2021-06-26 22:41:31 5412 Artvin Artv~ 80554 43132 123686
## # ... with 71 more rows, and 5 more variables: POPULATION <dbl>,
## # DIFF_1DOSE <dbl>, DIFF_2DOSE <dbl>, DIFF_TOTAL <dbl>, PREVID <dbl>
summary(DS)
## ID DATE_ SEQID
## Min. :438102 Min. :2021-06-26 22:41:31 Min. :5412
## 1st Qu.:438122 1st Qu.:2021-06-26 22:41:31 1st Qu.:5412
## Median :438142 Median :2021-06-26 22:41:31 Median :5412
## Mean :438142 Mean :2021-06-26 22:41:31 Mean :5412
## 3rd Qu.:438162 3rd Qu.:2021-06-26 22:41:31 3rd Qu.:5412
## Max. :438182 Max. :2021-06-26 22:41:31 Max. :5412
## CITY CITY2 _1DOSE _2DOSE
## Length:81 Length:81 Min. : 24128 Min. : 13155
## Class :character Class :character 1st Qu.: 106990 1st Qu.: 49145
## Mode :character Mode :character Median : 197631 Median : 98896
## Mean : 396543 Mean : 182891
## 3rd Qu.: 395582 3rd Qu.: 187138
## Max. :6197550 Max. :2419591
## _TOTAL POPULATION DIFF_1DOSE DIFF_2DOSE
## Min. : 37283 Min. : 81910 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 157545 1st Qu.: 284923 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 294010 Median : 537762 Median : 1.000 Median : 0.0000
## Mean : 579434 Mean : 1032276 Mean : 8.407 Mean : 0.4938
## 3rd Qu.: 580524 3rd Qu.: 1081065 3rd Qu.: 6.000 3rd Qu.: 0.0000
## Max. :8617141 Max. :15462452 Max. :193.000 Max. :11.0000
## DIFF_TOTAL PREVID
## Min. : 0.000 Min. :438021
## 1st Qu.: 0.000 1st Qu.:438041
## Median : 1.000 Median :438061
## Mean : 8.901 Mean :438061
## 3rd Qu.: 6.000 3rd Qu.:438081
## Max. :204.000 Max. :438101
str(DS)
## tibble [81 x 13] (S3: tbl_df/tbl/data.frame)
## $ ID : num [1:81] 438102 438103 438104 438105 438106 ...
## $ DATE_ : POSIXct[1:81], format: "2021-06-26 22:41:31" "2021-06-26 22:41:31" ...
## $ SEQID : num [1:81] 5412 5412 5412 5412 5412 ...
## $ CITY : chr [1:81] "Adana" "Adiyaman" "Afyonkarahisar" "Agri" ...
## $ CITY2 : chr [1:81] "Adana" "Adiyaman" "Afyon" "Agri" ...
## $ _1DOSE : num [1:81] 784464 170847 292611 94807 131889 ...
## $ _2DOSE : num [1:81] 355404 77955 138934 41146 62603 ...
## $ _TOTAL : num [1:81] 1139868 248802 431545 135953 194492 ...
## $ POPULATION: num [1:81] 2258718 632459 736912 535435 423011 ...
## $ DIFF_1DOSE: num [1:81] 15 2 8 0 3 1 91 29 0 0 ...
## $ DIFF_2DOSE: num [1:81] 0 0 0 0 0 0 6 4 0 0 ...
## $ DIFF_TOTAL: num [1:81] 15 2 8 0 3 1 97 33 0 0 ...
## $ PREVID : num [1:81] 438021 438022 438023 438024 438025 ...
plot(DS)
Now, we have to select a sample of size 36 from a population of size 81.
library(samplingbook)
## Loading required package: pps
## Loading required package: sampling
## Loading required package: survey
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survival'
## The following objects are masked from 'package: sampling':
##
## cluster, strata
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
set.seed(420)
DD=DS[sample(1:nrow(DS),36,replace=F),]
DD
## # A tibble: 36 x 13
## ID DATE_ SEQID CITY CITY2 `_1DOSE` `_2DOSE` `_TOTAL`
## <dbl> <dttm> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 438106 2021-06-26 22:41:31 5412 Aksaray Aksa~ 131889 62603 194492
## 2 438143 2021-06-26 22:41:31 5412 Kahramanmaras Kahr~ 336185 149207 485392
## 3 438179 2021-06-26 22:41:31 5412 Van Van 236641 91208 327849
## 4 438174 2021-06-26 22:41:31 5412 Tekirdag Teki~ 477726 208146 685872
## 5 438138 2021-06-26 22:41:31 5412 Hatay Hatay 512074 223365 735439
## 6 438122 2021-06-26 22:41:31 5412 Bursa Bursa 1272038 550539 1822577
## 7 438126 2021-06-26 22:41:31 5412 Denizli Deni~ 475605 212478 688083
## 8 438125 2021-06-26 22:41:31 5412 Çorum Corum 238440 121891 360331
## 9 438142 2021-06-26 22:41:31 5412 Izmir Izmir 2136705 1006223 3142928
## 10 438161 2021-06-26 22:41:31 5412 Mus Mus 62187 29056 91243
## # ... with 26 more rows, and 5 more variables: POPULATION <dbl>,
## # DIFF_1DOSE <dbl>, DIFF_2DOSE <dbl>, DIFF_TOTAL <dbl>, PREVID <dbl>
A sample of size 36 is created as a result of using the method of Simple Random Sampling Without Replacement.
plot(DD)
Now, we have to analyze by checking certain properties.
We take the Population of the cities of Turkey and analyze it..
We have to estimate the confidence intervals.
Mean=mean(DD$POPULATION)
Mean #sample mean
## [1] 1410072
variance=var(DS$POPULATION)
variance #population variance
## [1] 3.50654e+12
N=81
n=36
V=((N-n)/(N*N))*variance
V # variance of the estimate.
## [1] 24050344277
SE=sqrt(V)
SE # The standard error
## [1] 155081.7
LL=Mean-1.96*SE
LL #lower limit
## [1] 1106111
UL=Mean+1.96*SE
UL
## [1] 1714032
CI=c(LL,UL)
CI
## [1] 1106111 1714032
These are the confidence intervals for a 95% level of significance. We had taken 1.96 as the sample size is above 30. Conclusions: There is a 95% chance that the sample mean will lie between the above values.
Comments
Post a Comment