Ratio Estimator in Stratified Sampling

 Ratio Estimator in Stratified Sampling

                               

                                - Rohan Regi (2148111)

Introduction:

The ratio estimator is a statistical parameter that equals the ratio of two random variables' means. When ratio estimates are utilized in experimental or survey work, they must be corrected for bias. Because the ratio estimates are asymmetrical, symmetrical tests like the t test should be avoided when generating confidence intervals.

Under various scenarios, the ratio estimator was found to be more exact than the traditional sample mean estimator in calculating the population mean of the studied character. Several academics have shifted their focus to finding more exact estimates by exploiting the prior value of specific demographic factors. At the estimate step, Searls (1964) employed the coefficient of variation of the studied character. The coefficient of variation is rarely known in practice. Various authors, including Sen (1978), Sesodiya, and Dewivedi, were inspired by Searls' (1964) work (1981) In the ratio technique of estimation, Singh et al (1991) and Upadhyaya and Singh (1984) employed the known coefficient of variation of an auxiliary character to estimate the population mean of the study character. Singh et al. (1973) were the first to employ the previous value of the coefficient of kurtosis in calculating the study character's population variance. Searls and Interapanich later utilized it (1990). Singh and Tailor (2003) have suggested a modified ratio estimator based on the known correlation coefficient value. When priori information on an auxiliary variable with some attribute is known, Jhajj et al (2006) and Singh, et al (2008) defined ratio estimators of population mean using the point biserial correlation coefficient between auxiliary attribute and study variable.

In stratified random sampling with auxiliary attributes, certain ratio-type estimators have been developed. Up to first level of approximation, equations for the bias and mean square errors of the proposed estimators have been developed. The suggested estimators are proven to be more efficient than the traditional combined ratio estimator under specific conditions when compared to the traditional combined ratio estimator.

There are two techniques to create estimates when utilizing ratio estimation with stratified random sampling. One method is to estimate ratios independently in each stratum before combining them. This creates a separate ratio estimator. The second method is to utilize estimators for stratified random sampling to calculate estimators for µy and µx, and then use y(bar)st/x(bar)st as a ratio estimator of µy/µx. A combined ratio estimator is the result of this.

Separate ratio-type estimators for the population mean are studied, along with their features. Separate ratio-type estimators for population mean based on known auxiliary variate parameters are proposed. The proposed estimators' bias and mean squared error are calculated up to the first degree of approximation. Under particular given circumstances, the suggested estimators are demonstrated to be more efficient than unbiased estimators in stratified random sampling and typical separate ratio estimators.

Combined ratio estimator: In the case of a separate estimator, it is assumed that the nm's in each stratum were big. However, it may not always be true in practice. X_bar is the population mean of X based on all the N = ∑_(i=1)^(N ) [N_i ]  units. It does not depend on individual stratum units. It does not depend on information on each Xi but only on X.

The separate ratio estimator's main drawback is that with small sample sizes per stratum, the individual stratum variance estimates will be biased, and this bias will be increased across strata. Unless the stratum sizes are tiny, such as (ni < 20), or the within-stratum ratios are almost identical, it is advised to employ the separate ratio estimator. The population totals are estimated by multiplying by the population number N, resulting in τy_(RS )= Nµy_(RS  ) or τy_(RC )  = Nµy_(RC  )

Formulas:

Estimating the Ratio


In a situation with two strata (labelled A and B), the expression for estimating a mean using the separate ratio estimator is

 with estimated variance:

In a situation with two strata (labelled A and B), the expression for estimating a mean using the combined ratio estimator is:

with estimated variance:



Properties of separate ratio estimator:

                         

             

                           

               ­­­­­

 

 

Properties of combined ratio estimator:

  

 

 

 

Comparison of combined and separate ratio estimators

An obvious question arises that which of the estimates (^Y_RS) or (^Y_RC) is better. So, we compare their MSEs. Note that the only difference in the term of these MSEs is due to the form of ratio estimate. It is

                                  

The difference D depends on

(i)              The magnitude of the difference between the strata ratios (R_i) and whole population ratio (R).

(ii)            The value of  R_i  S_ix^2- rS_ix S_iy is usually small and vanishes when the regression line of y on x is linear and passes through origin within each stratum. In such a case

                                                   

Advantages and Disadvantages of Ratio Estimator in stratified sampling:


Advantages

1.     Helps in forecasting and planning by performing trend analysis.

2.     Helps in estimating budget for the firm by analyzing previous trends.

3.     It helps in determining how efficiently a firm or an organization is operating.

4.     It provides significant information to users of accounting information regarding the performance of the business.

5.     It helps in comparison of two or more firms.

6.     It helps in determining both liquidity and long-term solvency of the firm.

 

Disadvantages:

1.     Financial statements seem to be complicated.

2.     Several organizations work in various enterprises each possessing different environmental positions such as market structure, regulation, etc., Such factors are important that a comparison of 2 organizations from varied industries might be ambiguous.

3.     Financial accounting data is influenced by views and hypotheses. Accounting criteria provide different accounting methods, which reduces comparability and thus ratio analysis is less helpful in such circumstances.

4.     Ratio analysis illustrates the associations between prior data while users are more concerned about current and future data.


 

Application:

 

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs and this dataset contains 11 features that can be used to predict a possible heart disease.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

Objective:

Here, our objective is to find the ratio estimates with stratified sampling for this particular dataset for variables of interest as serum cholesterol [mm/dl] and maximum heart rate achieved [Numeric value between 60 and 202].

Attribute Information

  1. Age: age of the patient [years]
  2. Sex: sex of the patient [M: Male, F: Female]
  3. ChestPainType: chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
  4. RestingBP: resting blood pressure [mm Hg]
  5. Cholesterol: serum cholesterol [mm/dl]
  6. FastingBS: fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
  7. RestingECG: resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
  8. MaxHR: maximum heart rate achieved [Numeric value between 60 and 202]
  9. ExerciseAngina: exercise-induced angina [Y: Yes, N: No]
  10. Oldpeak: oldpeak = ST [Numeric value measured in depression]
  11. ST_Slope: the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
  12. HeartDisease: output class [1: heart disease, 0: Normal]

Source

This dataset was created by combining different datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:

  • Cleveland: 303 observations
  • Hungarian: 294 observations
  • Switzerland: 123 observations
  • Long Beach VA: 200 observations
  • Stalog (Heart) Data Set: 270 observations

Total: 1190 observations
Duplicated: 272 observations

Final dataset: 918 observations

Every dataset used can be found under the Index of heart disease datasets from UCI Machine Learning Repository on the following link: https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/


                                  

                                  

                                    


                                  

                                     

                                     


                                      

                                       

                                        

                                         

Conclusion:

 In stratified random sampling, we developed various ratio-type estimators for calculating population mean utilizing information on auxiliary parameters. The outcomes of an application with original data satisfy these theoretical constraints as well. In reality, the estimator of choice is determined by the availability of the population parameters. Here we can use the information for predicting values in the future also.

 






















Comments

Popular posts from this blog

Population Proportion of Size Without Replacement Using DesRaj Estimator

PPSWOR AND HORVITZ THOMPSON ESTIMATOR

Probability Proportional to Size Sampling without replacement (PPSWOR) using Murthy’s unordered estimator