Regression Method of Estimation

Khushi Dhasmana

2148136

One of the noteworthy objectives of statistical estimation is to determine and obtain estimators of parameters of interest with precision. Incorporation of more information which is valid and proper yields to better estimators. There are several estimation methods which include ratio and regression method of estimation. The ratio estimation method uses auxiliary information that correlates with the variable of interest to improve accuracy and precision. This makes the Y on X regression linear and improves the estimator as it passes through the origin. If the regression turns out to be linear, the line does not necessarily have to pass through the origin. In these situations, it is better to use regression estimator to estimate the population parameters.

In simpler words, when the relationship between the ‘x’ and ‘y’ variable is approximately linear and the line doesn’t pass through the origin, it is suggested to use an estimate based on the linear regression of ‘y’ on ‘x’ rather than the ratio of the two variables. Thus we can say that regression is an estimation method used to determine the characteristics of a dependent variable (usually ‘y’) because of the independent variable or variables (usually ‘x’) taken into consideration.

Considering an estimator (x̄-X̄) whose expectation is 0 {E(x̄-X̄)=0} and assuming an improved estimator, (Ȳ-hat) of Ȳ which is unbiased and 'μ' is any constant; which is given by;

where 'μ' is a constant such that the variance of the estimator (Ȳ-hat) is minimum which is then calculated through-

Consider a simple linear regression model:

y = xβ + e

where 'y' is the dependent variable, 'x' is the independent variable and 'e' is the error which arises due to the lack of exact relationship between x and y. The value of 'β' is obtained by minimizing the summation of square of the error terms. Thus, 'β' can also be written as-

Thus the optimum value of 'μ' is equal to the negative of the regression coefficient 'β', i.e.

μ = -β

and therefore the regression equation can we written as;

which is the regression estimator (Ȳ-hat) of Ȳ with variance equal to;

where ρ²(x̄ , ȳ) is the correlation coefficient between x̄ and ȳ. Hence we can say that if these two are highly correlated then the variance would be very less making the estimator extremely efficient.

Regression estimators can be found either with a pre-assigned ‘β’ or when the ‘β’ is computed from the sample. Usually ‘β’ is estimated from the results of the sample but can be taken as a constant beforehand as well.

1) When 'β' is pre-assigned

Assuming, β = β₀, the biasness of the regression estimator is seen when the sample is drawn by SRSWOR.

Thus we can conclude that the estimator is an unbiased estimator when β is known. Similarly the variance of the estimator can be calculated as given below-

where;

The estimator of the variance is given by-

We can also conclude that the variance of the (Ȳ-hat) in regression estimation increases as the difference between β₀and β_optincreases where the minimum value of variance of (Ȳ-hat) of regression with optimum value of β is

When we compare the variance of ȳ and the variance of the estimator (Ȳ-hat), we see that Var(ȳ) is greater than the variance of the estimator (Ȳ-hat) which is given below:

where to choose an optimal value of 'β ', we need to minimize the variance of (Ȳ-hat) which can be done by differentiating to the first order derivative and equating it to zero. Mathematically,

and the minimum variance is-

Since 'ρ' is greater than equal to -1 and less than equal to 1, we can conclude that variance of (Ȳ-hat) in linear regression is less than or equal to the variance in SRS. Thus we can conclude that the regression estimate is always better than the simple mean under SRSWOR.

2) When 'β' is estimated from the sample

When the 'β' is unknown, it is estimated through a sample drawn by SRSWOR where the regression line is given by;

where;

Since it is difficult to find the exact expression of the expectation of (Ȳ) in regression and variance of (Ȳ-hat) in regression and therefore we approximate it using a different methodology given below-

and;

The expectation of the (Ȳ-hat) regression estimate can we written as-

The R-code for Regression estimation is-

An example of the comparison of the ratio and regression estimators is given below done in R-programming.

The objective of the practical was to estimate the average real state farm loans using regression estimate assuming the average non-real estate farm loans in the country which was known to be 878.16. The aim is to calculate the correlation between the two variables and 95% confidence interval for the same. In addition to this, comparing the estimate of regression and ratio and determining which is more efficient.

Data and Descriptive Statistics: In this step we install the required packages and their libraries so that we can put down the code for the same. It also includes importing the dataset which was given to us in class. The minimum values of the variables x and y are 0.471 and 6.044 respectively. The maximum values for the same are 3928.732 and 1756.169 respectively. The mean of x and y are 899.358 and 602.551 respectively. The median for the two variables is 431.439 and 408.978 respectively. Since mean is greater than the median for both the variables we can infer that the distribution will be negatively skewed.