Regression Estimation
Regression Method of Estimation
Khushi Dhasmana
2148136
One
of the noteworthy objectives of statistical estimation is to determine and
obtain estimators of parameters of interest with precision. Incorporation of
more information which is valid and proper yields to better estimators. There
are several estimation methods which include ratio and regression method of
estimation. The ratio estimation method uses auxiliary information that
correlates with the variable of interest to improve accuracy and precision.
This makes the Y on X regression linear and improves the estimator as it passes
through the origin. If the regression turns out to be linear, the line does not
necessarily have to pass through the origin. In these situations, it is better
to use regression estimator to estimate the population parameters.
In simpler words, when the relationship
between the ‘x’ and ‘y’ variable is approximately linear and the line doesn’t
pass through the origin, it is suggested to use an estimate based on the linear
regression of ‘y’ on ‘x’ rather than the ratio of the two variables. Thus we
can say that regression is an estimation method used to determine the
characteristics of a dependent variable (usually ‘y’) because of the
independent variable or variables (usually ‘x’) taken into consideration.
Considering an estimator (x̄-X̄) whose expectation
is 0 {E(x̄-X̄)=0}
and assuming an improved estimator, (Ȳ-hat) of Ȳ which is unbiased and 'μ' is any constant;
which is given by;
where 'μ' is a constant such that the variance of the estimator (Ȳ-hat) is minimum which is then calculated through-
Consider
a simple linear regression model:
where
'y' is the dependent variable, 'x' is the independent variable and 'e' is the error which arises due to the lack of exact relationship between x and y. The value of 'β' is obtained by minimizing the summation of square of the
error terms. Thus, 'β' can also be written as-
Thus the optimum value of 'μ' is equal to the negative of the regression coefficient 'β', i.e.
and therefore the regression equation can we
written as;
where ρ2 (x̄ , ȳ)
is the correlation coefficient between x̄ and ȳ.
Hence we can say that if these two are highly correlated then the variance
would be very less making the estimator extremely efficient.
Regression estimators can be found
either with a pre-assigned ‘β’ or when the ‘β’ is computed from the sample.
Usually ‘β’ is estimated from the results of the sample but can be taken as a
constant beforehand as well.
1) When 'β' is pre-assigned
Thus we can conclude that the estimator is an unbiased estimator when β is known. Similarly the variance of the estimator can be calculated as given below-
The estimator of the variance is given by-
We can also conclude that the variance of the (Ȳ-hat) in regression estimation
increases as the difference between β0 and βopt increases
where the minimum value of variance of (Ȳ-hat) of regression with optimum value of
β is
When we compare the variance of ȳ and the variance of
the estimator (Ȳ-hat), we see that Var(ȳ) is greater than the variance of the estimator (Ȳ-hat) which is given below:
Since 'ρ' is greater than equal to -1 and less than equal to 1, we can conclude that variance of (Ȳ-hat) in linear regression is less than or equal to the variance in SRS. Thus we can conclude that the regression estimate is always better than the simple mean under SRSWOR.
2) When 'β' is estimated from the sample
When the 'β' is unknown, it is estimated through a
sample drawn by SRSWOR where the regression line is given by;
Since
it is difficult to find the exact expression of the expectation of (Ȳ) in regression and variance of (Ȳ-hat) in regression and therefore we approximate it using a
different methodology given below-
and;
The
expectation of the (Ȳ-hat) regression estimate can we written as-
The R-code for Regression estimation is-
An example of the comparison of the ratio and
regression estimators is given below done in R-programming.
The objective of the practical was to estimate the average real state farm
loans using regression estimate assuming the average non-real estate farm loans
in the country which was known to be 878.16. The aim is to calculate the
correlation between the two variables and 95% confidence interval for the same.
In addition to this, comparing the estimate of regression and ratio and
determining which is more efficient.
Data and Descriptive Statistics: In this step we install the required packages and their libraries so that we can put down the code for the same. It also includes importing the dataset which was given to us in class. The minimum values of the variables x and y are 0.471 and 6.044 respectively. The maximum values for the same are 3928.732 and 1756.169 respectively. The mean of x and y are 899.358 and 602.551 respectively. The median for the two variables is 431.439 and 408.978 respectively. Since mean is greater than the median for both the variables we can infer that the distribution will be negatively skewed.



























Comments
Post a Comment