November 27, 2011

2.1 A HYPOTHETICAL EXAMPLE 1 ( Damodar.N Gujarati )

As  noted in Section 1.2 , regression analysis is largely concerned with estimating and/or predicting the (population) mean value of the dependent variable on the basis of the known or fixed values of the explanatory variable(s).2 To understand this, consider the data given in table 2.1. the data


In the table refer to a total population of 60 families in a hypothetical community and their weekly income ( X ) and weekly consumption expenditure  ( Y ), both in dollars. The 60 families are divided in to 10 income groups (from $80 to $260 ) and the weekly expenditures of each family in the various groups are as shown in the table. Therefore, we have 10 fixed value of X and the corresponding Y values against each of the X  values; so to speak, there are 10 Y subpopulations.
there is considerable variation in weekly consumption expenditure in each income group, which can be seen clearly from Figure 2.1. But the general picture that one gets in that,despite the variability of the weekly 

consumption expenditure within  each income bracket, on the average, weekly consumption expenditure increases as income increases. To see this  clearly, Table 2.1 we have given the mean, or average. Weekly consumption expenditure corresponding to each of the 10 levels of income . Thus ,corresponding two the weekly income level $80, the mean consumption expenditure is $65, while corresponding to the income level of  $200, it is $137. In all we have 10 mean values for the subpopulations of Y. We call these mean  values conditional expected values, as they depend on the given values  of the (conditioning) variable X. Symbolically we denote them as E (Y | X ), which is read as the  expected value given Y the value of X, ( see also table 2.2 )
It is important to distinguish these conditional expected values from the unconditional expected value of weekly consumption expenditure. E(Y). If we add the weekly  consumptions expenditures for all the 60 families in the population and divide this number by 60, we get the number $121.20 ($7272/60), which is the unconditional mean, or expected , value of weekly consumption expenditure , E(Y); it is unconditional in the sense that in arriving at this number  we have disregarded the income levels of the various families.3 Obviously, the various conditional expected values  of Y given in  table 2.1are different from unconditional expected of Y of $121.20. When we ask the question, "What  is the expected value of weekly consumption expenditure family," we get the answer $121.20 ( the unconditional mean). But if we ask if the expected  value  of weekly consumption expenditure of family whose monthly income is, say $140, "we get the answer $101 ( the conditional mean).

To put it differently, if we ask the question, What is the best ( mean) prediction of weekly expenditure  of families with a weekly income of $140," the answer would be $101. Thus the knowledge of the income level may enable us to better predict the mean value of consumption expenditure  than if we do not  have that knowledge4. This probably is the essence of regression analysis , as we shall discover throughout this text.
The dark circled point in Figure 2.1 show the conditional mean values of Y against the various X values. If we join these conditional mean values , we obtain that is known as the population regression  line ( PRL , or more generally , the population regression curve.5  More simply, it is the regression  of Y on X. The adjective "population"comes from the fact that we are dealing in this example with  the entire population of 60 families ,. Of course in reality a population may have many families.
Geometrically, the a population regression curve is simply the locus of the conditional means of the dependent variable for the values of the explanatory  variable (s).More simply , it is the curve connecting means of the subpopulations of Y  corresponding to the given values of the regressor X. It can be depicted as in Figure.2.2

This figure show that each X (i.e.income level) there is a population  of Y values (weekly consumption expenditures) that are spread around the (conditional) mean of those Y values. And the regression line ( or curve) passes through these (conditional) mean values.
With this background , the reader my find in instructive to reread the definition of regression given in Section.1.2
________________________

1.The reader whose statistical knowledge has become somewhat rusty may want to freshen it up by reading the statistical appendix, App.A,before reading this chapter.
2The expected value ,or expectation ,or population mean of a random variable Y is denoted the symbol E ( Y ). On the other hand., the mean value computed from a sample of values from the Y population is denoted as ^Y, read as Y bar.
3As shown in App. A, in general the conditional and unconditional mean values are different.
4.I am indebted to James Davidson on this perspective. See Davidson, Econometric Theory, Blackwell Publishers, Oxford.UK.,200, p.11.
5.In the present example the PRL is a straight line, but it could be a curve (see Figure.2.3 ).

No comments:

Post a Comment