December 3, 2011

2.6 THE SAMPLE REGRESSION FUNCTION ( SRF) (Danamor N. Gujarati)

By confining our discussion so far to the population of Y values corresponding to the fixed X's, we have deliberately avoided sampling considerations (note that the data of  Table 2.1 represent the population , not a sample). But it is  about time to face up to the sampling problems, for in most practical situations what we have is but a sample of Y values corresponding to some fixed X's. Therefore.our task now is to estimate the PRF on the basis of the sample information.

As an illustration, pretend that population of Table2.4 was not know ti us and the only information we had was a randomly selected sample of Y values for the fixed X's as given in Table 2.4. Unlike Table 2.1, we now have only one Y value corresponding to the given X's; each ( given Xi ) in Table 2.4 is chosen randomly from similar Y's corresponding to the same X from the population of Table 2.1.

The Question is: From the sample of Table 2.4 can we predict the average weekly consumption expenditure Y in the population as a whole corresponding to the chosen X's?. In other words, can we estimate the PRF from the sample data? As the reader surely suspects, we may not be ale to estimate the PRF "accurately" because of sampling fluctuations. To see this suppose we draw another random sample from the population of Table 2.1. as presented in Table 2.5.

Plotting the data of Tables 2.4 and 2.5 we obtain the scattegram given in Figure 2.4. In the scattergram two sample regression lines are drawn so as to 'fit' the scatter reasonably well: SRF1 is based on the first sample, and SRF2 is based on the second sample.

Which of the two regression lines represent s the "true" population regression line? If we avoid the temptation of looking at figure2.1, which purportedly represents the PR, there is no way we can be absolutely  sure that either of the regression lines shown in Figure 2.4represent the true population regression line (or curve). The regression lines in Figure 2.4 are known as the sample regression lines.Supposedly the present the population regression line, but because of sampling fluctuations they are not best an approximation of the true PR.. In general, we would get N different SRFs for N different samples, and these SRFs are not likely to be the same.

Now, analogously to the PRF that underlies the population  regression line, we can develop the concept of the sample regression function (SRF) to represent the sample regression line .The sample counterpart of (2.2.2) may be written as
FIGURE 2.5 Sample and population regression lines

the PRF based on the SRF is at best an approximate one. This approximation is shown diagrammatically in Figure2.5

For X = Xi, we have one (sample) observation Y = Y i  . In terms of the SRF, the observed Y i can be expressed as.
Y i  = ^Y i + ^ui                              (2.6.3)
an in term of the PRF, it can be expressed as

Y i  = E(Y | Xi) + ui                         (2.6.4)
 
Now obviously in Figure 2.5 ^Y i overestimates the true E( Y | Xi ) for the Xi shown therein. Be the same token, for any Xi to the left of  the poin A,the SRF will underestimate the true PRF. But the reader can readily see that such over-and underestimateis inevitable because of sampling fluctuations.
 
The critical question now: Granted that the SRF is but an approximation on the PRF, can we devises a rule or a method that will make this approximation as "close"as Possible? in other words, how should the SRTF be constructed so ^B1 is as "close" as possible to the true B1 and ^B2 is as "close" as possible to the true B2 even through we will never know the true B1 and B2 ?

The answer to this question will occupy much of our attention in Chapter 3. We note  here that we can develop procedures that tell us now to construct the SRF to mirror the PRF as faithfully as possible. it is fascinating to consider that this can be done even though we never actually determine the PRF itself.









No comments:

Post a Comment