December 10, 2011

2.8 SUMMARY AND CONCLUSION ( Damodar N. Gujarati )

1. The key concept underlying regression analysis is the concept of the conditional expectation function (CEF), or population regression function (PRF). Our objective in regression analysis is to find out how the average value of the dependent variable ( or regressand) varies with the given value of the explanatory variable ( or regressor )
2. This book largely deals with linear PRFs, that is, regressions that are linear in they parameters. The may or may not be linear in the regressand or regressors.
3. For  empirical purposes, its the stochastic PRF that matters. The stochastic disturbance term ui plays a critical role in estimating the PRF.
4.The PRF is an idealized concept, since in practice one rarely has access to the entire population of interest. Usually, one has a sample of observation from the population .Therefore, one uses the stochastic sample regression function (SRF) to estimate the PRF. How this actually accomplished is discussed in Chapter3.

EXERCISES

Questions

2.1 What Is the conditional expectation function or they population regression function?
2.2 What is difference between the population and sample regression functions?.Is this a distinction without difference?
2.3 What  is the role of the stochastic error term ui in regresstion analysis? What in the difference between the stochastic error term and the residual ûi ?
2.4 Why do we need regression analysis ? Why not simply use the mean value of the regressand as its value?
2.5 What do we mean by a linear regression model?
2.6 Determine whether the the following models are linear in the parameters, or the variables, or both. Which of these models are linear regression model?
 2.8 What is meant by an intrinsically linear regression model? If β2 in exercise 2.7d were 0.8, would be a linear or nonlinear or non linear regression model?
*2.9 Consider the following nonstochastic model (i.e.,m models without the stochastic error term). Are they linear regression models? if no, is it posible, by suitable algebraic manipulation, to convert them into lineart modela?
 2.10 You are give the scattergram in Figure2.7 along with the regression line. What general conclusion do you draw from this diagram ? Is the regression line sketched in the diagram a population regression line or the sample regression line?


 FIGURE 2.8      Skill intensity of exports and human capital endowment. Data are for 126 industrial and developing countries in 1985. Values along the horizontal axis are logarithms of ratio of the country's average educational attainment to is land area: vertical axis value are logarithms of tario of manufacured to primary-products exports.

Source:world Bank,World Development Report 1995,P.59. Original sources: Export data from United Nation Statistical Office COMTRADE data base; education data from UNDP 1990;land data from the world bank. 

2.11.  From the scattergram given Figure 2.8, what general conclusions do you draw? What is the economic theory that underlies this scattergram? (Hint: Look up any international economics textbook and read up on the Heckscher-Ohlin model of trade) 

2.12. What does the scattergram Figure 2.9 reveal? On the basis of this diagram,Would you argue that minimum wage laws are good for economic well-being? 

2.13  Is the regression line shown in Figure I.3  Of the Introduction the PRF or SRF ? Why? How would you interpret the scatterpoint around the re  the regression line? Besides GDP, what other factors, or variables, might determine personal consumtion expenditure?
2.14 You are given the data  in Table 2.7 for the United States for years 1980-1996.
 
a. Plot the male civilian labor force participation rate against male civilian unemployment rate. Eyeball a regression line through the scatterpoints. A priori, waht is the expected relationship between the two and what is the underlying economic theory? Does the scattergram support the theory.

b.Repeat part a for females. 


c.Now Plot both the male and female labor participation rates against average hourly earnings ( in 1982 dollars). (you may use separate diagrams.) Now what do you find? And  how would you rationalize your finding?

d. Can you plot the labor force participation rate against the unemployment rate  and the average hourly earnings simultaneously? If not, how would you verbalize the relationship among the three variables?









2.7 AN ILLUSTRATIVE EXAMPLE (Damodar N. Gujarati)

We conclude this chapter with an example. Table 2.6 give data on the level of education ( measured by the number of years of schooling, the mean hourly wages earned by people at each level of education , and the number of the people at the stated level of education. Ernst Berndt originallly obtained the data presented in the table, and he derived these data from the current population survey conducted in May 1985.14 Wewil exsplore these data ( whith additional explanatory variables) in a later chapster.

Plotting the 9conditional) mean wage against education, we obtain the picture in Figure 2.6. Regression curve in the figure  shows how mean wages vary with the level of education; they generally increase  with the level of education, a finding one should not find surprising. We will study in a later chapter how variables besides education can also affect the mean wage.

December 3, 2011

2.6 THE SAMPLE REGRESSION FUNCTION ( SRF) (Danamor N. Gujarati)

By confining our discussion so far to the population of Y values corresponding to the fixed X's, we have deliberately avoided sampling considerations (note that the data of  Table 2.1 represent the population , not a sample). But it is  about time to face up to the sampling problems, for in most practical situations what we have is but a sample of Y values corresponding to some fixed X's. Therefore.our task now is to estimate the PRF on the basis of the sample information.

As an illustration, pretend that population of Table2.4 was not know ti us and the only information we had was a randomly selected sample of Y values for the fixed X's as given in Table 2.4. Unlike Table 2.1, we now have only one Y value corresponding to the given X's; each ( given Xi ) in Table 2.4 is chosen randomly from similar Y's corresponding to the same X from the population of Table 2.1.

The Question is: From the sample of Table 2.4 can we predict the average weekly consumption expenditure Y in the population as a whole corresponding to the chosen X's?. In other words, can we estimate the PRF from the sample data? As the reader surely suspects, we may not be ale to estimate the PRF "accurately" because of sampling fluctuations. To see this suppose we draw another random sample from the population of Table 2.1. as presented in Table 2.5.

Plotting the data of Tables 2.4 and 2.5 we obtain the scattegram given in Figure 2.4. In the scattergram two sample regression lines are drawn so as to 'fit' the scatter reasonably well: SRF1 is based on the first sample, and SRF2 is based on the second sample.

Which of the two regression lines represent s the "true" population regression line? If we avoid the temptation of looking at figure2.1, which purportedly represents the PR, there is no way we can be absolutely  sure that either of the regression lines shown in Figure 2.4represent the true population regression line (or curve). The regression lines in Figure 2.4 are known as the sample regression lines.Supposedly the present the population regression line, but because of sampling fluctuations they are not best an approximation of the true PR.. In general, we would get N different SRFs for N different samples, and these SRFs are not likely to be the same.

Now, analogously to the PRF that underlies the population  regression line, we can develop the concept of the sample regression function (SRF) to represent the sample regression line .The sample counterpart of (2.2.2) may be written as
FIGURE 2.5 Sample and population regression lines

the PRF based on the SRF is at best an approximate one. This approximation is shown diagrammatically in Figure2.5

For X = Xi, we have one (sample) observation Y = Y i  . In terms of the SRF, the observed Y i can be expressed as.
Y i  = ^Y i + ^ui                              (2.6.3)
an in term of the PRF, it can be expressed as

Y i  = E(Y | Xi) + ui                         (2.6.4)
 
Now obviously in Figure 2.5 ^Y i overestimates the true E( Y | Xi ) for the Xi shown therein. Be the same token, for any Xi to the left of  the poin A,the SRF will underestimate the true PRF. But the reader can readily see that such over-and underestimateis inevitable because of sampling fluctuations.
 
The critical question now: Granted that the SRF is but an approximation on the PRF, can we devises a rule or a method that will make this approximation as "close"as Possible? in other words, how should the SRTF be constructed so ^B1 is as "close" as possible to the true B1 and ^B2 is as "close" as possible to the true B2 even through we will never know the true B1 and B2 ?

The answer to this question will occupy much of our attention in Chapter 3. We note  here that we can develop procedures that tell us now to construct the SRF to mirror the PRF as faithfully as possible. it is fascinating to consider that this can be done even though we never actually determine the PRF itself.









December 1, 2011

2.5 THE SIGNIFICANCE OF THE STOCHASTIC ( Damodar N. Gujarati )



DISTURBANCE TERM
As noted in Section 2.4, the disturbance term ui is a surrogate for all those variable that are omitted from the model but that collectively affect Y. The obvious question  is: Why not introduce these variables into  the model explicitly? Stated otherwise, why not develop a multiple regression model whit as many variables as possible? The reason are many.

1. Vagueness of theory : The theory, if any, determining the behavior of Y may be, and often is, incomplete. We might know for certain that weekly income X influence weekly consumption expenditure Y, but we might be ignorant or unsure about the other variables affecting Y. Therefore, ui may be used as asubstitute for all the excluded or omitted variables from the model.

2. Unavailability of data: Even if we know what some of the excluded variables are and therefore consider a multiple regression rather than a simple regression, we may not have quantitative information about these variables. It is a common experiences in empirical analysis that the data we would ideally like to have often are not available. For Example, in principle we could introduce family wealth as an explanatory variable in addition to the income variable to explain family consumption expenditure. But unfortunately, information on family wealth generally is not available. therefore, we may be forced to omit the wealth variable from our model despite its great theoretical relevance in explaining consumption expenditure.

3. Core variables versus peripheral variables: Assume in our consumption income example  that besides income X1, the number of children per family X2, sex X3, religion X4, education X5, and geographical region X6, also affect consumption expenditure. But it is quite possible that the joint influence of all or some of these variables may be so small and best non systematic or random that as a practical matter and for cost considerations it does not pay to introduce them into the model explicitly. One hopes that their combined effect can be treated as a random variable ui .10

4. Intrinsic randomness in human behavior: Even if we succeed in introducing all the relevant variables in to the model, there is bound to be some "intrinsic" randomness in individual Y's that cannot be explained no matter how hard we try.  The disturbances, the u's may very well reflect this intrinsic randomness.

5. Poor proxy variables: Although the classical regression model ( to be developed in Chapter.3 ) assumes that the variables Y and X are measured accurately, in practice the data may be plagued by errors of measurement. Consider, for example Milton Friedman's well-known theory of the consumption function.11

__________________________________
10.A further difficulty is that variables such as sex, education , and religion are difficult to quantify.
11Milton Friedman, A Theory of the Consumption Function, Princeton University Press, Princeton, N.J., 1957
12"That descriptions be kept as simple as possible until proved inadequate"The World of mathematics, vol.2,J.R. Newman  ( ed.), Simon & Schuster, New York, 1956,p.1247,or"Entities should no be multiplied beyond necessity,"Donald F. Morrison, Applied linear Statistical Method, Prentice Hall, Englewood Cliffs,N.J.,1983, p.58.



November 27, 2011

2.4 STOCHASTIC SPECIFICATION OF PRF ( Damodar N. Gujarati )

It clear from figure 2.1 that , as family income increases, family consumption expenditure on the average increases,too. But What about the consumption expenditure of an individual family in relation to its (fixed) level of income? It is obvious from Table 2.1 an Figure 2.14 that an individual family's consumption expenditure does not necessarily increase as the income level increases. For example , from Table2.1 we observe that corresponding to  the income level of $100 there is one family whose consumption  expenditure of $65 is less than the consumption expenditure of two families whose weekly income is only $80. But notice that average consumption  expenditure of families with a weekly income of $100 is greater than the average consumption expenditure  of families with a weekly income of $80 ($77 versus $65).
What, then, can we say about the relationship between an individual family's consumption expenditure an a given level of income? We see from Figure 2.1 that, given the income level of X1, an individual family's consumption expenditure is clustered around the average consumption  of all families at that X1,that is, around its conditional expectation, Therefore.we can express the deviation of an individula Yi around its expected value as follow:
ui = Yi  - E ( Y \ Xi )
or 
                                                   Yi =  E( Y | Xi ) + ui                                                   
(2.4.1)


____________________
8.See App.A for a brief discussion of the properties of the expectation operator E. Note that E(Y|Xi), once the value of Xi is fixed, is a constant.
9.As a matter of fact, in the method of least squares to be developed in chap.3 it is assumed e3xplicitly that  E = (ui | Xi ) = 0
 See Sec.3.2.

2.3 THE MEANING OF THE TERM LINEAR ( Damodar N. Gujarati )

Since this text concerned primarily with linear model like (2.2.2), it is essential to know what the term linear really means, for it can be interpreted in two different way.


2.2 THIS CONCEPT OF POPULATION REGRRESION (Damodar N, Gujarati )

FUNCTION  ( PRF )
From the preceding discussion Figure.2.1 and 2.2, it is clear that each conditional  E ( Y | Xi ) is a  function Xi, where Xi is given value of value of X.
Symbolically,
E(Y|Xi) =f (Xi)                                                   (2.2.1 )

Where f (Xi) denotes some function of the explanatory variable X. In our example   E(Y|Xi) is  a linear function   Xi . Equation (2.2.1) is known as the conditional expectation function (CEF) or population regression function ( PRF ) or population regression  (PR) for short. It states merely that the expected value of the distribution   Y     given     Xi is functionally related to Xi. In simple term, it tells show the mean or average response   of   Y varies whit   X.
What from does the function   f (Xi)   assume?   This is an important question because in real situation we do not have the entire population available for examination. The function form of the  for examination. The functional from the PRF is therefore an empirical question, although in specific cases theory may have something to say.
For example, an economist might posit that consumption expenditure is linearly related to income. Therefore, as a first approximation or working hypothesis, we may assume that  PRF  E(Y|Xi)  is linear function  Xi say, of type    

2.1 A HYPOTHETICAL EXAMPLE 1 ( Damodar.N Gujarati )

As  noted in Section 1.2 , regression analysis is largely concerned with estimating and/or predicting the (population) mean value of the dependent variable on the basis of the known or fixed values of the explanatory variable(s).2 To understand this, consider the data given in table 2.1. the data


In the table refer to a total population of 60 families in a hypothetical community and their weekly income ( X ) and weekly consumption expenditure  ( Y ), both in dollars. The 60 families are divided in to 10 income groups (from $80 to $260 ) and the weekly expenditures of each family in the various groups are as shown in the table. Therefore, we have 10 fixed value of X and the corresponding Y values against each of the X  values; so to speak, there are 10 Y subpopulations.
there is considerable variation in weekly consumption expenditure in each income group, which can be seen clearly from Figure 2.1. But the general picture that one gets in that,despite the variability of the weekly 

consumption expenditure within  each income bracket, on the average, weekly consumption expenditure increases as income increases. To see this  clearly, Table 2.1 we have given the mean, or average. Weekly consumption expenditure corresponding to each of the 10 levels of income . Thus ,corresponding two the weekly income level $80, the mean consumption expenditure is $65, while corresponding to the income level of  $200, it is $137. In all we have 10 mean values for the subpopulations of Y. We call these mean  values conditional expected values, as they depend on the given values  of the (conditioning) variable X. Symbolically we denote them as E (Y | X ), which is read as the  expected value given Y the value of X, ( see also table 2.2 )
It is important to distinguish these conditional expected values from the unconditional expected value of weekly consumption expenditure. E(Y). If we add the weekly  consumptions expenditures for all the 60 families in the population and divide this number by 60, we get the number $121.20 ($7272/60), which is the unconditional mean, or expected , value of weekly consumption expenditure , E(Y); it is unconditional in the sense that in arriving at this number  we have disregarded the income levels of the various families.3 Obviously, the various conditional expected values  of Y given in  table 2.1are different from unconditional expected of Y of $121.20. When we ask the question, "What  is the expected value of weekly consumption expenditure family," we get the answer $121.20 ( the unconditional mean). But if we ask if the expected  value  of weekly consumption expenditure of family whose monthly income is, say $140, "we get the answer $101 ( the conditional mean).

To put it differently, if we ask the question, What is the best ( mean) prediction of weekly expenditure  of families with a weekly income of $140," the answer would be $101. Thus the knowledge of the income level may enable us to better predict the mean value of consumption expenditure  than if we do not  have that knowledge4. This probably is the essence of regression analysis , as we shall discover throughout this text.
The dark circled point in Figure 2.1 show the conditional mean values of Y against the various X values. If we join these conditional mean values , we obtain that is known as the population regression  line ( PRL , or more generally , the population regression curve.5  More simply, it is the regression  of Y on X. The adjective "population"comes from the fact that we are dealing in this example with  the entire population of 60 families ,. Of course in reality a population may have many families.
Geometrically, the a population regression curve is simply the locus of the conditional means of the dependent variable for the values of the explanatory  variable (s).More simply , it is the curve connecting means of the subpopulations of Y  corresponding to the given values of the regressor X. It can be depicted as in Figure.2.2

This figure show that each X (i.e.income level) there is a population  of Y values (weekly consumption expenditures) that are spread around the (conditional) mean of those Y values. And the regression line ( or curve) passes through these (conditional) mean values.
With this background , the reader my find in instructive to reread the definition of regression given in Section.1.2
________________________

1.The reader whose statistical knowledge has become somewhat rusty may want to freshen it up by reading the statistical appendix, App.A,before reading this chapter.
2The expected value ,or expectation ,or population mean of a random variable Y is denoted the symbol E ( Y ). On the other hand., the mean value computed from a sample of values from the Y population is denoted as ^Y, read as Y bar.
3As shown in App. A, in general the conditional and unconditional mean values are different.
4.I am indebted to James Davidson on this perspective. See Davidson, Econometric Theory, Blackwell Publishers, Oxford.UK.,200, p.11.
5.In the present example the PRL is a straight line, but it could be a curve (see Figure.2.3 ).

2. TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS ( Damodar N. Gujarati )

In Chapter 1 we discussed  the concept of regression in broad terms. In this chapter we approach the subject somewhat formally. Specifically, this and the following two chapter introduce the reader to the theory underlying the simplest possible regression analysis ,namely, the bivariate, or two variable, regression in which the dependent variable ( the regressand ) is related to a single explanatory variable( the regressor). This case is considered first, not because of its practical adequacy, but because it present the fundamental ideas of regression analysis as simply as possible and some of these ideas can be illustrated with the aid of two-dimensional graphs. Moreover, as we shall see, the more general multiple regression analysis in which the regressand in related to one more regressoprs is in many ways a logical extension of the two-variable case.

November 24, 2011

1.8 SUMMARY AND CONCLUSIONS (Damodar N. Gujarati )

1. The Key idea behind regression analysis is the statistical dependence of one variable, the dependent variable , on one or more other variables, the explanatory variables

2. The objective  of such analysis is estimate and/or predict the mean or average value of the dependent variable on the basis  of the known or fixed values of the explanatory variables.

3 In practice the success of regression analysis depends on the availability of the appropriate data. This chapter discussed the nature , sources and limitations of the data  that  are generally available for research, especially in the social sciences
4. In any research, the researcher should clearly state the sources of the data used in the analysis. their definitions of collection, and any gaps or omissions in the data as well as any revisions in the data. Keep in mind that the macroeconomic data published by government are often revised.

5. Since the reader may not have the time, energy, or resources to tack down the data, the reader has the right to presume that the data used by the researcher are properly gathered and that the computations and analysis are correct.

EXERCISES

1.1 Table 1.2 give data on the Consumer  Prive Index (CPI) for seven industrialized countries with 1982-1984 =100 as the base of the index.
a. From the given data , compute the inflation rate each country.16
b. Plot the inflation rate for each country against time(i.e., use the horizontal axis for time and the vertical axis for the inflation rate)
c. What broad conclusions can you draw about the inflation experience in the seven countries?
d. Which Country 's inflation rate seems to be mos variable ? can you ofter explanation?

TABLE 1.2

1.2. a. Plot the inflation rate Canada, France, Germany, Italy, Japan United Kingdom against the Unite States inflatin rate.
     b. Comment generally  about the behavior of the inflation rate  in the  six countries vis-a-vis the U.S. inflation rate.
     c. If you find that the six countries ' inflation rate move in the same direction as the U.S. inflantion rate,  would that suggest that U.s. inflation 'causes" inflation in the other countroes? Why or Why not ?
1.3. Table  1.3 gives the foreign exchange rates for seven industrialized countries for years 1977-1998. Exept for the United Kingdom, the exchange rate is defined as the units of foreign currency for one  U.S. dollar; for United Kingdom, it is defined as the number of U.S. dollar for one U.K. pound.
       a. Plot these exchange  rates against time and comment on the general behavior of exchange rates over     the given time period. 
    b.The dollar is said to appreciate if it can buy more unit of foreign currency. Contrarily, it is saidto depreciate if buy fewer units  of a foreign currency. Over the time period 1977-1998,what  has been the general behavior  of the U.S. dollar ?. Incidentally, look up any texbook on macroeconomic or international economics to find out what factors determine the  appreciation or depreciation of a currency.
1.4The data behind the M1 money supply in figure 1.5 are given in table 1.4 Can you give reasons why the money supply has been increasing over the time period shown in table?

TABLE 1.3


TABLE1.4


1.5. Suppose you were to develop an economic model activities.say, the hours spent in criminal activities (e.g.selling illegal drugs). What variable would you consider in developing such a model? see if your model matches the one developed by the Nobel laureate economist Gary Becker.17
1.6. Controlled experiments economics: on April 17,2000, President Clinton signed into law a bill passed by both Houses of the U.S. Congress that  lifted earnings limitations on Social Security recipients. Until then, recipients between the ages of 65 and 69 who earned more than $17 ,000 a year would lose 1 dollar's with of  Social Security benefit for every 3 dollars of income earned in excess of $17 ,000. How Would you devise a study to assess the impact of this change in the law ? Note: there was no income limitation fore recipients over the age of 70 under the old law.

TABLE 1.5

1.7. The data presented in table 1.5 was published in the march 1, 1984 issue of the Wall Street Journal. It relates to the advertising budget ( in millions of dollar's) of 21 firm for 1983 and millions of impression retained per week by the viewers of the products of these firm. The data are based on a survey of 4000 adults in which users of the product category in the pas week.
        a. Plot impressions on the vertical axis and advertising expenditure on the horizontal axis.
        b. What can you say about the nature of the relationship between the two variables?
        c  Looking at your graph, do you think it pays to advertise? Think about all those commercial shown on Super Bowl Sunday or during the world series.

Note: We will explore further the data given in table 1.5 in subsequent chapters.