November 27, 2011

2.4 STOCHASTIC SPECIFICATION OF PRF ( Damodar N. Gujarati )

It clear from figure 2.1 that , as family income increases, family consumption expenditure on the average increases,too. But What about the consumption expenditure of an individual family in relation to its (fixed) level of income? It is obvious from Table 2.1 an Figure 2.14 that an individual family's consumption expenditure does not necessarily increase as the income level increases. For example , from Table2.1 we observe that corresponding to  the income level of $100 there is one family whose consumption  expenditure of $65 is less than the consumption expenditure of two families whose weekly income is only $80. But notice that average consumption  expenditure of families with a weekly income of $100 is greater than the average consumption expenditure  of families with a weekly income of $80 ($77 versus $65).
What, then, can we say about the relationship between an individual family's consumption expenditure an a given level of income? We see from Figure 2.1 that, given the income level of X1, an individual family's consumption expenditure is clustered around the average consumption  of all families at that X1,that is, around its conditional expectation, Therefore.we can express the deviation of an individula Yi around its expected value as follow:
ui = Yi  - E ( Y \ Xi )
or 
                                                   Yi =  E( Y | Xi ) + ui                                                   
(2.4.1)


____________________
8.See App.A for a brief discussion of the properties of the expectation operator E. Note that E(Y|Xi), once the value of Xi is fixed, is a constant.
9.As a matter of fact, in the method of least squares to be developed in chap.3 it is assumed e3xplicitly that  E = (ui | Xi ) = 0
 See Sec.3.2.

2.3 THE MEANING OF THE TERM LINEAR ( Damodar N. Gujarati )

Since this text concerned primarily with linear model like (2.2.2), it is essential to know what the term linear really means, for it can be interpreted in two different way.


2.2 THIS CONCEPT OF POPULATION REGRRESION (Damodar N, Gujarati )

FUNCTION  ( PRF )
From the preceding discussion Figure.2.1 and 2.2, it is clear that each conditional  E ( Y | Xi ) is a  function Xi, where Xi is given value of value of X.
Symbolically,
E(Y|Xi) =f (Xi)                                                   (2.2.1 )

Where f (Xi) denotes some function of the explanatory variable X. In our example   E(Y|Xi) is  a linear function   Xi . Equation (2.2.1) is known as the conditional expectation function (CEF) or population regression function ( PRF ) or population regression  (PR) for short. It states merely that the expected value of the distribution   Y     given     Xi is functionally related to Xi. In simple term, it tells show the mean or average response   of   Y varies whit   X.
What from does the function   f (Xi)   assume?   This is an important question because in real situation we do not have the entire population available for examination. The function form of the  for examination. The functional from the PRF is therefore an empirical question, although in specific cases theory may have something to say.
For example, an economist might posit that consumption expenditure is linearly related to income. Therefore, as a first approximation or working hypothesis, we may assume that  PRF  E(Y|Xi)  is linear function  Xi say, of type    

2.1 A HYPOTHETICAL EXAMPLE 1 ( Damodar.N Gujarati )

As  noted in Section 1.2 , regression analysis is largely concerned with estimating and/or predicting the (population) mean value of the dependent variable on the basis of the known or fixed values of the explanatory variable(s).2 To understand this, consider the data given in table 2.1. the data


In the table refer to a total population of 60 families in a hypothetical community and their weekly income ( X ) and weekly consumption expenditure  ( Y ), both in dollars. The 60 families are divided in to 10 income groups (from $80 to $260 ) and the weekly expenditures of each family in the various groups are as shown in the table. Therefore, we have 10 fixed value of X and the corresponding Y values against each of the X  values; so to speak, there are 10 Y subpopulations.
there is considerable variation in weekly consumption expenditure in each income group, which can be seen clearly from Figure 2.1. But the general picture that one gets in that,despite the variability of the weekly 

consumption expenditure within  each income bracket, on the average, weekly consumption expenditure increases as income increases. To see this  clearly, Table 2.1 we have given the mean, or average. Weekly consumption expenditure corresponding to each of the 10 levels of income . Thus ,corresponding two the weekly income level $80, the mean consumption expenditure is $65, while corresponding to the income level of  $200, it is $137. In all we have 10 mean values for the subpopulations of Y. We call these mean  values conditional expected values, as they depend on the given values  of the (conditioning) variable X. Symbolically we denote them as E (Y | X ), which is read as the  expected value given Y the value of X, ( see also table 2.2 )
It is important to distinguish these conditional expected values from the unconditional expected value of weekly consumption expenditure. E(Y). If we add the weekly  consumptions expenditures for all the 60 families in the population and divide this number by 60, we get the number $121.20 ($7272/60), which is the unconditional mean, or expected , value of weekly consumption expenditure , E(Y); it is unconditional in the sense that in arriving at this number  we have disregarded the income levels of the various families.3 Obviously, the various conditional expected values  of Y given in  table 2.1are different from unconditional expected of Y of $121.20. When we ask the question, "What  is the expected value of weekly consumption expenditure family," we get the answer $121.20 ( the unconditional mean). But if we ask if the expected  value  of weekly consumption expenditure of family whose monthly income is, say $140, "we get the answer $101 ( the conditional mean).

To put it differently, if we ask the question, What is the best ( mean) prediction of weekly expenditure  of families with a weekly income of $140," the answer would be $101. Thus the knowledge of the income level may enable us to better predict the mean value of consumption expenditure  than if we do not  have that knowledge4. This probably is the essence of regression analysis , as we shall discover throughout this text.
The dark circled point in Figure 2.1 show the conditional mean values of Y against the various X values. If we join these conditional mean values , we obtain that is known as the population regression  line ( PRL , or more generally , the population regression curve.5  More simply, it is the regression  of Y on X. The adjective "population"comes from the fact that we are dealing in this example with  the entire population of 60 families ,. Of course in reality a population may have many families.
Geometrically, the a population regression curve is simply the locus of the conditional means of the dependent variable for the values of the explanatory  variable (s).More simply , it is the curve connecting means of the subpopulations of Y  corresponding to the given values of the regressor X. It can be depicted as in Figure.2.2

This figure show that each X (i.e.income level) there is a population  of Y values (weekly consumption expenditures) that are spread around the (conditional) mean of those Y values. And the regression line ( or curve) passes through these (conditional) mean values.
With this background , the reader my find in instructive to reread the definition of regression given in Section.1.2
________________________

1.The reader whose statistical knowledge has become somewhat rusty may want to freshen it up by reading the statistical appendix, App.A,before reading this chapter.
2The expected value ,or expectation ,or population mean of a random variable Y is denoted the symbol E ( Y ). On the other hand., the mean value computed from a sample of values from the Y population is denoted as ^Y, read as Y bar.
3As shown in App. A, in general the conditional and unconditional mean values are different.
4.I am indebted to James Davidson on this perspective. See Davidson, Econometric Theory, Blackwell Publishers, Oxford.UK.,200, p.11.
5.In the present example the PRL is a straight line, but it could be a curve (see Figure.2.3 ).

2. TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS ( Damodar N. Gujarati )

In Chapter 1 we discussed  the concept of regression in broad terms. In this chapter we approach the subject somewhat formally. Specifically, this and the following two chapter introduce the reader to the theory underlying the simplest possible regression analysis ,namely, the bivariate, or two variable, regression in which the dependent variable ( the regressand ) is related to a single explanatory variable( the regressor). This case is considered first, not because of its practical adequacy, but because it present the fundamental ideas of regression analysis as simply as possible and some of these ideas can be illustrated with the aid of two-dimensional graphs. Moreover, as we shall see, the more general multiple regression analysis in which the regressand in related to one more regressoprs is in many ways a logical extension of the two-variable case.

November 24, 2011

1.8 SUMMARY AND CONCLUSIONS (Damodar N. Gujarati )

1. The Key idea behind regression analysis is the statistical dependence of one variable, the dependent variable , on one or more other variables, the explanatory variables

2. The objective  of such analysis is estimate and/or predict the mean or average value of the dependent variable on the basis  of the known or fixed values of the explanatory variables.

3 In practice the success of regression analysis depends on the availability of the appropriate data. This chapter discussed the nature , sources and limitations of the data  that  are generally available for research, especially in the social sciences
4. In any research, the researcher should clearly state the sources of the data used in the analysis. their definitions of collection, and any gaps or omissions in the data as well as any revisions in the data. Keep in mind that the macroeconomic data published by government are often revised.

5. Since the reader may not have the time, energy, or resources to tack down the data, the reader has the right to presume that the data used by the researcher are properly gathered and that the computations and analysis are correct.

EXERCISES

1.1 Table 1.2 give data on the Consumer  Prive Index (CPI) for seven industrialized countries with 1982-1984 =100 as the base of the index.
a. From the given data , compute the inflation rate each country.16
b. Plot the inflation rate for each country against time(i.e., use the horizontal axis for time and the vertical axis for the inflation rate)
c. What broad conclusions can you draw about the inflation experience in the seven countries?
d. Which Country 's inflation rate seems to be mos variable ? can you ofter explanation?

TABLE 1.2

1.2. a. Plot the inflation rate Canada, France, Germany, Italy, Japan United Kingdom against the Unite States inflatin rate.
     b. Comment generally  about the behavior of the inflation rate  in the  six countries vis-a-vis the U.S. inflation rate.
     c. If you find that the six countries ' inflation rate move in the same direction as the U.S. inflantion rate,  would that suggest that U.s. inflation 'causes" inflation in the other countroes? Why or Why not ?
1.3. Table  1.3 gives the foreign exchange rates for seven industrialized countries for years 1977-1998. Exept for the United Kingdom, the exchange rate is defined as the units of foreign currency for one  U.S. dollar; for United Kingdom, it is defined as the number of U.S. dollar for one U.K. pound.
       a. Plot these exchange  rates against time and comment on the general behavior of exchange rates over     the given time period. 
    b.The dollar is said to appreciate if it can buy more unit of foreign currency. Contrarily, it is saidto depreciate if buy fewer units  of a foreign currency. Over the time period 1977-1998,what  has been the general behavior  of the U.S. dollar ?. Incidentally, look up any texbook on macroeconomic or international economics to find out what factors determine the  appreciation or depreciation of a currency.
1.4The data behind the M1 money supply in figure 1.5 are given in table 1.4 Can you give reasons why the money supply has been increasing over the time period shown in table?

TABLE 1.3


TABLE1.4


1.5. Suppose you were to develop an economic model activities.say, the hours spent in criminal activities (e.g.selling illegal drugs). What variable would you consider in developing such a model? see if your model matches the one developed by the Nobel laureate economist Gary Becker.17
1.6. Controlled experiments economics: on April 17,2000, President Clinton signed into law a bill passed by both Houses of the U.S. Congress that  lifted earnings limitations on Social Security recipients. Until then, recipients between the ages of 65 and 69 who earned more than $17 ,000 a year would lose 1 dollar's with of  Social Security benefit for every 3 dollars of income earned in excess of $17 ,000. How Would you devise a study to assess the impact of this change in the law ? Note: there was no income limitation fore recipients over the age of 70 under the old law.

TABLE 1.5

1.7. The data presented in table 1.5 was published in the march 1, 1984 issue of the Wall Street Journal. It relates to the advertising budget ( in millions of dollar's) of 21 firm for 1983 and millions of impression retained per week by the viewers of the products of these firm. The data are based on a survey of 4000 adults in which users of the product category in the pas week.
        a. Plot impressions on the vertical axis and advertising expenditure on the horizontal axis.
        b. What can you say about the nature of the relationship between the two variables?
        c  Looking at your graph, do you think it pays to advertise? Think about all those commercial shown on Super Bowl Sunday or during the world series.

Note: We will explore further the data given in table 1.5 in subsequent chapters.

November 22, 2011

1.7 THE NATURE AND SOURSCES OF DATA ( Damodar N. Gujarati )

FOR ECONOMIC ANALYSIS 10

The success of any econometric analysis ultimately depends on the availability of the appropriate data. It is therefore essential that we spend some time discussing the nature, sources,and limitations of the data that one may encounter in empirical analysis.

Types Of  Data 
There types of data may be available for empirical analysis; time series, cross-section, and pooled ( i.e.,combination of time series and cross-section) data.

Time Series Data   The data shown in table 1.1 of the Introduction are an example of time series data. A time series is a set of observation on the values that a variable take at different times. Such data may be collected at regular time intervals, such as daily (e.g.,stock prices, weather report ), weekly (e.g., money supply figures),monthly [e.g.,the unemployment rate ,the Consumer Price Index (CPI) ], quarterly (e.g.,GPD), annually (e.g., government budgets), quinquennially, that is , every 5 years (e.g., the census of manufactures ), or decennially (e.g.,the consensus of population).
Sometime data are available both quarterly as well as annually, as in the case of the data on GPD and consumer expenditure.With the advent of high-speed computer, data can now be collected over an extremely short interval  of time, such as the data on stock prices, which can be obtained literally continuously ( ( the so-called real-time quote )
Although time series data are used heavily in econometric studies, they present special problems for econometricians. As we will show in chapters on time series econometrics later on , most empirical work based  on time series data assumes that the underlying time series is stationary. Although it is too early  to introduce the precise technical meaning of stationarity at this juncture,loosely speaking a time series is stationary if it mean and variance do not vary systematically over time. To see what this means,consider Figure 1.5, which depicts the behavior of the M money supply in the United States From 1 1959, to July 31,99. ( the actual data are given in exercise 1.4.) As you can see from this figure , the M1 money supply shows a steady upward trend as well as variability over the year , suggesting that M1 time series is not stationary.11 We will explore this topic fully in chapter 21.

FIGURE 1.5. M1 money supply. United States, 1951:01-1999:09
________________
10For an informative account,see Michael D. Intriligator, Econometric Models, Techniques, and application,Prentice Hall, Englewood Cliffs,N.J.,1978,chap. 3.
11To see this more clearly, we divided the data into four time periods:1951:01 to 1962:12;1963:01 to 1975:12;1975:01 to 1986:12, and 1987:01 to 1999:09: For these subperiods the mean values of the money supply (with corresponding standard deviation in parentheses ) were, respectively,165.88 (23.27).323.20 (72.66), 788.12 (195.43),and 1099 (27.84), all figures in billions of dollars. This is a rough indication of the fact that the money supply over the entire period was not stationary.

Cross-Section Data    Cross-section data are data on one or more variables collected at the same poin in time, such as the census of population conducted by the Census Bureau every 10 years (  the latest being in year 2000), the survey of consumer expenditures  conducted by University of Michigan, and of course , the opinion polls by Gallup and upteen othe3r organizations. A concrete example of cross-sewctional data given in table 1.1.
This table give data on egg production and egg prices for the 50 states in the union for 1990 and 1991. For each year the data on the 50 state are cross-sectional data. Thus , in table 1.1 we have two cross-sectional samples.
Just as time series data create their   own special problems ( because of specifically the problem of heterogeneity. From the data given in table 1.1 we see that we have one some states that produce huge amounts of eggs (e.g., Pennsylvania ) and some that produce very little (e.q., Alaska ). When include such heterogeneous unit in a statistical analysis, size or scale effect must be taken into account so as not mix apples with oranges. to see this clearly, we plot in Figure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure shows how widely scattered the observation are. In Chapter 11 we will see how the scale effect can be the important factor in assessing relationships among economic variables.

TABLE 1.1



figure 1.6 Relationship between eggs produced and prices,1990.

Polled Data  In Pooled, or combined, data are element of both time series and cross-section data, The data in table 1.1 are example of pooled data. For each year we have 50 cross-sectional observations and for each state we have two time series observation on prices and output of eggs, a total 100 pooled ( or combined ) observations. Likewise, the data given in exercise 1.1 are pooled data in that Consumer Price Index (CPI) for each country for 1973-1997  is time series data ,whereas the data on the CPI for the seven countries for single year are cross-sectional data . In the pooled data we have 175 observation-25 annual observations for each of the seven countries.

Panel, Longitudinal or Micropanel Data   This is a special type of pooled data in which the same cross-sectional unit ( say, a family or firm ) is surveyed over time. For example , The U.S. Department of Commerce caries out a census of housing at periodic intervals. At each periodic survey the same household ( or the people living at the same address ) is interviewed to find out if there has been any change in the housing and financial conditions of that household since the last survey. By interviewing the same household periodically, the panel data provides very useful information on the ics oh household behavior, as we shall see in chapter 16.

The Sources of data 12
The data used empirical analysis my be collected a governmental agency (e.g. , the Department of Commerce ), an international agency ( e.g. , the International Monetary Fund ( IMF or the World Bank ), a private organization ( e.g. the standard & Poor's Corporation ), or an individual. Literally, there are thousand of such agencies collecting data for one purposes or another.

The Internet     The internet has literally revolutionized data gathering. If you just "surf the net" with keyword (e.g. , exchange rates ), you will be swamped with all kids of data sources . In Appendix E we provide some of the frequently visited web sites that provide economic and financial data of all sort. Most of the data can be downloaded without much cost. You may want to bookmark the various web sites that might provide yo with useful economic data.
The data collected by various agencies may experimental or nonexperimental.In experimental data, often collected in the natural sciences, the investigator may want to collect data while  holding certain factor constant in order to asses the impact of same factors on a given phenomenon. For instance, in assessing the impact of obesity on blood pressure, the researcher would want to collect data while holding constant the eating, smoking, and drinking habits of the people in order to minimize the influence of the variables on blood pressure.
In the social science, the data that one generally encounters are nonexperimental in nature , that is not subject  to the control of the researcher.13
For example , the data on GNP,unemployment,stock prices etc., are not directly under the control  of the investigator. As we shall see, this lack of control often  creates special problems for the researcher in pinning down the exact cause or cause affecting a particular situation. For example , is it money supply that determines the ( nominal ) GPD or is it the other way round?

The Accuracy of Data 14.
Although  plenty of data are available for economic research, the quality of the data is often not that good. There are several reasons for that. First as noted, most social science data are nonexperimental in nature, Threfore, there is the possibility of observational errors, either of omission or commission. Second, even in experimentally collected data errors of measurement arise from approximations and roundoffs. Third, in questionnaire -type surveys, the problem of the nonresponse can be serious; researcher get 40 percent response to a questionnaire. Analysis based on such partial response may not truly the behavior of the 60 percent who did not respond, thereby leading to what id known as ( sample ) selectivity bias. Then there is the the further problem that those who respond to the questionnaire may not answer all the the questions, especially question of financially sensitive nature , thus leading to additional selectivity  bias. Fourth,the sampling methods used in obtaining the data  may vary so widely that it is often difficult to compare  the result obtained  from the various samples, Fifth economic data are generally available at a highly aggregate level. For example, most macrodata (e.g.. GNP, employment, inflation, unemployment ) are available for the economy as a whole or at the most for same broad geographical regions. Such highly aggregated data may not tell us much about the individual or microunits that may be the ultimate object of study. Sixth, because of confidentiality, certain data can published  only in highly aggregate from. The IRS, for example , is not allowed be law to  disclose data  on individual  tax returns; it can only  release some  broad  broad summary data. 
Threfore, if one wants for  find out how much individuals with a certain level of income spent on health care, one cannot do that analysis except at a very highly aggregate level.  But such macroanalysis  often fails ti reveal the  dynamics of the behavior of the microunits. Similarly, the Department of  Commerce, which conducts the census of business every 5 years, is not allowed to disclose  information on production, employment, energy consumption, research and development expenditure, etc. , at the firm level. It is therefore difficult to study the interfirm differences on these items.
 Because of all these and many other problems, the researcher  should always keep in mind that the results of the research are only as good as the quality of the data. Therefore, if in given situations researcher find that the results of the research are "unsatisfactory" the cause may be not that they used the wrong model but that quality of the data was poor : Unfortunately, because of the nonexperimental nature of the data used in most social science studies, researcher very often have no choice but to depend on the available data. But they should always keep in mind that the data used may not be the best and should try not tobe too domatic about the results obtained from a given study, especially when the quality of the data is suspect.

A Note on the Measurement Scales of Variables. 15

The variable that will generally encounter fall into four broad categories: ratio scale , intervals scale , and nominal scale. It is important that we understand

Ratio Scale    For a variable X, taking two values , X1 and X2, the ratio X1/X2 and the distance (X2-X1) are meaningful quantities. Also, there is anatural ordering ( ascending or descending )of the value along the scale.
 Therefore, comparisons such as X2< X1 or X2 > X1 are meaningful. Mos economic variable belong to this category. Thus, it is meaningful to ask how big is this year's GPD compared with the previous year's GPD.

Interval Scale  An intervl scale variable statisfies the last two properties of the ratio scale variable but not the first. Thus, the distance between two time periods, say ( 2000-1995) is meaningful, but not the ratio of two itme periods (2000/1995)

Ordinal Scale   A variable belongs to this category only if it statisfies the third property of the ratio scale(i.e., natural ordering). Examples  are grading system ( A, B. C grades ) or income class (upper,middle, lower)  For these variables the ordering exists but the distances between the categories cannot be quantified. Students of economics will recall the indifference curves between  two goods, each higher indifference curve indicating higher lever of utility, but one cannot quantify by how much one indifference curve is higher than others. 
Nominal Scale  Variables in this category have none of the features of the ratio scale variables. Variables such  as gender (male,female) and marital status ( married, unmarried, divorced, separated) denote  cagories , Question: what is the reason why such variables cannot be expressed on the ratio, interval, ordinal scales ?
As we shall see, econometric techniques that may be suitable for ratio scale variables may not be suitable for nominal scale variables. Therefore, it is important to bear in mind the distinctions among the four types of the measurement scales discussed above.
__________________________________
12 For an illuminating account,see Albert T. Somers, The U.S. Economy Demystified : What the Major Economic Statistic Mean and Their Significance for Business. D.C. Heath , Lexington, Mass 1985
13 In the social  sciences too sometimes one can have a controlled experiment. An example is given in cercise 1.6
14  For a critical review, see O. Morgenstern, The Accuracy  of Economic Observations, 2d ed., Princeton University Press,Princeton, N.J., 1963.
15 The following discussion relies heavily on Aris Spanos, Probability  Theory and Statistical Inference: Econometric Modeling with Observation Data, Cambridge University Press, New York 1999,p.24