November 22, 2011

1.7 THE NATURE AND SOURSCES OF DATA ( Damodar N. Gujarati )

FOR ECONOMIC ANALYSIS 10

The success of any econometric analysis ultimately depends on the availability of the appropriate data. It is therefore essential that we spend some time discussing the nature, sources,and limitations of the data that one may encounter in empirical analysis.

Types Of  Data 
There types of data may be available for empirical analysis; time series, cross-section, and pooled ( i.e.,combination of time series and cross-section) data.

Time Series Data   The data shown in table 1.1 of the Introduction are an example of time series data. A time series is a set of observation on the values that a variable take at different times. Such data may be collected at regular time intervals, such as daily (e.g.,stock prices, weather report ), weekly (e.g., money supply figures),monthly [e.g.,the unemployment rate ,the Consumer Price Index (CPI) ], quarterly (e.g.,GPD), annually (e.g., government budgets), quinquennially, that is , every 5 years (e.g., the census of manufactures ), or decennially (e.g.,the consensus of population).
Sometime data are available both quarterly as well as annually, as in the case of the data on GPD and consumer expenditure.With the advent of high-speed computer, data can now be collected over an extremely short interval  of time, such as the data on stock prices, which can be obtained literally continuously ( ( the so-called real-time quote )
Although time series data are used heavily in econometric studies, they present special problems for econometricians. As we will show in chapters on time series econometrics later on , most empirical work based  on time series data assumes that the underlying time series is stationary. Although it is too early  to introduce the precise technical meaning of stationarity at this juncture,loosely speaking a time series is stationary if it mean and variance do not vary systematically over time. To see what this means,consider Figure 1.5, which depicts the behavior of the M money supply in the United States From 1 1959, to July 31,99. ( the actual data are given in exercise 1.4.) As you can see from this figure , the M1 money supply shows a steady upward trend as well as variability over the year , suggesting that M1 time series is not stationary.11 We will explore this topic fully in chapter 21.

FIGURE 1.5. M1 money supply. United States, 1951:01-1999:09
________________
10For an informative account,see Michael D. Intriligator, Econometric Models, Techniques, and application,Prentice Hall, Englewood Cliffs,N.J.,1978,chap. 3.
11To see this more clearly, we divided the data into four time periods:1951:01 to 1962:12;1963:01 to 1975:12;1975:01 to 1986:12, and 1987:01 to 1999:09: For these subperiods the mean values of the money supply (with corresponding standard deviation in parentheses ) were, respectively,165.88 (23.27).323.20 (72.66), 788.12 (195.43),and 1099 (27.84), all figures in billions of dollars. This is a rough indication of the fact that the money supply over the entire period was not stationary.

Cross-Section Data    Cross-section data are data on one or more variables collected at the same poin in time, such as the census of population conducted by the Census Bureau every 10 years (  the latest being in year 2000), the survey of consumer expenditures  conducted by University of Michigan, and of course , the opinion polls by Gallup and upteen othe3r organizations. A concrete example of cross-sewctional data given in table 1.1.
This table give data on egg production and egg prices for the 50 states in the union for 1990 and 1991. For each year the data on the 50 state are cross-sectional data. Thus , in table 1.1 we have two cross-sectional samples.
Just as time series data create their   own special problems ( because of specifically the problem of heterogeneity. From the data given in table 1.1 we see that we have one some states that produce huge amounts of eggs (e.g., Pennsylvania ) and some that produce very little (e.q., Alaska ). When include such heterogeneous unit in a statistical analysis, size or scale effect must be taken into account so as not mix apples with oranges. to see this clearly, we plot in Figure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure shows how widely scattered the observation are. In Chapter 11 we will see how the scale effect can be the important factor in assessing relationships among economic variables.

TABLE 1.1



figure 1.6 Relationship between eggs produced and prices,1990.

Polled Data  In Pooled, or combined, data are element of both time series and cross-section data, The data in table 1.1 are example of pooled data. For each year we have 50 cross-sectional observations and for each state we have two time series observation on prices and output of eggs, a total 100 pooled ( or combined ) observations. Likewise, the data given in exercise 1.1 are pooled data in that Consumer Price Index (CPI) for each country for 1973-1997  is time series data ,whereas the data on the CPI for the seven countries for single year are cross-sectional data . In the pooled data we have 175 observation-25 annual observations for each of the seven countries.

Panel, Longitudinal or Micropanel Data   This is a special type of pooled data in which the same cross-sectional unit ( say, a family or firm ) is surveyed over time. For example , The U.S. Department of Commerce caries out a census of housing at periodic intervals. At each periodic survey the same household ( or the people living at the same address ) is interviewed to find out if there has been any change in the housing and financial conditions of that household since the last survey. By interviewing the same household periodically, the panel data provides very useful information on the ics oh household behavior, as we shall see in chapter 16.

The Sources of data 12
The data used empirical analysis my be collected a governmental agency (e.g. , the Department of Commerce ), an international agency ( e.g. , the International Monetary Fund ( IMF or the World Bank ), a private organization ( e.g. the standard & Poor's Corporation ), or an individual. Literally, there are thousand of such agencies collecting data for one purposes or another.

The Internet     The internet has literally revolutionized data gathering. If you just "surf the net" with keyword (e.g. , exchange rates ), you will be swamped with all kids of data sources . In Appendix E we provide some of the frequently visited web sites that provide economic and financial data of all sort. Most of the data can be downloaded without much cost. You may want to bookmark the various web sites that might provide yo with useful economic data.
The data collected by various agencies may experimental or nonexperimental.In experimental data, often collected in the natural sciences, the investigator may want to collect data while  holding certain factor constant in order to asses the impact of same factors on a given phenomenon. For instance, in assessing the impact of obesity on blood pressure, the researcher would want to collect data while holding constant the eating, smoking, and drinking habits of the people in order to minimize the influence of the variables on blood pressure.
In the social science, the data that one generally encounters are nonexperimental in nature , that is not subject  to the control of the researcher.13
For example , the data on GNP,unemployment,stock prices etc., are not directly under the control  of the investigator. As we shall see, this lack of control often  creates special problems for the researcher in pinning down the exact cause or cause affecting a particular situation. For example , is it money supply that determines the ( nominal ) GPD or is it the other way round?

The Accuracy of Data 14.
Although  plenty of data are available for economic research, the quality of the data is often not that good. There are several reasons for that. First as noted, most social science data are nonexperimental in nature, Threfore, there is the possibility of observational errors, either of omission or commission. Second, even in experimentally collected data errors of measurement arise from approximations and roundoffs. Third, in questionnaire -type surveys, the problem of the nonresponse can be serious; researcher get 40 percent response to a questionnaire. Analysis based on such partial response may not truly the behavior of the 60 percent who did not respond, thereby leading to what id known as ( sample ) selectivity bias. Then there is the the further problem that those who respond to the questionnaire may not answer all the the questions, especially question of financially sensitive nature , thus leading to additional selectivity  bias. Fourth,the sampling methods used in obtaining the data  may vary so widely that it is often difficult to compare  the result obtained  from the various samples, Fifth economic data are generally available at a highly aggregate level. For example, most macrodata (e.g.. GNP, employment, inflation, unemployment ) are available for the economy as a whole or at the most for same broad geographical regions. Such highly aggregated data may not tell us much about the individual or microunits that may be the ultimate object of study. Sixth, because of confidentiality, certain data can published  only in highly aggregate from. The IRS, for example , is not allowed be law to  disclose data  on individual  tax returns; it can only  release some  broad  broad summary data. 
Threfore, if one wants for  find out how much individuals with a certain level of income spent on health care, one cannot do that analysis except at a very highly aggregate level.  But such macroanalysis  often fails ti reveal the  dynamics of the behavior of the microunits. Similarly, the Department of  Commerce, which conducts the census of business every 5 years, is not allowed to disclose  information on production, employment, energy consumption, research and development expenditure, etc. , at the firm level. It is therefore difficult to study the interfirm differences on these items.
 Because of all these and many other problems, the researcher  should always keep in mind that the results of the research are only as good as the quality of the data. Therefore, if in given situations researcher find that the results of the research are "unsatisfactory" the cause may be not that they used the wrong model but that quality of the data was poor : Unfortunately, because of the nonexperimental nature of the data used in most social science studies, researcher very often have no choice but to depend on the available data. But they should always keep in mind that the data used may not be the best and should try not tobe too domatic about the results obtained from a given study, especially when the quality of the data is suspect.

A Note on the Measurement Scales of Variables. 15

The variable that will generally encounter fall into four broad categories: ratio scale , intervals scale , and nominal scale. It is important that we understand

Ratio Scale    For a variable X, taking two values , X1 and X2, the ratio X1/X2 and the distance (X2-X1) are meaningful quantities. Also, there is anatural ordering ( ascending or descending )of the value along the scale.
 Therefore, comparisons such as X2< X1 or X2 > X1 are meaningful. Mos economic variable belong to this category. Thus, it is meaningful to ask how big is this year's GPD compared with the previous year's GPD.

Interval Scale  An intervl scale variable statisfies the last two properties of the ratio scale variable but not the first. Thus, the distance between two time periods, say ( 2000-1995) is meaningful, but not the ratio of two itme periods (2000/1995)

Ordinal Scale   A variable belongs to this category only if it statisfies the third property of the ratio scale(i.e., natural ordering). Examples  are grading system ( A, B. C grades ) or income class (upper,middle, lower)  For these variables the ordering exists but the distances between the categories cannot be quantified. Students of economics will recall the indifference curves between  two goods, each higher indifference curve indicating higher lever of utility, but one cannot quantify by how much one indifference curve is higher than others. 
Nominal Scale  Variables in this category have none of the features of the ratio scale variables. Variables such  as gender (male,female) and marital status ( married, unmarried, divorced, separated) denote  cagories , Question: what is the reason why such variables cannot be expressed on the ratio, interval, ordinal scales ?
As we shall see, econometric techniques that may be suitable for ratio scale variables may not be suitable for nominal scale variables. Therefore, it is important to bear in mind the distinctions among the four types of the measurement scales discussed above.
__________________________________
12 For an illuminating account,see Albert T. Somers, The U.S. Economy Demystified : What the Major Economic Statistic Mean and Their Significance for Business. D.C. Heath , Lexington, Mass 1985
13 In the social  sciences too sometimes one can have a controlled experiment. An example is given in cercise 1.6
14  For a critical review, see O. Morgenstern, The Accuracy  of Economic Observations, 2d ed., Princeton University Press,Princeton, N.J., 1963.
15 The following discussion relies heavily on Aris Spanos, Probability  Theory and Statistical Inference: Econometric Modeling with Observation Data, Cambridge University Press, New York 1999,p.24





 






No comments:

Post a Comment