Closely related to but conceptually very much different from regression analysis is correlation analysis, where the primary objective is to measure the strength or degree of linear association between two variables. The correlation coefficient, which we shall study in detail in Chapter 3, measures this strength of ( linear ) association. For Example, we may be interested in finding the correlation (coefficient) between smoking and lung cancer,between scores on statistics and mathematics examination,between high school grades and college grades, and so on. In regression analysis, as already noted, we are not primarily interested in such a measure. Instead, we try to estimated or predict the average value of want to know whether we can predict the average score on statistics examination by knowing a student's score on mathematics examination.
Regression and correlation have same fundamental differences that are worth mentioning . In regression analysis there is an asymmetry in the way the dependent and explanatory variables are are treated. The dependent variable is assumed to be statistical ,random, or stochastic,that is , to have a probability distribution. The explanatory variables, on the other hand , are assumed to have fixed values ( in repeated sampling)7 which was made explicit in the definition of regression given in Section 1.2.Thus , in figure 1.2 we assumed the the variable age was fixed at given levels and height measurements were obtained at these levels. In Correlation analysis, on the other hand, we treat any (two) variable symmetrically; there is no distinction between the dependent and explanatory variable. After all, the correlation between scores on mathematics and statistic examinations is the same as that between scores on statistics and mathematics examination.
Moreover,both variables are assumed to be random. As we shall see, most of the correlation theory is based on the assumption of the randomness of variables, whereas most of the regression theory to be expounded in this book is conditional upon the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic.8
_______________________________
7It is crucial to note the the explanatory variables may be intrinsically stochastic, but for the purpose of regression analysis we assume that their values are fixed in repeated sampling ( that is, X assumes the same values in various samples),thus rendering them in effect nonrandom or nonstochastic. But more on this in Chap.3, Sec.3.2
8In advanced treatment of economics, one can relax the assumption that the explanatory variables are nonstochastic ( see introduction to part II ).
Moreover,both variables are assumed to be random. As we shall see, most of the correlation theory is based on the assumption of the randomness of variables, whereas most of the regression theory to be expounded in this book is conditional upon the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic.8
_______________________________
7It is crucial to note the the explanatory variables may be intrinsically stochastic, but for the purpose of regression analysis we assume that their values are fixed in repeated sampling ( that is, X assumes the same values in various samples),thus rendering them in effect nonrandom or nonstochastic. But more on this in Chap.3, Sec.3.2
8In advanced treatment of economics, one can relax the assumption that the explanatory variables are nonstochastic ( see introduction to part II ).
No comments:
Post a Comment