Chapter 14

Regression Analysis and Correlation

Modified: 2007-05-03

In the example above, the regression line shows the best linear fit for the bivariate data (mileage and weight)

Correlated data are often graphed on a scatterplot. One of the variables is plotted on the X axis and the other on the Y axis. Each dot of a scatterplot represents a pair of scores, one for the X variable and one for the Y variable. Here are some examples:

The Pearson correlation coefficient (symbolized r) is a widely used descriptive statistic that shows the degree of relationship between two variables. Correlation coefficients range from -1.00 to 1.00, with .00 in the middle.


The strongest degree of relationship is indicated by r = 1.00 and r = -1.00. Both coefficients indicate that the relationship is perfect, which means that changes in the scores of one variable are accompanied by perfectly predictable changes in the scores of the other variable. The middle value, r = .00, means that there is no relationship between the two variables. When r = .00, the changes in one variable give no clue as to changes in the other variable.


            Positive correlation coefficients (from .01 to 1.00) indicate that the two variables vary in the same direction. That is, as scores on one variable increase, scores on the other variable increase as well. Negative correlation coefficients (from - .01 to -1.00) mean that the variables change in opposite directions. As scores on one variable increase, scores on the other variable decrease. The closer r is to 1.00 or -1.00, the more predictable the increase or decrease is.

Outlier: A score separated from others and 1.5(IQR) beyond the 25th or 75th percentile.

Here's an example of a dataset with an outlier:

The data above are the number of patents awarded by state in 1940 vs. 1950. Look for the outlier, it's not that easy to see. If the outlier were not there, where would the regression line be? What should we do with these data?

Online Resourses

Correlation coefficient
Interactive page allows user to see regression line and scatterplot of bivariate distribution whose values range from -1.00 to +1.00.
http://noppa5.pc.helsinki.fi/koe/corr/cor7.html

Wikipedia on correlation coefficient
Long page discusses correlation coefficient in detail including mathematical properties, non-parametric coefficients, and common misconceptions about correlation.
http://en.wikipedia.org/wiki/Correlation

Scatterplot
Shows sample scatterplots and explains positive and negative associations between bivariate data.
http://www.stat.yale.edu/Courses/1997-98/101/scatter.htm

Multiple Regression
Online primer covers basic topics in multiple regression
http://www.statsoft.com/textbook/stmulreg.html


Back to Main RMPA Page