Wednesday, March 6, 2013

correlation basics

We use linear regression to find relationships between two or more variables and to create a model that attempts to describe the relationship. 

A scatterplot is a 2-dimensional graph that displays pairs of data (i.e. observations). 

A correlation coefficient is a number that tells strength and direction of a relationship. 

·         0 means no linear relationship exists (could be there is a curved relationship)

·         +1 or -1 means the points fall on a perfect straight line   

·         The closer the number is to +1 or -1, the stronger the relationship. 

·         Rule of thumb is that +0.7  or -0.7  (or more) is a strong relationship; +0.5 or -0.5 indicates a moderate relationship.

Kinds of correlation coefficient – the one used depends on the kind of data:

·         Pearson r – used for data measured at least on an interval level (such my as data for TL, SL, and SV scales)

·         Spearman rho – used for linear relationships when data is measured on an ordinal scale (such as a ranking)

·         Phi – used for linear relationships for data measured dichotomously (e.g. yes/no, pass/fail)

·         also Point Biserial and Eta....don’t care about these for now

For a null hypothesis, the expected correlation is 0.  The key question is whether the variance from what we expect can be attributed to a relationship that really exists, or is the variance found only because of a sampling error.

A one-tailed hypothesis assumes the relationship is positive or negative.

A two-tailed hypothesis makes no assumption about the relationship.  

For a correlational study, degrees of freedom = N-2.  One degree of freedom is lost for every variable in the model.  Degrees of Freedom represents how many numbers are free to vary in a calculation sequence (Steinberg, 2008). 


Rumsey, D. (2009). Statistics II for Dummies.  Hoboken, NJ: Wiley Publishing, Inc.

Steinberg, W. J. (2008). Statistics Alive! Thousand Oaks, CA: Sage Publications.