Stack Overflow: 2013

Wednesday, March 6, 2013

correlation basics

We use linear regression to find relationships between two or more variables and to create a model that attempts to describe the relationship.

A scatterplot is a 2-dimensional graph that displays pairs of data (i.e. observations).

A correlation coefficient is a number that tells strength and direction of a relationship.

· 0 means no linear relationship exists (could be there is a curved relationship)

· +1 or -1 means the points fall on a perfect straight line

· The closer the number is to +1 or -1, the stronger the relationship.

· Rule of thumb is that +0.7 or -0.7 (or more) is a strong relationship; +0.5 or -0.5 indicates a moderate relationship.

Kinds of correlation coefficient – the one used depends on the kind of data:

· Pearson r – used for data measured at least on an interval level (such my as data for TL, SL, and SV scales)

· Spearman rho – used for linear relationships when data is measured on an ordinal scale (such as a ranking)

· Phi – used for linear relationships for data measured dichotomously (e.g. yes/no, pass/fail)

· also Point Biserial and Eta....don’t care about these for now

For a null hypothesis, the expected correlation is 0. The key question is whether the variance from what we expect can be attributed to a relationship that really exists, or is the variance found only because of a sampling error.

A one-tailed hypothesis assumes the relationship is positive or negative.

A two-tailed hypothesis makes no assumption about the relationship.

For a correlational study, degrees of freedom = N-2. One degree of freedom is lost for every variable in the model. Degrees of Freedom represents how many numbers are free to vary in a calculation sequence (Steinberg, 2008).

References

Rumsey, D. (2009). Statistics II for Dummies. Hoboken, NJ: Wiley Publishing, Inc.

Steinberg, W. J. (2008). Statistics Alive! Thousand Oaks, CA: Sage Publications.

Sunday, February 17, 2013

normal curve

A normal curve is a theoretical distribution. The smaller the sample size, the greater we can expect deviation from percentages associated with the normal curve. Even so, the theory holds true (statistically) when there are minor violations to the shape assumption. This means it is useful for interpreting my data as long as it’s approximately normally distributed.

When a distribution is normal, 99% of the sample points will fall within 3 standard deviations of the sample mean. Standard Deviation is the average linear distance from the mean = square root of the variance.

normal distribution curve

Friday, February 15, 2013

fundamental: alpha and beta

Fundamental stuff to know...

In research, statistical tests don’t prove an alternate hypothesis is true. Instead, we use statistical analysis to provide evidence that supports rejecting or failing to reject the null hypothesis.

P-value is a measure of the strength of evidence against the null hypothesis. Alpha (α) is the cutoff for the p-value. Alpha = .05 in my research, which is standard for most dissertations.

when p-value < α (in my case, α=.05), reject the null hypothesis
when p-value > or = α, we do not have evidence to reject the null hypothesis

When you draw conclusions about a population based on a sample, you have opportunity for error. You hope your sample represents the population, but our world is not perfect...damnit.

· Type I error = α = probability of rejecting a null hypothesis when you shouldn’t. We reject a null hypothesis that is true. You see effect when there really isn’t any effect. It’s a false positive...we reject a true null hypothesis.
· Type II error = β = the probability of not rejecting a null hypothesis, i.e. we accept a null hypothesis when we should not accept it (it is false)...we accept a false null hypothesis. Effect is there, but we don’t find it.

You can reduce the likelihood of committing a Type I error by making alpha smaller. You can reduce the likelihood of committing a Type II error by increasing the sample size.

The power of the hypothesis test = 1 – probability of committing a Type II error = probability of rejecting a null hypothesis when it should be rejected. The power represents the likelihood of detecting effect that is real.