Choosing a suitable test

The test you use depends very much on what sort of data you are able to collect, and what you want to investigate.

There are two main choices: parametric tests and nonparametric tests. Parametric tests usually rely on the error in your data having a normal distribution, whereas nonparametric tests don't have this assumption. Parametric tests are usually more powerful than nonparametric (they are less likely to accept the null hypothesis as true when it isn't), but they are often wrongly used in situations where they should not be, because the data is not normally distributed.

If you are collecting count data and want to know if they fit some expected results (e.g.the number of corn kernels of a given colour from Mendelian genetic ratios), you will want the χ2 test. Likewise, you'll want this test, or it's near relative the gamma test, if you want to know whether a distribution of count data (e.g. the the number of insects found on certain sorts of leaf) is nonrandom.

If you are trying to show a correlation between two data sets (show that when one variable is changed, another changes in response), you can chose between a parametric test (regression) and a nonparametric test (Spearman rank correlation). The first of these relies on your data being in a straight line (or other well defined shape), the second doesn't care about the exact shape of the curve as long as when one variable increases, so does the other. The one you chose depends on how sure you are that your data is in a nice straight line. Plot it, find out, and chose the appropriate test.

If you are comparing data sets to see if they are different, you have several choices. If you are comparing just two means, you can use a t test if you are sure that the data is normally distributed (check it in a frequency table, or see if the median and mean are similar), or a Mann-Whitney U test if you are less certain (e.g. if your data is severely skewed). If you are comparing more than two means, analysis of variance (ANOVA) is the way to go, but this has the same assumptions as the t test, so be careful when applying it.

The t test's assumptions are:

  1. The samples were randomly collected
  2. The error is normally distributed
  3. The two sample groups are independent of one another
  4. The variances (standard deviations) of the two sample groups are the same

There is nothing you can do about the first two assumptions, except use a different test. However, if you suspect that you are breaking the other two, you can use a slightly different sort of t test. There are several sorts of t test, not covered in detail here, but if you get confused by the plethora of them in Excel, here is a quick run down:

  • One tailed vs two tailed. Use a one tailed test if you want to see if one mean is bigger than the other. Use a two tailed test if you want to see if one mean is different (bigger or smaller than the other).
  • Paired vs unpaired. Use an unpaired test if your two means come from different populations. Use a paired test if your two means come from the same population treated twice. Hence, it's paired if you are comparing ten students' reactions to strong coffee to the same students' reaction to decaffeinated coffee, but it's unpaired if you are comparing the reaction of ten students to decaf, and a different ten's reaction to an espresso. Use the paired test to deal with non-independent sample groups (assumption 3).
  • Equal variance vs unequal variance. Use an unequal variance test if the standard deviations of your two sample groups appear to be very different (assumption 4).