Lies And Statistics

Applied common sense

Science is not a religion, nor a way of proving things, nor a way of revealing ultimate truth, if such as thing exists. Science is working out exactly how certain you are that the world works the way you think it does. Science is about proposing models of reality, testing then, and refining them whenever new data contradicts the theory. With each improvement, your confidence that the model is correct under all circumstances increases; and although you'll never be able to say its 100% bullet-proof correct, statistics does allow you to put a number on your certainty: "Theory agrees with reality 95% of the time" or "I am 95% confident that this effect is not due to random chance".

Consider throwing a ball into the air. Logically, there are two possibilities: either the ball will fall back down again, or it will not. If you throw the ball up a hundred times and find that every time, it comes back down again, your model of gravity might well be "What goes up must come down". However, if you throw the ball up using a Saturn V rocket: catastrophe! Model and reality no longer agree at all, and you will have to modify your theory to account for this new data. By throwing balls into the air at various speeds and in various directions, and observing the motion of the ball, you may well eventually come up with a model of gravity that bears a stunning similarity to that of Newton. The model and reality agree extremely well, and your confidence in the accuracy of its predictions would be extremely high, ever closer to 100%.

However, there is always a chance that the model is incorrect, and will fail to match reality some tiny fraction of the time, for two reasons:

  1. An unusually strong updraft of wind might accelerate the ball into orbit, or an astoundingly unlikely quantum effect might turn the ball into a penguin. The simple Newtonian model doesn't include a model for the effects of wind, or quantum jitters, or Einsteinian spacetime. Incorporating these features into our model will improve its predictive power (although the increased power might make it less practically useful if the maths gets gnarly).
  2. All the times the ball went up and then did what you predicted might really have been a gigantic fluke: a very unlikely fluke, but just an enormous piece of luck nonetheless. This is exactly what statistics allow us to deal with. If you do something a trillion times, and always get the right answer (according to your model), it shows that either your model is excellent, or you are incredibly lucky. You can never tell. All you can say is that your model explains the observations within a certain degree of probability. This probability represents how sure you are that the correct predictions weren't all a massive fluke, and that the model is genuinely useful.

Science never proves anything, but its models of the world can be repeatedly refined, getting arbitrarily closer to a model that has no exceptions. For people who like to think in black and white, science may be rather unsatisfactory. Science can never prove anything absolutely, although it can 'prove' things beyond reasonable doubt. The only problem is convincing people their doubts are unreasonable! Science is not about black and white, it's about working out exactly which shade of grey you are dealing with.

Fair tests

Most British school kids these days will have been indoctrinated with the idea of the 'fair test' by the time they are seven. However, adults seem somewhat hazier, and herein lies a world of stupidity. Not understanding the basic idea of a fair test is the root of most pseudoscience, shoddy experimentation and other such displays of public stupidity. The idea is that if we want to do a scientific experiment investigating something, we should try to think of all the possible factors that might influence our chosen something, and then control all but the ones we want to investigate. For example, if we want to know the effect of water temperature on the solubility of salt, we should ensure all the other things that might influence solubility (like the volume of water, mass of salt, atmospheric pressure, purity of the water, etc.) are held consistently the same for all our experiments. This means that when we find out that hot water does dissolve more salt, we can be fairly sure it really is down to the temperature, and not down to something else that wasn't properly controlled.

We can take the idea of a fair test and push it a little further. As well as making sure all possible interfering factors are taken into account, and that we are systematic and consistent about our experimentation, we should also be aware of the following too:

Significance

If you are comparing two samples, e.g. the sugar content of urine from diabetics and non-diabetics, you should be aware that not only can you work out a mean (average) from your data, but you should also calculate a standard deviation. The reason for doing this is that you can only be sure that two means are different if the data that went into them isn't too scattered. A mean of 2 may look smaller than a mean of 4, but you can only tell if they are significantly different if you also calculate their standard deviations. Big standard deviations imply the data from your two sample groups is very spread out, and the two sample populations overlap a lot. You can use statistical tests like the t-test to see if things are significantly different.

Sample size

The bigger a sample size you can reliably take, the better. It's difficult to say what a 'big enough' sample size is, but for well controlled experiments (like ones you carry out in lab), five replicates is often enough. For experiments you can't control very easily (like ecological experiments), thirty is nearer the mark.

Randomisation

It's all very well to have a huge sample, but it's worthless if you don't randomise. Say you wanted to see whether caffeine increases heart rate. You have twenty people to experiment on, so you decide to give ten of them coffee, and ten of them decaf. If you don't properly randomise your experiment, you run the risk of giving the caffeinated coffee only to boys, tall people, or those people who are regular coffee drinkers, purely by accident. These three groups are likely to be more resistant to caffeine than girls, short people and coffee-teetotallers. This will severely interfere with your interpretation of the results, particularly if the sample size is very small. One way to randomise things is to get everyone to pull a ticket marked 'A' or 'B' out of a hat, so that everyone has an equal chance of ending up in the coffee or decaf group.

Something else you can do is 'stratified' sampling. This is a sort of non-random randomisation, useful for small sample sizes. What you do is try to pair people off, e.g. ensure that there are an equal number of girls and boys, coffee-drinkers and non-drinkers, big and small, in each group. In ecology, you can divide a big field into nine squares, then take random samples from four places in each square. This makes sure your rather small sample of 36 doesn't end up clumped into one corner of the big field, which it might do if you used real randomisation.

Placebo control

Say you want to know if a new drug is effective. It is a Proven Fact™ that if you find someone with the disease you want to treat and give them completely useless sugar pills whilst telling them they are the latest treatment for their condition, there is a good chance they will feel better anyway (particularly if you call them homeopathic). This is called the placebo effect. In the previous coffee example, we used a placebo, the decaffeinated coffee, which is designed to look and feel just like the real drug (coffee), but doesn't actually contain the drug at all. A placebo is a special sort of control that accounts for the psychological effect of an experiment, and it is an essential part of any experiment that is done on human beings or other such easily duped animals.

Double blind

A single-blind experiment is placebo controlled: the experimentee doesn't know whether they are getting the real drug or the sugar pills. A double-blind experiment is one where the experimenter doesn't know either. For the coffee experiment, the experimenter can get a friend to put the coffee and decaf in plastic bags labelled A and B whilst she is in another room. The friend can seal a crib saying "A is the decaf" in an envelope and stick it to the ceiling, only to be retrieved and opened once the analysis of the experiment is complete. That way, the experimenter doesn't know which coffee is which. The idea of this is that it stops the experimenter's prejudices from clouding the analysis. If she is expecting caffeine to make the pulse rate increase, then it is human nature to expect her to dismiss inconvenient results as 'aberrant' just because they don't fit in with what she expected to happen.

Tiny effects

You should always criticise your conclusions. One thing to be wary of is significant but tiny effects. If your two samples are significantly different, you should still be wary. Say your caffeinated coffee drinkers had a mean increase of 4.0 BPM over the resting rate, and your decaf drinkers a mean increase of 3.5 BPM, and that this difference was significant. If you had to put lots of sugar into the coffee to make it palatable, and you think this may also increase the pulse rate, then you should think twice before accepting that the difference really is significant: it's just as likely that the difference was due to slight and accidental differences in how much sugar you added to the coffees. Beware of small differences between samples where there is a larger underlying difference that you haven't accounted for fully.

Picking holes

Armed with these ideas, see if you can pick holes in these experiments:

Antiviral drug trial

We took a sample of twenty people with cold sores. To a random half, we gave the drug, to the other half we gave no drug. After three weeks of taking the drug, the drugged group had significantly fewer sores. We concluded the drug was an effective treatment for cold sores. Answer

Chemotherapy drug trial

We asked for volunteers with leukaemia to take part in a trial of a chemotherapy drug. To the first fifty volunteers who applied, we gave the drug, to the next fifty, we gave a placebo. The data showned no significant difference. However, we noted some aberrant results in the placebo group, and after removal and reanalysis, we found the drugged group were more likely to be in remission. We concluded the drug was an effective chemotherapeutic. Answer

GM potato feeding trial on rats

We fed ten rats normal potatoes, ten rats potatoes genetically modified to produce an insecticidal toxin, and ten rats potatoes sprinkled with an equivalent amount of toxin. We looked at the effect on the rats' weight after two weeks on this diet. The weigh losses were 50% (normal), 55% (sprinkled) and 60% (GM). These differences were significant. We concluded the GM process somehow made the toxin more toxic to the rats. Answer

Correlation and causality

In addition to not understanding the basics (and not so basics) of fair tests, there's another common pitfall that many people fall for. When provided with a correlation, people immediately assume one thing causes the other. For example:

The above are all cases of correlation: when one variable is high, so is the other (positive correlation: more smoke = more cancer), or when one is high the other is low (negative correlation: more exercise = less hypertension). However, these cases may give you a false impression that whenever two things are correlated, the second is caused by the first:

It may be that all the people who die of lung cancer were also asbestos workers in the 1960s. Perhaps the tobacco companies gave away free cigarettes at asbestos factories during the 1960s to encourage the workers to take up the habit. We would then end up with a completely spurious correlation between smoking and lung cancer.

For any correlation (X is correlated with Y), we have three possibilities

You might think these alternative explanations are a little unlikely. For smoking this is true, but we only know it is true because huge, randomised, double-blind, controlled experiments have been done on animals to show that when we expose them to smoke, they are more likely to develop cancer.

You should always be careful when you interpret correlations: the following correlations are true, but the 'obvious' link (X causes Y) is nonsense.

Picking even bigger holes

See if you can come up with alternatives to the following, and classify them as X causes Y, Y causes X or Z causes X and Y:

Incidentally, in the first two cases, the largest and best-controlled experiments don't even show a correlation, let alone a causal link!

So how can we see if a correlation really is a simple X causes Y or something more complicated?

The best way to do this is to do controlled experiments. For the smoking/cancer link, we would need to take large, random samples of human beings, and force half of them to smoke, and the other half not to smoke. We would keep them captive so they couldn't do the wrong thing, and so we could control confounding factors (like the possibility that the drugs in cigarettes make humans more likely to become asbestos workers, nuclear technicians or X-ray radiographers). We would then see how many in each group developed cancer.

This is rather unethical, and you probably wouldn't get the grant money to perform such an experiment. The alternatives to this experiment include:

Avoiding the obvious pitfalls

In the last twenty years, the number of people in industrialised countries suffering from asthma has risen. Furthermore, in the last twenty years, the following could be said:

These may be relatively indisputable facts, but so are the following:

So what? Well, the thing is, just because something is 'obvious' doesn't mean it's the only possible explanation, or even the most likely one, and certainly not necessarily the most correct one. To a certain prejudice, children spending all their time watching television, not exercising, is the obvious cause of asthma. To another, it's vaccines, to another it's thick carpets, to another, it's surely the death of having the toilet in the out-house. The problem with 'obvious' things is that quite often they are wrong, or at least worthy of a good hard look first. The problem is spurious correlation. If someone shows you a graph like this:

Obvious correlation between asthma and pollution.

(Units, etc. are arbitrary), it seems obvious that as the concentration of ozone in town air increases, as it may well have done over the last twenty years, so too does the rate of asthma. 'Obviously' the ozone is causing the asthma. Now this is a very good working hypothesis, and a scientifically valid guess at what is happening, but correlation does not necessarily imply cause. I could equally well show you this graph:

Obvious correlation between asthma and the speed of computer processors.

(Units and numbers here are probably largely accurate!), and it is equally 'obvious' that the average speed of computer processors is the cause of asthma.

It appears 'obvious' to many that pollution is to blame for asthma: air in the industrialised world is full of benzene and aldehydes, sulfur dioxide and ozone, and this damages children's lungs. Unfortunately, when you look at the rates of asthma in heavily polluted countries (particularly the Eastern Bloc, until recently), the asthma rates were not noticeably different to those in countries with lovely clean air. That's not to say that dirty air is a good thing, it does cause damage (not least to lichens, but people seem to be more attached to boring human children than fantastic fungal symbioses), but as an 'obvious' cause of asthma, it begins to look a little shaky.

That a single effect may have several causes, each contributing to a different degree in different situations is something some people would rather not have to deal with. One big, evil bogey man like pollution (which seems to have very little to do with most allergies), is so much more psychologically appealing than a nebulous mix of strange-sounding causes that may include such things as 'excessive hygiene'.

To take another famous example, the rate of leukaemia in children around the nuclear power station in Sellafield is unusually high. It is quite 'obvious' that the nuclear power station is to blame, as we all know about the dangers of radiation: exposure does lead to cancer. However, finding evidence for a link between radiation exposure and rates of cancer in the children of (male) workers has proved extremely tricky.

Here is another way of looking at the data: perhaps the number of people working at the station is indeed the reason that there are more childhood leukaemias, but the reason is not the radiation at all, but that the workers come from all over the country. Why should that be interesting? If you've ever worked in a school or university, which brings lots of varied humans into close proximity, you'll know all about the cold (or worse, meningitis C) everyone goes down with in about October. High human immigration from a wide area means high immigration of an exciting variety of infectious diseases too. Possibly including retroviruses that cause leukaemia. Not such an obvious idea as the radiation causing the cancers, and certainly not the public health scandal everyone would like a juicy part of, but currently a runners in the explanation of this cancer cluster. Maybe it's not the sole cause, or even a cause at all: maybe it really is some unfound radiation problem, or some much more complex interaction, but blindly accepting the 'obvious' as certainly true is half way down the path to stupidity.

Media scare stories are full of poorly-designed experiments on tiny sample sizes, vague correlations being trotted out as if they were indisputable causes, and - more often than not - press releases from parties who are neither disinterested nor peer-reviewed. If you want to know the real science (or complete lack thereof) in an item of newspaper scare-mongering, try searching PubMed, or at the very least see of you can see whether there is any genuine, published, peer-reviewed original research to which the news story refers. If you don't, you'll never know whether the science behind the story is any good, because you can lay money on the chances that that journalist neither bothered or knew!