Correlation does not equal causation PDF

Title Correlation does not equal causation
Course Foundations of statistics
Institution Swinburne University of Technology
Pages 3
File Size 106.3 KB
File Type PDF
Total Downloads 14
Total Views 144

Summary

Correlation does not equal causation. Useful reminder to keep around for future reference. Useful for exams and assignments...


Description

7.5 Correlation Doesn’t Always Mean Causation When we find an association between two variables in an observational study, we can’t be sure why the variables are related. It might be because changes in one of the variables are causing changes in the other. On the other hand, it might be the influence of some nuisance variable – something other than the two variables we’ve explored. One variable is causing changes in the other (but which variable is the cause and which the effect?)

Suppose we are interested in the relationship between weight and the amount of exercise people do. Suppose we measured the weight of a random sample of adults and monitored how much exercise they performed in a week. We might find a negative relationship between amount of exercise and weight – people who do more exercise tend to weigh less. This would be an observational study; we haven’t manipulated how much exercise the participants performed, we’ve just observed them in their normal routine. It’s possible that amount of exercise has a direct effect on weight – because someone is exercising more they weigh less. But we can’t be sure that it’s the variation in exercise which is causing the variation in weight. It could be the other way around. Maybe people who are heavier have less energy, and that’s why they do less exercise. Simply observing that two variables are associated with each other doesn’t tell us which one (if either) is causing the changes. A study was done on advertising and occupancy rates in Motels. Proprietors were asked how much they spent on advertising last week, and what their occupancy rate was last week.9 Researchers were initially surprised to find a negative relationship between advertising expenditure and occupancy rates. What is happening here? Surely increasing spending on advertising would result in an increase in occupancy rates? Actually, this has the cause and the effect the wrong way around. The proprietors with higher occupancy rates last week didn’t bother to spend any money on advertising, while those with lower occupancy tended to spend more on advertising in an attempt to attract more customers. It was the occupancy rate which was causing the variation in advertising expenditure, rather than the other way around. What about the following fairly trivial example. There is a positive correlation between sales of cough lollies and sales of firewood. What could explain this association? There are three possibilities here: • • •

Smoke from fires makes people cough, so they buy cough lollies – not very likely. Sucking on cough lollies makes people want to sit in front of a nice cosy fire – even less likely. There is a third variable which is responsible for both the variation in the sales of firewood and the variation in the sales of cough lollies.

The last alternative seems by far the most likely. The season is likely to account for variation in both sets of sales figures. In winter people tend to buy more firewood and there also tend to be more coughs and colds, so they tend to buy more cough lollies. The correlation between sales of firewood and sales of cough lollies is spurious – that is, it just reflects some other factor (the season). Spurious relationships (both variables are affected by some nuisance variable)

9

See Utts, J., Seeing through Statistics, 3rd Ed 2005. Brooks/Cole, Belmont.

There are a few different ways in which nuisance variables can explain a relationship. Firstly, the nuisance variable might be causing changes in both the independent and the dependent variables, as in the firewood example. Here’s another example of this type of effect. A study showed a correlation between the number of managers in business organisations and the amount of fraud. Can we conclude that managers are causing the fraud? Not necessarily. If we reduce the number of managers, will this lead to a reduction in fraud? – again, not necessarily – in fact not at all likely. Perhaps the cause and effect goes the other way around – maybe increases in fraud increases the number of managerial staff. Actually, a third explanation is far more feasible. Both the number of managers in the organisation and the amount of fraud both reflect another underlying variable – the size of the organisation. Larger organisations tend to have more managers and they also tend to have more fraud. The correlation between number of managers and amount of fraud is totally spurious. A nuisance variable which is associated with the independent variable is causing the changes in the dependent variable.

Early studies on factors affecting the risk of lung cancer identified a strong association between number of cups of coffee and incidence of lung cancer. People who drank more coffee had a much greater risk of developing lung cancer. Initially researchers concluded that something in the coffee was causing lung cancer, but further research showed a confounding variable. There was an association between drinking coffee and smoking cigarettes. The sort of people who tended to drink a lot of coffee also tended to smoke, and as more recent studies have confirmed, it was the cigarette smoking which lead to the higher risk of lung cancer. This highlights the need to be very cautious when interpreting observational studies. The initial study showed an association between amount of coffee consumed and risk of lung cancer. If researchers had suggested that reducing the amount of coffee would substantially reduce the risk of lung cancer, this would have been a very misleading recommendation.

The independent variable is acting indirectly through some mediating variable.

Suppose we found a correlation between seniority in an organisation and work related stress. More senior staff tend to report more work related stress. We might be interested in knowing why the more senior staff feel more stressed. Perhaps we could identify some mediating variable that would help explain the relationship. For example, maybe more senior staff operate under more time pressure and that leads to more stress. Or maybe more senior staff have more financial responsibility and that leads to more stress. Exploring why seniority and stress are related could lead to useful recommendations for reducing stress, especially amongst the more senior staff where stress levels are highest. Be wary not to assume causation.

Drawing causal conclusions from observational studies is one of the most common errors in newspaper reports of statistical studies. You need to be aware of all of the possible reasons why two variables might be correlated with each other. Consider one more example. Suppose a study finds a positive correlation between the length of visit with the doctor and patient satisfaction with the consultation. Dr. Crosspatch finds that his patients are not very satisfied with his consultations. Should we recommend that he spend longer with each patient? Not necessarily. Finding a correlation doesn’t mean that it is the length of consultation itself which directly affects the level of satisfaction. It could be that doctors who spend longer with their patients tend to spend more time listening to what their patients are saying and it is patient involvement which leads to greater satisfaction. If Dr. Crosspatch simply increases the time he spends giving his patients detailed clinical information, and doesn’t spend any more time listening to what they have to say, their level of satisfaction won’t improve....


Similar Free PDFs