An excellent paper published a few years ago, Sifting the Evidence, highlighted many of the problems inherent in significance testing, and the use of P-values. One particular problem highlighted was the use of arbitrary thresholds (typically P < 0.05) to divide results into “significant” and “non-significant”. More recently, there has been a lot of coverage of the problems of reproducibility in science, and in particular distinguishing true effects from false positives. Confusion about what P-values actually tell us may contribute to this.
It is often not made clear whether research is exploratory or confirmatory. This distinction is now commonly made in genetic epidemiology, where individual studies routinely report “discovery” and “replication” samples. That in itself is helpful – it’s all too common for post-hoc analyses (e.g., of sub-groups within a sample) to be described as having been based on a priori hypotheses. This is sometimes called HARKing (Hypothesising After the Results are Known), which can make it seem like results were expected (and therefore more likely to be true), when in fact they were unexpected (and therefore less likely to be true). In other words, a P-value alone is often not very informative in telling us whether an observed effect is likely to be true – we also need to take into account whether it conforms with our prior expectations.
One way we can do this is by taking into account the pre-study probability that the effect or association being investigated is real. This is difficult of course, because we can’t know this with certainty. However, what we perhaps can estimate is the extent to which a study is exploratory (the first to address a particular question, or use a newly-developed methodology) or confirmatory (the latest in a long series of studies addressing the same basic question). Broer et al (2013) describe a simple way to take this into account and increase the likelihood that a reported finding is actually true. Their basic point is that the likelihood that a claimed finding is actually true (which they call the positive predictive value, or PPV) is related to three things: the prior probability (i.e., whether the study is exploratory or confirmatory), the statistical power (i.e., the probability of finding an effect if it really exists), and the Type I error rate (i.e., the P-value or significance threshold used). We have recently described the problems associated with low statistical power in neuroscience (Button et al., 2013).
What Broer and colleagues show is that if we adjust the P-value threshold we use, depending on whether a study is exploratory or confirmatory, we can dramatically increase the likelihood that a claimed finding is true. For highly exploratory research, with a very low prior probability, they suggest a P-value of 1 × 10-7. Where the prior probability is uncertain or difficult to estimate, they suggest a value of 1 × 10-5. Only for highly confirmatory research, where the prior probability is high, do they suggest that a “conventional” value of 0.05 is appropriate.
Psychologists are notorious for having an unhealthy fixation on P-values, and particularly the 0.05 threshold. This is unhelpful for lots of reasons, and many journals now discourage or even ban the use of the word “significant”. The genetics literature that Broer and colleagues draw on has learned these lessons from bitter experience. However, if we are going to use thresholds, it makes sense that these reflect the exploratory or confirmatory nature of our research question. Fewer findings might pass these new thresholds, but those that do will be much more likely to be true.
References:
Broer L, Lill CM, Schuur M, Amin N, Roehr JT, Bertram L, Ioannidis JP, van Duijn CM. (2013). Distinguishing true from false positives in genomic studies: p values. Eur J Epidemiol; 28(2): 131-8.
Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci; 14(5): 365-76.
Posted by Marcus Munafo and thanks to Mark Stokes at Oxford University for the ‘Statistical power is truth power’ image.