Shifting the Evidence

An excellent paper published a few years ago, Sifting the Evidence, highlighted many of the problems inherent in significance testing, and the use of P-values. One particular problem highlighted was the use of arbitrary thresholds (typically P < 0.05) to divide results into “significant” and “non-significant”. More recently, there has been a lot of coverage of the problems of reproducibility in science, and in particular distinguishing true effects from false positives. Confusion about what P-values actually tell us may contribute to this.

It is often not made clear whether research is exploratory or confirmatory. This distinction is now commonly made in genetic epidemiology, where individual studies routinely report “discovery” and “replication” samples. That in itself is helpful – it’s all too common for post-hoc analyses (e.g., of sub-groups within a sample) to be described as having been based on a priori hypotheses. This is sometimes called HARKing (Hypothesising After the Results are Known), which can make it seem like results were expected (and therefore more likely to be true), when in fact they were unexpected (and therefore less likely to be true). In other words, a P-value alone is often not very informative in telling us whether an observed effect is likely to be true – we also need to take into account whether it conforms with our prior expectations.


One way we can do this is by taking into account the pre-study probability that the effect or association being investigated is real. This is difficult of course, because we can’t know this with certainty. However, what we perhaps can estimate is the extent to which a study is exploratory (the first to address a particular question, or use a newly-developed methodology) or confirmatory (the latest in a long series of studies addressing the same basic question). Broer et al (2013) describe a simple way to take this into account and increase the likelihood that a reported finding is actually true. Their basic point is that the likelihood that a claimed finding is actually true (which they call the positive predictive value, or PPV) is related to three things: the prior probability (i.e., whether the study is exploratory or confirmatory), the statistical power (i.e., the probability of finding an effect if it really exists), and the Type I error rate (i.e., the P-value or significance threshold used). We have recently described the problems associated with low statistical power in neuroscience (Button et al., 2013).

What Broer and colleagues show is that if we adjust the P-value threshold we use, depending on whether a study is exploratory or confirmatory, we can dramatically increase the likelihood that a claimed finding is true. For highly exploratory research, with a very low prior probability, they suggest a P-value of 1 × 10-7. Where the prior probability is uncertain or difficult to estimate, they suggest a value of 1 × 10-5. Only for highly confirmatory research, where the prior probability is high, do they suggest that a “conventional” value of 0.05 is appropriate.

Psychologists are notorious for having an unhealthy fixation on P-values, and particularly the 0.05 threshold. This is unhelpful for lots of reasons, and many journals now discourage or even ban the use of the word “significant”. The genetics literature that Broer and colleagues draw on has learned these lessons from bitter experience. However, if we are going to use thresholds, it makes sense that these reflect the exploratory or confirmatory nature of our research question. Fewer findings might pass these new thresholds, but those that do will be much more likely to be true.


Broer L, Lill CM, Schuur M, Amin N, Roehr JT, Bertram L, Ioannidis JP, van Duijn CM. (2013). Distinguishing true from false positives in genomic studies: p values. Eur J Epidemiol; 28(2): 131-8.

Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci; 14(5): 365-76.

Posted by Marcus Munafo and thanks to Mark Stokes at Oxford University for the ‘Statistical power is truth power’ image.

Health Technology Assessment report finds computer and other electronic aids can help people stop smoking

Smoking continues to be the greatest single preventable cause of premature illness and death in developed countries. Although rates of smoking have fallen, over 20% of the adult population in the UK continues to smoke. Anything which can be done to help people stop smoking will therefore have substantial public health benefits.

More and more people now have access to computers and other electronic devices (such as mobile ‘phones), and there is growing interest in whether these can be used to prompt or support attempts to stop smoking. This could be by providing a prompt to quit, reaching smokers who would otherwise use no support, and/or supporting the degree to which people use their smoking cessation medication (e.g., nicotine replacement therapy).

A recent Health Technology Assessment review assessed the effectiveness of internet sites, computer programs, mobile telephone text messages and other electronic aids for helping smokers to quit, and/or to reduce relapse to smoking among those who had quit.


The reviewers conducted a systematic review of the literature from 1980 to 2009 and found 60 randomised controlled trials (RCTs) and quasi-RCTs evaluating smoking cessation programmes that utilised computer, internet, mobile telephone or other electronic aids. The review was restricted to studies of adult smokers.

The primary outcomes were smoking abstinence, measured in two ways: Point prevalence abstinence and prolonged abstinence. The first is typically available in more studies (because it is easier to measure) but a rather liberal measure of abstinence (since the smoker need only be abstinent at the point the assessment is made to count as having quit). The latter is more conservative (since it requires the smoker to have been abstinent for an extended period to count as having quit), and is generally the preferred measure. Smoking abstinence at the longest follow-up available in each study was used, again because this is most conservative.


Combining the data from the 60 trials indicated that, overall, the use of computer and other electronic aids increased quit rates for both prolonged (pooled RR = 1.32, 95% CI 1.21 to 1.45) and point prevalence (pooled RR = 1.14, 95% CI 1.07 to 1.22) abstinence at longest follow-up,  compared with no intervention or generic self-help materials.

The authors also looked at whether studies which aided cessation differed from those which prompted cessation, and found no evidence of any difference in the effect size between these. The effectiveness of the interventions also did not appear to vary with respect to mode of delivery or the concurrent use non-electronic co-interventions (e.g., nicotine replacement therapies).


Computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small

The review concluded that computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small. However, even a small effect is likely to have important public health benefits, given the large number of people who smoke and the impact of smoking on health. The authors also note that uncertainty remains around the comparative effectiveness of different types of electronic intervention, which will require further study.

The authors argue that further research is needed on the relative benefits of different forms of delivery for electronic aids, the content of delivery, and the acceptability of these technologies for smoking cessation with subpopulations of smokers, particularly disadvantaged groups. More evidence is also required on how electronic aids developed and tested in research settings are applied in routine practice and in the community.


Chen YF, Madan J, Welton N, Yahaya I, Aveyard P, Bauld L, Wang D, Fry-Smith A, Munafò MR. Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis (PDF). Health Technol Assess 2012; 16(38): 1-205, iii-v. doi: 10.3310/hta16380.

This article first appeared on the Mental Elf website on 11th March 2013 and is posted by Marcus Munafo

Why I applied for an internship at the Behavioural Insights Team

Hi, and welcome to TARG’s shiny new blog! I’m Olivia Maynard and I’m a final year PhD student. I’ve recently found out that for three months over the summer, I will be working as a Research Fellow in the Behavioural Insights Team, part of the UK government Cabinet Office . I thought that I would use my first blog post to tell you a bit about the team I’ll be working in and why I applied for the job.

The Behavioural Insights Team’s aim is to ‘find innovative ways of encouraging, enabling and supporting people to make better choices for themselves’. Since its creation in 2010, the team has claimed among its many successes: encouraging more people to pay their income tax, saving energy by promoting loft insulation, helping more people into work and making government savings of 22 times the team’s cost.

The secret behind the team’s success is its reliance on ‘nudges’, which are anything that ‘alters people’s behaviour in a predictable way without forbidding any options or significantly changing their economic incentives’. It is therefore known as the ‘Nudge Unit’, and uses these nudges to help inform and design effective policy interventions. To use the example of the loft insulation scheme, the team realised that one of the greatest barriers to people insulating their lofts was the amount of junk they had stored up there. In a trial where residents of two London Boroughs were offered loft insulation with or without a subsidised loft clearance scheme, those offered the loft clearance were four times more likely to get their lofts insulated. In this case, removing the ‘hassle factor’ nudged people in the direction of saving both energy and money in the long-run.

The team also hope to make inroads into the nation’s health problems, with plans to encourage organ donation by changing the current opt-in scheme to an opt-out scheme, promoting healthy eating by placing signs at supermarket checkouts detailing the amount of fruit and vegetables the average shopper buys, and helping people to quit smoking through schemes offering rewards to those who sign a contract stating their commitment to quit. In addition to using these novel techniques to encourage behaviour change, the team have pioneered the use of rigorous randomised controlled trial methodology to assess these interventions.

My main motivation for wanting to work in this team stems from my strong interest in public health and the important interplay I see between research and policy. Now in the third and final year of my PhD, I’ve been using objective experimental techniques, such as eye-tracking and brain imaging to directly assess, for the first time, the likely impact of standardised tobacco packaging on behaviour. As an academic researcher, it’s all too easy to lock yourself in your windowless lab and ignore the outside world. However, I have learnt the value and importance of engaging with policy makers throughout my PhD, and as a result, my research has been used by a number of governments and by the European Union to inform their tobacco control policies. I hope that by working within the Behavioural Insights Team, I will gain a greater understanding of how evidence is actually used by governments to inform policy and I hope to be able to come back to research with a fresh perspective on how to engage and collaborate with policy makers.

Another motivation is that I want to try something new. Although I’ve enjoyed working in such an exciting field over the course of my PhD, the very nature of a PhD means that you spend three or four years focussing all of your attention on one particular idea or project. Now I’m approaching the home-stretch, I’m looking to expand my research interests and develop myself as an independent researcher. The three month internship will involve working collaboratively across government and local authorities, writing briefings for members of government on how behavioural science can inform policy and also designing, managing and analysing the data from a variety of policy intervention trials. I hope that this experience will provide me with new ideas for research when I return to my PhD in October.

Welcome to the TARG blog

Hi there everyone!

I’m Suzi Gage, a PhD student in TARG, and an avid blogger. I have a science blog called Sifting the Evidence, which is on the Guardian website. I love writing about science, for a variety of reasons. I believe that since most research conducted in Universities is carried out using money which has ultimately come from the public, we as researchers have a duty to share any results we find. This can be hard due to journals sometimes having paywalls, meaning research isn’t freely available. Also, academic papers are often written in dry technical language which can be confusing or boring to read.

Blogging is a great way of sharing our findings with those people who are interested in what we get up to. We intend to use this TARG blog to do just this, as well as writing posts more generally about the type of research we do, or background summaries of areas of research we are interested in.

If there’s anything you’d like us to cover, do let us know.

A first post will be up soon. Enjoy!