The missing heritability problem

By Marcus Munafo

Missing heritability has been described as genetic “dark matter”In my last post I described the transition from candidate gene studies to genome-wide association studies, and argued that the corresponding change in the methods used, focusing on the whole genome rather than on a handful of genes of presumed biological relevance, has transformed our understanding of the genetic basis of complex traits. In this post I discuss the reasons why, despite this success, we still have not accounted for all the genetic influences we expect to find.

As I discussed previously, genome-wide association studies (GWAS) have been extremely successful in identifying genetic variants associated with a range of disease outcomes – countless replicable associations have emerged over the last few years. Nevertheless, despite this success, the proportion of variability in specific traits accounted for so far is much less than what twin, family and adoption studies would lead us to expect. The individual variants identified are associated with a very small proportion of variance in the trait of interest (typically 0.1% of less), so that together they still only account for a modest proportion. Twin, family and adoption studies would lead us to expect that 50% or more of the variance in many complex traits is attributable to genetic influences, but so far we have found only a small fraction of that total. This has become known as the “missing heritability” problem. Where are the other genes? Should we be seeking common genetic variants of smaller and smaller effect, in larger and larger studies? Or is there a role for rare variants (i.e., those which occur with a low frequency in a particular population, typically a minor allele frequency less than 5%), which may have a larger effect?

It is clear that some missing heritability will be accounted for by variants that have not yet been identified via GWAS. Most GWAS genotyping chips don’t capture rare variants very well, but evolutionary theory predicts that those mutations that strongly influence complex phenotypes will tend to occur at low frequencies. Under the evolutionary neutral model, variants with these large effects are predicted to be rare. However, under the same model, while rare variants of large effect constitute the majority of causal variants, they still only contribute a small proportion of phenotypicvariance in a population, because they are rare. On the other hand, common variants of small effect contribute a greater overall proportion of variance. There are new methods which use a less stringent threshold for including variants identified via GWAS – instead of only including those that reach “genomewide significance” (i.e., a P-value < 10-8 – see my earlier post), those which reach a much more modest level of statistical evidence (e.g., P < 0.5) are included. This much more inclusive approach has shown that when considered together, common genetic variants do in fact seem to account for a substantial proportion of expected heritability.

In other words, complex traits, such as most disease outcomes but also those behavioural traits of interest to psychologists, are highly polygenic – that is, they are influenced by a very large number of common genetic variants of very small effect. This, in turn, explains why we have yet to reliably identify specific genetic variants associated with many psychological and behavioural traits – while the latest GWAS of traits such as height and weight (the GIANT Consortium) includes data on over 250,000 individuals, there exists no such collection of data on most psychological and behavioural traits. This situation is changing though – a recent GWAS of educational attainment combined data on over 125,000 individuals, and three genetic loci were identified with genomewide significance, although these were associated with very small effects (as we would expect). Excitingly, these findings have recently been replicated. Another large GWAS, this time of schizophrenia, identified 108 loci associated with the disease, putting this psychiatric condition on a par with traits such as height and weight in terms of our understanding of the underlying genetics.

The success of the GWAS method is remarkable – the recent schizophrenia GWAS, for example, has provided a number of intriguing new biological targets for further study. It should only be a matter of time (and sample size) before we begin to identify variants associated with personality, cognitive ability and so on. Once we do, we will understand more about the biological basis for these traits, and finally begin to account for the missing heritability.


Munafò, M.R., & Flint J. (2014). Schizophrenia: genesis of a complex disease. Nature, 511, 412-3.

Rietveld, C.A., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science340, 1467-71.



This blog first appeared on The Inquisitive Mind site on 18th October 2014.

Cochrane review says there’s insufficient evidence to tell whether fluoxetine is better or worse than other treatments for depression

Depression is common in primary care and associated with a substantial personal, social and societal burden. There is considerable ongoing controversy regarding whether antidepressant pharmacotherapy works and, in particular, for whom. One widely-prescribed antidepressant is fluoxetine (Prozac), an antidepressant of the selective serotonin reuptake inhibitors (SSRI) class. Although a number of more recent antidepressants are available, fluoxetine (which went off patent in 2001) remains highly popular and is commonly prescribed.

This systematic review and meta-analysis, published through the Cochrane Collaboration, compares the effects of fluoxetine for depression, compared with other SSRIs, tricyclic antidepressants (TCAs), selective noradrenaline reuptake inhibitors (SNRIs), monoamine oxidase inhibitors (MAOIs) and newer agents, as well as other conventional and unconventional agents. This is an important clinical question – different antidepressants have different efficacy and side effect profiles, but direct comparisons are relatively rare.


Thank goodness for systematic reviewers who read hundreds of papers and combine the results, so you don't have to

Thank goodness for systematic reviewers who read hundreds of papers and combine the results, so you don’t have to

The review focused on studies of adults with unipolar major depressive disorder (regardless of the specific diagnostic criteria used), searching major databases for studies published up to 11 May 2012.

All randomised controlled trials comparing fluoxetine with any other antidepressant (including non-conventional agents such as hypericum, also known as St John’s wort) were included. Both dichotomous (reduction of at least 50% on the Hamilton Depression Scale) and continuous (mean scores at the end of the trial or change score on depression measures) outcomes were considered.


A total of 171 studies were included in the analysis, conducted between 1984 and 2012 and comprising data on 24,868 participants.

A number of differences in efficacy and tolerability between fluoxetine and certain antidepressants were observed. However, these differences were typically small, so that the clinical meaning of these differences is not clear.

Moreover, the majority of studies failed to report detail on methodological procedures, and most were sponsored by pharmaceutical companies.

Both factors increase the risk of bias and overestimation of treatment effects.


The review

The review found sertraline and venlafaxine (and possibly other antidepressants) had a better efficacy profile than fluoxetine

The authors conclude that: “No definitive implications can be drawn from the studies’ results”.

There was some evidence for greater efficacy of sertraline and venlafaxine over fluoxetine, which may be clinically meaningful, but other considerations such as side-effect profile, patient acceptability and cost will also have a bearing on treatment decisions.

In other words, despite considerable effort and pooling all of the available evidence, we still can’t be certain whether one antidepressant is superior to another.

What this review really highlights is the ongoing difficulty in establishing whether some drugs are genuinely effective (and safe), because of publication bias against null results (Turner, 2008).

This situation is made worse when there are financial vested interests involved. Recently, there has been active discussion about how this problem can be resolved, for example by requiring pharmaceutical companies to release all data from clinical trials they conduct, irrespective of the nature of the findings.

Despite the mountains of trials published in this field, we still cannot say for sure which treatments work best for depression

Despite the mountains of trials published in this field, we still cannot say for sure which treatments work best for depression

Clinical decision making regarding the most appropriate medication to prescribe are complex, and made harder by the lack of direct comparisons. Moreover, the apparent efficacy of individual treatments may be inflated by publication bias. Direct comparisons between different treatments are therefore important, but remain relatively rare. This Cochrane Review provides very important information, even if only by highlighting how much we still don’t know about which treatments work best.


Magni LR, Purgato M, Gastaldon C, Papola D, Furukawa TA, Cipriani A, Barbui C. Fluoxetine versus other types of pharmacotherapy for depression. Cochrane Database of Systematic Reviews 2013, Issue 7. Art. No.: CD004185. DOI: 10.1002/14651858.CD004185.pub3.

Etchells, P. We don’t know if antidepressants work, so stop bashing them. The Guardian website, 15 Aug 2013.

Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008 Jan 17;358(3):252-60. doi: 10.1056/NEJMsa065779. [PubMed abstract]

This article first appeared on the Mental Elf website on 1st October 2013 and is posted by Marcus Munafo

“Doubt is our product…”

Cigarette smoking is addictive. Cigarette smoking causes lung cancer. Today these statements are uncontroversial, but it’s easy to forget that this was not the case until relatively recently. The first studies reporting a link between smoking and lung cancer appeared in the 1950’s (although scientists in Germany had reported a link earlier), while the addictiveness of tobacco, and the isolation of nicotine as the principal addictive constituent, was not established until some time later. Part of the reason for this is simply that scientific progress is generally slow, and scientists themselves are typically not the kind of people to get ahead of themselves.

However, another factor is that at every stage the tobacco industry has resisted the scientific evidence that has indicated the harms associated with the use of its products. One way in which it has done this is by suggesting that there is uncertainty around the core evidence base used to support tobacco control efforts. A 1969 Brown and Williamson document outlines this strategy: “Doubt is our product, since it is the best means of competing with the ‘body of fact’ [linking smoking with disease] that exists in the mind of the general public”.

This approach seeks to “neutralize the influence of academic scientists”, and has since been adopted more widely by other lobby groups. The energy industry has used a similar approach in response to consensus among climate scientists on the role of human activity in climate change. But what’s the problem? There are always a number of ways to interpret data, scientists will hold different theoretical positions despite being in possession of the same basic facts, people are entitled to their opinion… That’s fine, but the tobacco industry goes beyond this and actively misrepresents the facts. Why do I care? Because recently our research was misrepresented in this way…

There is ongoing debate around whether to introduce standardised packaging for tobacco products. Public health researchers mostly favour it, while the tobacco industry is opposed to it. No particular surprises there, but there’s a need for more research to inform the debate. We have done some research here in Bristol suggesting that standardised packs increase the prominence of health warnings in non-smokers and light smokers. Interestingly, we didn’t see this in regular smokers. This research contributed to the recent European Commission Tobacco Products Directive and the UK government consultation on standardised packaging. British American Tobacco (BAT) submitted a response to this consultation, which cited our research and said:

“The researchers concluded that daily smokers exhibited more eye movements towards health warnings when the pack was branded than when it was plain, but the opposite was true for non-smokers and non-daily smokers”.

We didn’t find that, and we didn’t say that. This isn’t a matter of interpretation or opinion – this is simple misrepresentation. What we actually concluded was:

“…among non-smokers and weekly … smokers, plain packaging increases visual attention towards health warning information and away from brand information. This effect is not observed among daily (i.e. established) cigarette smokers”.

In other words, standardised packaging increases the prominence of health warnings in non-smokers and light smokers, but don’t seem to have any effect in daily smokers. This is an important difference compared to how BAT represents this research. In their response to the consultation, BAT argues that “plain packaging may actually reduce smokers’ attention to warnings”. Of course it’s possible that there could be negative unintended consequences to standardised packaging, but there is no evidence in our study for this.

Why does this matter? Maybe it doesn’t – people get misrepresented all the time. But scientists produce data and ideas, the latter ideally based on the former, and so to misrepresent their conclusions is fundamentally distorting. Unfortunately this sort of thing happens all the time, including in media coverage of scientists’ work. This often makes scientists less willing to engage in important debates where they could make a valuable contribution. If this happens, then those with clear vested interests will succeed in removing valuable evidence from these debates. More importantly, this example illustrates why it’s vital that scientists do engage with the public and the media. Only by doing so can scientists make sure that their research is accurately represented, and that attempts to misrepresent their research are challenged.

As the health effects of smoking became apparent, successive governments acted to reduce the prevalence of smoking in the population. In the United Kingdom these efforts have been pretty successful – the overall prevalence of smoking is currently around 20%, down from a peak of over 50% in the 1950’s. This is due to restrictions on tobacco advertising, increases in taxation on tobacco products, and other tobacco control measures, as well as public health campaigns to increase awareness of the health consequences of tobacco use and greater availability of services to help people stop smoking. We want these policies to be evidence-based, and we don’t want this evidence to be knowingly distorted. Scientists have an important part to play in this.

Posted by Marcus Munafo @MarcusMunafo


Having confidence…

I’ve written previously about the problems associated with an unhealthy fixation on P-values in psychology. Although null hypothesis significance testing (NHST) remains the dominant approach, there are a number of important problems with it. Tressoldi and colleagues summarise some of these in a recent article.

First, NHST focuses on rejection of the null hypothesis at a pre-specified level of probability (typically 5%, or 0.05). The implicit assumption, therefore, is that we are only interested answering “Yes!” to questions of the form “Is there a difference from zero?”. What if we are interested in cases where the answer is “No!”? Since the null hypothesis is hypothetical and unobserved, NHST doesn’t allow us to conclude that the null hypothesis is true.

Second, P-values can vary widely when the same experiment is repeated (for example, because the participants you sample will be different each time) – in other words, it gives very unreliable information about whether a finding is likely to be reproducible. This is important in the context of recent concerns about the poor reproducibility of many scientific findings.

Third, with a large enough sample size we will always be able to reject the null hypothesis. No observed distribution is ever exactly consistent with the null hypothesis, and as sample size increases the likelihood of being able to reject the null increases. This means that trivial differences (for example, a difference in age of a few days) can lead to a P-value less than 0.05 in a large enough sample, despite the difference having no theoretical or practical importance.

The last point is particularly important, and relates to two other limitations. Namely, the P-value doesn’t tell us anything about how large an effect is (i.e., the effect size), or about how precise our estimate of the effect size is. Any measurement will include a degree of error, and it’s important to know how large this is likely to be.

There are a number of things that can be done to address these limitations. One is the routine reporting of effect size and confidence intervals. The confidence interval is essentially a measure of the reliability of our estimate of the effect size, and can be calculated for different ranges. A 95% confidence interval, for example, represents the range of values that we can be 95% confident that the true effect size in the underlying population lies within. Reporting the effect size and associated confidence interval therefore tells us both the likely magnitude of the observed effect, and the degree of precision associated with that estimate. The reporting of effect sizes and confidence intervals is recommended by a number of scientific organisations, including the American Psychological Association, and the International Committee of Medical Journal Editors.

How often does this happen in the best journals? Tressoldi and colleagues go on to assess the frequency with which effect sizes and confidence intervals are reported in some of the most prestigious journals, including Science, Nature, Lancet and New England Journal of Medicine. The results showed a clear split. Prestigious medical journals did reasonably well, with most selected articles reporting prospective power (Lancet 66%, New England Journal of Medicine 61%) and an effect size and associated confidence interval (Lancet 86%, New England Journal of Medicine 83%). However, non-medical journals did very poorly, with hardly any selected articles reporting prospective power (Science 0%, Nature 3%) or an effect size and associated confidence interval (Science 0%, Nature 3%). Conversely, these journals frequently (Science 42%, Nature 89%) reported P-values in the absence of any other information (such as prospective power, effect size or confidence intervals).

There are a number of reasons why we should be cautious when ranking journals according to metrics intended to reflect quality and convey a sense of prestige. One of these appears to be that many of the articles in the “best” journals neglect some simple reporting procedures for statistics. This may be for a number of reasons – editorial policy, common practices within a particular field, or article formats which encourage extreme brevity. Fortunately the situation appears to be improving – Nature recently introduced a methods reporting checklist for new submissions, which includes statistical power and sample size calculation. It’s not perfect (there’s no mention of effect size or confidence intervals, for example), but it’s a start…


Tressoldi, P.E., Giofré, D., Sella, F. & Cumming, G. (2013). High impact = high statistical standards? Not necessarily so. PLoS One, e56180.

Posted by Marcus Munafo

Health Technology Assessment report finds computer and other electronic aids can help people stop smoking

Smoking continues to be the greatest single preventable cause of premature illness and death in developed countries. Although rates of smoking have fallen, over 20% of the adult population in the UK continues to smoke. Anything which can be done to help people stop smoking will therefore have substantial public health benefits.

More and more people now have access to computers and other electronic devices (such as mobile ‘phones), and there is growing interest in whether these can be used to prompt or support attempts to stop smoking. This could be by providing a prompt to quit, reaching smokers who would otherwise use no support, and/or supporting the degree to which people use their smoking cessation medication (e.g., nicotine replacement therapy).

A recent Health Technology Assessment review assessed the effectiveness of internet sites, computer programs, mobile telephone text messages and other electronic aids for helping smokers to quit, and/or to reduce relapse to smoking among those who had quit.


The reviewers conducted a systematic review of the literature from 1980 to 2009 and found 60 randomised controlled trials (RCTs) and quasi-RCTs evaluating smoking cessation programmes that utilised computer, internet, mobile telephone or other electronic aids. The review was restricted to studies of adult smokers.

The primary outcomes were smoking abstinence, measured in two ways: Point prevalence abstinence and prolonged abstinence. The first is typically available in more studies (because it is easier to measure) but a rather liberal measure of abstinence (since the smoker need only be abstinent at the point the assessment is made to count as having quit). The latter is more conservative (since it requires the smoker to have been abstinent for an extended period to count as having quit), and is generally the preferred measure. Smoking abstinence at the longest follow-up available in each study was used, again because this is most conservative.


Combining the data from the 60 trials indicated that, overall, the use of computer and other electronic aids increased quit rates for both prolonged (pooled RR = 1.32, 95% CI 1.21 to 1.45) and point prevalence (pooled RR = 1.14, 95% CI 1.07 to 1.22) abstinence at longest follow-up,  compared with no intervention or generic self-help materials.

The authors also looked at whether studies which aided cessation differed from those which prompted cessation, and found no evidence of any difference in the effect size between these. The effectiveness of the interventions also did not appear to vary with respect to mode of delivery or the concurrent use non-electronic co-interventions (e.g., nicotine replacement therapies).


Computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small

The review concluded that computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small. However, even a small effect is likely to have important public health benefits, given the large number of people who smoke and the impact of smoking on health. The authors also note that uncertainty remains around the comparative effectiveness of different types of electronic intervention, which will require further study.

The authors argue that further research is needed on the relative benefits of different forms of delivery for electronic aids, the content of delivery, and the acceptability of these technologies for smoking cessation with subpopulations of smokers, particularly disadvantaged groups. More evidence is also required on how electronic aids developed and tested in research settings are applied in routine practice and in the community.


Chen YF, Madan J, Welton N, Yahaya I, Aveyard P, Bauld L, Wang D, Fry-Smith A, Munafò MR. Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis (PDF). Health Technol Assess 2012; 16(38): 1-205, iii-v. doi: 10.3310/hta16380.

This article first appeared on the Mental Elf website on 11th March 2013 and is posted by Marcus Munafo