Genome-wide association studies (GWAS) are getting old, turning ten next year. From 2014, we can now look back and identify the mood swings we faced during our particular gold-rush for common genetic variants associated to disease. The ‘GWAS era’ (2005-?) is starting to look even boring, indeed! We now know what to expect (lesson 1: whenever a few thousands of affected people are GWASed, a few (but not a lot) common disease SNPs will be found) but what not to expect too! (lesson 2: those new associated SNPs will not explain much of the genetic susceptibility to the disease, aka heritability). These lessons were a bit shocking (not for everybody!). They ended up being known as the missing heritability case, thanks to a 2008 piece in Nature that became very famous (those pictures!).
This what-not-to-expect learning was key to realize about the limitations of GWAS, and the need to look beyond (see a nice review here). This is why I decided to start the year with a new study that resembles a new what-not-to-expect learning moment for the community (mostly those that hold strong hopes about the role of rare variants to explain the missing heritability ). In a paper entitled Whole-Exome Sequencing of 2,000 Danish Individuals and the Role of Rare Coding Variants in Type 2 Diabetes, Kirk Lohmueller and colleagues scanned the exomes of a bunch of Danish people, looking for rare variants associated to type 2 diabetes.. But found…none.
A time-limited reader may feel tempted to stop reading after that spoiler. No new ‘diabetes’ SNPs. Period. No new ‘diabetes’ genes. Period. It would seem that rare variants won’t rescue us, at least with sample sizes in the few thousands of individuals. Then, why to enjoy this paper? Easy. Not because of the negative result, but because of the consequences of that result (as it is cleverly shown by the authors).
It is a long paper, a brute-force attack on how to call, trust, and analyze rare variants in an exome study. Let me summarize it by means of three of the five-W’s: who? what? why?
1. Who is it about?
There are several who-s to talk about. First, the phenotype: type 2 diabetes, with up to 63 loci known by GWAS, together explaining ~20% of the genetic susceptibility (6% of the 30% heritability). Second, the people: ~1,000 Danish diabetic people, hypertensive and at least a bit overweight (BMI > 27.5), versus ~1,000 more Danish people who do not show any insulin resistance, overweight, or hypertension. Third, the kind of SNPs they check: genetic variants that lie in the exonic regions, and are rare in Denmark (MAF<5%).
2. What happened?
They sequence up to 82 Mb of the genome (only 30 to 40 Mb are strictly exonic). After an impressive methodological curation (with two appendix about variant calling and filtering), they detect ~3 million polymorphic positions (1 in every 30 bp; It still amazes me that each of us carries >10,000 non-synonymous positions in our exome!). Next, they check if any of the individual variants associates with the phenotype. They also check up to 7 methods that collapse all variants within each gene, which lowers the number of tests to ~15,000 (instead of hundreds of thousands of variants). But the result is 100% negative. No SNP and no gene shows a pvalue lower than what is expected after testing so many variants. You know, the typical bloodshed after Bonferroni…
This is when things become interesting again. After the negative results, the authors perform an easy but clever power study. The question is not anymore “is there any gene with rare variants associated to diabetes?” but (with dramatic mode ON)… “not a single associated gene?! Does this mean that rare variants do not explain the missing heritability for diabetes?!“. This very same idea has been applied in abundance after GWAS results, such as by Park et al. in 2010. It can be easily summarized within a Plato’s allegory perspective. There are the shadows (“crap, not a single associated gene“), there is the reality we wanna discover (“how many genes contain rare variant for diabetes“), and there is the candle (“given that we found none, can we make a guess such a number?“). They create different architectures of the disease, build so that the missing heritability for diabetes is equally spread across a number of genes (15, 20, 500…). Each of these fake genes is assigned fake rare SNPs (mimicking the number of polymorphism present in their real data). Finally, every fake SNP (associated to the fake disease!) is assigned an effect size so that its gene explains its bit of heritability.
In short, they test several models in which rare variants explain the missing heritability while being spread across a number of genes. As shown in the figure, they conclude that a few genes at Bonferroni (light blue line) would be observed if the missing heritability was due to rare variants spread across few genes (e.g. <50). That is, rare variants may still explain the susceptibility to diabetes, but they must be spread across hundreds of genes. It certainly sounds familiar…
3. Why did it happen?
From an evolutionary point of view, it makes sense to test rare variants (if they have been detrimental during human evolution, puryfying selection would have contained them at low frequencies). Besides, rare variants are not covered by the genotyping arrays that are used in GWAS, and nobody had tested them before… We know of somebody who had themoney to do so (2,000 exomes at ~1,000$ per exome, 2 million dollars overall?).
Me, I don’t have big complaints about the paper. I would have loved a qqplot restricted to the 63 diabetes GWAS genes (more after the evidence on GWAS loci harboring rare variants, such as here). I would have also loved a discussion about their diabetic people (why to look for overweight and hypertensive patients? is it difficult to sample lean people with type 2 diabetes?). And finally, I would have loved that these results would have included in the evidence used by this recent Nature Genetics paper (maybe the most beautiful paper I have read in years!). Certainly, it is a matter of time that we see exome studies with several tens of thousands of people, further narrowing the range of genetic architectures of diabetes that remain posible. Could it be the start point towards a scenario in which not even the largest studies are powered to detect rare variants?
The show will go on…