Review Article: GWAS and Missing Heritability

cacm-coverA couple of years ago I was asked to write a review article on the progress of my field (computational genetics) targeted toward computer scientists. My article “Discovering Genes Involved in Disease and the Mystery of Missing Heritability” was just published on the cover of the Communications of the ACM. This article is written to be an introduction to the field as well as describe the rapid progress over the past decade in terms of the discovery of large number of variants involved in common human diseases. The article is written assuming no background in biology and is designed to be accessible to researchers and students outside the field. I hope that it will encourage other computational researchers to get involved in genetics.  The journal also made a video highlighting this article which is available here:

Discovering Genes Involved in Disease and the Mystery of Missing Heritability from CACM on Vimeo.

The full citation to the article is:

Sorry, no publications matched your criteria.

Emrah Kostem’s talk about his research

Emrah Kostem, who graduated this year and is now at Illumina, gave a talk about the research he completed in the lab this summer at our retreat.  It is available here and gives a good overview of what the goals of our group are and some details of the projects that Emrah completed in the lab.

One of the topics he discusses is his recently published work on estimating heritability, which is quantifying the amount that genetics accounts for the variance of a trait.  He discusses his work on how to partition heritability into the contributions of genomic regions(10.1016/j.ajhg.2013.03.010).

He also talks about his work which takes advantage of the insight that association statistics follow the multivariate normal distribution and applies this to two problems.  The first is the problem of selecting follow up SNPs using the results of an association study(10.1534/genetics.111.128595).  The second problem is the problem of speeding up eQTL studies using a two stage approach where only a fraction of the association tests are performed but virtually all of the significant associations are still discovered(10.1089/cmb.2013.0087).

Details of what he talked about are in his papers:

Sorry, no publications matched your criteria.


How much does part of a genome contribute to a trait?

Both genetic and environmental factors contribute to a trait.  The genetic factors which contribute to a trait are typically spread over the genome.  Emrah Kostem in our group recently published a paper on estimating how much a specific genomic region (such as a single chromosome) contributes to a trait(10.1016/j.ajhg.2013.03.010) and released a software for performing this analysis called HEIDI which is available at  This type of analysis is referred to as “partitioning heritability into the contributions of genomic regions.”

Estimating the heritability of a trait, e.g., measuring the influence of nature vs. nurture, has been a fundamental question in genetics. Traditionally, heritabilities were estimated using related individuals with known pedigrees such as twins or family cohorts. With the availability of high-throughput genomic technologies, it has been shown that heritabilities to those similar to the traditionally estimated can be obtained from genome-wide association study (GWAS) datasets utilizing unrelated individuals(10.1038/ng.608). In these approaches, the genetic similarities, or kinships, among the individuals are computed from the observed spectrum of the SNPs rather than inferring them from a given pedigree data.

Additionally, high-throughput SNP data makes it also possible to estimate local genetic similarities, which has recently been used to partition the heritability of a trait into the contributions of genomic regions(10.1038/ng.823). A naive approach estimates the heritability contributions using a linear mixed model (LMM) approach, where each region is modeled using a separate variance component.

We presented a method called HEIDI (Heritability Estimations Distributed) to improve the accuracy and computational efficiency of partitioning the heritability of a trait into the contributions of genomic regions. We show that the naive approach is not accurate for large number of regions and also does not scale for more than several partitions per chromosome in a study with 5000 individuals. We proposed an alternative approach, where the heritability contribution of a region is obtained using a model that includes the region and its genetic complement, or the rest of the genome. The advantage of using a two-component model is that it is computationally efficient and fast to fit. Additionally, it also makes it possible to parallelize the heritability estimations, where the computation of each region can be performed separately across computers.

We show the estimates of heritability contributions is inflated when the region and its genetic complement have SNPs that are in linkage disequilibrium (LD) and introduce a normalization procedure to mitigate the effect of LD. We normalize the contributions of the chromosomes such that their sum equals to the genome-wide heritability estimate and in each chromosome the regions’ contributions are normalized that sum up to the chromosome contribution.

The full citation to the paper is:

Sorry, no publications matched your criteria.