Fine Mapping Causal Variants and Allelic Heterogeneity

On Friday, April 28, 2017, in the CNSI Auditorium, Eleazar Eskin presented ZarLab’s research on fine mapping causal variants and allelic heterogeneity at the 2nd Annual Institute for Quantitative and Computational Biosciences (QCBio) Symposium.

Geneticists use a technique called Genome Wide Association Studies (GWAS) to identify genetic variants that cause an individual to exhibit a particular trait or disease. Typically, GWAS identifies an association signal which suggests that genetic variants within a region of the genome — known as a locus —  are associated with the condition. The process of identifying the actual variant in the region which has an affect on the disease is referred to as “fine mapping.”

In addition to finding the actual variants affecting a disease, fine mapping also seeks to address questions that are related to the genetic basis of disease. First, how many causal variants does a locus contain? A disease could be caused by one, single variant or multiple variants that independently affect disease status. We refer to the latter phenomenon as allelic heterogeneity (AH).

Second, when analyzing results from multiple GWASes, can the same causal variant identified in one study be assumed causal in other studies? A GWAS can identify many variants that are associated with two or more traits; however, this correlation can be induced by a confounding factor known as linkage disequilibrium. Colocalization methods seek to identify shared and distinct causal variants.

Farhad Hormozdiari, a recent alumnus of our group and a post-doc at Harvard University, developed several novel approaches for improving the accuracy and efficiency of fine mapping despite presence of AH in the study population. Hormozdiari’s software, CAVIAR, CAVIAR-Genes, and eCAVIAR, are capable of quantifying the probability of a variant to be causal in GWAS and eQTL studies, while allowing for an arbitrary number of causal variants.

In a video of his presentation, Eskin summarizes the progress on these problems.  A video of Eskin’s presentation may be found on the QCBio website:

More details about our research in fine mapping are available in the following papers:

Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar

Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article

In: Am J Hum Genet, 2016, ISSN: 1537-6605.

Abstract | Links | BibTeX

Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun Y; Pasaniuc, Bogdan; Eskin, Eleazar

Identification of causal genes for complex traits. Journal Article

In: Bioinformatics, 31 (12), pp. i206-i213, 2015, ISSN: 1367-4811.

Abstract | Links | BibTeX

Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar

Identifying causal variants at Loci with multiple signals of association. Journal Article

In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631.

Abstract | Links | BibTeX

Hormozdiari F, Zhu A, Kichaev G, Ju CJ, Segrè AV, Joo JW, Won H, Sankararaman S, Pasaniuc B, Shifman S, Eskin E. Widespread allelic heterogeneity in complex traits. The American Journal of Human Genetics. 2017 May 4;100(5):789-802.

Widespread Allelic Heterogeneity in Complex Traits

This week, our group published a paper in the American Journal of Human Genetics that presents a new computational method for improving the accuracy of genome wide association studies. ZarLab alumni Farhad Hormozdiari (PhD, 2016) developed the method, CAVIAR (CAusal Variants Identification in Associated Regions), a statistical framework that quantifies the probability of each variant to be causal while allowing an arbitrary number of causal variants.

Genome-wide association studies (GWASs) identify genetic variants associated with diseases and traits. Recent successes in GWASs make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. A more comprehensive understanding of these aspects will guide the development of new methods for fine mapping and association mapping of complex traits—and the discovery of new biomarkers for disease diagnosis and treatment.

One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH). Allelic heterogeneity occurs when different mutations at the same locus affects the same phenotype. AH is very common in Mendelian traits, but we know little about the extent to which AH contributes to common, complex disease. Undetected AH could potentially bias results of an association study, leading to false positive results.

Levels of Allelic Heterogeneity in eQTL Studies. For more information, see our paper.

In order to take AH into account while conducting a GWAS, we developed a computational method to infer the probability of AH. Our method quantifies the number of independent causal variants at a locus that can be responsible for the observed association signals detected in a GWAS. Our method is incorporated into the CAVIAR approach, and it is based on the principle of jointly analyzing association signals (i.e., summary level Z-scores) and LD structure in order to estimate the number of causal variants.

Our results show that our method is more accurate than the standard conditional method (CM). We applied our novel method to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of the presence of AH. The proportion of all loci with identified AH is 4%–23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH, indicating that statistical power prevents identification of AH in other loci.

One of the main benefits of our method is that it requires only summary statistics. Summary statistics of a GWAS or eQTL study are widely available, so our method is applicable to most existing datasets. We have shown that AH is widespread and more common than previously estimated in complex traits, both in GWASs and eQTL studies.

Our results highlight the importance of accounting for the presence of multiple causal variants when characterizing the mechanism of genetic association in complex traits. Falling to account for AH can reduce the power to detect true causal variants and can explain the limited success of fine mapping of GWASs.

In a related study, researchers at University of California, Irvine, and University of Kansas, identified an analogous signal in eQTLs from genetic sequencing of flies. King et al. (2014) observe that the vast majority of genes with eQTL are more consistent with heterogeneity than bi-allelism. Read more about this related study, “Genetic Dissection of the Drosophila melanogaster Female Head Transcriptome Reveals Widespread Allelic Heterogeneity.”

CAVIAR was created by Farhad Hormozdiari, Emrah Kostem, Eun Yong Kang, Bogdan Pasaniuc and Eleazar Eskin. Software is freely available for download:

For more information, see our full paper, which can be accessed through AJHG

The full citation of our paper:
Hormozdiari F, Zhu A, Kichaev G, Ju CJ, Segrè AV, Joo JW, Won H, Sankararaman S, Pasaniuc B, Shifman S, Eskin E. Widespread allelic heterogeneity in complex traits. The American Journal of Human Genetics. 2017 May 4;100(5):789-802.

Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes

Farhad Hormozdiari and Eleazar Eskin recently applied an extension of CAVIAR to assess signal selection in European ancestry. CAVIAR is a probabilistic method for detecting a confidence set of SNPs containing all the causal variants in a locus that are within a predefined probability (e.g., 90% or 95%)—while taking into account biases generated by linkage disequilibrium. Farhad, now a post-doctoral scholar at Boston University, developed CAVIAR while a PhD student at UCLA.

This project was led by Matthew T. Buckley and Fernando Racimo at the University of California, Berkeley, and Morten E. Allentoft at the University of Copenhagen. Alleles with strong selection signals have been recently selected for and are thought to carry an evolutionary advantage for individuals in the population. Identifying these alleles helps expand our understanding of the selective pressures that shaped historic populations.

Allele frequency changes across FADS region. For more information, see our full paper.

In order to analyze the selective processes in Europeans across space and time, the project compared sequencing data from FADS genes obtained from present-day and Bronze Age (5000 to 3000 years ago) Europeans. We focused on FADS genes because prior studies indicate they are subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. In other words, selective pressure in the FADS genes may be linked to dietary adaptations.

Other analyses conducted by the project show that alleles in the FAD2 gene display the strongest changes in allele frequency since the Bronze Age, and this change shows associations with expression changes and multiple lipid-related phenotypes. Farhad and Eleazar used CAVIAR to look for presence of allelic heterogeneity, an adaptive process in which different mutations at the same locus cause the same phenotype. In an evolutionary context, presence suggests that a strong pressure selective pressure likely acted upon the population.

Application of CAVIAR to genomic data from the 1000 Genomes Project and 54 Bronze Age Europeans revealed that specific causal variants within the FADS2 gene have been subjected to selective pressure. In particular, FADS2 shows evidence of allelic heterogeneity in three tissue types: transformed fibroblast cells (Pr(2 causal variants) = 0.72), left heart ventricle (Pr(2 causal variants) = 0.74), and whole blood (Pr(3 causal variants) = 0.74).

The project’s comparison of modern to Bronze Age European genomic data show that selection has indeed strongly acted on the FADS gene cluster over the past 3000 years. The selective patterns observed in European data may be driven by a change in the dietary composition of fatty acids following the human transition from hunting-and-gathering to agriculture. As Europeans obtained more lipids from plants, rather than from fish and mammals, their genes adapted to optimize metabolism of these cereal-based lipids.

For more information, see our paper, which is available for download through Molecular Biology and Evolution:

The full citation to our paper is: 

Buckley, M.T., Racimo, F., Allentoft, M.E., Jensen, M.K., Jonsson, A., Huang, H., Hormozdiari, F., Sikora, M., Marnetto, D., Eskin, E. and Jørgensen, M.E., 2017. Selection in Europeans on fatty acid desaturases associated with dietary changes. Molecular biology and evolution.

This project used a method introduced in a previous publication: 

Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar

Identifying causal variants at Loci with multiple signals of association. Journal Article

In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631.

Abstract | Links | BibTeX

CAVIAR was created by Farhad HormozdiariEmrah KostemEun Yong KangBogdan Pasaniuc, and Eleazar Eskin. Visit the following page to download CAVIAR and eCAVIAR: