Colocalization of GWAS and eQTL Signals Detects Target Genes

Farhad Hormozdiari recently developed a method for combining genome-wide association studies (GWASs) and quantitative trait loci (eQTL) studies in a statistical framework that quantifies the probability of each variant to be causal while allowing an arbitrary number of causal variants. Together with collaborators at the University of Oxford and Broad Institute of MIT and Harvard, we present a paper in The American Journal of Human Genetics. Here, we describe eQTL and GWAS CAusal Variants Identification in Associated Regions (eCAVIAR). We apply our approach to datasets from several GWASs and eQTL studies in order to assess its accuracy and potential contributions to colocalization and fine-mapping.

Integrating GWASs and eQTL studies is a promising way to explore the mechanism of non-coding variants on diseases. Integration of GWAS and eQTL data is challenging due to the uncertainty induced by linkage disequilibrium (LD), the non-random association of alleles at different loci, and presence of loci that harbor multiple causal variants (allelic heterogeneity). Current methods assume that each locus contains a single causal variant and expect loci to be independent and associated randomly.

eCAVIAR is a novel probabilistic model for integrating GWAS and eQTL data that extends the CAVIAR (Hormozdiari et al. 2014) framework to explicitly estimate the posterior probability of the same variant being causal in both GWAS and eQTL studies, while accounting for allelic heterogeneity and LD. Our approach can quantify the strength between a causal variant and its associated signals in both studies, and it can be used to colocalize variants that pass the genome-wide significance threshold in GWAS. For any given peak variant identified in GWAS, eCAVIAR considers a collection of variants around that peak variant as one single locus.

We apply eCAVIAR to the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) dataset and GTEx dataset to detect the target gene and most relevant tissue for each GWAS risk locus. When applied to the MAGIC dataset’s 2 phenotypes, eCAVIAR identifies genetic variants that are causal in both eQTL and GWAS. Further, eCAVIAR detects a large number of loci where the GWAS causal variants are clearly distinct from the causal variants in the eQTL data. Interestingly, eCAVIAR also identifies genes that colocalize in one tissue yet can be excluded in others. For the majority of loci in which we identify a single variant causal for both GWAS and eQTL, eCAVIAR implicates more than one causal variant across the 45 tissues.

We observe that eCAVIAR outperforms existing methods even when there are different values of non-colocalization. Using simulated datasets, we compared accuracy, precision, and recall rate of eCAVIAR to RTC (Nica et al. 2010) and COLOC (Giambartolomei et al. 2014), two current methods for eQTL and GWAS colocalization. Our results show that eCAVIAR has high confidence for selecting loci to be colocalized between the GWAS and eQTL data and is conservative in selecting a locus to be colocalized.

We hope that future applications of eCAVIAR will advance identification of specific GWAS loci that share a causal variant with eQTL studies in a tissue, thus providing insight into presently unclear disease mechanisms.

Figure2

Overview of eCAVIAR.

 

eCAVIAR was created by Farhad Hormozdiari, Ayellet V. Segre, Martijn van de Bunt, Xiao Li, Jong Wha J Joo, Michael Bilow, Jae Hoon Sul, Bogdan Pasaniuc and Eleazar Eskin. The article is available at: http://www.cell.com/ajhg/abstract/S0002-9297(16)30439-6.

Visit the following page to download CAVIAR and eCAVIAR: http://genetics.cs.ucla.edu/caviar/

The full citation to our paper is:

Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar

Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article

In: Am J Hum Genet, 2016, ISSN: 1537-6605.

Abstract | Links | BibTeX

Our paper builds upon a method introduced in a previous publication:

Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar

Identifying causal variants at Loci with multiple signals of association. Journal Article

In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631.

Abstract | Links | BibTeX

ZarLab goes to Vancouver for ASHG!

serghei

Last week many members of our group traveled to Vancouver, British Columbia, for the annual meeting of the American Society of Human Genetics. The 66th Annual Meeting, which took place October 18-22, 2016, featured over 3000 talks, workshops, and poster presentations on topics such as bioinformatics and computational methods, developmental genetics and gene function, cancer and cardiovascular diseases, evolutionary and population genetics, and genetic counseling.

ZarLab contributed 8 poster presentations and one research talk. Serghei Mangul discussed his recent work on dumpster-diving techniques in a talk titled, “Comprehensive analysis of RNA-sequencing to find the source of every last read across 544 individuals from 53 tissues,” as part of the Interpreting the Transcriptome in Health and Disease symposium. You can view his slides here: https://sergheimangul.files.wordpress.com/2016/10/ashg2016_public.pdf

ZarLab in Vancouver!

ZarLab in Vancouver!

Recent alumni Farhad Hormozdiari received a Reviewers’ Choice ribbon for his poster titled, “Joint fine mapping of GWAS and eQTL detects target gene and relevant tissue.” Only the top 10% of posters by topic receive this honor, as determined by the reviewers’ scores of the submitted abstracts. Congratulations, Farhad!

Other posters presented by members of our group:

  • Prevalence of allelic heterogeneity in complex traits. Eleazar Eskin
  • Modeling the covariance of effect sizes in a meta-analysis. Dat Duong
  • Estimating regional heritability in the presence of linkage disequilibrium. Lisa Gai
  • linear mixed models for quantitative traits in health-system scale data. Michael Bilow
  • Utilizing allele specific expression to identify cis-regulatory variants. Jennifer Zou
  • Haplotype-based predictors for complex trait association. Rob Brown
  • Repeat elements expression profile across different tissues in GTEx samples. Harry Yang

UCLA Launches CGSI with Inaugural Summer Programs

In 2015, Profs. Eleazar Eskin (UCLA), Eran Halperin (UCLA), John Novembre (The University of Chicago), and Ben Raphael (Brown University) created the Computational Genomics Summer Institute (CGSI). A collaboration with the Institute for Pure and Applied Mathematics (IPAM) led by Russ Caflisch, CGSI aims to develop a flexible program for improving education and enhancing collaboration in Bioinformatics research. In summer 2016, the inaugural program included a five-day short course (July 18-22) followed by a three-week long course (July 22 to August 12).

Over the past two decades, technological developments have substantially changed research in Bioinformatics. New methods in DNA sequencing technologies are capable of performing large-scale measurements of cellular states with a lower cost and higher efficiency of computing time. These improvements have revolutionized the potential application of genomic studies toward clinical research and development of novel diagnostic tools and treatments for human disease.

Modern genomic data collection creates an enormous need for mathematical and computational infrastructures capable of analyzing datasets that are increasingly larger in scale and resolution. This poses several unique challenges to researchers in Bioinformatics, an interdisciplinary field that cuts across traditional academic fields of math, statistics, computer science, and biology—and includes private-industry sequence technology developers. Innovation depends on seamless collaboration among scientists with different skill sets, communication styles, and institution-driven career goals. Therefore, impactful Bioinformatics research requires an original framework for doing science that bridges traditional discipline-based academic structures.

The summer 2016 courses combined formal research talks and tutorials with informal interaction and mentorship in order to facilitate exchange among international researchers. Participants in the short program attended five full days packed with lectures, tutorials, and journal clubs covering a variety of cutting-edge techniques. Senior trainees, including advanced graduate students and post-docs, underwent additional training through the long program’s residence program. The extended program enabled these scientists to interact with leading researchers through a mix of structured training programs and flexible time for collaboration with fellow participants and other program faculty.

Collaboration on a wide variety of problem types and research themes facilitated cross-disciplinary communication and networking. During both courses, CGSI participants shared technical skills in coding and data analysis relevant to genetic and epigenetic imputation, fine-mapping of complex traits, linear mixed models, and Bayesian statistics in human, canine, mouse, and bacteria datasets. Scholars at different stages of their careers explored application of these methods, among others, to emerging themes such as cancer, neuropsychiatric disorders, evolutionary adaptation, early human origins, and data privacy.

CGSI instructors and participants established mentor-mentee relationships in computational genomics labs at UCLA, including the ZarLab and Bogdan Lab, while tackling practical problems and laying groundwork for future publications. In addition, participants developed comradery and professional connections while enjoying a full schedule of social activities, including dinners at classic Los Angeles area restaurants, volleyball tournaments in Santa Monica, bike rides along the beach, morning runs around UCLA campus, and even an excursion to see a live production of “West Side Story” at the Hollywood Bowl.

CGSI organizers thank the National Institutes of Health grant GM112625, UCLA Clinical and Translational Science Institute grant UL1TR000124, and IPAM for making this unique program possible. We look forward to fostering more collaboration between mathematicians, computer scientists, biologists, and sequencing technology developers in both industry and academia with future CGSI programs.

Visit the CGSI website for an up-to-date archive of program videos, slides, papers, and more:
http://computationalgenomics.bioinformatics.ucla.edu/

Enrollment in 2017 CGSI programs opens this fall with a registration deadline of February 1.

This slideshow requires JavaScript.