Using Relatedness to Identify Disease Genes


An example of IBD graph. IBD detection method provides IBD information (Table). Then we build a graph where vertices are individuals and edges are IBD relationships.

The standard approach for detecting genetic variants involved in disease is the association study where genetic information is collected from a set of individuals who have the disease and a set of healthy individuals. Any genetic variants which are more common in the set of individuals who have the disease, referred to as “associated variants”, may be involved in the disease.

Our group has just published a paper on a alternative and complementary approach for identifying regions involved in disease from the same genetic data. The basic idea is that we consider the patterns of how the individuals are related in different parts of their genomes and how this relates to their disease status. The idea is that if a region is involved in disease, individuals who have the disease will likely have more similar DNA sequences than individuals who do not have the disease. Identifying pairs of individuals with similar DNA sequences is called Identity By Descent (IBD) mapping and there are several methods which can identify IBD relations efficiently(18971310),(21310274),(24207118).

The way our approach works is that in each region of the genome, we build an IBD graph based on which pairs of individual are related where a vertex in the graph is an individual and an edge is a IBD relation which implies that the two individuals have similar DNA sequences at that point.  In our graph, individuals who have the disease are red squares (cases) and individuals who are healthy are green circles (controls).  Following our intuition, if the region is involved in the disease, we expect more edges between pairs of case individuals than between pairs of control individuals.  Our approach simply considers this difference and then apples permutation where the assignment of case and control status to the individuals are randomized in order to obtain a significance level.  Our approach was not the first method to apply this idea and follows the paper by Thompson and Browning(23733848).  The advantage of our paper is that we use a technique called importance sampling to speed up the computation of the significance levels by orders of magnitude. The hope is that this type of approach maybe more effective to identify regions of the genome that are involved in disease through rare variants which are difficult to detect in association studies.

The full citation for the paper is:

Han, Buhm; Kang, Eun Yong ; Raychaudhuri, Soumya ; de Bakker, Paul I W; Eskin, Eleazar

Fast Pairwise IBD Association Testing in Genome-wide Association Studies. Journal Article

In: Bioinformatics, 2013, ISSN: 1367-4811.

Abstract | Links | BibTeX


Discovering Genetic Variation that Affects Expression in Multiple Tissues

Over the past several years, Genome Wide Association Studies (GWAS) have discovered hundreds of genetic variants involved in complex diseases(10.1056/NEJMra0905980).  The vast majority of these variants do not lie in the protein coding regions of genes and thus do not affect what the gene produces, but instead likely affect how the genes are regulated.  For this reason, the study of how genetic variation affect gene activity levels (referred to as expression levels) has been a major focus of research for many years.  Genetic variation that affects gene expression are referred to as expression quantitative trait loci (eQTL)(10.1038/nrg2969).

Several studies collect expression from multiple tissues which leads to the question of whether or not the same genetic variants affect expression in multiple tissues(10.1038/ng.2653).  Another way to ask this question is: Are eQTLs tissue specific or not tissue specific?

A challenge in this type of analysis is that an eQTL may affect expression in multiple tissues, but because of small sample sizes, the eQTL will only be detected in one of the tissues.  Thus, traditional techniques for eQTLs will systematically be biased against detecting eQTLs in multiple tissues.

Jae-Hoon Sul and Buhm Han in our group developed a method to address this issue which builds upon recent methods in random effects meta-analysis(10.1016/j.ajhg.2011.04.014),(10.1371/journal.pgen.1002555).  To apply these methods we first analyze each tissue separately and then use the meta-analysis method to combine the results of each tissue.  Since our methods are specifically designed to handle “heterogeneity” which is that the effect size can be different in each study, our method is able to perform well when the effect is present in all of the tissues or just some of the tissues.  More information about our meta-analysis research is here.

The full citation of our paper is here:

Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar

Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article

In: PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404.

Abstract | Links | BibTeX

Over the past few years, our group has published several papers on methods for eQTL analysis.  Our other paper on eQTL analysis include:


Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn; Wen, Xiaoquan; Xi, Hualin S; Hormozdiari, Farhad; Ongen, Halit; Konkashbaev, Anuar; Derks, Eske M; Aguet, François; Quan, Jie; Nicolae, Dan L; Eskin, Eleazar; Kellis, Manolis; Getz, Gad; McCarthy, Mark I; Dermitzakis, Emmanouil T; Cox, Nancy J; Ardlie, Kristin G

Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Journal Article

In: Nat Genet, 50 (7), pp. 956-967, 2018, ISSN: 1546-1718.

Abstract | Links | BibTeX


Duong, Dat; Gai, Lisa; Snir, Sagi; Kang, Eun Yong; Han, Buhm; Sul, Jae Hoon; Eskin, Eleazar

Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Journal Article

In: Bioinformatics, 33 (14), pp. i67-i74, 2017, ISSN: 1367-4811.

Abstract | Links | BibTeX


Duong, Dat ; Zou, Jennifer ; Hormozdiari, Farhad ; Sul, Jae Hoon ; Ernst, Jason ; Han, Buhm ; Eskin, Eleazar

Using genomic annotations increases statistical power to detect eGenes. Journal Article

In: Bioinformatics, 32 (12), pp. i156-i163, 2016, ISSN: 1367-4811.

Abstract | Links | BibTeX

Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar

Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article

In: Genetics, 2016, ISSN: 1943-2631.

Abstract | Links | BibTeX

Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar

Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article

In: Am J Hum Genet, 2016, ISSN: 1537-6605.

Abstract | Links | BibTeX

Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara

Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. Journal Article

In: PLoS Genet, 12 (5), pp. e1006046, 2016, ISSN: 1553-7404.

Abstract | Links | BibTeX

Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J

Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article

In: Elife, 5 , 2016, ISSN: 2050-084X.

Abstract | Links | BibTeX


Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm

Accurate and Fast Multiple-Testing Correction in eQTL Studies. Journal Article

In: Am J Hum Genet, 96 (6), pp. 857-68, 2015, ISSN: 1537-6605.

Abstract | Links | BibTeX


Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar

Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article

In: Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914.

Abstract | Links | BibTeX


Kostem, Emrah; Eskin, Eleazar

Efficiently Identifying Significant Associations in Genome-Wide Association Studies Conference

Research in Computational Molecular Biology, University of California Springer Berlin Heidelberg, 2013.

Abstract | Links | BibTeX

Kostem, Emrah; Eskin, Eleazar

Efficiently Identifying Significant Associations in Genome-wide Association Studies. Journal Article

In: J Comput Biol, 20 (10), pp. 817-30, 2013, ISSN: 1557-8666.

Abstract | Links | BibTeX


Kang, Eun Yong; Ye, Chun ; Shpitser, Ilya ; Eskin, Eleazar

Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples. Journal Article

In: J Comput Biol, 17 (3), pp. 533-46, 2010, ISSN: 1557-8666.

Abstract | Links | BibTeX


Ye, Chun; Galbraith, Simon J; Liao, James C; Eskin, Eleazar

Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. Journal Article

In: PLoS Comput Biol, 5 (3), pp. e1000311, 2009, ISSN: 1553-7358.

Abstract | Links | BibTeX


Kang, Hyun Min; Ye, Chun ; Eskin, Eleazar

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Journal Article

In: Genetics, 180 (4), pp. 1909-25, 2008, ISSN: 0016-6731.

Abstract | Links | BibTeX


Heterogeneity and Meta-Analysis


Visualizing heterogeneity in meta-analyses of GWAS. The left panel shows a forest plot which shows the predicted effect size and standard error for each study. The right panel shows a PM-plot which for each study plots the p-value on the y-axis and the m-value on the x-axis. M-values have the following interpretations: Small m-value (e.g. < 0.1) suggest the study does not have an effect. Large m-value (e.g. > 0.9) suggest the study is predicted to have an effect. Otherwise the prediction is ambiguous.

Over the past couple of years, a major focus of our group has been on meta-analysis. These efforts have been led by Buhm Han who is a graduate of our group and now a post-doc at the Broad Institute.

Meta-Analysis is a statistical method to combine the results of many statistical studies.  Meta-analysis has the advantage that the statistical power of the combination of the studies is much higher than the statistical power of any individual studies.  In fact, the majority of the recently identified genetic variants associated with complex diseases have been discovered using meta-analysis (10.1146/annurev-genom-091212-153520) since most of the effect sizes of these variants are too small to discover in the sample sizes of the individual studies.

Standard meta-analysis techniques assume what is referred to as the “fixed effect model” (FE). In the FE model, the effect size in each study is assumed to the the same. In the case of genetic association studies, this is an unrealistic assumption because the studies are often collected in very different populations which are subject to very different environmental conditions. An alternate model is the “random effects model” (RE) where the effect size are assumed to be different in each study and the effect sizes are modeled as being drawn from a distribution with an estimated mean and variance. This difference in effect sizes between studies is referred to as “heterogeneity.”

Buhm Han, in our group, made two contributions related to heterogeneity in meta-analysis. In his first paper, he noticed that previous approaches for hypothesis testing using the RE model did not correctly model the null hypothesis and led to a significant loss in power(10.1016/j.ajhg.2011.04.014). His second paper presented a method for helping interpret meta-analysis studies to identify in which studies an effect is present and in which studies an effect is not present(10.1371/journal.pgen.1002555).  One aspect of the interpretation framework is the m-value which can be used to identify in which studies an effect is present and a summary of the heterogeneity of the meta-analysis can be visualized utilizing a PM-plot (see figure).

The methods are implemented in the software that Buhm developed, METASOFT, available at

The full citations to his papers are below:

Han, Buhm; Eskin, Eleazar

Interpreting meta-analyses of genome-wide association studies. Journal Article

PLoS Genet, 8 (3), pp. e1002555, 2012, ISSN: 1553-7404.

Abstract | Links | BibTeX

Han, Buhm; Eskin, Eleazar

Random-Effects Model Aimed at Discovering Associations in Meta-Analysis of Genome-wide Association Studies. Journal Article

Am J Hum Genet, 88 (5), pp. 586-98, 2011, ISSN: 1537-6605.

Abstract | Links | BibTeX