Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq data

Analyses of expression quantitative trait loci (eQTL), genomic loci that contribute to variation in genetic expression levels, are essential to understanding the mechanisms of human disease. These studies identify regulators of gene expression as either cis-acting factors that regulate nearby genes, or trans-acting factors that affect unlinked genes through various functions.  Traditional eQTL studies treat expression as a quantitative trait and associate it with genetic variation. This approach has identified many loci involved in the genetic regulation of common, complex diseases.

Standard eQTL methods are limited in power and accuracy by several phenomena common to genomic datasets. First, the correlation structure of genetic variation in the genome, known as linkage disequilibrium (LD), limits the ability of these methods to differentiate between the regulatory variant and neighboring variants that are in LD. Second, like other quantitative traits, the total expression of a gene is influenced by multiple genetic and environmental factors. The effect size for any given variant is therefore small, and standard methods require a large sample size to identify the effect.


ASE example and corresponding mathematical representation of three individuals (1, 2, 3). We assume that the third SNP is the causal SNP site affecting the differential gene expression level (Allele A/ Allele T).

Our forthcoming paper in Genetics presents a new method that improves the accuracy and computational power of eQTL mapping with incorporation of allele specific expression (ASE) analysis. Our novel method uses genome sequencing, alongside measurements of ASE from RNA-seq data, to identify cis-acting regulatory variants.

In standard eQTLs studies, the analysis of ASE is influenced by LD structure and the amount of allelic heterogeneity present in the genome. Individual effects appear weak since the effect of a variant is modest when compared to the variance of total expression. In our approach, the genotypes of each single individual with ASE provides information useful to determining variants causal for the observed ASE. Our approach actually leverages the relationship between LD and variant identification to map the variants affecting expression. Thus, analysis of ASE is advantageous over analysis of total expression levels, the standard approach to eQTL mapping.

We demonstrate the utility of our method by analyzing RNA-seq data from 77 unrelated northern and western European individuals (CEU). To map each gene, we simultaneously compare ASE measurements across a set of sequenced individuals. We then identify genetic variants that are in proximity to those genes and capable of explaining observed patterns of ASE. Here, we characterize the efficacy of this method as the ratio termed “reduction rate” and denoted as the ratio between the number of candidate regulatory SNPs to the total number of SNPs in the proximal region of the gene.

When applied to the CEU dataset, our method reduced the set of candidate SNPs from ten to two (a reduction rate of 80%). Allowing for one error increases the number of candidate SNPs to five and decreases the reduction rate to 50%. We also observe that the relationship between LD and variant identification has a different quality in ASE mapping when compared to eQTL studies, and produces different types of information useful to eQTL mapping studies.

ASE studies are a powerful approach to identifying associations between genetic variation and gene expression. Accurate measurement of ASE can identify cis-acting regulatory variants associated with common diseases. Our novel method for ASE mapping is based on a robust and computationally efficient non-parametric approach, and we hope it advances our understanding of functional risk alleles and facilitates development of new hypotheses for the causes and treatment of common diseases.

This project used software developed by Jennifer Zou, which is available for download at:

This project was led by Eun Yong Kang and involved Serghei Mangul, Buhm Han, and Sagiv Shifman. The article is available at:

The full citation to our paper is:

Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar

Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article

In: Genetics, 2016, ISSN: 1943-2631.

Abstract | Links | BibTeX

ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity Between Studies in Meta-analysis

Our group recently published a paper in G3 that presents a new method for interpreting meta-analysis of genomic studies. Our software, called ForestPMPlot, is a free, open-source, python-interfaced R package tool available for download from ZarLab Software. In our article, we demonstrate how ForestPMPlot facilitates interpretation of meta-analysis results by producing a plot that visualizes the heterogeneous genetic effects on the phenotype in different study conditions. We show an example analysis where our visualization framework leads to plausible interpretations of gene-by-environment interaction and multiple tissue eQTL, which would not have been straightforward with the traditional framework.

Meta-analysis has become a popular tool for increasing power in genetic association studies, yet it remains a methodological challenge. Genetic association studies can differ from each other in terms of environmental conditions, study design, population types and sizes, statistical noise, and analytical use of covariates. These factors produce different effect sizes between studies, a phenomenon called between-study heterogeneity. Correctly interpreting and accounting for heterogeneity in genetic association studies would give us a more accurate model of the true effects genetic variants have on traits under specific conditions.

Compared to traditional forest plotting techniques, ForestPMPlot visualizes a broader depth of information useful to interpretation of meta-analysis results. Specifically, our tool helps visualize differences in the effect sizes of genetic association studies and clarify why such studies exhibit heterogeneity for a particular phenotype and locus pair under different conditions. To distinguish studies with an effect from studies without an effect, we use the m-value framework. The m-value (Han and Eskin 2012; Kang et al. 2014) is the posterior probability that the effect exists in each study. In our paper, we explain how to compute an m-value and propose using the PM-plot framework (Han and Eskin 2012) to plot the P-values and m-values of each study together. The PM-Plot visualizes the relationship between m-values and P-values in a two-dimensional space, allowing a researcher to easily distinguish which study is predicted to have an effect, and which study is predicted not to have an effect.

We applied ForestPMPlot to a GWAS meta-analysis of 17 HDL mouse studies that have different environmental conditions, such as diet (e.g., high fat/low fat), and genetic knockouts, including homozygous deficiency in leptin receptor (db/db), LDL receptor knockouts, and Apoe gene knockouts. Here, we observe that two confidence intervals of effect estimates overlap each other when only considering the effect size estimates in forest plot format. This result is ambiguous if the observed heterogeneity is a result of stochastic errors. However, in the PM-Plot, we observe that the posterior probabilities are well segregated for these two studies (m-value: 0.93 vs. 0.03), allowing us to hypothesize that the SNP effects on HDL in these strains under the Western diet condition can be interacting with sex.


Seventeen mouse HDL studies with various environmental/genetic conditions are combined in this meta-analysis. (A) Forest plot and (B) PM-plot for rs32595861 locus (Fabp3 gene) analyzing data from the Kang et al. (2014) study.


We continue to develop new applications for ForestPMPlot, and we hope that our tool will facilitate more accurate interpretations of meta-analysis in future genetic association research.

ForestPMPlot was developed by Eun Yong Kang and Yurang Park. The article is available at:

Visit the following page to download ForestPMPlot:

The full citation to our paper is: 

Kang, Eun Yong; Park, Yurang; Li, Xiao; Segrè, Ayellet V; Han, Buhm; Eskin, Eleazar

ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity between Studies in Meta-analysis. Journal Article

In: G3 (Bethesda), 6 (7), pp. 1793-8, 2016, ISSN: 2160-1836.

Abstract | Links | BibTeX

This paper describes methods implemented based on research originally published by this group: 

Han, Buhm; Eskin, Eleazar

Interpreting meta-analyses of genome-wide association studies. Journal Article

In: PLoS Genet, 8 (3), pp. e1002555, 2012, ISSN: 1553-7404.

Abstract | Links | BibTeX

Han, Buhm; Eskin, Eleazar

Random-Effects Model Aimed at Discovering Associations in Meta-Analysis of Genome-wide Association Studies. Journal Article

In: Am J Hum Genet, 88 (5), pp. 586-98, 2011, ISSN: 1537-6605.

Abstract | Links | BibTeX

We discussed these methods and papers in a 2013 blog post:

Genetic and Environmental Control of Host-Gut Microbiota Interactions

Studies carried out over the last decade have revealed that gut microbiota contribute to a variety of common disorders, including obesity and diabetes (Musso et al. 2011), colitis (Devkota et al. 2012), atherosclerosis (Wang et al. 2011), rheumatoid arthritis (Vaahtovuo et al. 2008), and cancer (Yoshimoto et al. 2013). The evidence for metabolic interactions is particularly strong, as a large body of data now supports the conclusion that gut microbiota influence the energy harvest from dietary components, particularly complex carbohydrates, and that metabolites such as the short chain fatty acids produced by gut bacteria can perturb metabolic traits, including adiposity and insulin resistance (Turnbaugh et al. 2006; Backhed et al. 2007; Wen et al. 2008; Turnbaugh et al. 2009; Ridaura et al. 2013).

Gut microbiota communities are assembled by generation, influenced by maternal seeding, environmental factors, host genetics and age, resulting in substantial variations in composition among individuals in human populations (Eckburg et al. 2005; Costello et al. 2009; Huttenhower and Consortium 2012; Goodrich et al. 2014). Most experimental studies of host-gut microbiota interactions have employed large perturbations, such as comparisons of germ-free versus conventional mice, and the significance of common variations in gut microbiota composition for disease susceptibility is still poorly understood. Furthermore, while studies with germ-free mice have clearly implicated microbiota in clinically relevant traits, it has proven difficult to identify the responsible taxa of bacteria.

We now report a population-based analysis of host-gut microbiota interactions in the mouse. One of the issues we explore is the role of host genetics. Although some evidence is consistent with significant heritability of gut microbiota composition, the extent to which the host controls microbiota composition under controlled environmental conditions is unclear. We also examine the role of common variations in gut microbiota in metabolic traits such as obesity and insulin resistance. We performed our study using a resource termed the Hybrid Mouse Diversity Panel (HMDP), consisting of about 100 inbred strains of  mice that have been either sequenced or subjected to high density genotyping (Bennett et al. 2010). The resource has several advantages for genetic analysis as compared to traditional genetic crosses. First, it allows high resolution mapping by association rather than linkage analysis, and it has now been used for the identification of a number of novel genes underlying complex traits (Farber et al. 2011; Lavinsky et al. 2015; Parks et al. 2015; Rau et al. 2015). Second, since the strains are permanent the data from separate studies can be integrated, allowing the development of large, publically available databases of physiological and molecular traits relevant to a variety of clinical disorders ( and Third, the panel is ideal for examining gene-by-environment interactions, since it is possible to examine individuals of a particular genotype under a variety of conditions (Orozco et al. 2012; Parks et al. 2013).

Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a significant fraction of abundance of most common microbiota.The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies.

In our study, we concluded:

– In a total of 599 mice, 75% of them abundantly exhibited the same 17 genera

– These 17 genera accounted for 68% of reads

– Consistent with previous studies, changing diet drastically changes gut microbiota composition, and these shifts are strongly dependent on the genetic background of the mice

– Gut microbiota contribute to dietary responsiveness

– Several gut microbiota (known and novel to this study) contribute to obesity and metabolic phenotypes

– seven genome-wide significant loci (P < 4 x 10-6) were found to be associated with common genera

– We were able to estimated the heritability by using a linear mixed model approach andassuming an additive effect based on the proportion of phenotype variance accounted for by genetic relationships among the strains.

We began our study with the hypothesis that the dietary response was dictated in part by differences in gut microbiota. We showed that different inbred strains of mice differ strikingly in the composition of gut microbiota and provided evidence that the variation is determined in part by the host genetic background. Consistent with our hypothesis, we showed that cross-fostering between two strains of mice affected dietary response to the high fat, high sucrose diet. By correlating microbiota composition with dietary response among the HMDP inbred strains, we were able to identify several candidate microbiota influencing dietary response.

For all the details of our research and our methods, read our paper:

Org, Elin; Parks, Brian W W; Joo, Jong Wha J; Emert, Benjamin; Schwartzman, William; Kang, Eun Yong; Mehrabian, Margarete; Pan, Calvin; Knight, Rob; Gunsalus, Robert; Drake, Thomas A; Eskin, Eleazar; Lusis, Aldons J

Genetic and environmental control of host-gut microbiota interactions. Journal Article

In: Genome Res, 2015, ISSN: 1549-5469.

Abstract | Links | BibTeX