Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq data

Analyses of expression quantitative trait loci (eQTL), genomic loci that contribute to variation in genetic expression levels, are essential to understanding the mechanisms of human disease. These studies identify regulators of gene expression as either cis-acting factors that regulate nearby genes, or trans-acting factors that affect unlinked genes through various functions.  Traditional eQTL studies treat expression as a quantitative trait and associate it with genetic variation. This approach has identified many loci involved in the genetic regulation of common, complex diseases.

Standard eQTL methods are limited in power and accuracy by several phenomena common to genomic datasets. First, the correlation structure of genetic variation in the genome, known as linkage disequilibrium (LD), limits the ability of these methods to differentiate between the regulatory variant and neighboring variants that are in LD. Second, like other quantitative traits, the total expression of a gene is influenced by multiple genetic and environmental factors. The effect size for any given variant is therefore small, and standard methods require a large sample size to identify the effect.

figure

ASE example and corresponding mathematical representation of three individuals (1, 2, 3). We assume that the third SNP is the causal SNP site affecting the differential gene expression level (Allele A/ Allele T).

Our forthcoming paper in Genetics presents a new method that improves the accuracy and computational power of eQTL mapping with incorporation of allele specific expression (ASE) analysis. Our novel method uses genome sequencing, alongside measurements of ASE from RNA-seq data, to identify cis-acting regulatory variants.

In standard eQTLs studies, the analysis of ASE is influenced by LD structure and the amount of allelic heterogeneity present in the genome. Individual effects appear weak since the effect of a variant is modest when compared to the variance of total expression. In our approach, the genotypes of each single individual with ASE provides information useful to determining variants causal for the observed ASE. Our approach actually leverages the relationship between LD and variant identification to map the variants affecting expression. Thus, analysis of ASE is advantageous over analysis of total expression levels, the standard approach to eQTL mapping.

We demonstrate the utility of our method by analyzing RNA-seq data from 77 unrelated northern and western European individuals (CEU). To map each gene, we simultaneously compare ASE measurements across a set of sequenced individuals. We then identify genetic variants that are in proximity to those genes and capable of explaining observed patterns of ASE. Here, we characterize the efficacy of this method as the ratio termed “reduction rate” and denoted as the ratio between the number of candidate regulatory SNPs to the total number of SNPs in the proximal region of the gene.

When applied to the CEU dataset, our method reduced the set of candidate SNPs from ten to two (a reduction rate of 80%). Allowing for one error increases the number of candidate SNPs to five and decreases the reduction rate to 50%. We also observe that the relationship between LD and variant identification has a different quality in ASE mapping when compared to eQTL studies, and produces different types of information useful to eQTL mapping studies.

ASE studies are a powerful approach to identifying associations between genetic variation and gene expression. Accurate measurement of ASE can identify cis-acting regulatory variants associated with common diseases. Our novel method for ASE mapping is based on a robust and computationally efficient non-parametric approach, and we hope it advances our understanding of functional risk alleles and facilitates development of new hypotheses for the causes and treatment of common diseases.

This project used software developed by Jennifer Zou, which is available for download at: http://genetics.cs.ucla.edu/ase/

This project was led by Eun Yong Kang and involved Serghei Mangul, Buhm Han, and Sagiv Shifman. The article is available at: http://www.genetics.org/content/204/3/1057

The full citation to our paper is:

Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar

Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article

In: Genetics, 2016, ISSN: 1943-2631.

Abstract | Links | BibTeX

Review Article: The Hybrid Mouse Diversity Panel

This year, we published a review of studies on the Hybrid Mouse Diversity Panel (HMDP) dataset, a project led by Aldons J. Lusis (David Geffen School of Medicine at UCLA). Our paper in Journal of Lipid Research describes the dataset, summarizes current discoveries facilitated by the dataset, and explains how researchers can use correlation, genetic mapping, and statistical modeling methods with HMDP data to address cardiometabolic questions.

The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated.

Since its development in 2010, studies using the HMDP have validated over a dozen novel genes underlying complex traits. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of mice subjected to various environmental conditions. These analyses have identified many novel genes and significant loci associated with disease risk relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, and fatty liver disease.

The HMDP has substantial potential to advance interdisciplinary research on genetics and computational biology. In order to make HMDP and associated methods accessible to cardiometabolic researchers, our paper includes a glossary of genetics terms and an outline of how the database can be interrogated to address certain questions using correlation, genetic mapping, and statistical modeling.

All of the published data are available and can be readily used to formulate hypotheses about genes, pathways, and interactions. For more information about HMDP, read our article: https://www.ncbi.nlm.nih.gov/pubmed/27099397

The full citation to our paper is:

Lusis, Aldons J; Seldin, Marcus; Allayee, Hooman; Bennett, Brian J; Civelek, Mete; Davis, Richard C; Eskin, Eleazar; Farber, Charles; Hui, Simon T; Mehrabian, Margarete; Norheim, Frode; Pan, Calvin; Parks, Brian; Rau, Christoph; Smith, Desmond J; Vallim, Thomas; Wang, Yibin; Wang, Jessica

The Hybrid Mouse Diversity Panel: A Resource for Systems Genetics Analyses of Metabolic and Cardiovascular Traits. Journal Article

In: J Lipid Res, 2016, ISSN: 1539-7262.

Abstract | Links | BibTeX

 

schematic

Hypothetical examples of how information from the HMDP can be utilized to explore relationships between genes (A) and traits (B) of interest. Read our paper for more information on methods for exploring their relationships with multiple layers of information.

Chromosome conformation elucidates regulatory relationships in developing human brain

Farhad Hormozdiari, a recent ZarLab alumni, contributed to a paper published this week in Nature. Our paper reports new findings on genetic factors related to human cognition and neurodevelopmental disorders, the result of a collaboration with UCLA’s David Geffen School of Medicine and the School of Biotechnology and Biomolecular Sciences at University of New South Wales. Farhad implemented the software package CAVIAR which was utilized to identify the causal variants and interpretation of data.

Neurodevelopmental disorders such as autism and schizophrenia are thought to originate during embryonic development of the cerebral cortex. The project focused on the 3D interactions of genome-wide chromatin contacts, the areas of a cell’s nucleus that package chromosomes into DNA and influence cell replication. Chromatin contacts regulate gene expression in specific tissues, and mapping their interactions within chromosomes provides important biological insights into the malfunctioning gene regulatory mechanisms that drive these disorders.

The project generated high-resolution 3D maps of chromatin contacts active during development of the cortex region of the human brain. These maps enabled a large-scale annotation of previously uncharacterized regulatory mechanisms tied to the evolution of human cognition and disease. Using this data, the paper identified hundreds of genes involved with human cognitive function. Next, the paper integrated chromatin contacts with noncoding variants previously identified in schizophrenia genome-wide association studies (GWAS) and performed several analyses to explore the relationships of interactions between chromatin and biological function.  One of the uses of CAVIAR in the paper was to verify that the causal variants involved in schizophrenia GWAS are in fact compatible with the 3D maps of chromatin contacts.

The paper also found several highly interacting chromatin regions that correlate with levels of gene expression and are associated with promoters, positive transcriptional regulators, and enhances—areas of the genome that shape cell replication and neurological development. The paper identified specific sets of genes enriched in known intellectual disability risk genes, including mutations known to cause autosomal recessive primary microcephaly. The GWAS results identified approximately 500 genome-wide significant schizophrenia-associated loci, about 30% of which interact with schizophrenia SNPs exclusively in developing brain tissue. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene.

This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders. Read the paper for a detailed account of our data, methods, and results: http://www.nature.com/nature/journal/vaop/ncurrent/full/nature19847.html

The CAVIAR program was developed by Farhad Hormozdiari and is freely available for download on the following webpage: http://genetics.cs.ucla.edu/caviar/

The full citation to our paper is:

Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L; Parikshak, Neelroop N; Huang, Jerry; Opland, Carli K; Gandal, Michael J; Sutton, Gavin J; Hormozdiari, Farhad; Lu, Daning; Lee, Changhoon; Eskin, Eleazar; Voineagu, Irina; Ernst, Jason; Geschwind, Daniel H

Chromosome conformation elucidates regulatory relationships in developing human brain. Journal Article

In: Nature, 538 (7626), pp. 523-527, 2016, ISSN: 1476-4687.

Abstract | Links | BibTeX

 

figure

Annotation of schizophrenia-associated loci identified by a GWAS of chromatin contact data.