Over the past several years, Genome Wide Association Studies (GWAS) have discovered hundreds of genetic variants involved in complex diseases(10.1056/NEJMra0905980). The vast majority of these variants do not lie in the protein coding regions of genes and thus do not affect what the gene produces, but instead likely affect how the genes are regulated. For this reason, the study of how genetic variation affect gene activity levels (referred to as expression levels) has been a major focus of research for many years. Genetic variation that affects gene expression are referred to as expression quantitative trait loci (eQTL)(10.1038/nrg2969).
Several studies collect expression from multiple tissues which leads to the question of whether or not the same genetic variants affect expression in multiple tissues(10.1038/ng.2653). Another way to ask this question is: Are eQTLs tissue specific or not tissue specific?
A challenge in this type of analysis is that an eQTL may affect expression in multiple tissues, but because of small sample sizes, the eQTL will only be detected in one of the tissues. Thus, traditional techniques for eQTLs will systematically be biased against detecting eQTLs in multiple tissues.
Jae-Hoon Sul and Buhm Han in our group developed a method to address this issue which builds upon recent methods in random effects meta-analysis(10.1016/j.ajhg.2011.04.014),(10.1371/journal.pgen.1002555). To apply these methods we first analyze each tissue separately and then use the meta-analysis method to combine the results of each tissue. Since our methods are specifically designed to handle “heterogeneity” which is that the effect size can be different in each study, our method is able to perform well when the effect is present in all of the tissues or just some of the tissues. More information about our meta-analysis research is here.
The full citation of our paper is here:
Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article In: PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404. @article{10.1371/journal.pgen.1003491, title = {Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches}, author = { Jae Hoon Sul and Buhm Han and Chun Ye and Ted Choi and Eleazar Eskin}, url = {http://dx.doi.org/10.1371%2Fjournal.pgen.1003491}, issn = {1553-7404}, year = {2013}, date = {2013-01-01}, journal = {PLoS Genet}, volume = {9}, number = {6}, pages = {e1003491}, publisher = {Public Library of Science}, address = {United States}, abstract = {Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data. |
Over the past few years, our group has published several papers on methods for eQTL analysis. Our other paper on eQTL analysis include:
2018 |
Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn; Wen, Xiaoquan; Xi, Hualin S; Hormozdiari, Farhad; Ongen, Halit; Konkashbaev, Anuar; Derks, Eske M; Aguet, François; Quan, Jie; Nicolae, Dan L; Eskin, Eleazar; Kellis, Manolis; Getz, Gad; McCarthy, Mark I; Dermitzakis, Emmanouil T; Cox, Nancy J; Ardlie, Kristin G Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Journal Article In: Nat Genet, 50 (7), pp. 956-967, 2018, ISSN: 1546-1718. @article{Gamazon:NatGenet:2018, title = {Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation.}, author = { Eric R. Gamazon and Ayellet V. Segrè and Martijn van de Bunt and Xiaoquan Wen and Hualin S. Xi and Farhad Hormozdiari and Halit Ongen and Anuar Konkashbaev and Eske M. Derks and François Aguet and Jie Quan and Dan L. Nicolae and Eleazar Eskin and Manolis Kellis and Gad Getz and Mark I. McCarthy and Emmanouil T. Dermitzakis and Nancy J. Cox and Kristin G. Ardlie}, url = {http://dx.doi.org/10.1038/s41588-018-0154-4}, issn = {1546-1718}, year = {2018}, date = {2018-01-01}, journal = {Nat Genet}, volume = {50}, number = {7}, pages = {956-967}, address = {United States}, organization = {Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. egamazon@uchicago.edu.}, abstract = {We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU}, keywords = {}, pubstate = {published}, tppubtype = {article} } We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU |
2017 |
Duong, Dat; Gai, Lisa; Snir, Sagi; Kang, Eun Yong; Han, Buhm; Sul, Jae Hoon; Eskin, Eleazar Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Journal Article In: Bioinformatics, 33 (14), pp. i67-i74, 2017, ISSN: 1367-4811. @article{Duong:Bioinformatics:2017, title = {Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes.}, author = { Dat Duong and Lisa Gai and Sagi Snir and Eun Yong Kang and Buhm Han and Jae Hoon Sul and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btx227}, issn = {1367-4811}, year = {2017}, date = {2017-01-01}, journal = {Bioinformatics}, volume = {33}, number = {14}, pages = {i67-i74}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, CA 90095, USA.}, abstract = {Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {}, pubstate = {published}, tppubtype = {article} } Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online |
2016 |
Duong, Dat ; Zou, Jennifer ; Hormozdiari, Farhad ; Sul, Jae Hoon ; Ernst, Jason ; Han, Buhm ; Eskin, Eleazar Using genomic annotations increases statistical power to detect eGenes. Journal Article In: Bioinformatics, 32 (12), pp. i156-i163, 2016, ISSN: 1367-4811. @article{Duong:Bioinformatics:2016, title = {Using genomic annotations increases statistical power to detect eGenes.}, author = {Duong, Dat and Zou, Jennifer and Hormozdiari, Farhad and Sul, Jae Hoon and Ernst, Jason and Han, Buhm and Eskin, Eleazar}, url = {http://bioinformatics.oxfordjournals.org/content/32/12/i156.abstract}, doi = {10.1093/bioinformatics/btw272}, issn = {1367-4811}, year = {2016}, date = {2016-01-01}, journal = {Bioinformatics}, volume = {32}, number = {12}, pages = {i156-i163}, address = {England}, abstract = {MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu}, keywords = {}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu |
Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article In: Genetics, 2016, ISSN: 1943-2631. @article{Kang:Genetics:2016, title = {Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data.}, author = { Eun Yong Kang and Lisa Martin and Serghei Mangul and Warin Isvilanonda and Jennifer Zou and Eyal Ben-David and Buhm Han and Aldons J. Lusis and Sagiv Shifman and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.115.177246}, issn = {1943-2631}, year = {2016}, date = {2016-01-01}, journal = {Genetics}, address = {United States}, organization = {University of California, Los Angeles; ekang@cs.ucla.edu.}, abstract = {The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases}, keywords = {}, pubstate = {published}, tppubtype = {article} } The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases |
Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article In: Am J Hum Genet, 2016, ISSN: 1537-6605. @article{Hormozdiari:AmJHumGenet:2016b, title = {Colocalization of GWAS and eQTL Signals Detects Target Genes.}, author = { Farhad Hormozdiari and Martijn van de Bunt and Ayellet V. Segrè and Xiao Li and Jong Wha J. Joo and Michael Bilow and Jae Hoon Sul and Sriram Sankararaman and Bogdan Pasaniuc and Eleazar Eskin}, url = {http:://dx.doi.org/10.1016/j.ajhg.2016.10.003}, issn = {1537-6605}, year = {2016}, date = {2016-01-01}, journal = {Am J Hum Genet}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.}, abstract = {The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci}, keywords = {}, pubstate = {published}, tppubtype = {article} } The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci |
Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. Journal Article In: PLoS Genet, 12 (5), pp. e1006046, 2016, ISSN: 1553-7404. @article{Peterson:PlosGenet:2016, title = {Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder.}, author = { Christine B. Peterson and Susan K. Service and Anna J. Jasinska and Fuying Gao and Ivette Zelaya and Terri M. Teshiba and Carrie E. Bearden and Rita M. Cantor and Victor I. Reus and Gabriel Macaya and Carlos López-Jaramillo and Marina Bogomolov and Yoav Benjamini and Eleazar Eskin and Giovanni Coppola and Nelson B. Freimer and Chiara Sabatti}, url = {http://dx.doi.org/10.1371/journal.pgen.1006046}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {5}, pages = {e1006046}, address = {United States}, abstract = {The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information}, keywords = {}, pubstate = {published}, tppubtype = {article} } The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information |
Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article In: Elife, 5 , 2016, ISSN: 2050-084X. @article{HasinBrumshtein:Elife:2016, title = {Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes.}, author = { Yehudit Hasin-Brumshtein and Arshad H. Khan and Farhad Hormozdiari and Calvin Pan and Brian W. Parks and Vladislav A. Petyuk and Paul D. Piehowski and Anneke Brümmer and Matteo Pellegrini and Xinshu Xiao and Eleazar Eskin and Richard D. Smith and Aldons J. Lusis and Desmond J. Smith}, url = {http://dx.doi.org/10.7554/eLife.15614}, issn = {2050-084X}, year = {2016}, date = {2016-01-01}, journal = {Elife}, volume = {5}, address = {England}, abstract = {Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation}, keywords = {}, pubstate = {published}, tppubtype = {article} } Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation |
2015 |
Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm Accurate and Fast Multiple-Testing Correction in eQTL Studies. Journal Article In: Am J Hum Genet, 96 (6), pp. 857-68, 2015, ISSN: 1537-6605. @article{Sul:AmJHumGenet:2015b, title = {Accurate and Fast Multiple-Testing Correction in eQTL Studies.}, author = { Jae Hoon Sul and Towfique Raj and Simone de Jong and Paul I. W. de Bakker and Soumya Raychaudhuri and Roel A. Ophoff and Barbara E. Stranger and Eleazar Eskin and Buhm Han}, url = {http://dx.doi.org/10.1016/j.ajhg.2015.04.012}, issn = {1537-6605}, year = {2015}, date = {2015-01-01}, journal = {Am J Hum Genet}, volume = {96}, number = {6}, pages = {857-68}, address = {United States}, abstract = {In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset}, keywords = {}, pubstate = {published}, tppubtype = {article} } In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset |
2014 |
Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article In: Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914. @article{Joo:GenomeBiol:2014, title = {Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies.}, author = { Jong Wha J. Joo and Jae Hoon Sul and Buhm Han and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/gb-2014-15-4-r61}, issn = {1465-6914}, year = {2014}, date = {2014-01-01}, journal = {Genome Biol}, volume = {15}, number = {4}, pages = {R61}, abstract = {Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods}, keywords = {}, pubstate = {published}, tppubtype = {article} } Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods |
2013 |
Kostem, Emrah; Eskin, Eleazar Efficiently Identifying Significant Associations in Genome-Wide Association Studies Conference Research in Computational Molecular Biology, University of California Springer Berlin Heidelberg, 2013. @conference{Kostem:ResearchInComputationalMolecularBiology:201, title = {Efficiently Identifying Significant Associations in Genome-Wide Association Studies}, author = { Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1007/978-3-642-37195-0_10}, year = {2013}, date = {2013-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {118-131}, publisher = {Springer Berlin Heidelberg}, organization = {University of California}, abstract = {Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75. |
Kostem, Emrah; Eskin, Eleazar Efficiently Identifying Significant Associations in Genome-wide Association Studies. Journal Article In: J Comput Biol, 20 (10), pp. 817-30, 2013, ISSN: 1557-8666. @article{Kostem:JComputBiol:2013, title = {Efficiently Identifying Significant Associations in Genome-wide Association Studies.}, author = {Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2013.0087}, issn = {1557-8666}, year = {2013}, date = {2013-01-01}, journal = {J Comput Biol}, volume = {20}, number = {10}, pages = {817-30}, address = {United States}, organization = {1 Computer Science Department, University of California , Los Angeles, California.}, abstract = {Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75}, keywords = {}, pubstate = {published}, tppubtype = {article} } Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75 |
2010 |
Kang, Eun Yong; Ye, Chun ; Shpitser, Ilya ; Eskin, Eleazar Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples. Journal Article In: J Comput Biol, 17 (3), pp. 533-46, 2010, ISSN: 1557-8666. @article{Kang:JComputBiol:2010b, title = {Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples.}, author = { Eun Yong Kang and Chun Ye and Ilya Shpitser and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2009.0176}, issn = {1557-8666}, year = {2010}, date = {2010-01-01}, journal = {J Comput Biol}, volume = {17}, number = {3}, pages = {533-46}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA.}, abstract = {Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level. |
2009 |
Ye, Chun; Galbraith, Simon J; Liao, James C; Eskin, Eleazar Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. Journal Article In: PLoS Comput Biol, 5 (3), pp. e1000311, 2009, ISSN: 1553-7358. @article{Ye:PlosComputBiol:2009, title = {Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast.}, author = { Chun Ye and Simon J. Galbraith and James C. Liao and Eleazar Eskin}, url = {http://dx.doi.org/10.1371/journal.pcbi.1000311}, issn = {1553-7358}, year = {2009}, date = {2009-01-01}, journal = {PLoS Comput Biol}, volume = {5}, number = {3}, pages = {e1000311}, address = {United States}, organization = {Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America. cye@bioinf.ucsd.edu}, abstract = {Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request. |
2008 |
Kang, Hyun Min; Ye, Chun ; Eskin, Eleazar Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Journal Article In: Genetics, 180 (4), pp. 1909-25, 2008, ISSN: 0016-6731. @article{Kang:Genetics:2008b, title = {Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots.}, author = { Hyun Min Kang and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.108.094201}, issn = {0016-6731}, year = {2008}, date = {2008-01-01}, journal = {Genetics}, volume = {180}, number = {4}, pages = {1909-25}, address = {United States}, organization = {Department of Human Genetics, University of California, Los Angeles, California 90095, USA.}, abstract = {In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots," resulting in "trans-regulatory bands" in an eQTL map. As several recent studies have demonstrated, technical confounding factors such as batch effects can complicate eQTL analysis by causing many spurious associations including spurious regulatory hotspots. Yet little is understood about how these technical confounding factors affect eQTL analyses and how to correct for these factors. Our analysis of data sets with biological replicates suggests that it is this intersample correlation structure inherent in expression data that leads to spurious associations between genetic loci and a large number of transcripts inducing spurious regulatory hotspots. We propose a statistical method that corrects for the spurious associations caused by complex intersample correlation of expression measurements in eQTL mapping. Applying our intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations. The concordances of cis and trans associations have consistently increased between different replicates, tissues, and populations, demonstrating the higher accuracy of our method to identify real genetic effects.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots," resulting in "trans-regulatory bands" in an eQTL map. As several recent studies have demonstrated, technical confounding factors such as batch effects can complicate eQTL analysis by causing many spurious associations including spurious regulatory hotspots. Yet little is understood about how these technical confounding factors affect eQTL analyses and how to correct for these factors. Our analysis of data sets with biological replicates suggests that it is this intersample correlation structure inherent in expression data that leads to spurious associations between genetic loci and a large number of transcripts inducing spurious regulatory hotspots. We propose a statistical method that corrects for the spurious associations caused by complex intersample correlation of expression measurements in eQTL mapping. Applying our intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations. The concordances of cis and trans associations have consistently increased between different replicates, tissues, and populations, demonstrating the higher accuracy of our method to identify real genetic effects. |