Our group publishes papers presenting new methodologies, describing the results of studies that use our software, and reviewing current topics in the field of Bioinformatics. Scroll down or click here for a complete list of papers produced by our lab. Since 2013, we write blog posts summarizing new research papers and review articles:
GWAS
- Fine Mapping Causal Variants and Allelic Heterogeneity
- Widespread Allelic Heterogeneity in Complex Traits
- Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes
- Incorporating prior information into association studies
- Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder
- Simultaneous modeling of disease status and clinical phenotypes to increase power in GWAS
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Colocalization of GWAS and eQTL Signals Detects Target Genes
- Chromosome conformation elucidates regulatory relationships in developing human brain
Mouse Genetics
- Review Article: The Hybrid Mouse Diversity Panel
- Genes, Environments and Meta-Analysis
- Review Article: Mixed Models and Population Structure
- Identifying Genes Involved in Blood Cell Traits
- Genes, Diet, and Body Weight (in Mice)
- Review Article: Mouse Genetics
Population Structure
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models
- Multiple testing correction in linear mixed models
- Identification of causal genes for complex traits (CAVIAR-gene)
- Accurate viral population assembly from ultra-deep sequencing data
- GRAT: Speeding up Expression Quantitative Trail Loci (eQTL) Studies
- Correcting Population Structure using Mixed Models Webcast
- Mixed models can correct for population structure for genomic regions under selection
Review Articles
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Review Article: The Hybrid Mouse Diversity Panel
- Review Article: GWAS and Missing Heritability
- Review Article: Mixed Models and Population Structure
- Review Article: Mouse Genetics
Publications
2018 |
Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn; Wen, Xiaoquan; Xi, Hualin S; Hormozdiari, Farhad; Ongen, Halit; Konkashbaev, Anuar; Derks, Eske M; Aguet, François; Quan, Jie; Nicolae, Dan L; Eskin, Eleazar; Kellis, Manolis; Getz, Gad; McCarthy, Mark I; Dermitzakis, Emmanouil T; Cox, Nancy J; Ardlie, Kristin G Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Journal Article Nat Genet, 50 (7), pp. 956-967, 2018, ISSN: 1546-1718. Abstract | Links | BibTeX | Tags: Co-Localization, Expression QTLs, Fine Mapping, GWAS+eQTL @article{Gamazon:NatGenet:2018, title = {Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation.}, author = { Eric R. Gamazon and Ayellet V. Segrè and Martijn van de Bunt and Xiaoquan Wen and Hualin S. Xi and Farhad Hormozdiari and Halit Ongen and Anuar Konkashbaev and Eske M. Derks and François Aguet and Jie Quan and Dan L. Nicolae and Eleazar Eskin and Manolis Kellis and Gad Getz and Mark I. McCarthy and Emmanouil T. Dermitzakis and Nancy J. Cox and Kristin G. Ardlie}, url = {http://dx.doi.org/10.1038/s41588-018-0154-4}, issn = {1546-1718}, year = {2018}, date = {2018-01-01}, journal = {Nat Genet}, volume = {50}, number = {7}, pages = {956-967}, address = {United States}, organization = {Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. egamazon@uchicago.edu.}, abstract = {We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU}, keywords = {Co-Localization, Expression QTLs, Fine Mapping, GWAS+eQTL}, pubstate = {published}, tppubtype = {article} } We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU |
2017 |
Duong, Dat; Gai, Lisa; Snir, Sagi; Kang, Eun Yong; Han, Buhm; Sul, Jae Hoon; Eskin, Eleazar Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Journal Article Bioinformatics, 33 (14), pp. i67-i74, 2017, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Expression QTLs, Meta-Analysis @article{Duong:Bioinformatics:2017, title = {Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes.}, author = { Dat Duong and Lisa Gai and Sagi Snir and Eun Yong Kang and Buhm Han and Jae Hoon Sul and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btx227}, issn = {1367-4811}, year = {2017}, date = {2017-01-01}, journal = {Bioinformatics}, volume = {33}, number = {14}, pages = {i67-i74}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, CA 90095, USA.}, abstract = {Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {Expression QTLs, Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online |
2016 |
Duong, Dat ; Zou, Jennifer ; Hormozdiari, Farhad ; Sul, Jae Hoon ; Ernst, Jason ; Han, Buhm ; Eskin, Eleazar Using genomic annotations increases statistical power to detect eGenes. Journal Article Bioinformatics, 32 (12), pp. i156-i163, 2016, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Expression QTLs @article{Duong:Bioinformatics:2016, title = {Using genomic annotations increases statistical power to detect eGenes.}, author = {Duong, Dat and Zou, Jennifer and Hormozdiari, Farhad and Sul, Jae Hoon and Ernst, Jason and Han, Buhm and Eskin, Eleazar}, url = {http://bioinformatics.oxfordjournals.org/content/32/12/i156.abstract}, doi = {10.1093/bioinformatics/btw272}, issn = {1367-4811}, year = {2016}, date = {2016-01-01}, journal = {Bioinformatics}, volume = {32}, number = {12}, pages = {i156-i163}, address = {England}, abstract = {MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu}, keywords = {Expression QTLs}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu |
Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article Genetics, 2016, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Allele Specific Expression, Expression QTLs @article{Kang:Genetics:2016, title = {Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data.}, author = { Eun Yong Kang and Lisa Martin and Serghei Mangul and Warin Isvilanonda and Jennifer Zou and Eyal Ben-David and Buhm Han and Aldons J. Lusis and Sagiv Shifman and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.115.177246}, issn = {1943-2631}, year = {2016}, date = {2016-01-01}, journal = {Genetics}, address = {United States}, organization = {University of California, Los Angeles; ekang@cs.ucla.edu.}, abstract = {The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases}, keywords = {Allele Specific Expression, Expression QTLs}, pubstate = {published}, tppubtype = {article} } The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases |
Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article Am J Hum Genet, 2016, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Expression QTLs, Fine Mapping @article{Hormozdiari:AmJHumGenet:2016b, title = {Colocalization of GWAS and eQTL Signals Detects Target Genes.}, author = { Farhad Hormozdiari and Martijn van de Bunt and Ayellet V. Segrè and Xiao Li and Jong Wha J. Joo and Michael Bilow and Jae Hoon Sul and Sriram Sankararaman and Bogdan Pasaniuc and Eleazar Eskin}, url = {http:://dx.doi.org/10.1016/j.ajhg.2016.10.003}, issn = {1537-6605}, year = {2016}, date = {2016-01-01}, journal = {Am J Hum Genet}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.}, abstract = {The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci}, keywords = {Expression QTLs, Fine Mapping}, pubstate = {published}, tppubtype = {article} } The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci |
Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. Journal Article PLoS Genet, 12 (5), pp. e1006046, 2016, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Expression QTLs @article{Peterson:PlosGenet:2016, title = {Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder.}, author = { Christine B. Peterson and Susan K. Service and Anna J. Jasinska and Fuying Gao and Ivette Zelaya and Terri M. Teshiba and Carrie E. Bearden and Rita M. Cantor and Victor I. Reus and Gabriel Macaya and Carlos López-Jaramillo and Marina Bogomolov and Yoav Benjamini and Eleazar Eskin and Giovanni Coppola and Nelson B. Freimer and Chiara Sabatti}, url = {http://dx.doi.org/10.1371/journal.pgen.1006046}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {5}, pages = {e1006046}, address = {United States}, abstract = {The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information}, keywords = {Expression QTLs}, pubstate = {published}, tppubtype = {article} } The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information |
Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article Elife, 5 , 2016, ISSN: 2050-084X. Abstract | Links | BibTeX | Tags: Expression QTLs, Mouse Genetics @article{HasinBrumshtein:Elife:2016, title = {Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes.}, author = { Yehudit Hasin-Brumshtein and Arshad H. Khan and Farhad Hormozdiari and Calvin Pan and Brian W. Parks and Vladislav A. Petyuk and Paul D. Piehowski and Anneke Brümmer and Matteo Pellegrini and Xinshu Xiao and Eleazar Eskin and Richard D. Smith and Aldons J. Lusis and Desmond J. Smith}, url = {http://dx.doi.org/10.7554/eLife.15614}, issn = {2050-084X}, year = {2016}, date = {2016-01-01}, journal = {Elife}, volume = {5}, address = {England}, abstract = {Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation}, keywords = {Expression QTLs, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation |
2015 |
Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm Accurate and Fast Multiple-Testing Correction in eQTL Studies. Journal Article Am J Hum Genet, 96 (6), pp. 857-68, 2015, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Expression QTLs, Multiple Testing @article{Sul:AmJHumGenet:2015b, title = {Accurate and Fast Multiple-Testing Correction in eQTL Studies.}, author = { Jae Hoon Sul and Towfique Raj and Simone de Jong and Paul I. W. de Bakker and Soumya Raychaudhuri and Roel A. Ophoff and Barbara E. Stranger and Eleazar Eskin and Buhm Han}, url = {http://dx.doi.org/10.1016/j.ajhg.2015.04.012}, issn = {1537-6605}, year = {2015}, date = {2015-01-01}, journal = {Am J Hum Genet}, volume = {96}, number = {6}, pages = {857-68}, address = {United States}, abstract = {In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset}, keywords = {Expression QTLs, Multiple Testing}, pubstate = {published}, tppubtype = {article} } In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset |
2014 |
Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914. Abstract | Links | BibTeX | Tags: Confounding, eQTL Confounding, Expression QTLs @article{Joo:GenomeBiol:2014, title = {Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies.}, author = { Jong Wha J. Joo and Jae Hoon Sul and Buhm Han and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/gb-2014-15-4-r61}, issn = {1465-6914}, year = {2014}, date = {2014-01-01}, journal = {Genome Biol}, volume = {15}, number = {4}, pages = {R61}, abstract = {Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods}, keywords = {Confounding, eQTL Confounding, Expression QTLs}, pubstate = {published}, tppubtype = {article} } Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods |
2013 |
Kostem, Emrah; Eskin, Eleazar Efficiently Identifying Significant Associations in Genome-Wide Association Studies Conference Research in Computational Molecular Biology, University of California Springer Berlin Heidelberg, 2013. Abstract | Links | BibTeX | Tags: Association Study Methods, Expression QTLs @conference{Kostem:ResearchInComputationalMolecularBiology:201, title = {Efficiently Identifying Significant Associations in Genome-Wide Association Studies}, author = { Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1007/978-3-642-37195-0_10}, year = {2013}, date = {2013-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {118-131}, publisher = {Springer Berlin Heidelberg}, organization = {University of California}, abstract = {Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.}, keywords = {Association Study Methods, Expression QTLs}, pubstate = {published}, tppubtype = {conference} } Over the past several years, genome wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome which harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits where only a handful of phenotypes are analyzed per study, in (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75. |
Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Expression QTLs, Meta-Analysis, Mixed Models, Multiple Phenotypes @article{10.1371/journal.pgen.1003491, title = {Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches}, author = { Jae Hoon Sul and Buhm Han and Chun Ye and Ted Choi and Eleazar Eskin}, url = {http://dx.doi.org/10.1371%2Fjournal.pgen.1003491}, issn = {1553-7404}, year = {2013}, date = {2013-01-01}, journal = {PLoS Genet}, volume = {9}, number = {6}, pages = {e1003491}, publisher = {Public Library of Science}, address = {United States}, abstract = {Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data.}, keywords = {Expression QTLs, Meta-Analysis, Mixed Models, Multiple Phenotypes}, pubstate = {published}, tppubtype = {article} } Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data. |
Kostem, Emrah; Eskin, Eleazar Efficiently Identifying Significant Associations in Genome-wide Association Studies. Journal Article J Comput Biol, 20 (10), pp. 817-30, 2013, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: Association Study Methods, Expression QTLs @article{Kostem:JComputBiol:2013, title = {Efficiently Identifying Significant Associations in Genome-wide Association Studies.}, author = {Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2013.0087}, issn = {1557-8666}, year = {2013}, date = {2013-01-01}, journal = {J Comput Biol}, volume = {20}, number = {10}, pages = {817-30}, address = {United States}, organization = {1 Computer Science Department, University of California , Los Angeles, California.}, abstract = {Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75}, keywords = {Association Study Methods, Expression QTLs}, pubstate = {published}, tppubtype = {article} } Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75 |
2010 |
Kang, Eun Yong; Ye, Chun ; Shpitser, Ilya ; Eskin, Eleazar Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples. Journal Article J Comput Biol, 17 (3), pp. 533-46, 2010, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: Causal Inference, Causal Inference Biology, Expression QTLs @article{Kang:JComputBiol:2010b, title = {Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples.}, author = { Eun Yong Kang and Chun Ye and Ilya Shpitser and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2009.0176}, issn = {1557-8666}, year = {2010}, date = {2010-01-01}, journal = {J Comput Biol}, volume = {17}, number = {3}, pages = {533-46}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA.}, abstract = {Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level.}, keywords = {Causal Inference, Causal Inference Biology, Expression QTLs}, pubstate = {published}, tppubtype = {article} } Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level. |
2009 |
Ye, Chun; Galbraith, Simon J; Liao, James C; Eskin, Eleazar Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. Journal Article PLoS Comput Biol, 5 (3), pp. e1000311, 2009, ISSN: 1553-7358. Abstract | Links | BibTeX | Tags: Expression QTLs @article{Ye:PlosComputBiol:2009, title = {Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast.}, author = { Chun Ye and Simon J. Galbraith and James C. Liao and Eleazar Eskin}, url = {http://dx.doi.org/10.1371/journal.pcbi.1000311}, issn = {1553-7358}, year = {2009}, date = {2009-01-01}, journal = {PLoS Comput Biol}, volume = {5}, number = {3}, pages = {e1000311}, address = {United States}, organization = {Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America. cye@bioinf.ucsd.edu}, abstract = {Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request.}, keywords = {Expression QTLs}, pubstate = {published}, tppubtype = {article} } Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request. |
2008 |
Kang, Hyun Min; Ye, Chun ; Eskin, Eleazar Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Journal Article Genetics, 180 (4), pp. 1909-25, 2008, ISSN: 0016-6731. Abstract | Links | BibTeX | Tags: Expression QTLs, Mixed Models @article{Kang:Genetics:2008b, title = {Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots.}, author = { Hyun Min Kang and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.108.094201}, issn = {0016-6731}, year = {2008}, date = {2008-01-01}, journal = {Genetics}, volume = {180}, number = {4}, pages = {1909-25}, address = {United States}, organization = {Department of Human Genetics, University of California, Los Angeles, California 90095, USA.}, abstract = {In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots," resulting in "trans-regulatory bands" in an eQTL map. As several recent studies have demonstrated, technical confounding factors such as batch effects can complicate eQTL analysis by causing many spurious associations including spurious regulatory hotspots. Yet little is understood about how these technical confounding factors affect eQTL analyses and how to correct for these factors. Our analysis of data sets with biological replicates suggests that it is this intersample correlation structure inherent in expression data that leads to spurious associations between genetic loci and a large number of transcripts inducing spurious regulatory hotspots. We propose a statistical method that corrects for the spurious associations caused by complex intersample correlation of expression measurements in eQTL mapping. Applying our intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations. The concordances of cis and trans associations have consistently increased between different replicates, tissues, and populations, demonstrating the higher accuracy of our method to identify real genetic effects.}, keywords = {Expression QTLs, Mixed Models}, pubstate = {published}, tppubtype = {article} } In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called "regulatory hotspots," resulting in "trans-regulatory bands" in an eQTL map. As several recent studies have demonstrated, technical confounding factors such as batch effects can complicate eQTL analysis by causing many spurious associations including spurious regulatory hotspots. Yet little is understood about how these technical confounding factors affect eQTL analyses and how to correct for these factors. Our analysis of data sets with biological replicates suggests that it is this intersample correlation structure inherent in expression data that leads to spurious associations between genetic loci and a large number of transcripts inducing spurious regulatory hotspots. We propose a statistical method that corrects for the spurious associations caused by complex intersample correlation of expression measurements in eQTL mapping. Applying our intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations. The concordances of cis and trans associations have consistently increased between different replicates, tissues, and populations, demonstrating the higher accuracy of our method to identify real genetic effects. |