Our group publishes papers presenting new methodologies, describing the results of studies that use our software, and reviewing current topics in the field of Bioinformatics. Scroll down or click here for a complete list of papers produced by our lab. Since 2013, we write blog posts summarizing new research papers and review articles:
GWAS
- Fine Mapping Causal Variants and Allelic Heterogeneity
- Widespread Allelic Heterogeneity in Complex Traits
- Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes
- Incorporating prior information into association studies
- Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder
- Simultaneous modeling of disease status and clinical phenotypes to increase power in GWAS
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Colocalization of GWAS and eQTL Signals Detects Target Genes
- Chromosome conformation elucidates regulatory relationships in developing human brain
Mouse Genetics
- Review Article: The Hybrid Mouse Diversity Panel
- Genes, Environments and Meta-Analysis
- Review Article: Mixed Models and Population Structure
- Identifying Genes Involved in Blood Cell Traits
- Genes, Diet, and Body Weight (in Mice)
- Review Article: Mouse Genetics
Population Structure
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models
- Multiple testing correction in linear mixed models
- Identification of causal genes for complex traits (CAVIAR-gene)
- Accurate viral population assembly from ultra-deep sequencing data
- GRAT: Speeding up Expression Quantitative Trail Loci (eQTL) Studies
- Correcting Population Structure using Mixed Models Webcast
- Mixed models can correct for population structure for genomic regions under selection
Review Articles
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Review Article: The Hybrid Mouse Diversity Panel
- Review Article: GWAS and Missing Heritability
- Review Article: Mixed Models and Population Structure
- Review Article: Mouse Genetics
Publications
2019 |
Mangul, Serghei; Mosqueiro, Thiago; Abdill, Richard J; Duong, Dat; Mitchell, Keith; Sarwal, Varuni; Hill, Brian; Brito, Jaqueline; Littman, Russell Jared; Statz, Benjamin; Lam, Angela Ka-Mei; Dayama, Gargi; Grieneisen, Laura; Martin, Lana S; Flint, Jonathan; Eskin, Eleazar; Blekhman, Ran Challenges and recommendations to improve the installability and archival stability of omics computational tools. Journal Article PLoS Biol, 17 (6), pp. e3000333, 2019, ISSN: 1545-7885. Abstract | Links | BibTeX | Tags: @article{Mangul:PlosBiol:2019, title = {Challenges and recommendations to improve the installability and archival stability of omics computational tools.}, author = { Serghei Mangul and Thiago Mosqueiro and Richard J. Abdill and Dat Duong and Keith Mitchell and Varuni Sarwal and Brian Hill and Jaqueline Brito and Russell Jared Littman and Benjamin Statz and Angela Ka-Mei Lam and Gargi Dayama and Laura Grieneisen and Lana S. Martin and Jonathan Flint and Eleazar Eskin and Ran Blekhman}, url = {http://dx.doi.org/10.1371/journal.pbio.3000333}, issn = {1545-7885}, year = {2019}, date = {2019-01-01}, journal = {PLoS Biol}, volume = {17}, number = {6}, pages = {e3000333}, address = {United States}, organization = {Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America.}, abstract = {Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software}, keywords = {}, pubstate = {published}, tppubtype = {article} } Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software |
Mangul, Serghei; Martin, Lana S; Hill, Brian L; Lam, Angela Ka-Mei; Distler, Margaret G; Zelikovsky, Alex; Eskin, Eleazar; Flint, Jonathan Systematic benchmarking of omics computational tools. Journal Article Nat Commun, 10 (1), pp. 1393, 2019, ISSN: 2041-1723. Abstract | Links | BibTeX | Tags: @article{Mangul:NatCommun:2019, title = {Systematic benchmarking of omics computational tools.}, author = { Serghei Mangul and Lana S. Martin and Brian L. Hill and Angela Ka-Mei Lam and Margaret G. Distler and Alex Zelikovsky and Eleazar Eskin and Jonathan Flint}, url = {http://dx.doi.org/10.1038/s41467-019-09406-4}, issn = {2041-1723}, year = {2019}, date = {2019-01-01}, journal = {Nat Commun}, volume = {10}, number = {1}, pages = {1393}, address = {England}, organization = {Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA. smangul@ucla.edu.}, abstract = {Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results}, keywords = {}, pubstate = {published}, tppubtype = {article} } Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results |
Mangul, Serghei; Martin, Lana S; Langmead, Ben; Sanchez-Galan, Javier E; Toma, Ian; Hormozdiari, Fereydoun; Pevzner, Pavel; Eskin, Eleazar How bioinformatics and open data can boost basic science in countries and universities with limited resources. Journal Article Nat Biotechnol, 37 (3), pp. 324-326, 2019, ISSN: 1546-1696. @article{Mangul:NatBiotechnol:2019, title = {How bioinformatics and open data can boost basic science in countries and universities with limited resources.}, author = { Serghei Mangul and Lana S. Martin and Ben Langmead and Javier E. Sanchez-Galan and Ian Toma and Fereydoun Hormozdiari and Pavel Pevzner and Eleazar Eskin}, url = {http://dx.doi.org/10.1038/s41587-019-0053-y}, issn = {1546-1696}, year = {2019}, date = {2019-01-01}, journal = {Nat Biotechnol}, volume = {37}, number = {3}, pages = {324-326}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA. smangul@ucla.edu.}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Duong, Dat; Ahmad, Wasi Uddin; Eskin, Eleazar; Chang, Kai-Wei W; Li, Jingyi Jessica Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions. Journal Article J Comput Biol, 26 (1), pp. 38-52, 2019, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: @article{Duong:JComputBiol:2019, title = {Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions.}, author = { Dat Duong and Wasi Uddin Ahmad and Eleazar Eskin and Kai-Wei W. Chang and Jingyi Jessica Li}, url = {http://dx.doi.org/10.1089/cmb.2018.0093}, issn = {1557-8666}, year = {2019}, date = {2019-01-01}, journal = {J Comput Biol}, volume = {26}, number = {1}, pages = {38-52}, address = {United States}, organization = {1 Department of Computer Science, University of California, Los Angeles, California.}, abstract = {The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural network-based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model's ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO tree-based method achieves the best classification accuracy}, keywords = {}, pubstate = {published}, tppubtype = {article} } The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural network-based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model's ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO tree-based method achieves the best classification accuracy |
Mangul, Serghei; Martin, Lana S; Eskin, Eleazar; Blekhman, Ran Improving the usability and archival stability of bioinformatics software. Journal Article Genome Biol, 20 (1), pp. 47, 2019, ISSN: 1474-760X. Abstract | Links | BibTeX | Tags: @article{Mangul:GenomeBiol:2019, title = {Improving the usability and archival stability of bioinformatics software.}, author = { Serghei Mangul and Lana S. Martin and Eleazar Eskin and Ran Blekhman}, url = {http://dx.doi.org/10.1186/s13059-019-1649-8}, issn = {1474-760X}, year = {2019}, date = {2019-01-01}, journal = {Genome Biol}, volume = {20}, number = {1}, pages = {47}, address = {England}, organization = {Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA. smangul@ucla.edu.}, abstract = {Implementation of bioinformatics software involves numerous unique challenges; a rigorous standardized approach is needed to examine software tools prior to their publication}, keywords = {}, pubstate = {published}, tppubtype = {article} } Implementation of bioinformatics software involves numerous unique challenges; a rigorous standardized approach is needed to examine software tools prior to their publication |
LaPierre, Nathan; Mangul, Serghei; Alser, Mohammed; Mandric, Igor; Wu, Nicholas C; Koslicki, David; Eskin, Eleazar MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples. Journal Article BMC Genomics, 20 (Suppl 5), pp. 423, 2019, ISSN: 1471-2164. Abstract | Links | BibTeX | Tags: @article{LaPierre:BmcGenomics:2019, title = {MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples.}, author = { Nathan LaPierre and Serghei Mangul and Mohammed Alser and Igor Mandric and Nicholas C. Wu and David Koslicki and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/s12864-019-5699-9}, issn = {1471-2164}, year = {2019}, date = {2019-01-01}, journal = {BMC Genomics}, volume = {20}, number = {Suppl 5}, pages = {423}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, 90095, CA, USA.}, abstract = {BACKGROUND: High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes. RESULTS: Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project. CONCLUSIONS: MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP}, keywords = {}, pubstate = {published}, tppubtype = {article} } BACKGROUND: High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes. RESULTS: Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project. CONCLUSIONS: MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP |
2018 |
Loohuis, Loes Olde M; Mangul, Serghei; Ori, Anil P S; Jospin, Guillaume; Koslicki, David; Yang, Harry Taegyun; Wu, Timothy; Boks, Marco P; Lomen-Hoerth, Catherine; Wiedau-Pazos, Martina; Cantor, Rita M; de Vos, Willem M; Kahn, René S; Eskin, Eleazar; Ophoff, Roel A Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia. Journal Article Transl Psychiatry, 8 (1), pp. 96, 2018, ISSN: 2158-3188. Abstract | Links | BibTeX | Tags: @article{OldeLoohuis:TranslPsychiatry:2018, title = {Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia.}, author = { Loes M. Olde Loohuis and Serghei Mangul and Anil P. S. Ori and Guillaume Jospin and David Koslicki and Harry Taegyun Yang and Timothy Wu and Marco P. Boks and Catherine Lomen-Hoerth and Martina Wiedau-Pazos and Rita M. Cantor and Willem M. de Vos and René S. Kahn and Eleazar Eskin and Roel A. Ophoff}, url = {http://dx.doi.org/10.1038/s41398-018-0107-9}, issn = {2158-3188}, year = {2018}, date = {2018-01-01}, journal = {Transl Psychiatry}, volume = {8}, number = {1}, pages = {96}, address = {United States}, organization = {Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA, USA.}, abstract = {The role of the human microbiome in health and disease is increasingly appreciated. We studied the composition of microbial communities present in blood across 192 individuals, including healthy controls and patients with three disorders affecting the brain: schizophrenia, amyotrophic lateral sclerosis, and bipolar disorder. By using high-quality unmapped RNA sequencing reads as candidate microbial reads, we performed profiling of microbial transcripts detected in whole blood. We were able to detect a wide range of bacterial and archaeal phyla in blood. Interestingly, we observed an increased microbial diversity in schizophrenia patients compared to the three other groups. We replicated this finding in an independent schizophrenia case-control cohort. This increased diversity is inversely correlated with estimated cell abundance of a subpopulation of CD8+memory T cells in healthy controls, supporting a link between microbial products found in blood, immunity and schizophrenia}, keywords = {}, pubstate = {published}, tppubtype = {article} } The role of the human microbiome in health and disease is increasingly appreciated. We studied the composition of microbial communities present in blood across 192 individuals, including healthy controls and patients with three disorders affecting the brain: schizophrenia, amyotrophic lateral sclerosis, and bipolar disorder. By using high-quality unmapped RNA sequencing reads as candidate microbial reads, we performed profiling of microbial transcripts detected in whole blood. We were able to detect a wide range of bacterial and archaeal phyla in blood. Interestingly, we observed an increased microbial diversity in schizophrenia patients compared to the three other groups. We replicated this finding in an independent schizophrenia case-control cohort. This increased diversity is inversely correlated with estimated cell abundance of a subpopulation of CD8+memory T cells in healthy controls, supporting a link between microbial products found in blood, immunity and schizophrenia |
Sul, Jae Hoon; Martin, Lana S; Eskin, Eleazar Population structure in genetic studies: Confounding factors and mixed models. Journal Article PLoS Genet, 14 (12), pp. e1007309, 2018, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Mixed Models, Population Structure Methods @article{Sul:PlosGenet:2018, title = {Population structure in genetic studies: Confounding factors and mixed models.}, author = { Jae Hoon Sul and Lana S. Martin and Eleazar Eskin}, url = {http://dx.doi.org/10.1371/journal.pgen.1007309}, issn = {1553-7404}, year = {2018}, date = {2018-01-01}, journal = {PLoS Genet}, volume = {14}, number = {12}, pages = {e1007309}, address = {United States}, organization = {Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, California, United States of America.}, abstract = {A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors}, keywords = {Mixed Models, Population Structure Methods}, pubstate = {published}, tppubtype = {article} } A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors |
Hormozdiari, Farhad; Gazal, Steven; van de Geijn, Bryce; Finucane, Hilary K; Ju, Chelsea J-T; Loh, Po-Ru R; Schoech, Armin; Reshef, Yakir; Liu, Xuanyao; O'Connor, Luke; Gusev, Alexander; Eskin, Eleazar; Price, Alkes L Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Journal Article Nat Genet, 50 (7), pp. 1041-1047, 2018, ISSN: 1546-1718. Abstract | Links | BibTeX | Tags: Fine Mapping, Functional Genomics @article{Hormozdiari:NatGenet:2018, title = {Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits.}, author = { Farhad Hormozdiari and Steven Gazal and Bryce van de Geijn and Hilary K. Finucane and Chelsea J-T Ju and Po-Ru R. Loh and Armin Schoech and Yakir Reshef and Xuanyao Liu and Luke O'Connor and Alexander Gusev and Eleazar Eskin and Alkes L. Price}, url = {http://dx.doi.org/10.1038/s41588-018-0148-2}, issn = {1546-1718}, year = {2018}, date = {2018-01-01}, journal = {Nat Genet}, volume = {50}, number = {7}, pages = {1041-1047}, address = {United States}, organization = {Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA. Hormozdiari@hsph.harvard.edu.}, abstract = {There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84$times$ for eQTLs; P=1.19$times$10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80$times$ for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06$times$; P=1.20$times$10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures}, keywords = {Fine Mapping, Functional Genomics}, pubstate = {published}, tppubtype = {article} } There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84$times$ for eQTLs; P=1.19$times$10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80$times$ for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06$times$; P=1.20$times$10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures |
Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn; Wen, Xiaoquan; Xi, Hualin S; Hormozdiari, Farhad; Ongen, Halit; Konkashbaev, Anuar; Derks, Eske M; Aguet, François; Quan, Jie; Nicolae, Dan L; Eskin, Eleazar; Kellis, Manolis; Getz, Gad; McCarthy, Mark I; Dermitzakis, Emmanouil T; Cox, Nancy J; Ardlie, Kristin G Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Journal Article Nat Genet, 50 (7), pp. 956-967, 2018, ISSN: 1546-1718. Abstract | Links | BibTeX | Tags: Co-Localization, Expression QTLs, Fine Mapping, GWAS+eQTL @article{Gamazon:NatGenet:2018, title = {Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation.}, author = { Eric R. Gamazon and Ayellet V. Segrè and Martijn van de Bunt and Xiaoquan Wen and Hualin S. Xi and Farhad Hormozdiari and Halit Ongen and Anuar Konkashbaev and Eske M. Derks and François Aguet and Jie Quan and Dan L. Nicolae and Eleazar Eskin and Manolis Kellis and Gad Getz and Mark I. McCarthy and Emmanouil T. Dermitzakis and Nancy J. Cox and Kristin G. Ardlie}, url = {http://dx.doi.org/10.1038/s41588-018-0154-4}, issn = {1546-1718}, year = {2018}, date = {2018-01-01}, journal = {Nat Genet}, volume = {50}, number = {7}, pages = {956-967}, address = {United States}, organization = {Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA. egamazon@uchicago.edu.}, abstract = {We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU}, keywords = {Co-Localization, Expression QTLs, Fine Mapping, GWAS+eQTL}, pubstate = {published}, tppubtype = {article} } We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU |
Mangul, Serghei; Martin, Lana S; Eskin, Eleazar Involving undergraduates in genomics research to narrow the education-research gap. Journal Article Nat Biotechnol, 36 (4), pp. 369-371, 2018, ISSN: 1546-1696. @article{Mangul:NatBiotechnol:2018, title = {Involving undergraduates in genomics research to narrow the education-research gap.}, author = { Serghei Mangul and Lana S. Martin and Eleazar Eskin}, url = {http://dx.doi.org/10.1038/nbt.4113}, issn = {1546-1696}, year = {2018}, date = {2018-01-01}, journal = {Nat Biotechnol}, volume = {36}, number = {4}, pages = {369-371}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA.}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Wu, Yue; Hormozdiari, Farhad; Joo, Jong Wha J; Eskin, Eleazar Improving Imputation Accuracy by Inferring Causal Variants in Genetic Studies. Journal Article J Comput Biol, 2018, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: Imputation @article{Wu:JComputBiol:2018, title = {Improving Imputation Accuracy by Inferring Causal Variants in Genetic Studies.}, author = { Yue Wu and Farhad Hormozdiari and Jong Wha J. Joo and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2018.0139}, issn = {1557-8666}, year = {2018}, date = {2018-01-01}, journal = {J Comput Biol}, address = {United States}, organization = {1 Department of Computer Science, University of California Los Angeles , Los Angeles, California.}, abstract = {Genotype imputation has been widely utilized for two reasons in the analysis of genome-wide association studies (GWAS). One reason is to increase the power for association studies when causal single nucleotide polymorphisms are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association statistics at untyped variants. In this article, we show that prediction of association statistics at untyped variants that have an influence on the trait produces is overly conservative. Current imputation methods assume that none of the variants in a region (locus consists of multiple variants) affect the trait, which is often inconsistent with the observed data. In this article, we propose a new method, CAUSAL-Imp, which can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. Our method builds on recent methods that impute the marginal statistics for GWAS by utilizing the fact that marginal statistics follow a multivariate normal distribution. We utilize both simulated and real data sets to assess the performance of our method. We show that traditional imputation approaches underestimate the association statistics for variants involved in the trait, and our results demonstrate that our approach provides less biased estimates of these association statistics}, keywords = {Imputation}, pubstate = {published}, tppubtype = {article} } Genotype imputation has been widely utilized for two reasons in the analysis of genome-wide association studies (GWAS). One reason is to increase the power for association studies when causal single nucleotide polymorphisms are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association statistics at untyped variants. In this article, we show that prediction of association statistics at untyped variants that have an influence on the trait produces is overly conservative. Current imputation methods assume that none of the variants in a region (locus consists of multiple variants) affect the trait, which is often inconsistent with the observed data. In this article, we propose a new method, CAUSAL-Imp, which can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. Our method builds on recent methods that impute the marginal statistics for GWAS by utilizing the fact that marginal statistics follow a multivariate normal distribution. We utilize both simulated and real data sets to assess the performance of our method. We show that traditional imputation approaches underestimate the association statistics for variants involved in the trait, and our results demonstrate that our approach provides less biased estimates of these association statistics |
Rahmani, Elior; Schweiger, Regev; Shenhav, Liat; Wingert, Theodora; Hofer, Ira; Gabel, Eilon; Eskin, Eleazar; Halperin, Eran BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Journal Article Genome Biol, 19 (1), pp. 141, 2018, ISSN: 1474-760X. Abstract | Links | BibTeX | Tags: @article{Rahmani:GenomeBiol:2018, title = {BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference.}, author = { Elior Rahmani and Regev Schweiger and Liat Shenhav and Theodora Wingert and Ira Hofer and Eilon Gabel and Eleazar Eskin and Eran Halperin}, url = {http://dx.doi.org/10.1186/s13059-018-1513-2}, issn = {1474-760X}, year = {2018}, date = {2018-01-01}, journal = {Genome Biol}, volume = {19}, number = {1}, pages = {141}, address = {England}, organization = {Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.}, abstract = {We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before}, keywords = {}, pubstate = {published}, tppubtype = {article} } We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before |
Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas; Gruhl, Franziska; Porath, Hagit T; Hsieh, Kevin; Chen, Linus; Daley, Timothy; Christenson, Stephanie; Wesolowska-Andersen, Agata; Spreafico, Roberto; Rios, Cydney; Eng, Celeste; Smith, Andrew D; Hernandez, Ryan D; Ophoff, Roel A; Santana, Jose Rodriguez; Levanon, Erez Y; Woodruff, Prescott G; Burchard, Esteban; Seibold, Max A; Shifman, Sagiv; Eskin, Eleazar; Zaitlen, Noah ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Journal Article Genome Biol, 19 (1), pp. 36, 2018, ISSN: 1474-760X. Abstract | Links | BibTeX | Tags: RNAseq @article{Mangul:GenomeBiol:2018, title = {ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues.}, author = { Serghei Mangul and Harry Taegyun Yang and Nicolas Strauli and Franziska Gruhl and Hagit T. Porath and Kevin Hsieh and Linus Chen and Timothy Daley and Stephanie Christenson and Agata Wesolowska-Andersen and Roberto Spreafico and Cydney Rios and Celeste Eng and Andrew D. Smith and Ryan D. Hernandez and Roel A. Ophoff and Jose Rodriguez Santana and Erez Y. Levanon and Prescott G. Woodruff and Esteban Burchard and Max A. Seibold and Sagiv Shifman and Eleazar Eskin and Noah Zaitlen}, url = {http://dx.doi.org/10.1186/s13059-018-1403-7}, issn = {1474-760X}, year = {2018}, date = {2018-01-01}, journal = {Genome Biol}, volume = {19}, number = {1}, pages = {36}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, CA, USA. smangul@ucla.edu.}, abstract = {High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki}, keywords = {RNAseq}, pubstate = {published}, tppubtype = {article} } High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki |
Kang, Eun Yong; Lee, Cue Hyunkyu; Furlotte, Nicholas A; Joo, Jong Wha J; Kostem, Emrah; Zaitlen, Noah; Eskin, Eleazar; Han, Buhm An Association Mapping Framework To Account for Potential Sex Difference in Genetic Architectures. Journal Article Genetics, 2018, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Association Study Methods, Meta-Analysis @article{Kang:Genetics:2018, title = {An Association Mapping Framework To Account for Potential Sex Difference in Genetic Architectures.}, author = { Eun Yong Kang and Cue Hyunkyu Lee and Nicholas A. Furlotte and Jong Wha J. Joo and Emrah Kostem and Noah Zaitlen and Eleazar Eskin and Buhm Han}, url = {http://dx.doi.org/10.1534/genetics.117.300501}, issn = {1943-2631}, year = {2018}, date = {2018-01-01}, journal = {Genetics}, address = {United States}, organization = {University of California, Los Angeles.}, abstract = {Over the past few years, genome-wide association studies have identified many trait-associated loci that have different effects on females and males, which increased attention to the genetic architecture differences between the sexes. The between-sex differences in genetic architectures can cause a variety of phenomena such as differences in the effect sizes at trait-associated loci, differences in the magnitudes of polygenic background effects, and differences in the phenotypic variances. However, current association testing approaches for dealing with sex, such as including sex as a covariate, cannot fully account for these phenomena and can be suboptimal in statistical power. We present a novel association mapping framework, MetaSex, that can comprehensively account for the genetic architecture differences between the sexes. Through simulations and applications to real data, we show that our framework has superior performance than previous approaches in association mapping}, keywords = {Association Study Methods, Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Over the past few years, genome-wide association studies have identified many trait-associated loci that have different effects on females and males, which increased attention to the genetic architecture differences between the sexes. The between-sex differences in genetic architectures can cause a variety of phenomena such as differences in the effect sizes at trait-associated loci, differences in the magnitudes of polygenic background effects, and differences in the phenotypic variances. However, current association testing approaches for dealing with sex, such as including sex as a covariate, cannot fully account for these phenomena and can be suboptimal in statistical power. We present a novel association mapping framework, MetaSex, that can comprehensively account for the genetic architecture differences between the sexes. Through simulations and applications to real data, we show that our framework has superior performance than previous approaches in association mapping |
Kennedy, Elizabeth M; Goehring, George N; Nichols, Michael H; Robins, Chloe; Mehta, Divya; Klengel, Torsten; Eskin, Eleazar; Smith, Alicia K; Conneely, Karen N An integrated -omics analysis of the epigenetic landscape of gene expression in human blood cells. Journal Article BMC Genomics, 19 (1), pp. 476, 2018, ISSN: 1471-2164. Abstract | Links | BibTeX | Tags: @article{Kennedy:BmcGenomics:2018, title = {An integrated -omics analysis of the epigenetic landscape of gene expression in human blood cells.}, author = { Elizabeth M. Kennedy and George N. Goehring and Michael H. Nichols and Chloe Robins and Divya Mehta and Torsten Klengel and Eleazar Eskin and Alicia K. Smith and Karen N. Conneely}, url = {http://dx.doi.org/10.1186/s12864-018-4842-3}, issn = {1471-2164}, year = {2018}, date = {2018-01-01}, journal = {BMC Genomics}, volume = {19}, number = {1}, pages = {476}, address = {England}, organization = {Genetics and Molecular Biology Program, Emory University, Atlanta, GA, USA. ekennedy983@gmail.com.}, abstract = {BACKGROUND: Gene expression can be influenced by DNA methylation 1) distally, at regulatory elements such as enhancers, as well as 2) proximally, at promoters. Our current understanding of the influence of distal DNA methylation changes on gene expression patterns is incomplete. Here, we characterize genome-wide methylation and expression patterns for ~13udotk genes to explore how DNA methylation interacts with gene expression, throughout the genome. RESULTS: We used a linear mixed model framework to assess the correlation of DNA methylation at ~400udotk CpGs with gene expression changes at ~13udotk transcripts in two independent datasets from human blood cells. Among CpGs at which methylation significantly associates with transcription (eCpGs), >50% are distal (>50udotkb) or trans (different chromosome) to the correlated gene. Many eCpG-transcript pairs are consistent between studies and ~90% of neighboring eCpGs associate with the same gene, within studies. We find that enhancers (P<5e-18) and microRNA genes (P=9e-3) are overrepresented among trans eCpGs, and insulators and long intergenic non-coding RNAs are enriched among cis and distal eCpGs. Intragenic-eCpG-transcript correlations are negative in 60-70% of occurrences and are enriched for annotated gene promoters and enhancers (P<0.002), highlighting the importance of intragenic regulation. Gene Ontology analysis indicates that trans eCpGs are enriched for transcription factor genes and chromatin modifiers, suggesting that some trans eCpGs represent the influence of gene networks and higher-order transcriptional control. CONCLUSIONS: This work sheds new light on the interplay between epigenetic changes and gene expression, and provides useful data for mining biologically-relevant results from epigenome-wide association studies}, keywords = {}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Gene expression can be influenced by DNA methylation 1) distally, at regulatory elements such as enhancers, as well as 2) proximally, at promoters. Our current understanding of the influence of distal DNA methylation changes on gene expression patterns is incomplete. Here, we characterize genome-wide methylation and expression patterns for ~13udotk genes to explore how DNA methylation interacts with gene expression, throughout the genome. RESULTS: We used a linear mixed model framework to assess the correlation of DNA methylation at ~400udotk CpGs with gene expression changes at ~13udotk transcripts in two independent datasets from human blood cells. Among CpGs at which methylation significantly associates with transcription (eCpGs), >50% are distal (>50udotkb) or trans (different chromosome) to the correlated gene. Many eCpG-transcript pairs are consistent between studies and ~90% of neighboring eCpGs associate with the same gene, within studies. We find that enhancers (P<5e-18) and microRNA genes (P=9e-3) are overrepresented among trans eCpGs, and insulators and long intergenic non-coding RNAs are enriched among cis and distal eCpGs. Intragenic-eCpG-transcript correlations are negative in 60-70% of occurrences and are enriched for annotated gene promoters and enhancers (P<0.002), highlighting the importance of intragenic regulation. Gene Ontology analysis indicates that trans eCpGs are enriched for transcription factor genes and chromatin modifiers, suggesting that some trans eCpGs represent the influence of gene networks and higher-order transcriptional control. CONCLUSIONS: This work sheds new light on the interplay between epigenetic changes and gene expression, and provides useful data for mining biologically-relevant results from epigenome-wide association studies |
Hormozdiari, Farhad I; Jung, Junghyun; Eskin, Eleazar; Joo, Jong Wha J Leveraging allelic heterogeneity to increase power of association testing Journal Article bioRxiv, pp. 498360, 2018. Abstract | Links | BibTeX | Tags: Alleliec Heterogeneity, Association Study Methods, Multi-SNP Association @article{Hormozdiari:Biorxiv:2018, title = {Leveraging allelic heterogeneity to increase power of association testing}, author = { Farhad I. Hormozdiari and Junghyun Jung and Eleazar Eskin and Jong Wha J. Joo}, url = {http://dx.doi.org/10.1101/498360}, year = {2018}, date = {2018-01-01}, journal = {bioRxiv}, pages = {498360}, publisher = {Cold Spring Harbor Laboratory}, organization = {Department of Computer Science and Engineering, Dongguk University-Seoul}, abstract = {The standard genome-wide association studies (GWAS) detects an association between a single variant and a phenotype of interest. Recently, several studies reported that at many risk loci, there may exist multiple causal variants. For a locus with multiple causal variants with small effect sizes, the standard association test is underpowered to detect the associations. Alternatively, an approach considering effects of multiple variants simultaneously may increase statistical power by leveraging effects of multiple causal variants. In this paper, we propose a new statistical method, Model-based Association test Reflecting causal Status (MARS), that tries to find an association between variants in risk loci and a phenotype, considering the causal status of the variants. One of the main advantages of MARS is that it only requires the existing summary statistics to detect associated risk loci. Thus, MARS is applicable to any association study with summary statistics, even though individual level data is not available for the study. Utilizing extensive simulated data sets, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while robustly controls the type I error. Applied to data of 44 tissues provided by the Genotype-Tissue Expression (GTEx) consortium, we show that MARS identifies more eGenes compared to previous approaches in most of the tissues; e.g. MARS identified 16% more eGenes than the ones reported by the GTEx consortium. Moreover, applied to Northern Finland Birth Cohort (NFBC) data, we demonstrate that MARS effectively identifies association loci with improved power (56% of more loci found by MARS) inGWAS studies compared to the standard association test.}, keywords = {Alleliec Heterogeneity, Association Study Methods, Multi-SNP Association}, pubstate = {published}, tppubtype = {article} } The standard genome-wide association studies (GWAS) detects an association between a single variant and a phenotype of interest. Recently, several studies reported that at many risk loci, there may exist multiple causal variants. For a locus with multiple causal variants with small effect sizes, the standard association test is underpowered to detect the associations. Alternatively, an approach considering effects of multiple variants simultaneously may increase statistical power by leveraging effects of multiple causal variants. In this paper, we propose a new statistical method, Model-based Association test Reflecting causal Status (MARS), that tries to find an association between variants in risk loci and a phenotype, considering the causal status of the variants. One of the main advantages of MARS is that it only requires the existing summary statistics to detect associated risk loci. Thus, MARS is applicable to any association study with summary statistics, even though individual level data is not available for the study. Utilizing extensive simulated data sets, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while robustly controls the type I error. Applied to data of 44 tissues provided by the Genotype-Tissue Expression (GTEx) consortium, we show that MARS identifies more eGenes compared to previous approaches in most of the tissues; e.g. MARS identified 16% more eGenes than the ones reported by the GTEx consortium. Moreover, applied to Northern Finland Birth Cohort (NFBC) data, we demonstrate that MARS effectively identifies association loci with improved power (56% of more loci found by MARS) inGWAS studies compared to the standard association test. |
Gai, Lisa; Eskin, Eleazar Finding associated variants in genome-wide association studies on multiple traits. Journal Article Bioinformatics, 34 (13), pp. i467-i474, 2018, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Multiple Phenotypes @article{Gai:Bioinformatics:2018, title = {Finding associated variants in genome-wide association studies on multiple traits.}, author = { Lisa Gai and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/bty249}, issn = {1367-4811}, year = {2018}, date = {2018-01-01}, journal = {Bioinformatics}, volume = {34}, number = {13}, pages = {i467-i474}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, CA, USA.}, abstract = {Motivation: Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time. Results: Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci. Availability and implementation: Our source code is available at https://github.com/lgai/CONFIT. Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {Multiple Phenotypes}, pubstate = {published}, tppubtype = {article} } Motivation: Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time. Results: Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci. Availability and implementation: Our source code is available at https://github.com/lgai/CONFIT. Supplementary information: Supplementary data are available at Bioinformatics online |
2017 |
Robert Brown Gleb Kichaev, Nicholas Mancuso James Boocock ; Pasaniuc, Bogdan Enhanced methods to detect haplotypic effects on gene expression Journal Article Bioinformatics, pp. btx142, 2017. @article{Brown2017, title = {Enhanced methods to detect haplotypic effects on gene expression}, author = {Robert Brown, Gleb Kichaev, Nicholas Mancuso, James Boocock, and Bogdan Pasaniuc}, url = {https://www.ncbi.nlm.nih.gov/pubmed/28369161}, doi = {10.1093/bioinformatics/btx142}, year = {2017}, date = {2017-03-22}, journal = {Bioinformatics}, pages = {btx142}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Mangul, Serghei; Martin, Lana S; Hoffmann, Alexander; Pellegrini, Matteo; Eskin, Eleazar Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX. Journal Article Trends Biotechnol, 2017, ISSN: 1879-3096. Abstract | Links | BibTeX | Tags: @article{Mangul:TrendsBiotechnol:2017, title = {Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX.}, author = { Serghei Mangul and Lana S. Martin and Alexander Hoffmann and Matteo Pellegrini and Eleazar Eskin}, url = {http://dx.doi.org/10.1016/j.tibtech.2017.06.007}, issn = {1879-3096}, year = {2017}, date = {2017-01-01}, journal = {Trends Biotechnol}, address = {England}, organization = {Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, Boyer Hall, 611 Charles Young Drive, UCLA, Los Angeles, CA 90095, USA. Electronic address: smangul@ucla.edu.}, abstract = {Life and medical science researchers increasingly rely on applications that lack a graphical interface. Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. We present a training model for use of command-line tools when the learner has little to no prior knowledge of UNIX}, keywords = {}, pubstate = {published}, tppubtype = {article} } Life and medical science researchers increasingly rely on applications that lack a graphical interface. Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. We present a training model for use of command-line tools when the learner has little to no prior knowledge of UNIX |
Crawford, Nicholas G; Kelly, Derek E; Hansen, Matthew E B; Beltrame, Marcia H; Fan, Shaohua; Bowman, Shanna L; Jewett, Ethan; Ranciaro, Alessia; Thompson, Simon; Lo, Yancy; Pfeifer, Susanne P; Jensen, Jeffrey D; Campbell, Michael C; Beggs, William; Hormozdiari, Farhad; Mpoloka, Sununguko Wata; Mokone, Gaonyadiwe George; Nyambo, Thomas; Meskel, Dawit Wolde; Belay, Gurja; Haut, Jake; Rothschild, Harriet; Zon, Leonard; Zhou, Yi; Kovacs, Michael A; Xu, Mai; Zhang, Tongwu; Bishop, Kevin; Sinclair, Jason; Rivas, Cecilia; Elliot, Eugene; Choi, Jiyeon; Li, Shengchao A; Hicks, Belynda; Burgess, Shawn; Abnet, Christian; Watkins-Chow, Dawn E; Oceana, Elena; Song, Yun S; Eskin, Eleazar; Brown, Kevin M; Marks, Michael S; Loftus, Stacie K; Pavan, William J; Yeager, Meredith; Chanock, Stephen; Tishkoff, Sarah A Loci associated with skin pigmentation identified in African populations. Journal Article Science, 358 (6365), 2017, ISSN: 1095-9203. Abstract | Links | BibTeX | Tags: @article{Crawford:Science:2017, title = {Loci associated with skin pigmentation identified in African populations.}, author = { Nicholas G. Crawford and Derek E. Kelly and Matthew E. B. Hansen and Marcia H. Beltrame and Shaohua Fan and Shanna L. Bowman and Ethan Jewett and Alessia Ranciaro and Simon Thompson and Yancy Lo and Susanne P. Pfeifer and Jeffrey D. Jensen and Michael C. Campbell and William Beggs and Farhad Hormozdiari and Sununguko Wata Mpoloka and Gaonyadiwe George Mokone and Thomas Nyambo and Dawit Wolde Meskel and Gurja Belay and Jake Haut and Harriet Rothschild and Leonard Zon and Yi Zhou and Michael A. Kovacs and Mai Xu and Tongwu Zhang and Kevin Bishop and Jason Sinclair and Cecilia Rivas and Eugene Elliot and Jiyeon Choi and Shengchao A. Li and Belynda Hicks and Shawn Burgess and Christian Abnet and Dawn E. Watkins-Chow and Elena Oceana and Yun S. Song and Eleazar Eskin and Kevin M. Brown and Michael S. Marks and Stacie K. Loftus and William J. Pavan and Meredith Yeager and Stephen Chanock and Sarah A. Tishkoff}, url = {http://dx.doi.org/10.1126/science.aan8433}, issn = {1095-9203}, year = {2017}, date = {2017-01-01}, journal = {Science}, volume = {358}, number = {6365}, address = {United States}, organization = {Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.}, abstract = {Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5, MFSD12, DDB1, TMEM138, OCA2, and HERC2that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in South Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138correlate with expression of ultraviolet response genes under selection in Eurasians}, keywords = {}, pubstate = {published}, tppubtype = {article} } Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5, MFSD12, DDB1, TMEM138, OCA2, and HERC2that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in South Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138correlate with expression of ultraviolet response genes under selection in Eurasians |
Rahmani, Elior; Zaitlen, Noah; Baran, Yael; Eng, Celeste; Hu, Donglei; Galanter, Joshua; Oh, Sam; Burchard, Esteban G; Eskin, Eleazar; Zou, James; Halperin, Eran Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation. Journal Article Nat Methods, 14 (3), pp. 218-219, 2017, ISSN: 1548-7105. Links | BibTeX | Tags: Confounding @article{Rahmani:NatMethods:2017, title = {Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation.}, author = { Elior Rahmani and Noah Zaitlen and Yael Baran and Celeste Eng and Donglei Hu and Joshua Galanter and Sam Oh and Esteban G. Burchard and Eleazar Eskin and James Zou and Eran Halperin}, url = {http://dx.doi.org/10.1038/nmeth.4190}, issn = {1548-7105}, year = {2017}, date = {2017-01-01}, journal = {Nat Methods}, volume = {14}, number = {3}, pages = {218-219}, address = {United States}, organization = {Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.}, keywords = {Confounding}, pubstate = {published}, tppubtype = {article} } |
Jasinska, Anna J; Zelaya, Ivette; Service, Susan K; Peterson, Christine B; Cantor, Rita M; Choi, Oi-Wa W; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A; Fears, Scott; Furterer, Allison E; Huang, Yu S; Ramensky, Vasily; Schmitt, Christopher A; Svardal, Hannes; Jorgensen, Matthew J; Kaplan, Jay R; Villar, Diego; Aken, Bronwen L; Flicek, Paul; Nag, Rishi; Wong, Emily S; Blangero, John; Dyer, Thomas D; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K; Jentsch, David J; Warren, Wesley; Coppola, Giovanni; Woods, Roger P; Freimer, Nelson B Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate. Journal Article Nat Genet, 49 (12), pp. 1714-1721, 2017, ISSN: 1546-1718. Abstract | Links | BibTeX | Tags: @article{Jasinska:NatGenet:2017, title = {Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate.}, author = { Anna J. Jasinska and Ivette Zelaya and Susan K. Service and Christine B. Peterson and Rita M. Cantor and Oi-Wa W. Choi and Joseph DeYoung and Eleazar Eskin and Lynn A. Fairbanks and Scott Fears and Allison E. Furterer and Yu S. Huang and Vasily Ramensky and Christopher A. Schmitt and Hannes Svardal and Matthew J. Jorgensen and Jay R. Kaplan and Diego Villar and Bronwen L. Aken and Paul Flicek and Rishi Nag and Emily S. Wong and John Blangero and Thomas D. Dyer and Marina Bogomolov and Yoav Benjamini and George M. Weinstock and Ken Dewar and Chiara Sabatti and Richard K. Wilson and J. David Jentsch and Wesley Warren and Giovanni Coppola and Roger P. Woods and Nelson B. Freimer}, url = {http://dx.doi.org/10.1038/ng.3959}, issn = {1546-1718}, year = {2017}, date = {2017-01-01}, journal = {Nat Genet}, volume = {49}, number = {12}, pages = {1714-1721}, address = {United States}, organization = {Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.}, abstract = {By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders}, keywords = {}, pubstate = {published}, tppubtype = {article} } By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders |
Buckley, Matthew T; Racimo, Fernando; Allentoft, Morten E; Jensen, Majken K; Jonsson, Anna; Huang, Hongyan; Hormozdiari, Farhad; Sikora, Martin; Marnetto, Davide; Eskin, Eleazar; Jørgensen, Marit E; Grarup, Niels; Pedersen, Oluf; Hansen, Torben; Kraft, Peter; Willerslev, Eske; Nielsen, Rasmus Selection in Europeans on fatty acid desaturases associated with dietary changes. Journal Article Mol Biol Evol, 2017, ISSN: 1537-1719. Abstract | Links | BibTeX | Tags: @article{Buckley:MolBiolEvol:2017, title = {Selection in Europeans on fatty acid desaturases associated with dietary changes.}, author = { Matthew T. Buckley and Fernando Racimo and Morten E. Allentoft and Majken K. Jensen and Anna Jonsson and Hongyan Huang and Farhad Hormozdiari and Martin Sikora and Davide Marnetto and Eleazar Eskin and Marit E. Jørgensen and Niels Grarup and Oluf Pedersen and Torben Hansen and Peter Kraft and Eske Willerslev and Rasmus Nielsen}, url = {http://dx.doi.org/10.1093/molbev/msx103}, issn = {1537-1719}, year = {2017}, date = {2017-01-01}, journal = {Mol Biol Evol}, address = {United States}, organization = {Departments of Integrative Biology and Statistics, University of California Berkeley, Berkeley, CA 94720, USA.}, abstract = {FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. Prior studies indicate that the FADS genes have been subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. By comparing FADS sequencing data from present-day and Bronze Age (5-3k years ago) Europeans, we identify possible targets of selection in the European population, which suggest that selection has targeted different alleles in the FADS genes in Europe than it has in South Asia or Greenland. The alleles showing the strongest changes in allele frequency since the Bronze Age show associations with expression changes and multiple lipid-related phenotypes. Furthermore, the selected alleles are associated with a decrease in linoleic acid and an increase in arachidonic and eicosapentaenoic acids among Europeans; this is an opposite effect of that observed for selected alleles in Inuit from Greenland. We show that multiple SNPs in the region affect expression levels and PUFA synthesis. Additionally, we find evidence for a gene-environment interaction influencing low-density lipoprotein (LDL) levels between alleles affecting PUFA synthesis and PUFA dietary intake: carriers of the derived allele display lower LDL cholesterol levels with a higher intake of PUFAs. We hypothesize that the selective patterns observed in Europeans were driven by a change in dietary composition of fatty acids following the transition to agriculture, resulting in a lower intake of arachidonic acid and eicosapentaenoic acid, but a higher intake of linoleic acid and $alpha$-linolenic acid}, keywords = {}, pubstate = {published}, tppubtype = {article} } FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. Prior studies indicate that the FADS genes have been subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. By comparing FADS sequencing data from present-day and Bronze Age (5-3k years ago) Europeans, we identify possible targets of selection in the European population, which suggest that selection has targeted different alleles in the FADS genes in Europe than it has in South Asia or Greenland. The alleles showing the strongest changes in allele frequency since the Bronze Age show associations with expression changes and multiple lipid-related phenotypes. Furthermore, the selected alleles are associated with a decrease in linoleic acid and an increase in arachidonic and eicosapentaenoic acids among Europeans; this is an opposite effect of that observed for selected alleles in Inuit from Greenland. We show that multiple SNPs in the region affect expression levels and PUFA synthesis. Additionally, we find evidence for a gene-environment interaction influencing low-density lipoprotein (LDL) levels between alleles affecting PUFA synthesis and PUFA dietary intake: carriers of the derived allele display lower LDL cholesterol levels with a higher intake of PUFAs. We hypothesize that the selective patterns observed in Europeans were driven by a change in dietary composition of fatty acids following the transition to agriculture, resulting in a lower intake of arachidonic acid and eicosapentaenoic acid, but a higher intake of linoleic acid and $alpha$-linolenic acid |
He, Dan; Wang, Zhanyong; Parida, Laxmi; Eskin, Eleazar IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees. Journal Article IEEE/ACM Trans Comput Biol Bioinform, 2017, ISSN: 1557-9964. Abstract | Links | BibTeX | Tags: Pedigree Inference @article{He:IeeeAcmTransComputBiolBioinform:2017, title = {IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees.}, author = { Dan He and Zhanyong Wang and Laxmi Parida and Eleazar Eskin}, url = {http://dx.doi.org/10.1109/TCBB.2017.2688439}, issn = {1557-9964}, year = {2017}, date = {2017-01-01}, journal = {IEEE/ACM Trans Comput Biol Bioinform}, address = {United States}, abstract = {Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings. We note that the presence of half-siblings makes the reconstruction problem significantly more challenging which is why previous methods exclude the possibility of half-siblings. We proposed a novel model as well as an efficient graph algorithm and experiments show that our algorithm achieves relatively accurate reconstruction. To our knowledge, this is the first method that is able to handle pedigree reconstruction from genotype data when half-sibling exists in any generation of the pedigree}, keywords = {Pedigree Inference}, pubstate = {published}, tppubtype = {article} } Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings. We note that the presence of half-siblings makes the reconstruction problem significantly more challenging which is why previous methods exclude the possibility of half-siblings. We proposed a novel model as well as an efficient graph algorithm and experiments show that our algorithm achieves relatively accurate reconstruction. To our knowledge, this is the first method that is able to handle pedigree reconstruction from genotype data when half-sibling exists in any generation of the pedigree |
Mangul, Serghei; Yang, Harry Taegyun; Hormozdiari, Farhad; Dainis, Alex; Tseng, Elizabeth; Ashley, Euan A; Zelikovsky, Alex; Eskin, Eleazar HapIso : An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads. Journal Article IEEE Trans Nanobioscience, 16 (2), pp. 108-115, 2017, ISSN: 1558-2639. Abstract | Links | BibTeX | Tags: Allele Specific Expression, Haplotype Phasing, Haplotyping from Sequences, RNAseq, Sequence Assembly @article{Mangul:IeeeTransNanobioscience:2017, title = {HapIso : An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads.}, author = { Serghei Mangul and Harry Taegyun Yang and Farhad Hormozdiari and Alex Dainis and Elizabeth Tseng and Euan A. Ashley and Alex Zelikovsky and Eleazar Eskin}, url = {http://dx.doi.org/10.1109/TNB.2017.2675981}, issn = {1558-2639}, year = {2017}, date = {2017-01-01}, journal = {IEEE Trans Nanobioscience}, volume = {16}, number = {2}, pages = {108-115}, address = {United States}, abstract = {Sequencing of RNA provides the possibility to study an individual's transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete haplotype isoforms. This allows partitioning the reads into two parental haplotypes. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to accurately detect the genetic variants and assemble them into the haplotype-specific isoforms. In this paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a method able to tolerate the relatively high error-rate of the single-molecule platform and partition the isoform reads into the parental alleles. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations. HapIso uses a k-means clustering algorithm aiming to group the reads into two meaningful clusters maximizing the similarity of the reads within cluster and minimizing the similarity of the reads from different clusters. Each cluster corresponds to a parental haplotype. We used family pedigree information to evaluate our approach. Experimental validation suggests that HapIso is able to tolerate the relatively high error-rate and accurately partition the reads into the parental alleles of the isoform transcripts. We also applied HapIso to novel clinical single-molecule RNA-Seq data to estimate ASE of genes of interest. Our method was able to correct reads and determine Glu1883Lys point mutation of clinical signifcance validated by GeneDx HCM Panel. Furthermore, our method is the first method able to reconstruct the haplotype-specific isoforms from long single-molecule reads}, keywords = {Allele Specific Expression, Haplotype Phasing, Haplotyping from Sequences, RNAseq, Sequence Assembly}, pubstate = {published}, tppubtype = {article} } Sequencing of RNA provides the possibility to study an individual's transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete haplotype isoforms. This allows partitioning the reads into two parental haplotypes. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to accurately detect the genetic variants and assemble them into the haplotype-specific isoforms. In this paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a method able to tolerate the relatively high error-rate of the single-molecule platform and partition the isoform reads into the parental alleles. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations. HapIso uses a k-means clustering algorithm aiming to group the reads into two meaningful clusters maximizing the similarity of the reads within cluster and minimizing the similarity of the reads from different clusters. Each cluster corresponds to a parental haplotype. We used family pedigree information to evaluate our approach. Experimental validation suggests that HapIso is able to tolerate the relatively high error-rate and accurately partition the reads into the parental alleles of the isoform transcripts. We also applied HapIso to novel clinical single-molecule RNA-Seq data to estimate ASE of genes of interest. Our method was able to correct reads and determine Glu1883Lys point mutation of clinical signifcance validated by GeneDx HCM Panel. Furthermore, our method is the first method able to reconstruct the haplotype-specific isoforms from long single-molecule reads |
Bilow, Michael; Crespo, Fernando; Pan, Zhicheng; Eskin, Eleazar; Eyheramendy, Susana Simultaneous Modeling of Disease Status and Clinical Phenotypes To Increase Power in Genome-Wide Association Studies. Journal Article Genetics, 205 (3), pp. 1041-1047, 2017, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Causal Inference Biology, Covariates @article{Bilow:Genetics:2017, title = {Simultaneous Modeling of Disease Status and Clinical Phenotypes To Increase Power in Genome-Wide Association Studies.}, author = { Michael Bilow and Fernando Crespo and Zhicheng Pan and Eleazar Eskin and Susana Eyheramendy}, url = {http://dx.doi.org/10.1534/genetics.116.198473}, issn = {1943-2631}, year = {2017}, date = {2017-01-01}, journal = {Genetics}, volume = {205}, number = {3}, pages = {1041-1047}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, California.}, abstract = {Genome-wide association studies have identified thousands of variants implicated in dozens of complex diseases. Most studies collect individuals with and without disease and search for variants with different frequencies between the groups. For many of these studies, additional disease traits are also collected. Jointly modeling clinical phenotype and disease status is a promising way to increase power to detect true associations between genetics and disease. In particular, this approach increases the potential for discovering genetic variants that are associated with both a clinical phenotype and a disease. Standard multivariate techniques fail to effectively solve this problem, because their case-control status is discrete and not continuous. Standard approaches to estimate model parameters are biased due to the ascertainment in case-control studies. We present a novel method that resolves both of these issues for simultaneous association testing of genetic variants that have both case status and a clinical covariate. We demonstrate the utility of our method using both simulated data and the Northern Finland Birth Cohort data}, keywords = {Causal Inference Biology, Covariates}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies have identified thousands of variants implicated in dozens of complex diseases. Most studies collect individuals with and without disease and search for variants with different frequencies between the groups. For many of these studies, additional disease traits are also collected. Jointly modeling clinical phenotype and disease status is a promising way to increase power to detect true associations between genetics and disease. In particular, this approach increases the potential for discovering genetic variants that are associated with both a clinical phenotype and a disease. Standard multivariate techniques fail to effectively solve this problem, because their case-control status is discrete and not continuous. Standard approaches to estimate model parameters are biased due to the ascertainment in case-control studies. We present a novel method that resolves both of these issues for simultaneous association testing of genetic variants that have both case status and a clinical covariate. We demonstrate the utility of our method using both simulated data and the Northern Finland Birth Cohort data |
Park, Danny S; Eskin, Itamar; Kang, Eun Yong; Gamazon, Eric R; Eng, Celeste; Gignoux, Christopher R; Galanter, Joshua M; Burchard, Esteban; Ye, Chun J; Aschard, Hugues; Eskin, Eleazar; Halperin, Eran; Zaitlen, Noah An ancestry-based approach for detecting interactions. Journal Article Genet Epidemiol, 2017, ISSN: 1098-2272. Abstract | Links | BibTeX | Tags: Ancestry Mapping, Gene-Gene Interactions @article{Park:GenetEpidemiol:2017, title = {An ancestry-based approach for detecting interactions.}, author = { Danny S. Park and Itamar Eskin and Eun Yong Kang and Eric R. Gamazon and Celeste Eng and Christopher R. Gignoux and Joshua M. Galanter and Esteban Burchard and Chun J. Ye and Hugues Aschard and Eleazar Eskin and Eran Halperin and Noah Zaitlen}, url = {http://dx.doi.org/10.1002/gepi.22087}, issn = {1098-2272}, year = {2017}, date = {2017-01-01}, journal = {Genet Epidemiol}, address = {United States}, organization = {Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.}, abstract = {BACKGROUND: Epistasis and gene-environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene-environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies. RESULTS: In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at P<5$times$10-8. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low P-values (P<1.8$times$10-6). CONCLUSION: We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits}, keywords = {Ancestry Mapping, Gene-Gene Interactions}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Epistasis and gene-environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene-environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies. RESULTS: In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at P<5$times$10-8. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low P-values (P<1.8$times$10-6). CONCLUSION: We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits |
Lozano, Jose A; Hormozdiari, Farhad; Joo, Jong Wha; Han, Buhm; Eskin, Eleazar The Multivariate Normal Distribution Framework for Analyzing Association Studies Journal Article bioRxiv, pp. 208199, 2017. Abstract | Links | BibTeX | Tags: Fine Mapping, Multi-SNP Association, Multiple Testing @article{Lozano:Biorxiv:2017, title = {The Multivariate Normal Distribution Framework for Analyzing Association Studies}, author = { Jose A. Lozano and Farhad Hormozdiari and Jong Wha Joo and Buhm Han and Eleazar Eskin}, url = {http://dx.doi.org/10.1101/208199}, year = {2017}, date = {2017-01-01}, journal = {bioRxiv}, pages = {208199}, publisher = {Cold Spring Harbor Laboratory}, organization = {UCLA}, abstract = {Genome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two cohorts is considered an associated variant. A challenge in the analysis of GWAS studies is the fact that human population history causes nearby genetic variants in the genome to be correlated with each other. In this review, we demonstrate how to utilize the multivariate normal (MVN) distribution to explicitly take into account the correlation between genetic variants in a comprehensive framework for analysis of GWAS. We show how the MVN framework can be applied to perform association testing, correct for multiple hypothesis testing, estimate statistical power, and perform fine mapping and imputation.}, keywords = {Fine Mapping, Multi-SNP Association, Multiple Testing}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two cohorts is considered an associated variant. A challenge in the analysis of GWAS studies is the fact that human population history causes nearby genetic variants in the genome to be correlated with each other. In this review, we demonstrate how to utilize the multivariate normal (MVN) distribution to explicitly take into account the correlation between genetic variants in a comprehensive framework for analysis of GWAS. We show how the MVN framework can be applied to perform association testing, correct for multiple hypothesis testing, estimate statistical power, and perform fine mapping and imputation. |
Lee, C H; Eskin, E; Han, B Increasing the power of meta-analysis of genome-wide association studies to detect heterogeneous effects. Journal Article Bioinformatics, 33 (14), pp. i379-i388, 2017, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Meta-Analysis @article{Lee:Bioinformatics:2017, title = {Increasing the power of meta-analysis of genome-wide association studies to detect heterogeneous effects.}, author = { C. H. Lee and E. Eskin and B. Han}, url = {http://dx.doi.org/10.1093/bioinformatics/btx242}, issn = {1367-4811}, year = {2017}, date = {2017-01-01}, journal = {Bioinformatics}, volume = {33}, number = {14}, pages = {i379-i388}, address = {England}, organization = {Department of Convergence Medicine, University of Ulsan College of Medicine & Asan Institute for Life Sciences, Asan Medical Center, Songpa-gu, Seoul 138-736, Korea.}, abstract = {Motivation: Meta-analysis is essential to combine the results of genome-wide association studies (GWASs). Recent large-scale meta-analyses have combined studies of different ethnicities, environments and even studies of different related phenotypes. These differences between studies can manifest as effect size heterogeneity. We previously developed a modified random effects model (RE2) that can achieve higher power to detect heterogeneous effects than the commonly used fixed effects model (FE). However, RE2 cannot perform meta-analysis of correlated statistics, which are found in recent research designs, and the identified variants often overlap with those found by FE. Results: Here, we propose RE2C, which increases the power of RE2 in two ways. First, we generalized the likelihood model to account for correlations of statistics to achieve optimal power, using an optimization technique based on spectral decomposition for efficient parameter estimation. Second, we designed a novel statistic to focus on the heterogeneous effects that FE cannot detect, thereby, increasing the power to identify new associations. We developed an efficient and accurate p -value approximation procedure using analytical decomposition of the statistic. In simulations, RE2C achieved a dramatic increase in power compared with the decoupling approach (71% vs. 21%) when the statistics were correlated. Even when the statistics are uncorrelated, RE2C achieves a modest increase in power. Applications to real genetic data supported the utility of RE2C. RE2C is highly efficient and can meta-analyze one hundred GWASs in one day. Availability and implementation: The software is freely available at http://software.buhmhan.com/RE2C . Contact: buhm.han@amc.seoul.kr. Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Motivation: Meta-analysis is essential to combine the results of genome-wide association studies (GWASs). Recent large-scale meta-analyses have combined studies of different ethnicities, environments and even studies of different related phenotypes. These differences between studies can manifest as effect size heterogeneity. We previously developed a modified random effects model (RE2) that can achieve higher power to detect heterogeneous effects than the commonly used fixed effects model (FE). However, RE2 cannot perform meta-analysis of correlated statistics, which are found in recent research designs, and the identified variants often overlap with those found by FE. Results: Here, we propose RE2C, which increases the power of RE2 in two ways. First, we generalized the likelihood model to account for correlations of statistics to achieve optimal power, using an optimization technique based on spectral decomposition for efficient parameter estimation. Second, we designed a novel statistic to focus on the heterogeneous effects that FE cannot detect, thereby, increasing the power to identify new associations. We developed an efficient and accurate p -value approximation procedure using analytical decomposition of the statistic. In simulations, RE2C achieved a dramatic increase in power compared with the decoupling approach (71% vs. 21%) when the statistics were correlated. Even when the statistics are uncorrelated, RE2C achieves a modest increase in power. Applications to real genetic data supported the utility of RE2C. RE2C is highly efficient and can meta-analyze one hundred GWASs in one day. Availability and implementation: The software is freely available at http://software.buhmhan.com/RE2C . Contact: buhm.han@amc.seoul.kr. Supplementary information: Supplementary data are available at Bioinformatics online |
Duong, Dat; Gai, Lisa; Snir, Sagi; Kang, Eun Yong; Han, Buhm; Sul, Jae Hoon; Eskin, Eleazar Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Journal Article Bioinformatics, 33 (14), pp. i67-i74, 2017, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Expression QTLs, Meta-Analysis @article{Duong:Bioinformatics:2017, title = {Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes.}, author = { Dat Duong and Lisa Gai and Sagi Snir and Eun Yong Kang and Buhm Han and Jae Hoon Sul and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btx227}, issn = {1367-4811}, year = {2017}, date = {2017-01-01}, journal = {Bioinformatics}, volume = {33}, number = {14}, pages = {i67-i74}, address = {England}, organization = {Department of Computer Science, University of California, Los Angeles, CA 90095, USA.}, abstract = {Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {Expression QTLs, Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV . Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online |
Hormozdiari, Farhad; Zhu, Anthony; Kichaev, Gleb; Ju, Chelsea J-T; Segrè, Ayellet V; Joo, Jong Wha J; Won, Hyejung; Sankararaman, Sriram; Pasaniuc, Bogdan; Shifman, Sagiv; Eskin, Eleazar Widespread Allelic Heterogeneity in Complex Traits. Journal Article Am J Hum Genet, 100 (5), pp. 789-802, 2017, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Hormozdiari:AmJHumGenet:2017, title = {Widespread Allelic Heterogeneity in Complex Traits.}, author = { Farhad Hormozdiari and Anthony Zhu and Gleb Kichaev and Chelsea J-T Ju and Ayellet V. Segrè and Jong Wha J. Joo and Hyejung Won and Sriram Sankararaman and Bogdan Pasaniuc and Sagiv Shifman and Eleazar Eskin}, url = {http://dx.doi.org/10.1016/j.ajhg.2017.04.005}, issn = {1537-6605}, year = {2017}, date = {2017-01-01}, journal = {Am J Hum Genet}, volume = {100}, number = {5}, pages = {789-802}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, CA 90095, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.}, abstract = {Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AHudotand applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R(2) = 0.85}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AHudotand applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R(2) = 0.85 |
Ritchie, Marylyn D; Davis, Joe R; Aschard, Hugues; Battle, Alexis; Conti, David; Du, Mengmeng; Eskin, Eleazar; Fallin, Daniele M; Hsu, Li; Kraft, Peter; Moore, Jason H; Pierce, Brandon L; Bien, Stephanie A; Thomas, Duncan C; Wei, Peng; Montgomery, Stephen B Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. Journal Article Am J Epidemiol, 186 (7), pp. 771-777, 2017, ISSN: 1476-6256. Abstract | Links | BibTeX | Tags: Genes By Environment @article{Ritchie:AmJEpidemiol:2017, title = {Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions.}, author = { Marylyn D. Ritchie and Joe R. Davis and Hugues Aschard and Alexis Battle and David Conti and Mengmeng Du and Eleazar Eskin and M. Daniele Fallin and Li Hsu and Peter Kraft and Jason H. Moore and Brandon L. Pierce and Stephanie A. Bien and Duncan C. Thomas and Peng Wei and Stephen B. Montgomery}, url = {http://dx.doi.org/10.1093/aje/kwx229}, issn = {1476-6256}, year = {2017}, date = {2017-01-01}, journal = {Am J Epidemiol}, volume = {186}, number = {7}, pages = {771-777}, address = {United States}, abstract = {A growing knowledge base of genetic and environmental information has greatly enabled the study of disease risk factors. However, the computational complexity and statistical burden of testing all variants by all environments has required novel study designs and hypothesis-driven approaches. We discuss how incorporating biological knowledge from model organisms, functional genomics, and integrative approaches can empower the discovery of novel gene-environment interactions and discuss specific methodological considerations with each approach. We consider specific examples where the application of these approaches has uncovered effects of gene-environment interactions relevant to drug response and immunity, and we highlight how such improvements enable a greater understanding of the pathogenesis of disease and the realization of precision medicine}, keywords = {Genes By Environment}, pubstate = {published}, tppubtype = {article} } A growing knowledge base of genetic and environmental information has greatly enabled the study of disease risk factors. However, the computational complexity and statistical burden of testing all variants by all environments has required novel study designs and hypothesis-driven approaches. We discuss how incorporating biological knowledge from model organisms, functional genomics, and integrative approaches can empower the discovery of novel gene-environment interactions and discuss specific methodological considerations with each approach. We consider specific examples where the application of these approaches has uncovered effects of gene-environment interactions relevant to drug response and immunity, and we highlight how such improvements enable a greater understanding of the pathogenesis of disease and the realization of precision medicine |
2016 |
Mangul, Serghei; Loohuis, Loes Olde M; Ori, Anil; Jospin, Guillaume; Koslicki, David; Yang, Harry Taegyun; Wu, Timothy; Boks, Marco P; Lomen-Hoerth, Catherine; Wiedau-Pazos, Martina; Cantor, Rita; de Vos, Willem M; Kahn, Rene S; Eskin, Eleazar; Ophoff, Roel A Total RNA Sequencing reveals microbial communities in human blood and disease specific effects. Journal Article BioRxiv, (057570), 2016. Abstract | Links | BibTeX | Tags: blood microbiome, RNA sequencing, schizophrenia, unmapped reads @article{Mangul2016b, title = {Total RNA Sequencing reveals microbial communities in human blood and disease specific effects.}, author = {Serghei Mangul and Loes M Olde Loohuis and Anil Ori and Guillaume Jospin and David Koslicki and Harry Taegyun Yang and Timothy Wu and Marco P Boks and Catherine Lomen-Hoerth and Martina Wiedau-Pazos and Rita Cantor and Willem M de Vos and Rene S Kahn and Eleazar Eskin and Roel A. Ophoff}, url = {http://biorxiv.org/content/early/2016/06/07/057570}, doi = {10.1101/057570}, year = {2016}, date = {2016-06-07}, journal = {BioRxiv}, number = {057570}, abstract = {An increasing body of evidence suggests an important role of the human microbiome in health and disease. We propose a 'lost and found' pipeline, which examines high quality unmapped sequence reads for microbial taxonomic classification. Using this pipeline, we are able to detect bacterial and archaeal phyla in blood using RNA sequencing (RNA-Seq) data. Careful analyses, including the use of positive and negative control datasets, suggest that these detected phyla represent true microbial communities in whole blood and are not due to contaminants. We applied our pipeline to study the composition of microbial communities present in blood across 192 individuals from four subject groups: schizophrenia (n=48), amyotrophic lateral sclerosis (n=47), bipolar disorder (n=48) and healthy controls (n=49). We observe a significantly increased microbial diversity in schizophrenia compared to the three other groups and replicate this finding in an independent schizophrenia case-control study. Our results demonstrate the potential use of total RNA to study microbes that inhabit the human body.}, keywords = {blood microbiome, RNA sequencing, schizophrenia, unmapped reads}, pubstate = {published}, tppubtype = {article} } An increasing body of evidence suggests an important role of the human microbiome in health and disease. We propose a 'lost and found' pipeline, which examines high quality unmapped sequence reads for microbial taxonomic classification. Using this pipeline, we are able to detect bacterial and archaeal phyla in blood using RNA sequencing (RNA-Seq) data. Careful analyses, including the use of positive and negative control datasets, suggest that these detected phyla represent true microbial communities in whole blood and are not due to contaminants. We applied our pipeline to study the composition of microbial communities present in blood across 192 individuals from four subject groups: schizophrenia (n=48), amyotrophic lateral sclerosis (n=47), bipolar disorder (n=48) and healthy controls (n=49). We observe a significantly increased microbial diversity in schizophrenia compared to the three other groups and replicate this finding in an independent schizophrenia case-control study. Our results demonstrate the potential use of total RNA to study microbes that inhabit the human body. |
Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas; Gruhl, Franziska; Daley, Timothy; Christenson, Stephanie; Andersen, Agata Wesolowska; Spreafico, Roberto; Rios, Cydney; Eng, Celeste; Smith, Andrew D; Hernandez, Ryan D; Ophoff, Roel A; Santana, Jose Rodriguez; Woodruff, Prescott G; Burchard, Esteban; Seibold, Max A; Shifman, Sagiv; Eskin, Eleazar; Zaitlen, Noah Dumpster diving in RNA-sequencing to find the source of every last read. Journal Article BioRxiv, 2016. Links | BibTeX | Tags: read origin protocol, RNA sequencing, unmapped reads @article{Mangul2016, title = {Dumpster diving in RNA-sequencing to find the source of every last read.}, author = {Serghei Mangul and Harry Taegyun Yang and Nicolas Strauli and Franziska Gruhl and Timothy Daley and Stephanie Christenson and Agata Wesolowska Andersen and Roberto Spreafico and Cydney Rios and Celeste Eng and Andrew D. Smith and Ryan D. Hernandez and Roel A. Ophoff and Jose Rodriguez Santana and Prescott G. Woodruff and Esteban Burchard and Max A. Seibold and Sagiv Shifman and Eleazar Eskin and Noah Zaitlen}, url = {http://biorxiv.org/content/early/2016/05/13/053041.article-info}, doi = {http://dx.doi.org/10.1101/053041}, year = {2016}, date = {2016-05-13}, journal = {BioRxiv}, keywords = {read origin protocol, RNA sequencing, unmapped reads}, pubstate = {published}, tppubtype = {article} } |
Duong, Dat ; Zou, Jennifer ; Hormozdiari, Farhad ; Sul, Jae Hoon ; Ernst, Jason ; Han, Buhm ; Eskin, Eleazar Using genomic annotations increases statistical power to detect eGenes. Journal Article Bioinformatics, 32 (12), pp. i156-i163, 2016, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Expression QTLs @article{Duong:Bioinformatics:2016, title = {Using genomic annotations increases statistical power to detect eGenes.}, author = {Duong, Dat and Zou, Jennifer and Hormozdiari, Farhad and Sul, Jae Hoon and Ernst, Jason and Han, Buhm and Eskin, Eleazar}, url = {http://bioinformatics.oxfordjournals.org/content/32/12/i156.abstract}, doi = {10.1093/bioinformatics/btw272}, issn = {1367-4811}, year = {2016}, date = {2016-01-01}, journal = {Bioinformatics}, volume = {32}, number = {12}, pages = {i156-i163}, address = {England}, abstract = {MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu}, keywords = {Expression QTLs}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu |
Han, Buhm; Duong, Dat; Sul, Jae Hoon; de Bakker, Paul I W; Eskin, Eleazar; Raychaudhuri, Soumya A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping. Journal Article Hum Mol Genet, 2016, ISSN: 1460-2083. Abstract | Links | BibTeX | Tags: eQTL, genome-wide association studies, Meta-Analysis @article{Han:HumMolGenet:2016, title = {A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping.}, author = {Buhm Han and Dat Duong and Jae Hoon Sul and Paul I. W. de Bakker and Eleazar Eskin and Soumya Raychaudhuri}, url = {http://dx.doi.org/10.1093/hmg/ddw049}, issn = {1460-2083}, year = {2016}, date = {2016-01-01}, journal = {Hum Mol Genet}, abstract = {Meta-analysis strategies have become critical to augment power of genome-wide association studies (GWAS). To reduce genotyping or sequencing cost, many studies today utilize shared controls, and these individuals can inadvertently overlap among multiple studies. If these overlapping individuals are not taken into account in meta-analysis, they can induce spurious associations. In this paper, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data so it can be used in downstream analyses. As a result, the strategy is very flexible, and allows a wide range of meta-analysis methods, such as the random effects model, to account for overlapping subjects. Using simulations and real datasets, we demonstrate that our method has utility in meta-analyses of GWAS, as well as in a multi-tissue mouse eQTL study where our method increases the number of discovered eQTLs by up to 19% compared to existing methods}, keywords = {eQTL, genome-wide association studies, Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Meta-analysis strategies have become critical to augment power of genome-wide association studies (GWAS). To reduce genotyping or sequencing cost, many studies today utilize shared controls, and these individuals can inadvertently overlap among multiple studies. If these overlapping individuals are not taken into account in meta-analysis, they can induce spurious associations. In this paper, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data so it can be used in downstream analyses. As a result, the strategy is very flexible, and allows a wide range of meta-analysis methods, such as the random effects model, to account for overlapping subjects. Using simulations and real datasets, we demonstrate that our method has utility in meta-analyses of GWAS, as well as in a multi-tissue mouse eQTL study where our method increases the number of discovered eQTLs by up to 19% compared to existing methods |
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun Y; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. Journal Article PLoS Genet, 12 (3), pp. e1005849, 2016, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: gene-by-environment interactions, genome-wide association studies, Mixed Models @article{Sul:PlosGenet:2016, title = {Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.}, author = {Jae Hoon Sul and Michael Bilow and Wen-Yun Y. Yang and Emrah Kostem and Nick Furlotte and Dan He and Eleazar Eskin}, url = {http://dx.doi.org/10.1371/journal.pgen.1005849}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {3}, pages = {e1005849}, address = {United States}, abstract = {Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants}, keywords = {gene-by-environment interactions, genome-wide association studies, Mixed Models}, pubstate = {published}, tppubtype = {article} } Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants |
Joo, Jong Wha J; Hormozdiari, Farhad; Han, Buhm; Eskin, Eleazar Multiple testing correction in linear mixed models. Journal Article Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X. Abstract | Links | BibTeX | Tags: genome-wide association studies, Mixed Models, Multiple Testing @article{Joo:GenomeBiol:2016, title = {Multiple testing correction in linear mixed models.}, author = {Jong Wha J. Joo and Farhad Hormozdiari and Buhm Han and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/s13059-016-0903-6}, issn = {1474-760X}, year = {2016}, date = {2016-01-01}, journal = {Genome Biol}, volume = {17}, number = {1}, pages = {62}, address = {England}, abstract = {BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data}, keywords = {genome-wide association studies, Mixed Models, Multiple Testing}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data |
Lusis, Aldons J; Seldin, Marcus; Allayee, Hooman; Bennett, Brian J; Civelek, Mete; Davis, Richard C; Eskin, Eleazar; Farber, Charles; Hui, Simon T; Mehrabian, Margarete; Norheim, Frode; Pan, Calvin; Parks, Brian; Rau, Christoph; Smith, Desmond J; Vallim, Thomas; Wang, Yibin; Wang, Jessica The Hybrid Mouse Diversity Panel: A Resource for Systems Genetics Analyses of Metabolic and Cardiovascular Traits. Journal Article J Lipid Res, 2016, ISSN: 1539-7262. Abstract | Links | BibTeX | Tags: genome-wide association studies, Hybrid Mouse Diversity Panel, Mouse Genetics @article{Lusis:JLipidRes:2016, title = {The Hybrid Mouse Diversity Panel: A Resource for Systems Genetics Analyses of Metabolic and Cardiovascular Traits.}, author = {Aldons J. Lusis and Marcus Seldin and Hooman Allayee and Brian J. Bennett and Mete Civelek and Richard C. Davis and Eleazar Eskin and Charles Farber and Simon T. Hui and Margarete Mehrabian and Frode Norheim and Calvin Pan and Brian Parks and Christoph Rau and Desmond J. Smith and Thomas Vallim and Yibin Wang and Jessica Wang}, url = {http://dx.doi.org/10.1194/jlr.R066944}, issn = {1539-7262}, year = {2016}, date = {2016-01-01}, journal = {J Lipid Res}, abstract = {The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human Genome-Wide Association Studies (GWAS), it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated. Thus far, the HMDP has been studied for traits relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, fatty liver disease, and host-gut microbiota interactions. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of the mice under various environmental conditions. All of the published data are available and can be readily used to formulate hypotheses about genes, pathways and interactions}, keywords = {genome-wide association studies, Hybrid Mouse Diversity Panel, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human Genome-Wide Association Studies (GWAS), it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated. Thus far, the HMDP has been studied for traits relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, fatty liver disease, and host-gut microbiota interactions. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of the mice under various environmental conditions. All of the published data are available and can be readily used to formulate hypotheses about genes, pathways and interactions |
Mangul, Serghei ; Yang, Harry ; Hormozdiari, Farhad ; Tseng, Elizabeth ; Zelikovsky, Alex ; Eskin, Eleazar HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads Book Chapter Bioinformatics Research and Applications, pp. 80-92, Springer International Publishing, 2016. Links | BibTeX | Tags: RNAseq;Haplotyping from Sequences @inbook{Mangul:BioinformaticsResearchAndApplications:2016, title = {HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads}, author = {Mangul, Serghei and Yang, Harry and Hormozdiari, Farhad and Tseng, Elizabeth and Zelikovsky, Alex and Eskin, Eleazar}, url = {http://link.springer.com/chapter/10.1007%2F978-3-319-38782-6_7}, doi = {10.1007/978-3-319-38782-6_7}, year = {2016}, date = {2016-01-01}, booktitle = {Bioinformatics Research and Applications}, pages = {80-92}, publisher = {Springer International Publishing}, organization = {University of California}, keywords = {RNAseq;Haplotyping from Sequences}, pubstate = {published}, tppubtype = {inbook} } |
Hormozdiari, Farhad ; Kang, Eun Yong ; Bilow, Michael ; Ben-David, Eyal ; Vulpe, Chris ; McLachlan, Stela ; Lusis, Aldons J; Han, Buhm ; Eskin, Eleazar Imputing Phenotypes for Genome-wide Association Studies. Journal Article Am J Hum Genet, 99 (1), pp. 89-103, 2016, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Multiple Phenotypes;Imputation @article{Hormozdiari:AmJHumGenet:2016, title = {Imputing Phenotypes for Genome-wide Association Studies.}, author = {Hormozdiari, Farhad and Kang, Eun Yong and Bilow, Michael and Ben-David, Eyal and Vulpe, Chris and McLachlan, Stela and Lusis, Aldons J. and Han, Buhm and Eskin, Eleazar}, url = {https://www.ncbi.nlm.nih.gov/pubmed/27292110}, doi = {10.1016/j.ajhg.2016.04.013}, issn = {1537-6605}, year = {2016}, date = {2016-01-01}, journal = {Am J Hum Genet}, volume = {99}, number = {1}, pages = {89-103}, address = {United States}, abstract = {Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset}, keywords = {Multiple Phenotypes;Imputation}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset |
Kang, Eun Yong; Park, Yurang; Li, Xiao; Segrè, Ayellet V; Han, Buhm; Eskin, Eleazar ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity between Studies in Meta-analysis. Journal Article G3 (Bethesda), 6 (7), pp. 1793-8, 2016, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: Expression QTLs;Meta-Analysis @article{Kang:G3:2016, title = {ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity between Studies in Meta-analysis.}, author = {Eun Yong Kang and Yurang Park and Xiao Li and Ayellet V. Segrè and Buhm Han and Eleazar Eskin}, url = {https://www.ncbi.nlm.nih.gov/pubmed/27194809}, doi = {10.1534/g3.116.029439}, issn = {2160-1836}, year = {2016}, date = {2016-01-01}, journal = {G3 (Bethesda)}, volume = {6}, number = {7}, pages = {1793-8}, address = {United States}, abstract = {Meta-analysis has become a popular tool for genetic association studies to combine di˙erent genetic studies. A key challenge in meta-analysis is heterogeneity or the di˙erences in e˙ect sizes between studies. Heterogeneity complicates the interpretation of meta-analyses. In this paper, we describe ForestPMPlot, a flexible visualization tool for analyzing studies included in a meta-analysis. The main feature of the tool is visualizing the di˙erences in the e˙ect sizes of the studies to understand why the studies exhibit heterogeneity for a particular phenotype and locus pair under di˙erent conditions. We show the application of this tool to interpret a meta-analysis of 17 mouse studies and to interpret a multi-tissue eQTL study}, keywords = {Expression QTLs;Meta-Analysis}, pubstate = {published}, tppubtype = {article} } Meta-analysis has become a popular tool for genetic association studies to combine di˙erent genetic studies. A key challenge in meta-analysis is heterogeneity or the di˙erences in e˙ect sizes between studies. Heterogeneity complicates the interpretation of meta-analyses. In this paper, we describe ForestPMPlot, a flexible visualization tool for analyzing studies included in a meta-analysis. The main feature of the tool is visualizing the di˙erences in the e˙ect sizes of the studies to understand why the studies exhibit heterogeneity for a particular phenotype and locus pair under di˙erent conditions. We show the application of this tool to interpret a meta-analysis of 17 mouse studies and to interpret a multi-tissue eQTL study |
Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L; Parikshak, Neelroop N; Huang, Jerry; Opland, Carli K; Gandal, Michael J; Sutton, Gavin J; Hormozdiari, Farhad; Lu, Daning; Lee, Changhoon; Eskin, Eleazar; Voineagu, Irina; Ernst, Jason; Geschwind, Daniel H Chromosome conformation elucidates regulatory relationships in developing human brain. Journal Article Nature, 538 (7626), pp. 523-527, 2016, ISSN: 1476-4687. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Won:Nature:2016b, title = {Chromosome conformation elucidates regulatory relationships in developing human brain.}, author = { Hyejung Won and Luis de la Torre-Ubieta and Jason L. Stein and Neelroop N. Parikshak and Jerry Huang and Carli K. Opland and Michael J. Gandal and Gavin J. Sutton and Farhad Hormozdiari and Daning Lu and Changhoon Lee and Eleazar Eskin and Irina Voineagu and Jason Ernst and Daniel H. Geschwind}, url = {http://dx.doi.org/10.1038/nature19847}, issn = {1476-4687}, year = {2016}, date = {2016-01-01}, journal = {Nature}, volume = {538}, number = {7626}, pages = {523-527}, address = {England}, abstract = {Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders |
Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article Genetics, 2016, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Allele Specific Expression, Expression QTLs @article{Kang:Genetics:2016, title = {Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data.}, author = { Eun Yong Kang and Lisa Martin and Serghei Mangul and Warin Isvilanonda and Jennifer Zou and Eyal Ben-David and Buhm Han and Aldons J. Lusis and Sagiv Shifman and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.115.177246}, issn = {1943-2631}, year = {2016}, date = {2016-01-01}, journal = {Genetics}, address = {United States}, organization = {University of California, Los Angeles; ekang@cs.ucla.edu.}, abstract = {The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases}, keywords = {Allele Specific Expression, Expression QTLs}, pubstate = {published}, tppubtype = {article} } The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here we increase the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We design a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-seq data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. 2309 SNPs were identified to be associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases |
Artyomenko, Alexander; Wu, Nicholas C; Mangul, Serghei; Eskin, Eleazar; Sun, Ren; Zelikovsky, Alex Research in Computational Molecular Biology, pp. 164-175, Springer International Publishing, 2016. Links | BibTeX | Tags: Virus Genomics;Virus Assembly @inbook{flu2016b, title = {Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants}, author = { Alexander Artyomenko and Nicholas C. Wu and Serghei Mangul and Eleazar Eskin and Ren Sun and Alex Zelikovsky}, url = {10.1007/978-3-319-31957-5_12}, year = {2016}, date = {2016-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {164-175}, publisher = {Springer International Publishing}, organization = {Georgia State University}, keywords = {Virus Genomics;Virus Assembly}, pubstate = {published}, tppubtype = {inbook} } |
Main, Bradley J; Lee, Yoosook; Ferguson, Heather M; Kreppel, Katharina S; Kihonda, Anicet; Govella, Nicodem J; Collier, Travis C; Cornel, Anthony J; Eskin, Eleazar; Kang, Eun Yong; Nieman, Catelyn C; Weakley, Allison M; Lanzaro, Gregory C The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis. Journal Article PLoS Genet, 12 (9), pp. e1006303, 2016, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Heritability @article{Main:PlosGenet:2016, title = {The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis.}, author = { Bradley J. Main and Yoosook Lee and Heather M. Ferguson and Katharina S. Kreppel and Anicet Kihonda and Nicodem J. Govella and Travis C. Collier and Anthony J. Cornel and Eleazar Eskin and Eun Yong Kang and Catelyn C. Nieman and Allison M. Weakley and Gregory C. Lanzaro}, url = {http://dx.doi.org/10.1371/journal.pgen.1006303}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {9}, pages = {e1006303}, address = {United States}, abstract = {Malaria transmission is dependent on the propensity of Anopheles mosquitoes to bite humans (anthropophily) instead of other dead end hosts. Recent increases in the usage of Long Lasting Insecticide Treated Nets (LLINs) in Africa have been associated with reductions in highly anthropophilic and endophilic vectors such as Anopheles gambiae s.s., leaving species with a broader host range, such as Anopheles arabiensis, as the most prominent remaining source of transmission in many settings. An. arabiensis appears to be more of a generalist in terms of its host choice and resting behavior, which may be due to phenotypic plasticity and/or segregating allelic variation. To investigate the genetic basis of host choice and resting behavior in An. arabiensis we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We identified a total of 4,820,851 SNPs, which were used to conduct the first genome-wide estimates of "SNP heritability" for host choice and resting behavior in this species. A genetic component was detected for host choice (human vs cow fed; permuted P = 0.002), but there was no evidence of a genetic component for resting behavior (indoors versus outside; permuted P = 0.465). A principal component analysis (PCA) segregated individuals based on genomic variation into three groups which were characterized by differences at the 2Rb and/or 3Ra paracentromeric chromosome inversions. There was a non-random distribution of cattle-fed mosquitoes between the PCA clusters, suggesting that alleles linked to the 2Rb and/or 3Ra inversions may influence host choice. Using a novel inversion genotyping assay, we detected a significant enrichment of the standard arrangement (non-inverted) of 3Ra among cattle-fed mosquitoes (N = 129) versus all non-cattle-fed individuals (N = 234; $chi$2}, keywords = {Heritability}, pubstate = {published}, tppubtype = {article} } Malaria transmission is dependent on the propensity of Anopheles mosquitoes to bite humans (anthropophily) instead of other dead end hosts. Recent increases in the usage of Long Lasting Insecticide Treated Nets (LLINs) in Africa have been associated with reductions in highly anthropophilic and endophilic vectors such as Anopheles gambiae s.s., leaving species with a broader host range, such as Anopheles arabiensis, as the most prominent remaining source of transmission in many settings. An. arabiensis appears to be more of a generalist in terms of its host choice and resting behavior, which may be due to phenotypic plasticity and/or segregating allelic variation. To investigate the genetic basis of host choice and resting behavior in An. arabiensis we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We identified a total of 4,820,851 SNPs, which were used to conduct the first genome-wide estimates of "SNP heritability" for host choice and resting behavior in this species. A genetic component was detected for host choice (human vs cow fed; permuted P = 0.002), but there was no evidence of a genetic component for resting behavior (indoors versus outside; permuted P = 0.465). A principal component analysis (PCA) segregated individuals based on genomic variation into three groups which were characterized by differences at the 2Rb and/or 3Ra paracentromeric chromosome inversions. There was a non-random distribution of cattle-fed mosquitoes between the PCA clusters, suggesting that alleles linked to the 2Rb and/or 3Ra inversions may influence host choice. Using a novel inversion genotyping assay, we detected a significant enrichment of the standard arrangement (non-inverted) of 3Ra among cattle-fed mosquitoes (N = 129) versus all non-cattle-fed individuals (N = 234; $chi$2 |
Lavinsky, Joel; Ge, Marshall; Crow, Amanda L; Pan, Calvin; Wang, Juemei; Dermanaki, Pehzman Salehi; Myint, Anthony; Eskin, Eleazar; Allayee, Hooman; Lusis, Aldons J; Friedman, Rick A The Genetic Architecture of Noise-induced Hearing Loss: Evidence for a Gene-by-Environment Interaction. Journal Article G3 (Bethesda), 2016, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: Mouse Genetics @article{Lavinsky:G3:2016, title = {The Genetic Architecture of Noise-induced Hearing Loss: Evidence for a Gene-by-Environment Interaction.}, author = { Joel Lavinsky and Marshall Ge and Amanda L. Crow and Calvin Pan and Juemei Wang and Pehzman Salehi Dermanaki and Anthony Myint and Eleazar Eskin and Hooman Allayee and Aldons J. Lusis and Rick A. Friedman}, url = {http://dx.doi.org/10.1534/g3.116.032516}, issn = {2160-1836}, year = {2016}, date = {2016-01-01}, journal = {G3 (Bethesda)}, abstract = {The discovery of environmentally specific genetic effects is crucial to the understanding of complex traits, such as susceptibility to noise-induced hearing loss (NIHL). In this manuscript we describe the first genome-wide association study (GWAS) for NIHL in a large and well-characterized population of inbred mouse strains known as the Hybrid Mouse Diversity Panel (HMDP). We recorded auditory brainstem response (ABR) thresholds both pre and post 2-hour exposure to 10 kHz octave band noise at 108 dB SPL (sound pressure level) in 5-6 week-old female mice from the HMDP (4-5 mice/strain). From the observation that NIHL susceptibility varied among the strains, we performed a GWAS with correction for population structure and mapped a locus on chromosome 6 that was statistically significantly associated with two adjacent frequencies. We then used a 'genetical genomics' approach that included the analysis of cochlear eQTLs to identify candidate genes within the GWAS QTL. In order to validate the gene-by-environment interaction, we compared the effects of the post noise exposure locus with that from the same unexposed strains. The most significant SNP at chromosome 6 (rs37517079) was associated with noise susceptibility, but was not significant at the same frequencies in our unexposed study. These findings demonstrate that the genetic architecture of NIHL is distinct from that of unexposed hearing levels and provide strong evidence for gene-by-environment interactions in NIHL}, keywords = {Mouse Genetics}, pubstate = {published}, tppubtype = {article} } The discovery of environmentally specific genetic effects is crucial to the understanding of complex traits, such as susceptibility to noise-induced hearing loss (NIHL). In this manuscript we describe the first genome-wide association study (GWAS) for NIHL in a large and well-characterized population of inbred mouse strains known as the Hybrid Mouse Diversity Panel (HMDP). We recorded auditory brainstem response (ABR) thresholds both pre and post 2-hour exposure to 10 kHz octave band noise at 108 dB SPL (sound pressure level) in 5-6 week-old female mice from the HMDP (4-5 mice/strain). From the observation that NIHL susceptibility varied among the strains, we performed a GWAS with correction for population structure and mapped a locus on chromosome 6 that was statistically significantly associated with two adjacent frequencies. We then used a 'genetical genomics' approach that included the analysis of cochlear eQTLs to identify candidate genes within the GWAS QTL. In order to validate the gene-by-environment interaction, we compared the effects of the post noise exposure locus with that from the same unexposed strains. The most significant SNP at chromosome 6 (rs37517079) was associated with noise susceptibility, but was not significant at the same frequencies in our unexposed study. These findings demonstrate that the genetic architecture of NIHL is distinct from that of unexposed hearing levels and provide strong evidence for gene-by-environment interactions in NIHL |
Kichaev, Gleb; Roytman, Megan; Johnson, Ruth; Eskin, Eleazar; Lindström, Sara; Kraft, Peter; Pasaniuc, Bogdan Improved methods for multi-trait fine mapping of pleiotropic risk loci. Journal Article Bioinformatics, 2016, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Kichaev:Bioinformatics:2016, title = {Improved methods for multi-trait fine mapping of pleiotropic risk loci.}, author = { Gleb Kichaev and Megan Roytman and Ruth Johnson and Eleazar Eskin and Sara Lindström and Peter Kraft and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1093/bioinformatics/btw615}, issn = {1367-4811}, year = {2016}, date = {2016-01-01}, journal = {Bioinformatics}, abstract = {MOTIVATION: Genome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologically causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. RESULTS: In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution compared to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data. AVAILABILITY AND IMPLEMENTATION: The fastPAINTOR framework is implemented in the PAINTOR v3.0 package which is publicly available to the research community http://bogdan.bioinformatics.ucla.edu/software/paintor CONTACT: gkichaev@ucla.edu}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Genome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologically causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. RESULTS: In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution compared to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data. AVAILABILITY AND IMPLEMENTATION: The fastPAINTOR framework is implemented in the PAINTOR v3.0 package which is publicly available to the research community http://bogdan.bioinformatics.ucla.edu/software/paintor CONTACT: gkichaev@ucla.edu |
Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article Elife, 5 , 2016, ISSN: 2050-084X. Abstract | Links | BibTeX | Tags: Expression QTLs, Mouse Genetics @article{HasinBrumshtein:Elife:2016, title = {Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes.}, author = { Yehudit Hasin-Brumshtein and Arshad H. Khan and Farhad Hormozdiari and Calvin Pan and Brian W. Parks and Vladislav A. Petyuk and Paul D. Piehowski and Anneke Brümmer and Matteo Pellegrini and Xinshu Xiao and Eleazar Eskin and Richard D. Smith and Aldons J. Lusis and Desmond J. Smith}, url = {http://dx.doi.org/10.7554/eLife.15614}, issn = {2050-084X}, year = {2016}, date = {2016-01-01}, journal = {Elife}, volume = {5}, address = {England}, abstract = {Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation}, keywords = {Expression QTLs, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation |