Our group publishes papers presenting new methodologies, describing the results of studies that use our software, and reviewing current topics in the field of Bioinformatics. Scroll down or click here for a complete list of papers produced by our lab. Since 2013, we write blog posts summarizing new research papers and review articles:
GWAS
- Fine Mapping Causal Variants and Allelic Heterogeneity
- Widespread Allelic Heterogeneity in Complex Traits
- Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes
- Incorporating prior information into association studies
- Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder
- Simultaneous modeling of disease status and clinical phenotypes to increase power in GWAS
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Colocalization of GWAS and eQTL Signals Detects Target Genes
- Chromosome conformation elucidates regulatory relationships in developing human brain
Mouse Genetics
- Review Article: The Hybrid Mouse Diversity Panel
- Genes, Environments and Meta-Analysis
- Review Article: Mixed Models and Population Structure
- Identifying Genes Involved in Blood Cell Traits
- Genes, Diet, and Body Weight (in Mice)
- Review Article: Mouse Genetics
Population Structure
- Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models
- Multiple testing correction in linear mixed models
- Identification of causal genes for complex traits (CAVIAR-gene)
- Accurate viral population assembly from ultra-deep sequencing data
- GRAT: Speeding up Expression Quantitative Trail Loci (eQTL) Studies
- Correcting Population Structure using Mixed Models Webcast
- Mixed models can correct for population structure for genomic regions under selection
Review Articles
- Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models
- Review Article: The Hybrid Mouse Diversity Panel
- Review Article: GWAS and Missing Heritability
- Review Article: Mixed Models and Population Structure
- Review Article: Mouse Genetics
Publications
2016 |
Rahmani, Elior; Zaitlen, Noah; Baran, Yael; Eng, Celeste; Hu, Donglei; Galanter, Joshua; Oh, Sam; Burchard, Esteban G; Eskin, Eleazar; Zou, James; Halperin, Eran Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Journal Article Nat Methods, 13 (5), pp. 443-5, 2016, ISSN: 1548-7105. Abstract | Links | BibTeX | Tags: Confounding @article{Rahmani:NatMethods:2016, title = {Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies.}, author = {Elior Rahmani and Noah Zaitlen and Yael Baran and Celeste Eng and Donglei Hu and Joshua Galanter and Sam Oh and Esteban G Burchard and Eleazar Eskin and James Zou and Eran Halperin}, url = {http://dx.doi.org/10.1038/nmeth.3809}, issn = {1548-7105}, year = {2016}, date = {2016-01-01}, journal = {Nat Methods}, volume = {13}, number = {5}, pages = {443-5}, address = {United States}, abstract = {In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html}, keywords = {Confounding}, pubstate = {published}, tppubtype = {article} } In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html |
Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. Journal Article PLoS Genet, 12 (5), pp. e1006046, 2016, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Expression QTLs @article{Peterson:PlosGenet:2016, title = {Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder.}, author = { Christine B. Peterson and Susan K. Service and Anna J. Jasinska and Fuying Gao and Ivette Zelaya and Terri M. Teshiba and Carrie E. Bearden and Rita M. Cantor and Victor I. Reus and Gabriel Macaya and Carlos López-Jaramillo and Marina Bogomolov and Yoav Benjamini and Eleazar Eskin and Giovanni Coppola and Nelson B. Freimer and Chiara Sabatti}, url = {http://dx.doi.org/10.1371/journal.pgen.1006046}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {5}, pages = {e1006046}, address = {United States}, abstract = {The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information}, keywords = {Expression QTLs}, pubstate = {published}, tppubtype = {article} } The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information |
Schweiger, Regev; Kaufman, Shachar; Laaksonen, Reijo; Kleber, Marcus E; März, Winfried; Eskin, Eleazar; Rosset, Saharon; Halperin, Eran Fast and Accurate Construction of Confidence Intervals for Heritability. Journal Article Am J Hum Genet, 98 (6), pp. 1181-92, 2016, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Mixed Models @article{Schweiger:AmJHumGenet:2016, title = {Fast and Accurate Construction of Confidence Intervals for Heritability.}, author = { Regev Schweiger and Shachar Kaufman and Reijo Laaksonen and Marcus E. Kleber and Winfried März and Eleazar Eskin and Saharon Rosset and Eran Halperin}, url = {http://dx.doi.org/10.1016/j.ajhg.2016.04.016}, issn = {1537-6605}, year = {2016}, date = {2016-01-01}, journal = {Am J Hum Genet}, volume = {98}, number = {6}, pages = {1181-92}, address = {United States}, abstract = {Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and estimators of SEs for REML rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. Here, we show that the estimation of confidence intervals by state-of-the-art methods is inaccurate, especially when the true heritability is relatively low or relatively high. We further show that these inaccuracies occur in datasets including thousands of individuals. Such biases are present, for example, in estimates of heritability of gene expression in the Genotype-Tissue Expression project and of lipid profiles in the Ludwigshafen Risk and Cardiovascular Health study. We also show that often the probability that the genetic component is estimated as 0 is high even when the true heritability is bounded away from 0, emphasizing the need for accurate confidence intervals. We propose a computationally efficient method, ALBI (accurate LMM-based heritability bootstrap confidence intervals), for estimating the distribution of the heritability estimator and for constructing accurate confidence intervals. Our method can be used as an add-on to existing methods for estimating heritability and variance components, such as GCTA, FaST-LMM, GEMMA, or EMMAX}, keywords = {Mixed Models}, pubstate = {published}, tppubtype = {article} } Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and estimators of SEs for REML rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. Here, we show that the estimation of confidence intervals by state-of-the-art methods is inaccurate, especially when the true heritability is relatively low or relatively high. We further show that these inaccuracies occur in datasets including thousands of individuals. Such biases are present, for example, in estimates of heritability of gene expression in the Genotype-Tissue Expression project and of lipid profiles in the Ludwigshafen Risk and Cardiovascular Health study. We also show that often the probability that the genetic component is estimated as 0 is high even when the true heritability is bounded away from 0, emphasizing the need for accurate confidence intervals. We propose a computationally efficient method, ALBI (accurate LMM-based heritability bootstrap confidence intervals), for estimating the distribution of the heritability estimator and for constructing accurate confidence intervals. Our method can be used as an add-on to existing methods for estimating heritability and variance components, such as GCTA, FaST-LMM, GEMMA, or EMMAX |
Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article Elife, 5 , 2016, ISSN: 2050-084X. Abstract | Links | BibTeX | Tags: Expression QTLs, Mouse Genetics @article{HasinBrumshtein:Elife:2016, title = {Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes.}, author = { Yehudit Hasin-Brumshtein and Arshad H. Khan and Farhad Hormozdiari and Calvin Pan and Brian W. Parks and Vladislav A. Petyuk and Paul D. Piehowski and Anneke Brümmer and Matteo Pellegrini and Xinshu Xiao and Eleazar Eskin and Richard D. Smith and Aldons J. Lusis and Desmond J. Smith}, url = {http://dx.doi.org/10.7554/eLife.15614}, issn = {2050-084X}, year = {2016}, date = {2016-01-01}, journal = {Elife}, volume = {5}, address = {England}, abstract = {Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation}, keywords = {Expression QTLs, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation |
Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article Am J Hum Genet, 2016, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Expression QTLs, Fine Mapping @article{Hormozdiari:AmJHumGenet:2016b, title = {Colocalization of GWAS and eQTL Signals Detects Target Genes.}, author = { Farhad Hormozdiari and Martijn van de Bunt and Ayellet V. Segrè and Xiao Li and Jong Wha J. Joo and Michael Bilow and Jae Hoon Sul and Sriram Sankararaman and Bogdan Pasaniuc and Eleazar Eskin}, url = {http:://dx.doi.org/10.1016/j.ajhg.2016.10.003}, issn = {1537-6605}, year = {2016}, date = {2016-01-01}, journal = {Am J Hum Genet}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.}, abstract = {The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci}, keywords = {Expression QTLs, Fine Mapping}, pubstate = {published}, tppubtype = {article} } The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci |
Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L; Parikshak, Neelroop N; Huang, Jerry; Opland, Carli K; Gandal, Michael J; Sutton, Gavin J; Hormozdiari, Farhad; Lu, Daning; Lee, Changhoon; Eskin, Eleazar; Voineagu, Irina; Ernst, Jason; Geschwind, Daniel H Chromosome conformation elucidates regulatory relationships in developing human brain. Journal Article Nature, 538 (7626), pp. 523-527, 2016, ISSN: 1476-4687. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Won:Nature:2016b, title = {Chromosome conformation elucidates regulatory relationships in developing human brain.}, author = { Hyejung Won and Luis de la Torre-Ubieta and Jason L. Stein and Neelroop N. Parikshak and Jerry Huang and Carli K. Opland and Michael J. Gandal and Gavin J. Sutton and Farhad Hormozdiari and Daning Lu and Changhoon Lee and Eleazar Eskin and Irina Voineagu and Jason Ernst and Daniel H. Geschwind}, url = {http://dx.doi.org/10.1038/nature19847}, issn = {1476-4687}, year = {2016}, date = {2016-01-01}, journal = {Nature}, volume = {538}, number = {7626}, pages = {523-527}, address = {England}, abstract = {Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders |
Kichaev, Gleb; Roytman, Megan; Johnson, Ruth; Eskin, Eleazar; Lindström, Sara; Kraft, Peter; Pasaniuc, Bogdan Improved methods for multi-trait fine mapping of pleiotropic risk loci. Journal Article Bioinformatics, 2016, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Kichaev:Bioinformatics:2016, title = {Improved methods for multi-trait fine mapping of pleiotropic risk loci.}, author = { Gleb Kichaev and Megan Roytman and Ruth Johnson and Eleazar Eskin and Sara Lindström and Peter Kraft and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1093/bioinformatics/btw615}, issn = {1367-4811}, year = {2016}, date = {2016-01-01}, journal = {Bioinformatics}, abstract = {MOTIVATION: Genome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologically causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. RESULTS: In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution compared to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data. AVAILABILITY AND IMPLEMENTATION: The fastPAINTOR framework is implemented in the PAINTOR v3.0 package which is publicly available to the research community http://bogdan.bioinformatics.ucla.edu/software/paintor CONTACT: gkichaev@ucla.edu}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Genome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologically causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. RESULTS: In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution compared to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data. AVAILABILITY AND IMPLEMENTATION: The fastPAINTOR framework is implemented in the PAINTOR v3.0 package which is publicly available to the research community http://bogdan.bioinformatics.ucla.edu/software/paintor CONTACT: gkichaev@ucla.edu |
Lavinsky, Joel; Ge, Marshall; Crow, Amanda L; Pan, Calvin; Wang, Juemei; Dermanaki, Pehzman Salehi; Myint, Anthony; Eskin, Eleazar; Allayee, Hooman; Lusis, Aldons J; Friedman, Rick A The Genetic Architecture of Noise-induced Hearing Loss: Evidence for a Gene-by-Environment Interaction. Journal Article G3 (Bethesda), 2016, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: Mouse Genetics @article{Lavinsky:G3:2016, title = {The Genetic Architecture of Noise-induced Hearing Loss: Evidence for a Gene-by-Environment Interaction.}, author = { Joel Lavinsky and Marshall Ge and Amanda L. Crow and Calvin Pan and Juemei Wang and Pehzman Salehi Dermanaki and Anthony Myint and Eleazar Eskin and Hooman Allayee and Aldons J. Lusis and Rick A. Friedman}, url = {http://dx.doi.org/10.1534/g3.116.032516}, issn = {2160-1836}, year = {2016}, date = {2016-01-01}, journal = {G3 (Bethesda)}, abstract = {The discovery of environmentally specific genetic effects is crucial to the understanding of complex traits, such as susceptibility to noise-induced hearing loss (NIHL). In this manuscript we describe the first genome-wide association study (GWAS) for NIHL in a large and well-characterized population of inbred mouse strains known as the Hybrid Mouse Diversity Panel (HMDP). We recorded auditory brainstem response (ABR) thresholds both pre and post 2-hour exposure to 10 kHz octave band noise at 108 dB SPL (sound pressure level) in 5-6 week-old female mice from the HMDP (4-5 mice/strain). From the observation that NIHL susceptibility varied among the strains, we performed a GWAS with correction for population structure and mapped a locus on chromosome 6 that was statistically significantly associated with two adjacent frequencies. We then used a 'genetical genomics' approach that included the analysis of cochlear eQTLs to identify candidate genes within the GWAS QTL. In order to validate the gene-by-environment interaction, we compared the effects of the post noise exposure locus with that from the same unexposed strains. The most significant SNP at chromosome 6 (rs37517079) was associated with noise susceptibility, but was not significant at the same frequencies in our unexposed study. These findings demonstrate that the genetic architecture of NIHL is distinct from that of unexposed hearing levels and provide strong evidence for gene-by-environment interactions in NIHL}, keywords = {Mouse Genetics}, pubstate = {published}, tppubtype = {article} } The discovery of environmentally specific genetic effects is crucial to the understanding of complex traits, such as susceptibility to noise-induced hearing loss (NIHL). In this manuscript we describe the first genome-wide association study (GWAS) for NIHL in a large and well-characterized population of inbred mouse strains known as the Hybrid Mouse Diversity Panel (HMDP). We recorded auditory brainstem response (ABR) thresholds both pre and post 2-hour exposure to 10 kHz octave band noise at 108 dB SPL (sound pressure level) in 5-6 week-old female mice from the HMDP (4-5 mice/strain). From the observation that NIHL susceptibility varied among the strains, we performed a GWAS with correction for population structure and mapped a locus on chromosome 6 that was statistically significantly associated with two adjacent frequencies. We then used a 'genetical genomics' approach that included the analysis of cochlear eQTLs to identify candidate genes within the GWAS QTL. In order to validate the gene-by-environment interaction, we compared the effects of the post noise exposure locus with that from the same unexposed strains. The most significant SNP at chromosome 6 (rs37517079) was associated with noise susceptibility, but was not significant at the same frequencies in our unexposed study. These findings demonstrate that the genetic architecture of NIHL is distinct from that of unexposed hearing levels and provide strong evidence for gene-by-environment interactions in NIHL |
Main, Bradley J; Lee, Yoosook; Ferguson, Heather M; Kreppel, Katharina S; Kihonda, Anicet; Govella, Nicodem J; Collier, Travis C; Cornel, Anthony J; Eskin, Eleazar; Kang, Eun Yong; Nieman, Catelyn C; Weakley, Allison M; Lanzaro, Gregory C The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis. Journal Article PLoS Genet, 12 (9), pp. e1006303, 2016, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Heritability @article{Main:PlosGenet:2016, title = {The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis.}, author = { Bradley J. Main and Yoosook Lee and Heather M. Ferguson and Katharina S. Kreppel and Anicet Kihonda and Nicodem J. Govella and Travis C. Collier and Anthony J. Cornel and Eleazar Eskin and Eun Yong Kang and Catelyn C. Nieman and Allison M. Weakley and Gregory C. Lanzaro}, url = {http://dx.doi.org/10.1371/journal.pgen.1006303}, issn = {1553-7404}, year = {2016}, date = {2016-01-01}, journal = {PLoS Genet}, volume = {12}, number = {9}, pages = {e1006303}, address = {United States}, abstract = {Malaria transmission is dependent on the propensity of Anopheles mosquitoes to bite humans (anthropophily) instead of other dead end hosts. Recent increases in the usage of Long Lasting Insecticide Treated Nets (LLINs) in Africa have been associated with reductions in highly anthropophilic and endophilic vectors such as Anopheles gambiae s.s., leaving species with a broader host range, such as Anopheles arabiensis, as the most prominent remaining source of transmission in many settings. An. arabiensis appears to be more of a generalist in terms of its host choice and resting behavior, which may be due to phenotypic plasticity and/or segregating allelic variation. To investigate the genetic basis of host choice and resting behavior in An. arabiensis we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We identified a total of 4,820,851 SNPs, which were used to conduct the first genome-wide estimates of "SNP heritability" for host choice and resting behavior in this species. A genetic component was detected for host choice (human vs cow fed; permuted P = 0.002), but there was no evidence of a genetic component for resting behavior (indoors versus outside; permuted P = 0.465). A principal component analysis (PCA) segregated individuals based on genomic variation into three groups which were characterized by differences at the 2Rb and/or 3Ra paracentromeric chromosome inversions. There was a non-random distribution of cattle-fed mosquitoes between the PCA clusters, suggesting that alleles linked to the 2Rb and/or 3Ra inversions may influence host choice. Using a novel inversion genotyping assay, we detected a significant enrichment of the standard arrangement (non-inverted) of 3Ra among cattle-fed mosquitoes (N = 129) versus all non-cattle-fed individuals (N = 234; $chi$2}, keywords = {Heritability}, pubstate = {published}, tppubtype = {article} } Malaria transmission is dependent on the propensity of Anopheles mosquitoes to bite humans (anthropophily) instead of other dead end hosts. Recent increases in the usage of Long Lasting Insecticide Treated Nets (LLINs) in Africa have been associated with reductions in highly anthropophilic and endophilic vectors such as Anopheles gambiae s.s., leaving species with a broader host range, such as Anopheles arabiensis, as the most prominent remaining source of transmission in many settings. An. arabiensis appears to be more of a generalist in terms of its host choice and resting behavior, which may be due to phenotypic plasticity and/or segregating allelic variation. To investigate the genetic basis of host choice and resting behavior in An. arabiensis we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We identified a total of 4,820,851 SNPs, which were used to conduct the first genome-wide estimates of "SNP heritability" for host choice and resting behavior in this species. A genetic component was detected for host choice (human vs cow fed; permuted P = 0.002), but there was no evidence of a genetic component for resting behavior (indoors versus outside; permuted P = 0.465). A principal component analysis (PCA) segregated individuals based on genomic variation into three groups which were characterized by differences at the 2Rb and/or 3Ra paracentromeric chromosome inversions. There was a non-random distribution of cattle-fed mosquitoes between the PCA clusters, suggesting that alleles linked to the 2Rb and/or 3Ra inversions may influence host choice. Using a novel inversion genotyping assay, we detected a significant enrichment of the standard arrangement (non-inverted) of 3Ra among cattle-fed mosquitoes (N = 129) versus all non-cattle-fed individuals (N = 234; $chi$2 |
2015 |
Luykx, J J; Bakker, S C; Visser, W F; Verhoeven-Duif, N; Buizer-Voskamp, J E; den Heijer, J M; Boks, M P M; Sul, J H; Eskin, E; Ori, A P; Cantor, R M; Vorstman, J; Strengman, E; DeYoung, J; Kappen, T H; Pariama, E; van Dongen, E P A; Borgdorff, P; Bruins, P; de Koning, T J; Kahn, R S; Ophoff, R A Genome-wide association study of NMDA receptor coagonists in human cerebrospinal fluid and plasma. Journal Article Mol Psychiatry, 2015, ISSN: 1476-5578. Abstract | Links | BibTeX | Tags: genome-wide association studies, NMDAR, schizophrenia @article{Luykx:MolPsychiatry:2015b, title = {Genome-wide association study of NMDA receptor coagonists in human cerebrospinal fluid and plasma.}, author = { J. J. Luykx and S. C. Bakker and W. F. Visser and N. Verhoeven-Duif and J. E. Buizer-Voskamp and J. M. den Heijer and M. P. M. Boks and J. H. Sul and E. Eskin and A. P. Ori and R. M. Cantor and J. Vorstman and E. Strengman and J. DeYoung and T. H. Kappen and E. Pariama and E. P. A. van Dongen and P. Borgdorff and P. Bruins and T. J. de Koning and R. S. Kahn and R. A. Ophoff}, url = {http://dx.doi.org/10.1038/mp.2014.190}, issn = {1476-5578}, year = {2015}, date = {2015-01-01}, journal = {Mol Psychiatry}, abstract = {The N-methyl-d-aspartate receptor (NMDAR) coagonists glycine, d-serine and l-proline play crucial roles in NMDAR-dependent neurotransmission and are associated with a range of neuropsychiatric disorders. We conducted the first genome-wide association study of concentrations of these coagonists and their enantiomers in plasma and cerebrospinal fluid (CSF) of human subjects from the general population (N=414). Genetic variants at chromosome 22q11.2, located in and near PRODH (proline dehydrogenase), were associated with l-proline in plasma ($beta$=0.29; P=6.38 $times$ 10(-10)). The missense variant rs17279437 in the proline transporter SLC6A20 was associated with l-proline in CSF ($beta$=0.28; P=9.68 $times$ 10(-9)). Suggestive evidence of association was found for the d-serine plasma-CSF ratio at the d-amino-acid oxidase (DAO) gene ($beta$=-0.28; P=9.08 $times$ 10(-8)), whereas a variant in SRR (that encodes serine racemase and is associated with schizophrenia) constituted the most strongly associated locus for the l-serine to d-serine ratio in CSF. All these genes are highly expressed in rodent meninges and choroid plexus, anatomical regions relevant to CSF physiology. The enzymes and transporters they encode may be targeted to further construe the nature of NMDAR coagonist involvement in NMDAR gating. Furthermore, the highlighted genetic variants may be followed up in clinical populations, for example, schizophrenia and 22q11 deletion syndrome. Overall, this targeted metabolomics approach furthers the understanding of NMDAR coagonist concentration variability and sets the stage for non-targeted CSF metabolomics projects.Molecular Psychiatry advance online publication, 10 February 2015; doi:10.1038/mp.2014.190}, keywords = {genome-wide association studies, NMDAR, schizophrenia}, pubstate = {published}, tppubtype = {article} } The N-methyl-d-aspartate receptor (NMDAR) coagonists glycine, d-serine and l-proline play crucial roles in NMDAR-dependent neurotransmission and are associated with a range of neuropsychiatric disorders. We conducted the first genome-wide association study of concentrations of these coagonists and their enantiomers in plasma and cerebrospinal fluid (CSF) of human subjects from the general population (N=414). Genetic variants at chromosome 22q11.2, located in and near PRODH (proline dehydrogenase), were associated with l-proline in plasma ($beta$=0.29; P=6.38 $times$ 10(-10)). The missense variant rs17279437 in the proline transporter SLC6A20 was associated with l-proline in CSF ($beta$=0.28; P=9.68 $times$ 10(-9)). Suggestive evidence of association was found for the d-serine plasma-CSF ratio at the d-amino-acid oxidase (DAO) gene ($beta$=-0.28; P=9.08 $times$ 10(-8)), whereas a variant in SRR (that encodes serine racemase and is associated with schizophrenia) constituted the most strongly associated locus for the l-serine to d-serine ratio in CSF. All these genes are highly expressed in rodent meninges and choroid plexus, anatomical regions relevant to CSF physiology. The enzymes and transporters they encode may be targeted to further construe the nature of NMDAR coagonist involvement in NMDAR gating. Furthermore, the highlighted genetic variants may be followed up in clinical populations, for example, schizophrenia and 22q11 deletion syndrome. Overall, this targeted metabolomics approach furthers the understanding of NMDAR coagonist concentration variability and sets the stage for non-targeted CSF metabolomics projects.Molecular Psychiatry advance online publication, 10 February 2015; doi:10.1038/mp.2014.190 |
Eskin, Eleazar Discovering Genes Involved in Disease and the Mystery of Missing Heritability Journal Article Commun. ACM, 58 (10), pp. 80-87, 2015, ISSN: 0001-0782. Abstract | Links | BibTeX | Tags: Association Study Methods, Heritability, Review @article{Eskin:2015:DGI:2830674.2817827, title = {Discovering Genes Involved in Disease and the Mystery of Missing Heritability}, author = { Eleazar Eskin}, url = {http://doi.acm.org/10.1145/2817827}, doi = {10.1145/2817827}, issn = {0001-0782}, year = {2015}, date = {2015-01-01}, journal = {Commun. ACM}, volume = {58}, number = {10}, pages = {80-87}, publisher = {ACM}, address = {New York, NY, USA}, abstract = {The challenge of missing heritability offers great contribution options for computer scientists. Key Insights: 1. Over the past several years, thousands of genetic variants that have been implicated in dozens of common diseases have been discovered. 2. Despite this progress, only a fraction of the variants involved in disease have been discovered—a phenomenon referred to as “missing heritability.” 3. Many challenges related to understanding the mystery of missing heritability and discovering the variants involved in human disease require analysis of large datasets that present opportunities for computer scientists.}, keywords = {Association Study Methods, Heritability, Review}, pubstate = {published}, tppubtype = {article} } The challenge of missing heritability offers great contribution options for computer scientists. Key Insights: 1. Over the past several years, thousands of genetic variants that have been implicated in dozens of common diseases have been discovered. 2. Despite this progress, only a fraction of the variants involved in disease have been discovered—a phenomenon referred to as “missing heritability.” 3. Many challenges related to understanding the mystery of missing heritability and discovering the variants involved in human disease require analysis of large datasets that present opportunities for computer scientists. |
Lavinsky, Joel; Crow, Amanda L; Pan, Calvin; Wang, Juemei; Aaron, Ksenia A; Ho, Maria K; Li, Qingzhong; Salehide, Pehzman; Myint, Anthony; Monges-Hernadez, Maya; Eskin, Eleazar; Allayee, Hooman; Lusis, Aldons J; Friedman, Rick A Genome-wide association study identifies nox3 as a critical gene for susceptibility to noise-induced hearing loss. Journal Article 11 (6), pp. e1005293, 2015, ISSN: 1553-7404. Links | BibTeX | Tags: cochlear function, genome-wide association studies @article{Lavinsky:PlosGenet:2015, title = {Genome-wide association study identifies nox3 as a critical gene for susceptibility to noise-induced hearing loss.}, author = { Joel Lavinsky and Amanda L. Crow and Calvin Pan and Juemei Wang and Ksenia A. Aaron and Maria K. Ho and Qingzhong Li and Pehzman Salehide and Anthony Myint and Maya Monges-Hernadez and Eleazar Eskin and Hooman Allayee and Aldons J. Lusis and Rick A. Friedman}, url = {http://dx.doi.org/10.1371/journal.pgen.1005293}, issn = {1553-7404}, year = {2015}, date = {2015-01-01}, volume = {11}, number = {6}, pages = {e1005293}, address = {United States}, keywords = {cochlear function, genome-wide association studies}, pubstate = {published}, tppubtype = {article} } |
Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm Accurate and Fast Multiple-Testing Correction in eQTL Studies. Journal Article Am J Hum Genet, 96 (6), pp. 857-68, 2015, ISSN: 1537-6605. Abstract | Links | BibTeX | Tags: Expression QTLs, Multiple Testing @article{Sul:AmJHumGenet:2015b, title = {Accurate and Fast Multiple-Testing Correction in eQTL Studies.}, author = { Jae Hoon Sul and Towfique Raj and Simone de Jong and Paul I. W. de Bakker and Soumya Raychaudhuri and Roel A. Ophoff and Barbara E. Stranger and Eleazar Eskin and Buhm Han}, url = {http://dx.doi.org/10.1016/j.ajhg.2015.04.012}, issn = {1537-6605}, year = {2015}, date = {2015-01-01}, journal = {Am J Hum Genet}, volume = {96}, number = {6}, pages = {857-68}, address = {United States}, abstract = {In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset}, keywords = {Expression QTLs, Multiple Testing}, pubstate = {published}, tppubtype = {article} } In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum pudotvalue among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset |
Joo, Jong Wha J; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Lusis, Aldons J; Eskin, Eleazar Research in Computational Molecular Biology, pp. 136-153, Springer International Publishing, 2015. Abstract | Links | BibTeX | Tags: GAMMA, genome-wide association studies, microbiome @inbook{Joo:ResearchInComputationalMolecularBiology:2015b, title = {Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure}, author = {Jong Wha J. Joo and Eun Yong Kang and Elin Org and Nick Furlotte and Brian Parks and Aldons J. Lusis and Eleazar Eskin}, url = {http://dx.doi.org/10.1007/978-3-319-16706-0_15}, year = {2015}, date = {2015-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {136-153}, publisher = {Springer International Publishing}, organization = {University of California}, abstract = {A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.}, keywords = {GAMMA, genome-wide association studies, microbiome}, pubstate = {published}, tppubtype = {inbook} } A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism. |
Furlotte, Nicholas A; Eskin, Eleazar Efficient Multiple Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed-Model. Journal Article Genetics, 200 (1), pp. 59-68, 2015, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Heritability, Mixed Models, Multiple Phenotypes @article{Furlotte:Genetics:2015b, title = {Efficient Multiple Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed-Model.}, author = { Nicholas A. Furlotte and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.114.171447}, issn = {1943-2631}, year = {2015}, date = {2015-01-01}, journal = {Genetics}, volume = {200}, number = {1}, pages = {59-68}, address = {United States}, abstract = {Multiple trait association mapping, in which multiple traits are used simultaneously in the identification of genetic variants affecting those traits, has recently attracted interest. One class of approaches for this problem builds on classical variance component methodology, utilizing a multi-trait version of a linear mixed-model. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation which is a measure of the portion of the total correlation between traits that is due to additive genetic effects. Unfortunately, the practical utility of these methods is limited since they are computationally intractable for large sample sizes. In this paper, we introduce a reformulation of the multiple trait association mapping approach by defining the matrix-variate linear mixed model. Our approach reduces the computational time necessary to perform maximum-likelihood inference in a multiple trait model by utilizing a data transformation. By utilizing a well-studied human cohort, we show that our approach provides more than a 10-fold speed up, making multiple trait association feasible in a large population cohort on the genome-wide scale. We take advantage of the efficiency of our approach to analyze gene expression data. By decomposing gene coexpression into a genetic and environmental component, we show that our method provides fundamental insights into the nature of co-expressed genes. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM}, keywords = {Heritability, Mixed Models, Multiple Phenotypes}, pubstate = {published}, tppubtype = {article} } Multiple trait association mapping, in which multiple traits are used simultaneously in the identification of genetic variants affecting those traits, has recently attracted interest. One class of approaches for this problem builds on classical variance component methodology, utilizing a multi-trait version of a linear mixed-model. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation which is a measure of the portion of the total correlation between traits that is due to additive genetic effects. Unfortunately, the practical utility of these methods is limited since they are computationally intractable for large sample sizes. In this paper, we introduce a reformulation of the multiple trait association mapping approach by defining the matrix-variate linear mixed model. Our approach reduces the computational time necessary to perform maximum-likelihood inference in a multiple trait model by utilizing a data transformation. By utilizing a well-studied human cohort, we show that our approach provides more than a 10-fold speed up, making multiple trait association feasible in a large population cohort on the genome-wide scale. We take advantage of the efficiency of our approach to analyze gene expression data. By decomposing gene coexpression into a genetic and environmental component, we show that our method provides fundamental insights into the nature of co-expressed genes. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM |
Wang, Zhanyong; Sul, Jae Hoon; Snir, Sagi; Lozano, Jose A; Eskin, Eleazar Gene-Gene Interactions Detection Using a Two-stage Model. Journal Article J Comput Biol, 22 (6), pp. 563-76, 2015, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: genome-wide association studies, Threshold-based Efficient Pairwise Association Approach @article{Wang:JComputBiol:2015b, title = {Gene-Gene Interactions Detection Using a Two-stage Model.}, author = { Zhanyong Wang and Jae Hoon Sul and Sagi Snir and Jose A. Lozano and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2014.0163}, issn = {1557-8666}, year = {2015}, date = {2015-01-01}, journal = {J Comput Biol}, volume = {22}, number = {6}, pages = {563-76}, address = {United States}, abstract = {Genome-wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphisms (SNPs) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach}, keywords = {genome-wide association studies, Threshold-based Efficient Pairwise Association Approach}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphisms (SNPs) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach |
Org, Elin; Parks, Brian W W; Joo, Jong Wha J; Emert, Benjamin; Schwartzman, William; Kang, Eun Yong; Mehrabian, Margarete; Pan, Calvin; Knight, Rob; Gunsalus, Robert; Drake, Thomas A; Eskin, Eleazar; Lusis, Aldons J Genetic and environmental control of host-gut microbiota interactions. Journal Article Genome Res, 2015, ISSN: 1549-5469. Abstract | Links | BibTeX | Tags: Mouse Genetics. HDMAP, RNA sequencing @article{Org:GenomeRes:2015b, title = {Genetic and environmental control of host-gut microbiota interactions.}, author = {Elin Org and Brian W. W. Parks and Jong Wha J. Joo and Benjamin Emert and William Schwartzman and Eun Yong Kang and Margarete Mehrabian and Calvin Pan and Rob Knight and Robert Gunsalus and Thomas A. Drake and Eleazar Eskin and Aldons J. Lusis}, url = {http://dx.doi.org/10.1101/gr.194118.115}, issn = {1549-5469}, year = {2015}, date = {2015-01-01}, journal = {Genome Res}, abstract = {Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high resolution association mapping. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain AxB19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies}, keywords = {Mouse Genetics. HDMAP, RNA sequencing}, pubstate = {published}, tppubtype = {article} } Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high resolution association mapping. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain AxB19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies |
Hiyari, S; Atti, E; Camargo, P M; Eskin, E; Lusis, A J; Tetradis, S; Pirih, F Q Heritability of periodontal bone loss in mice. Journal Article J Periodontal Res, 2015, ISSN: 1600-0765. Abstract | Links | BibTeX | Tags: Periodontitis @article{Hiyari:JPeriodontalRes:2015b, title = {Heritability of periodontal bone loss in mice.}, author = { S. Hiyari and E. Atti and P. M. Camargo and E. Eskin and A. J. Lusis and S. Tetradis and F. Q. Pirih}, url = {http://dx.doi.org/10.1111/jre.12258}, issn = {1600-0765}, year = {2015}, date = {2015-01-01}, journal = {J Periodontal Res}, abstract = {BACKGROUND: Periodontitis is an inflammatory disease of the periodontal tissues that compromises tooth support and can lead to tooth loss. Although bacterial biofilm is central in disease pathogenesis, the host response plays an important role in the progression and severity of periodontitis. Indeed, clinical genetic studies indicate that periodontitis is 50% heritable. In this study, we hypothesized that lipopolysaccharide (LPS) injections lead to a strain-dependent periodontal bone loss pattern. MATERIAL AND METHODS: We utilized five inbred mouse strains that derive the recombinant strains of the hybrid mouse diversity panel. Mice received Porphyromonas gingivalis-LPS injections for 6udotwk. RESULTS AND CONCLUSION: Micro-computed tomography analysis demonstrated a statistically significant strain-dependent bone loss. The most susceptible strain, C57BL/6J, had a fivefold higher LPS-induced bone loss compared to the most resistant strain, A/J. More importantly, periodontal bone loss revealed 49% heritability, which closely mimics periodontitis heritability for patients. To evaluate further the functional differences that underlie periodontal bone loss, osteoclast numbers of C57BL/6J and A/J mice were measured in vivo and in vitro. In vitro analysis of osteoclastogenic potential showed a higher number of osteoclasts in C57BL/6J compared to A/J mice. In vivo LPS injections statistically significantly increased osteoclast numbers in both groups. Importantly, the number of osteoclasts was higher in C57BL/6J vs. A/J mice. These data support a significant role of the genetic framework in LPS-induced periodontal bone loss and the feasibility of utilizing the hybrid mouse diversity panel to determine the genetic factors that affect periodontal bone loss. Expanding these studies will contribute in predicting patients genetically predisposed to periodontitis and in identifying the biological basis of disease susceptibility}, keywords = {Periodontitis}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Periodontitis is an inflammatory disease of the periodontal tissues that compromises tooth support and can lead to tooth loss. Although bacterial biofilm is central in disease pathogenesis, the host response plays an important role in the progression and severity of periodontitis. Indeed, clinical genetic studies indicate that periodontitis is 50% heritable. In this study, we hypothesized that lipopolysaccharide (LPS) injections lead to a strain-dependent periodontal bone loss pattern. MATERIAL AND METHODS: We utilized five inbred mouse strains that derive the recombinant strains of the hybrid mouse diversity panel. Mice received Porphyromonas gingivalis-LPS injections for 6udotwk. RESULTS AND CONCLUSION: Micro-computed tomography analysis demonstrated a statistically significant strain-dependent bone loss. The most susceptible strain, C57BL/6J, had a fivefold higher LPS-induced bone loss compared to the most resistant strain, A/J. More importantly, periodontal bone loss revealed 49% heritability, which closely mimics periodontitis heritability for patients. To evaluate further the functional differences that underlie periodontal bone loss, osteoclast numbers of C57BL/6J and A/J mice were measured in vivo and in vitro. In vitro analysis of osteoclastogenic potential showed a higher number of osteoclasts in C57BL/6J compared to A/J mice. In vivo LPS injections statistically significantly increased osteoclast numbers in both groups. Importantly, the number of osteoclasts was higher in C57BL/6J vs. A/J mice. These data support a significant role of the genetic framework in LPS-induced periodontal bone loss and the feasibility of utilizing the hybrid mouse diversity panel to determine the genetic factors that affect periodontal bone loss. Expanding these studies will contribute in predicting patients genetically predisposed to periodontitis and in identifying the biological basis of disease susceptibility |
Rau, Christoph D; Parks, Brian; Wang, Yibin; Eskin, Eleazar; Simecek, Petr; Churchill, Gary A; Lusis, Aldons J High Density Genotypes of Inbred Mouse Strains: Improved Power and Precision of Association Mapping. Journal Article G3 (Bethesda), 5 (10), pp. 2021-6, 2015, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: HMDP, Mouse Genetics @article{Rau:G3:2015b, title = {High Density Genotypes of Inbred Mouse Strains: Improved Power and Precision of Association Mapping.}, author = { Christoph D. Rau and Brian Parks and Yibin Wang and Eleazar Eskin and Petr Simecek and Gary A. Churchill and Aldons J. Lusis}, url = {http://dx.doi.org/10.1534/g3.115.020784}, issn = {2160-1836}, year = {2015}, date = {2015-01-01}, journal = {G3 (Bethesda)}, volume = {5}, number = {10}, pages = {2021-6}, address = {United States}, abstract = {Human genome-wide association studies (GWAS) have identified thousands of loci associated with disease phenotypes. GWAS studies have also become feasible using rodent models and these have some important advantages over human studies including controlled environment, access to tissues for molecular profiling, reproducible genotypes and a wide array of techniques for experimental validation. Association mapping with common mouse inbred strains generally requires one hundred or more strains to achieve sufficient power and mapping resolution; in contrast, sample sizes for human studies are typically one or more orders of magnitude greater than this. To enable well-powered studies in mice, we have generated high-density genotypes for ~175 inbred strains of mice using the Mouse Diversity Array. These new data increase marker density by 1.9-fold, have reduced missing data rates, and provide more accurate identification of heterozygous regions compared to previous genotype data. We report the discovery of new loci from previously reported association mapping studies using the new genotype data. The data are freely available for download and web-based tools provide easy access for association mapping and viewing of the underlying intensity data for individual loci}, keywords = {HMDP, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Human genome-wide association studies (GWAS) have identified thousands of loci associated with disease phenotypes. GWAS studies have also become feasible using rodent models and these have some important advantages over human studies including controlled environment, access to tissues for molecular profiling, reproducible genotypes and a wide array of techniques for experimental validation. Association mapping with common mouse inbred strains generally requires one hundred or more strains to achieve sufficient power and mapping resolution; in contrast, sample sizes for human studies are typically one or more orders of magnitude greater than this. To enable well-powered studies in mice, we have generated high-density genotypes for ~175 inbred strains of mice using the Mouse Diversity Array. These new data increase marker density by 1.9-fold, have reduced missing data rates, and provide more accurate identification of heterozygous regions compared to previous genotype data. We report the discovery of new loci from previously reported association mapping studies using the new genotype data. The data are freely available for download and web-based tools provide easy access for association mapping and viewing of the underlying intensity data for individual loci |
Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun Y; Pasaniuc, Bogdan; Eskin, Eleazar Identification of causal genes for complex traits. Journal Article Bioinformatics, 31 (12), pp. i206-i213, 2015, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Hormozdiari:Bioinformatics:2015b, title = {Identification of causal genes for complex traits.}, author = { Farhad Hormozdiari and Gleb Kichaev and Wen-Yun Y. Yang and Bogdan Pasaniuc and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btv240}, issn = {1367-4811}, year = {2015}, date = {2015-01-01}, journal = {Bioinformatics}, volume = {31}, number = {12}, pages = {i206-i213}, address = {England}, abstract = {MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu |
Zhou, Xiaoying; Crow, Amanda L; Hartiala, Jaana; Spindler, Tassja J; Ghazalpour, Anatole; Barsky, Lora W; Bennett, Brian J; Parks, Brian W; Eskin, Eleazar; Jain, Rajan; Epstein, Jonathan A; Lusis, Aldons J; Adams, Gregor B; Allayee, Hooman The Genetic Landscape of Hematopoietic Stem Cell Frequency in Mice. Journal Article Stem Cell Reports, 5 (1), pp. 125-38, 2015, ISSN: 2213-6711. Abstract | Links | BibTeX | Tags: HMDP @article{Zhou:StemCellReports:2015b, title = {The Genetic Landscape of Hematopoietic Stem Cell Frequency in Mice.}, author = { Xiaoying Zhou and Amanda L. Crow and Jaana Hartiala and Tassja J. Spindler and Anatole Ghazalpour and Lora W. Barsky and Brian J. Bennett and Brian W. Parks and Eleazar Eskin and Rajan Jain and Jonathan A. Epstein and Aldons J. Lusis and Gregor B. Adams and Hooman Allayee}, url = {http://dx.doi.org/10.1016/j.stemcr.2015.05.008}, issn = {2213-6711}, year = {2015}, date = {2015-01-01}, journal = {Stem Cell Reports}, volume = {5}, number = {1}, pages = {125-38}, address = {United States}, abstract = {Prior efforts to identify regulators of hematopoietic stem cell physiology have relied mainly on candidate gene approaches with genetically modified mice. Here we used a genome-wide association study (GWAS) strategy with the hybrid mouse diversity panel to identify the genetic determinants of hematopoietic stem/progenitor cell (HSPC) frequency. Among 108 strains, we observed 120- to 300-fold variation in three HSPC populations. A GWAS analysis identified several loci that were significantly associated with HSPC frequency, including a locus on chromosome 5 harboring the homeodomain-only protein gene (Hopx). Hopx previously had been implicated in cardiac development but was not known to influence HSPC biology. Analysis of the HSPC pool in Hopx(-/-) mice demonstrated significantly reduced cell frequencies and impaired engraftment in competitive repopulation assays, thus providing functional validation of this positional candidate gene. These results demonstrate the power of GWAS in mice to identify genetic determinants of the hematopoietic system}, keywords = {HMDP}, pubstate = {published}, tppubtype = {article} } Prior efforts to identify regulators of hematopoietic stem cell physiology have relied mainly on candidate gene approaches with genetically modified mice. Here we used a genome-wide association study (GWAS) strategy with the hybrid mouse diversity panel to identify the genetic determinants of hematopoietic stem/progenitor cell (HSPC) frequency. Among 108 strains, we observed 120- to 300-fold variation in three HSPC populations. A GWAS analysis identified several loci that were significantly associated with HSPC frequency, including a locus on chromosome 5 harboring the homeodomain-only protein gene (Hopx). Hopx previously had been implicated in cardiac development but was not known to influence HSPC biology. Analysis of the HSPC pool in Hopx(-/-) mice demonstrated significantly reduced cell frequencies and impaired engraftment in competitive repopulation assays, thus providing functional validation of this positional candidate gene. These results demonstrate the power of GWAS in mice to identify genetic determinants of the hematopoietic system |
Patananan, Alexander Nikolich; Budenholzer, Lauren Michelle; Eskin, Ascia; Torres, Eric Rommel; Clarke, Steven Gerard Ethanol-induced differential gene expression and acetyl-CoA metabolism in a longevity model of the nematode Caenorhabditis elegans. Journal Article Exp Gerontol, 61 , pp. 20-30, 2015, ISSN: 1873-6815. Abstract | Links | BibTeX | Tags: Caenorhabditis, ethanol, RNA sequencing @article{Patananan:ExpGerontol:2015, title = {Ethanol-induced differential gene expression and acetyl-CoA metabolism in a longevity model of the nematode Caenorhabditis elegans.}, author = {Alexander Nikolich Patananan and Lauren Michelle Budenholzer and Ascia Eskin and Eric Rommel Torres and Steven Gerard Clarke}, url = {http://dx.doi.org/10.1016/j.exger.2014.11.010}, issn = {1873-6815}, year = {2015}, date = {2015-01-01}, journal = {Exp Gerontol}, volume = {61}, pages = {20-30}, address = {England}, abstract = {Previous studies have shown that exposing adults of the soil-dwelling nematode Caenorhabditis elegans to concentrations of ethanol in the range of 100-400mM results in slowed locomotion, decreased fertility, and reduced longevity. On the contrary, lower concentrations of ethanol (0.86-68mM) have been shown to cause a two- to three-fold increase in the life span of animals in the stress resistant L1 larval stage in the absence of a food source. However, little is known about how gene and protein expression is altered by low concentrations of ethanol and the mechanism for the increased longevity. Therefore, we used biochemical assays and next generation mRNA sequencing to identify genes and biological pathways altered by ethanol. RNA-seq analysis of L1 larvae incubated in the presence of 17mM ethanol resulted in the significant differential expression of 649 genes, 274 of which were downregulated and 375 were upregulated. Many of the genes significantly altered were associated with the conversion of ethanol and triglycerides to acetyl-CoA and glucose, suggesting that ethanol is serving as an energy source in the increased longevity of the L1 larvae as well as a signal for fat utilization. We also asked if L1 larvae could sense ethanol and respond by directed movement. Although we found that L1 larvae can chemotax to benzaldehyde, we observed little or no chemotaxis to ethanol. Understanding how low concentrations of ethanol increase the lifespan of L1 larvae may provide insight into not only the longevity pathways in C. elegans, but also in those of higher organisms}, keywords = {Caenorhabditis, ethanol, RNA sequencing}, pubstate = {published}, tppubtype = {article} } Previous studies have shown that exposing adults of the soil-dwelling nematode Caenorhabditis elegans to concentrations of ethanol in the range of 100-400mM results in slowed locomotion, decreased fertility, and reduced longevity. On the contrary, lower concentrations of ethanol (0.86-68mM) have been shown to cause a two- to three-fold increase in the life span of animals in the stress resistant L1 larval stage in the absence of a food source. However, little is known about how gene and protein expression is altered by low concentrations of ethanol and the mechanism for the increased longevity. Therefore, we used biochemical assays and next generation mRNA sequencing to identify genes and biological pathways altered by ethanol. RNA-seq analysis of L1 larvae incubated in the presence of 17mM ethanol resulted in the significant differential expression of 649 genes, 274 of which were downregulated and 375 were upregulated. Many of the genes significantly altered were associated with the conversion of ethanol and triglycerides to acetyl-CoA and glucose, suggesting that ethanol is serving as an energy source in the increased longevity of the L1 larvae as well as a signal for fat utilization. We also asked if L1 larvae could sense ethanol and respond by directed movement. Although we found that L1 larvae can chemotax to benzaldehyde, we observed little or no chemotaxis to ethanol. Understanding how low concentrations of ethanol increase the lifespan of L1 larvae may provide insight into not only the longevity pathways in C. elegans, but also in those of higher organisms |
Baxter, Ruth M; Arboleda, Valerie A; Lee, Hane; Barseghyan, Hayk; Adam, Margaret P; Fechner, Patricia Y; Bargman, Renee; Keegan, Catherine; Travers, Sharon; Schelley, Susan; Hudgins, Louanne; Mathew, Revi P; Stalker, Heather J; Zori, Roberto; Gordon, Ora K; Ramos-Platt, Leigh; Pawlikowska-Haddal, Anna; Eskin, Ascia; Nelson, Stanley F; Délot, Emmanuèle; Vilain, Eric Exome sequencing for the diagnosis of 46,XY disorders of sex development. Journal Article J Clin Endocrinol Metab, 100 (2), pp. E333-44, 2015, ISSN: 1945-7197. Abstract | Links | BibTeX | Tags: Disorders of sex development, exome sequencing @article{Baxter:JClinEndocrinolMetab:2015, title = {Exome sequencing for the diagnosis of 46,XY disorders of sex development.}, author = {Ruth M. Baxter and Valerie A. Arboleda and Hane Lee and Hayk Barseghyan and Margaret P. Adam and Patricia Y. Fechner and Renee Bargman and Catherine Keegan and Sharon Travers and Susan Schelley and Louanne Hudgins and Revi P. Mathew and Heather J. Stalker and Roberto Zori and Ora K. Gordon and Leigh Ramos-Platt and Anna Pawlikowska-Haddal and Ascia Eskin and Stanley F. Nelson and Emmanuèle Délot and Eric Vilain}, url = {http://dx.doi.org/10.1210/jc.2014-2605}, issn = {1945-7197}, year = {2015}, date = {2015-01-01}, journal = {J Clin Endocrinol Metab}, volume = {100}, number = {2}, pages = {E333-44}, address = {United States}, abstract = {CONTEXT: Disorders of sex development (DSD) are clinical conditions where there is a discrepancy between the chromosomal sex and the phenotypic (gonadal or genital) sex of an individual. Such conditions can be stressful for patients and their families and have historically been difficult to diagnose, especially at the genetic level. In particular, for cases of 46,XY gonadal dysgenesis, once variants in SRY and NR5A1 have been ruled out, there are few other single gene tests available. OBJECTIVE: We used exome sequencing followed by analysis with a list of all known human DSD-associated genes to investigate the underlying genetic etiology of 46,XY DSD patients who had not previously received a genetic diagnosis. DESIGN: Samples were either submitted to the research laboratory or submitted as clinical samples to the UCLA Clinical Genomic Center. Sequencing data were filtered using a list of genes known to be involved in DSD. RESULTS: We were able to identify a likely genetic diagnosis in more than a third of cases, including 22.5% with a pathogenic finding, an additional 12.5% with likely pathogenic findings, and 15% with variants of unknown clinical significance. CONCLUSIONS: Early identification of the genetic cause of a DSD will in many cases streamline and direct the clinical management of the patient, with more focused endocrine and imaging studies and better-informed surgical decisions. Exome sequencing proved an efficient method toward such a goal in 46,XY DSD patients}, keywords = {Disorders of sex development, exome sequencing}, pubstate = {published}, tppubtype = {article} } CONTEXT: Disorders of sex development (DSD) are clinical conditions where there is a discrepancy between the chromosomal sex and the phenotypic (gonadal or genital) sex of an individual. Such conditions can be stressful for patients and their families and have historically been difficult to diagnose, especially at the genetic level. In particular, for cases of 46,XY gonadal dysgenesis, once variants in SRY and NR5A1 have been ruled out, there are few other single gene tests available. OBJECTIVE: We used exome sequencing followed by analysis with a list of all known human DSD-associated genes to investigate the underlying genetic etiology of 46,XY DSD patients who had not previously received a genetic diagnosis. DESIGN: Samples were either submitted to the research laboratory or submitted as clinical samples to the UCLA Clinical Genomic Center. Sequencing data were filtered using a list of genes known to be involved in DSD. RESULTS: We were able to identify a likely genetic diagnosis in more than a third of cases, including 22.5% with a pathogenic finding, an additional 12.5% with likely pathogenic findings, and 15% with variants of unknown clinical significance. CONCLUSIONS: Early identification of the genetic cause of a DSD will in many cases streamline and direct the clinical management of the patient, with more focused endocrine and imaging studies and better-informed surgical decisions. Exome sequencing proved an efficient method toward such a goal in 46,XY DSD patients |
Bennett, Brian J; Davis, Richard C; Civelek, Mete; Orozco, Luz; Wu, Judy; Qi, Hannah; Pan, Calvin; Packard, René Sevag R; Eskin, Eleazar; Yan, Mujing; Kirchgessner, Todd; Wang, Zeneng; Li, Xinmin; Gregory, Jill C; Hazen, Stanley L; Gargalovic, Peter S; JLusis, Aldons Genetic Architecture of Atherosclerosis in Mice: A Systems Genetics Analysis of Common Inbred Strains. Journal Article PLoS Genet, 11 (12), pp. e1005711, 2015, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: atherosclerosis, genome-wide association studies, HMDP @article{Bennett:PlosGenet:2015, title = {Genetic Architecture of Atherosclerosis in Mice: A Systems Genetics Analysis of Common Inbred Strains.}, author = {Brian J. Bennett and Richard C. Davis and Mete Civelek and Luz Orozco and Judy Wu and Hannah Qi and Calvin Pan and René R. Sevag Packard and Eleazar Eskin and Mujing Yan and Todd Kirchgessner and Zeneng Wang and Xinmin Li and Jill C. Gregory and Stanley L. Hazen and Peter S. Gargalovic and Aldons JLusis}, url = {http://dx.doi.org/10.1371/journal.pgen.1005711}, issn = {1553-7404}, year = {2015}, date = {2015-01-01}, journal = {PLoS Genet}, volume = {11}, number = {12}, pages = {e1005711}, address = {United States}, abstract = {Common forms of atherosclerosis involve multiple genetic and environmental factors. While human genome-wide association studies have identified numerous loci contributing to coronary artery disease and its risk factors, these studies are unable to control environmental factors or examine detailed molecular traits in relevant tissues. We now report a study of natural variations contributing to atherosclerosis and related traits in over 100 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP). The mice were made hyperlipidemic by transgenic expression of human apolipoprotein E-Leiden (APOE-Leiden) and human cholesteryl ester transfer protein (CETP). The mice were examined for lesion size and morphology as well as plasma lipid, insulin and glucose levels, and blood cell profiles. A subset of mice was studied for plasma levels of metabolites and cytokines. We also measured global transcript levels in aorta and liver. Finally, the uptake of acetylated LDL by macrophages from HMDP mice was quantitatively examined. Loci contributing to the traits were mapped using association analysis, and relationships among traits were examined using correlation and statistical modeling. A number of conclusions emerged. First, relationships among atherosclerosis and the risk factors in mice resemble those found in humans. Second, a number of trait-loci were identified, including some overlapping with previous human and mouse studies. Third, gene expression data enabled enrichment analysis of pathways contributing to atherosclerosis and prioritization of candidate genes at associated loci in both mice and humans. Fourth, the data provided a number of mechanistic inferences; for example, we detected no association between macrophage uptake of acetylated LDL and atherosclerosis. Fifth, broad sense heritability for atherosclerosis was much larger than narrow sense heritability, indicating an important role for gene-by-gene interactions. Sixth, stepwise linear regression showed that the combined variations in plasma metabolites, including LDL/VLDL-cholesterol, trimethylamine N-oxide (TMAO), arginine, glucose and insulin, account for approximately 30 to 40% of the variation in atherosclerotic lesion area. Overall, our data provide a rich resource for studies of complex interactions underlying atherosclerosis}, keywords = {atherosclerosis, genome-wide association studies, HMDP}, pubstate = {published}, tppubtype = {article} } Common forms of atherosclerosis involve multiple genetic and environmental factors. While human genome-wide association studies have identified numerous loci contributing to coronary artery disease and its risk factors, these studies are unable to control environmental factors or examine detailed molecular traits in relevant tissues. We now report a study of natural variations contributing to atherosclerosis and related traits in over 100 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP). The mice were made hyperlipidemic by transgenic expression of human apolipoprotein E-Leiden (APOE-Leiden) and human cholesteryl ester transfer protein (CETP). The mice were examined for lesion size and morphology as well as plasma lipid, insulin and glucose levels, and blood cell profiles. A subset of mice was studied for plasma levels of metabolites and cytokines. We also measured global transcript levels in aorta and liver. Finally, the uptake of acetylated LDL by macrophages from HMDP mice was quantitatively examined. Loci contributing to the traits were mapped using association analysis, and relationships among traits were examined using correlation and statistical modeling. A number of conclusions emerged. First, relationships among atherosclerosis and the risk factors in mice resemble those found in humans. Second, a number of trait-loci were identified, including some overlapping with previous human and mouse studies. Third, gene expression data enabled enrichment analysis of pathways contributing to atherosclerosis and prioritization of candidate genes at associated loci in both mice and humans. Fourth, the data provided a number of mechanistic inferences; for example, we detected no association between macrophage uptake of acetylated LDL and atherosclerosis. Fifth, broad sense heritability for atherosclerosis was much larger than narrow sense heritability, indicating an important role for gene-by-gene interactions. Sixth, stepwise linear regression showed that the combined variations in plasma metabolites, including LDL/VLDL-cholesterol, trimethylamine N-oxide (TMAO), arginine, glucose and insulin, account for approximately 30 to 40% of the variation in atherosclerotic lesion area. Overall, our data provide a rich resource for studies of complex interactions underlying atherosclerosis |
Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun Y; Pasaniuc, Bogdan; Eskin, Eleazar Identification of causal genes for complex traits. Journal Article Bioinformatics, 31 (12), pp. i206-i213, 2015, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Hormozdiari:Bioinformatics:2015, title = {Identification of causal genes for complex traits.}, author = {Farhad Hormozdiari and Gleb Kichaev and Wen-Yun Y. Yang and Bogdan Pasaniuc and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btv240}, issn = {1367-4811}, year = {2015}, date = {2015-01-01}, journal = {Bioinformatics}, volume = {31}, number = {12}, pages = {i206-i213}, address = {England}, abstract = {MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu |
Brown, Robert; Lee, Hane; Eskin, Ascia; Kichaev, Gleb; Lohmueller, Kirk E; Reversade, Bruno; Nelson, Stanley F; Pasaniuc, Bogdan Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders. Journal Article Eur J Hum Genet, 24 (1), pp. 113-9, 2015, ISSN: 1476-5438. Abstract | Links | BibTeX | Tags: Causal Inference, exome sequencing, monogenic disorders @article{Brown:EurJHumGenet:2015, title = {Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders.}, author = {Robert Brown and Hane Lee and Ascia Eskin and Gleb Kichaev and Kirk E. Lohmueller and Bruno Reversade and Stanley F. Nelson and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1038/ejhg.2015.68}, issn = {1476-5438}, year = {2015}, date = {2015-01-01}, journal = {Eur J Hum Genet}, volume = {24}, number = {1}, pages = {113-9}, address = {England}, abstract = {Recent breakthroughs in exome-sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (eg, child and parents) are simultaneously sequenced, sequencing of a single case is often unsuccessful due to the large number of variants that need to be followed up for functional validation. Many approaches filter out common variants above a given frequency threshold (eg, 1%), and then prioritize the remaining variants according to their functional, structural and conservation properties. Here we present methods that leverage the genetic structure across different populations to improve filtering performance while accounting for the finite sample size of the reference panels. We show that leveraging genetic structure reduces the number of variants that need to be followed up by 16% in simulations and by up to 38% in empirical data of 20 exomes from individuals with monogenic disorders for which the causal variants are known.European Journal of Human Genetics advance online publication, 22 April 2015; doi:10.1038/ejhg.2015.68}, keywords = {Causal Inference, exome sequencing, monogenic disorders}, pubstate = {published}, tppubtype = {article} } Recent breakthroughs in exome-sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (eg, child and parents) are simultaneously sequenced, sequencing of a single case is often unsuccessful due to the large number of variants that need to be followed up for functional validation. Many approaches filter out common variants above a given frequency threshold (eg, 1%), and then prioritize the remaining variants according to their functional, structural and conservation properties. Here we present methods that leverage the genetic structure across different populations to improve filtering performance while accounting for the finite sample size of the reference panels. We show that leveraging genetic structure reduces the number of variants that need to be followed up by 16% in simulations and by up to 38% in empirical data of 20 exomes from individuals with monogenic disorders for which the causal variants are known.European Journal of Human Genetics advance online publication, 22 April 2015; doi:10.1038/ejhg.2015.68 |
Ng, Calista K L; Shboul, Mohammad; Taverniti, Valerio; Bonnard, Carine; Lee, Hane; Eskin, Ascia; Nelson, Stanley F; Al-Raqad, Mohammed; Altawalbeh, Samah; Séraphin, Bertrand; Reversade, Bruno Loss of the scavenger mRNA decapping enzyme DCPS causes syndromic intellectual disability with neuromuscular defects. Journal Article Hum Mol Genet, 24 (11), pp. 3163-71, 2015, ISSN: 1460-2083. Abstract | Links | BibTeX | Tags: DCPS, mRNA @article{Ng:HumMolGenet:2015, title = {Loss of the scavenger mRNA decapping enzyme DCPS causes syndromic intellectual disability with neuromuscular defects.}, author = {Calista K. L. Ng and Mohammad Shboul and Valerio Taverniti and Carine Bonnard and Hane Lee and Ascia Eskin and Stanley F. Nelson and Mohammed Al-Raqad and Samah Altawalbeh and Bertrand Séraphin and Bruno Reversade}, url = {http://dx.doi.org/10.1093/hmg/ddv067}, issn = {1460-2083}, year = {2015}, date = {2015-01-01}, journal = {Hum Mol Genet}, volume = {24}, number = {11}, pages = {3163-71}, address = {England}, abstract = {mRNA decay is an essential and active process that allows cells to continuously adapt gene expression to internal and environmental cues. There are two mRNA degradation pathways: 3' to 5' and 5' to 3'. The DCPS protein is the scavenger mRNA decapping enzyme which functions in the last step of the 3' end mRNA decay pathway. We have identified a DCPS pathogenic mutation in a large family with three affected individuals presenting with a novel recessive syndrome consisting of craniofacial anomalies, intellectual disability and neuromuscular defects. Using patient's primary cells, we show that this homozygous splice mutation results in a DCPS loss-of-function allele. Diagnostic biochemical analyses using various m7G cap derivatives as substrates reveal no DCPS enzymatic activity in patient's cells. Our results implicate DCPS and more generally RNA catabolism, as a critical cellular process for neurological development, normal cognition and organismal homeostasis in humans}, keywords = {DCPS, mRNA}, pubstate = {published}, tppubtype = {article} } mRNA decay is an essential and active process that allows cells to continuously adapt gene expression to internal and environmental cues. There are two mRNA degradation pathways: 3' to 5' and 5' to 3'. The DCPS protein is the scavenger mRNA decapping enzyme which functions in the last step of the 3' end mRNA decay pathway. We have identified a DCPS pathogenic mutation in a large family with three affected individuals presenting with a novel recessive syndrome consisting of craniofacial anomalies, intellectual disability and neuromuscular defects. Using patient's primary cells, we show that this homozygous splice mutation results in a DCPS loss-of-function allele. Diagnostic biochemical analyses using various m7G cap derivatives as substrates reveal no DCPS enzymatic activity in patient's cells. Our results implicate DCPS and more generally RNA catabolism, as a critical cellular process for neurological development, normal cognition and organismal homeostasis in humans |
Nanda, Vikrum; Gutman, Boris; Bar, Ehab; Alghamdi, Suha; Tetradis, Sotirios; Lusis, Aldons J; Eskin, Eleazar; Moon, Won Quantitative analysis of 3-dimensional facial soft tissue photographic images: technical methods and clinical application. Journal Article Prog Orthod, 16 , pp. 21, 2015, ISSN: 2196-1042. Abstract | Links | BibTeX | Tags: 3D photography, anthropomorphics @article{Nanda:ProgOrthod:2015, title = {Quantitative analysis of 3-dimensional facial soft tissue photographic images: technical methods and clinical application.}, author = {Vikrum Nanda and Boris Gutman and Ehab Bar and Suha Alghamdi and Sotirios Tetradis and Aldons J. Lusis and Eleazar Eskin and Won Moon}, url = {http://dx.doi.org/10.1186/s40510-015-0082-0}, issn = {2196-1042}, year = {2015}, date = {2015-01-01}, journal = {Prog Orthod}, volume = {16}, pages = {21}, address = {Germany}, abstract = {BACKGROUND: The recent advent of 3D photography has created the potential for comprehensive facial evaluation. However, lack of practical true 3D analysis of the information collected from 3D images has been the factor limiting widespread utilization in orthodontics. Current evaluation of 3D facial soft tissue images relies on subjective visual evaluation and 2D distances to assess facial disharmony. The objectives of this project strive to map the surface and define boundaries of 3D facial soft tissue, modify mathematical functions to average multiple 3D facial images, and mathematically average 3D facial images allowing generation of color-coded surface deviation relative to a true average. METHODS: Collaboration headed by UCLA Orthodontics with UCLA Neuroimaging was initiated to modify advanced brain mapping technology to accurately map the facial surface in 3D. 10 subjects were selected as a sample for development of the technical protocol. 3dMD photographic images were segmented, corrected using a series of topology correcting algorithms, and process to create close meshes. Shapes were mapped to a sphere using conformal and area preserving maps, and were then registered using a spherical patch mapping approach. Finally an average was created using 7-parameter procrustes alignment. RESULTS: Size-standardized average facial images were generated for the sample population. A single patient was then superimposed on the average and color-coded displacement maps were generated to demonstrate the clinical applicability of this protocol. Further confirmation of the methods through 3D superimposition of the initial (T0) average to the 4 week (T4) average was completed and analyzed. CONCLUSIONS: The results of this investigation suggest that it is possible to average multiple facial images of highly variable topology. The immediate application of this research will be rapid and detailed diagnostic imaging analysis for orthodontic and surgical treatment planning. There is great potential for application to anthropometrics and genomics. This investigation resulted in establishment of a protocol for mapping the surface of the human face in three dimensions}, keywords = {3D photography, anthropomorphics}, pubstate = {published}, tppubtype = {article} } BACKGROUND: The recent advent of 3D photography has created the potential for comprehensive facial evaluation. However, lack of practical true 3D analysis of the information collected from 3D images has been the factor limiting widespread utilization in orthodontics. Current evaluation of 3D facial soft tissue images relies on subjective visual evaluation and 2D distances to assess facial disharmony. The objectives of this project strive to map the surface and define boundaries of 3D facial soft tissue, modify mathematical functions to average multiple 3D facial images, and mathematically average 3D facial images allowing generation of color-coded surface deviation relative to a true average. METHODS: Collaboration headed by UCLA Orthodontics with UCLA Neuroimaging was initiated to modify advanced brain mapping technology to accurately map the facial surface in 3D. 10 subjects were selected as a sample for development of the technical protocol. 3dMD photographic images were segmented, corrected using a series of topology correcting algorithms, and process to create close meshes. Shapes were mapped to a sphere using conformal and area preserving maps, and were then registered using a spherical patch mapping approach. Finally an average was created using 7-parameter procrustes alignment. RESULTS: Size-standardized average facial images were generated for the sample population. A single patient was then superimposed on the average and color-coded displacement maps were generated to demonstrate the clinical applicability of this protocol. Further confirmation of the methods through 3D superimposition of the initial (T0) average to the 4 week (T4) average was completed and analyzed. CONCLUSIONS: The results of this investigation suggest that it is possible to average multiple facial images of highly variable topology. The immediate application of this research will be rapid and detailed diagnostic imaging analysis for orthodontic and surgical treatment planning. There is great potential for application to anthropometrics and genomics. This investigation resulted in establishment of a protocol for mapping the surface of the human face in three dimensions |
Crow, Amanda L; Ohmen, Jeffrey; Wang, Juemei; Lavinsky, Joel; Hartiala, Jaana; Li, Qingzhong; Li, Xin; Salehide, Pezhman; Eskin, Eleazar; Pan, Calvin; Lusis, Aldons J; Allayee, Hooman; Friedman, Rick A The Genetic Architecture of Hearing Impairment in Mice: Evidence for Frequency Specific Genetic Determinants. Journal Article G3 (Bethesda), 2015, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: cochlear function, genome-wide association studies, phenotypic heterogeneity @article{Crow:G3:2015b, title = {The Genetic Architecture of Hearing Impairment in Mice: Evidence for Frequency Specific Genetic Determinants.}, author = { Amanda L. Crow and Jeffrey Ohmen and Juemei Wang and Joel Lavinsky and Jaana Hartiala and Qingzhong Li and Xin Li and Pezhman Salehide and Eleazar Eskin and Calvin Pan and Aldons J. Lusis and Hooman Allayee and Rick A. Friedman}, url = {http://dx.doi.org/10.1534/g3.115.021592}, issn = {2160-1836}, year = {2015}, date = {2015-01-01}, journal = {G3 (Bethesda)}, abstract = {Genome-wide association studies (GWAS) have been successfully applied in humans for the study of many complex phenotypes. However, identification of the genetic determinants of hearing in adults has been hampered, in part, by the relative inability to control for environmental factors that might affect hearing throughout the lifetime, as well as a large degree of phenotypic heterogeneity. These and other factors have limited the number of large-scale studies performed in humans that have identified candidate genes that contribute to the etiology of this complex trait. In order to address these limitations, we performed a GWAS analysis using a set of inbred mouse strains from the Hybrid Mouse Diversity Panel. Among 99 strains characterized, we observed ~2 to 5-fold variation in hearing at six different frequencies, which are differentiated biologically from each other by the location in the cochlea where each frequency is registered. Among all frequencies tested, we identified a total of nine significant loci, several of which contained promising candidate genes for follow-up study. Taken together, our results indicate the existence of both genes that affect global cochlear function, as well as anatomical- - and frequency-specific genes, and further demonstrate the complex nature of mammalian hearing variation}, keywords = {cochlear function, genome-wide association studies, phenotypic heterogeneity}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies (GWAS) have been successfully applied in humans for the study of many complex phenotypes. However, identification of the genetic determinants of hearing in adults has been hampered, in part, by the relative inability to control for environmental factors that might affect hearing throughout the lifetime, as well as a large degree of phenotypic heterogeneity. These and other factors have limited the number of large-scale studies performed in humans that have identified candidate genes that contribute to the etiology of this complex trait. In order to address these limitations, we performed a GWAS analysis using a set of inbred mouse strains from the Hybrid Mouse Diversity Panel. Among 99 strains characterized, we observed ~2 to 5-fold variation in hearing at six different frequencies, which are differentiated biologically from each other by the location in the cochlea where each frequency is registered. Among all frequencies tested, we identified a total of nine significant loci, several of which contained promising candidate genes for follow-up study. Taken together, our results indicate the existence of both genes that affect global cochlear function, as well as anatomical- - and frequency-specific genes, and further demonstrate the complex nature of mammalian hearing variation |
2014 |
He, Dan; Eskin, Eleazar IPED2X: a robust pedigree reconstruction algorithm for complicated pedigrees. Journal Article Journal of Bioinformatics and Computational Biology, 12 , 2014. Abstract | Links | BibTeX | Tags: IPED2X, pedigree reconstruction @article{25553812, title = {IPED2X: a robust pedigree reconstruction algorithm for complicated pedigrees.}, author = {Dan He and Eleazar Eskin}, url = {https://www.ncbi.nlm.nih.gov/pubmed/25553812}, doi = {10.1142/S0219720014420074}, year = {2014}, date = {2014-12-01}, journal = {Journal of Bioinformatics and Computational Biology}, volume = {12}, abstract = {Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. Some recent methods have been developed to reconstruct pedigrees using genotype data only. These methods are accurate and efficient for simple pedigrees which contain only siblings, where two individuals share the same pair of parents. A most recent method IPED2 is able to handle complicated pedigrees with half-sibling relationships, where two individuals share only one parent. However, the method is shown to miss many true positive half-sibling relationships as it removes all suspicious half-sibling relationships during the parent construction process. In this work, we propose a novel method IPED2X, which deploys a more robust algorithm for parent construction in the pedigrees by considering more possible operations rather than simple deletion. We convert the parent construction problem into a graph labeling problem and propose a more effective labeling algorithm. We show in our experiments that IPED2X is more powerful on capturing the true half-sibling relationships, which further leads to better reconstruction accuracy.}, keywords = {IPED2X, pedigree reconstruction}, pubstate = {published}, tppubtype = {article} } Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. Some recent methods have been developed to reconstruct pedigrees using genotype data only. These methods are accurate and efficient for simple pedigrees which contain only siblings, where two individuals share the same pair of parents. A most recent method IPED2 is able to handle complicated pedigrees with half-sibling relationships, where two individuals share only one parent. However, the method is shown to miss many true positive half-sibling relationships as it removes all suspicious half-sibling relationships during the parent construction process. In this work, we propose a novel method IPED2X, which deploys a more robust algorithm for parent construction in the pedigrees by considering more possible operations rather than simple deletion. We convert the parent construction problem into a graph labeling problem and propose a more effective labeling algorithm. We show in our experiments that IPED2X is more powerful on capturing the true half-sibling relationships, which further leads to better reconstruction accuracy. |
Ohmen, Jeffrey; Kang, Eun Yong ; Li, Xin ; Joo, Jong Wha ; Hormozdiari, Farhad ; Zheng, Qing Yin ; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar ; Friedman, Rick A Genome-Wide Association Study for Age-Related Hearing Loss (AHL) in the Mouse: A Meta-Analysis. Journal Article J Assoc Res Otolaryngol, 15 (3), pp. 335-52, 2014, ISSN: 1438-7573. Abstract | Links | BibTeX | Tags: HMDP, Meta-Analysis, Mouse Genetics @article{Ohmen:JAssocResOtolaryngol:2014, title = {Genome-Wide Association Study for Age-Related Hearing Loss (AHL) in the Mouse: A Meta-Analysis.}, author = { Jeffrey Ohmen and Eun Yong Kang and Xin Li and Jong Wha Joo and Farhad Hormozdiari and Qing Yin Zheng and Richard C. Davis and Aldons J. Lusis and Eleazar Eskin and Rick A. Friedman}, url = {http://dx.doi.org/10.1007/s10162-014-0443-2}, issn = {1438-7573}, year = {2014}, date = {2014-01-01}, journal = {J Assoc Res Otolaryngol}, volume = {15}, number = {3}, pages = {335-52}, address = {United States}, abstract = {Age-related hearing loss (AHL) is characterized by a symmetric sensorineural hearing loss primarily in high frequencies and individuals have different levels of susceptibility to AHL. Heritability studies have shown that the sources of this variance are both genetic and environmental, with approximately half of the variance attributable to hereditary factors as reported by Huag and Tang (Eur Arch Otorhinolaryngol 267(8):1179-1191, 2010). Only a limited number of large-scale association studies for AHL have been undertaken in humans, to date. An alternate and complementary approach to these human studies is through the use of mouse models. Advantages of mouse models include that the environment can be more carefully controlled, measurements can be replicated in genetically identical animals, and the proportion of the variability explained by genetic variation is increased. Complex traits in mouse strains have been shown to have higher heritability and genetic loci often have stronger effects on the trait compared to humans. Motivated by these advantages, we have performed the first genome-wide association study of its kind in the mouse by combining several data sets in a meta-analysis to identify loci associated with age-related hearing loss. We identified five genome-wide significant loci (<10(-6)). One of these loci confirmed a previously identified locus (ahl8) on distal chromosome 11 and greatly narrowed the candidate region. Specifically, the most significant associated SNP is located 450udotkb upstream of Fscn2. These data confirm the utility of this approach and provide new high-resolution mapping information about variation within the mouse genome associated with hearing loss}, keywords = {HMDP, Meta-Analysis, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Age-related hearing loss (AHL) is characterized by a symmetric sensorineural hearing loss primarily in high frequencies and individuals have different levels of susceptibility to AHL. Heritability studies have shown that the sources of this variance are both genetic and environmental, with approximately half of the variance attributable to hereditary factors as reported by Huag and Tang (Eur Arch Otorhinolaryngol 267(8):1179-1191, 2010). Only a limited number of large-scale association studies for AHL have been undertaken in humans, to date. An alternate and complementary approach to these human studies is through the use of mouse models. Advantages of mouse models include that the environment can be more carefully controlled, measurements can be replicated in genetically identical animals, and the proportion of the variability explained by genetic variation is increased. Complex traits in mouse strains have been shown to have higher heritability and genetic loci often have stronger effects on the trait compared to humans. Motivated by these advantages, we have performed the first genome-wide association study of its kind in the mouse by combining several data sets in a meta-analysis to identify loci associated with age-related hearing loss. We identified five genome-wide significant loci (<10(-6)). One of these loci confirmed a previously identified locus (ahl8) on distal chromosome 11 and greatly narrowed the candidate region. Specifically, the most significant associated SNP is located 450udotkb upstream of Fscn2. These data confirm the utility of this approach and provide new high-resolution mapping information about variation within the mouse genome associated with hearing loss |
Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas; Joo, Jong Wha J; Shih, Diana; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice Journal Article PLoS Genet, 10 (1), pp. e1004022, 2014, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Genes By Environment, Meta-Analysis, Mouse Genetics @article{10.1371/journal.pgen.1004022, title = {Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice}, author = { Eun Yong Kang and Buhm Han and Nicholas Furlotte and Jong Wha J. Joo and Diana Shih and Richard C. Davis and Aldons J. Lusis and Eleazar Eskin}, url = {http://dx.doi.org/10.1371%2Fjournal.pgen.1004022}, issn = {1553-7404}, year = {2014}, date = {2014-01-01}, journal = {PLoS Genet}, volume = {10}, number = {1}, pages = {e1004022}, publisher = {Public Library of Science}, abstract = {Author Summary Identifying gene-by-environment interactions is important for understand the architecture of a complex trait. Discovering gene-by-environment interaction requires the observation of the same phenotype in individuals under different environments. Model organism studies are often conducted under different environments. These studies provide an unprecedented opportunity for researchers to identify the gene-by-environment interactions. A difference in the effect size of a genetic variant between two studies conducted in different environments may suggest the presence of a gene-by-environment interaction. In this paper, we propose to employ a random-effect-based meta-analysis approach to identify gene-by-environment interaction, which assumes different or heterogeneous effect sizes between studies. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional approaches for discovery of gene-by-environment interactions, which treats the gene-by-environment interactions as covariates in the analysis. We provide a intuitive way to visualize the results of the meta-analysis at a locus which allows us to obtain the biological insights of gene-by-environment interactions. We demonstrate our method by searching for gene-by-environment interactions by combining 17 mouse genetic studies totaling 4,965 distinct animals.}, keywords = {Genes By Environment, Meta-Analysis, Mouse Genetics}, pubstate = {published}, tppubtype = {article} } Author Summary Identifying gene-by-environment interactions is important for understand the architecture of a complex trait. Discovering gene-by-environment interaction requires the observation of the same phenotype in individuals under different environments. Model organism studies are often conducted under different environments. These studies provide an unprecedented opportunity for researchers to identify the gene-by-environment interactions. A difference in the effect size of a genetic variant between two studies conducted in different environments may suggest the presence of a gene-by-environment interaction. In this paper, we propose to employ a random-effect-based meta-analysis approach to identify gene-by-environment interaction, which assumes different or heterogeneous effect sizes between studies. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional approaches for discovery of gene-by-environment interactions, which treats the gene-by-environment interactions as covariates in the analysis. We provide a intuitive way to visualize the results of the meta-analysis at a locus which allows us to obtain the biological insights of gene-by-environment interactions. We demonstrate our method by searching for gene-by-environment interactions by combining 17 mouse genetic studies totaling 4,965 distinct animals. |
He, Dan; Furlotte, Nicholas A; Hormozdiari, Farhad ; Joo, Jong Wha J; Wadia, Akshay ; Ostrovsky, Rafail ; Sahai, Amit ; Eskin, Eleazar Identifying genetic relatives without compromising privacy. Journal Article Genome Res, 2014, ISSN: 1549-5469. Abstract | Links | BibTeX | Tags: Genomic Privacy @article{He:GenomeRes:2014, title = {Identifying genetic relatives without compromising privacy.}, author = { Dan He and Nicholas A. Furlotte and Farhad Hormozdiari and Jong Wha J. Joo and Akshay Wadia and Rafail Ostrovsky and Amit Sahai and Eleazar Eskin}, url = {http://dx.doi.org/10.1101/gr.153346.112}, issn = {1549-5469}, year = {2014}, date = {2014-01-01}, journal = {Genome Res}, abstract = {The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual\'s genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy}, keywords = {Genomic Privacy}, pubstate = {published}, tppubtype = {article} } The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual's genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy |
Mangul, Serghei; Wu, Nicholas C; Mancuso, Nicholas; Zelikovsky, Alex; Sun, Ren; Eskin, Eleazar Accurate viral population assembly from ultra-deep sequencing data. Journal Article Bioinformatics, 30 (12), pp. i329-i337, 2014, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Virus Assembly @article{Mangul:Bioinformatics:2014b, title = {Accurate viral population assembly from ultra-deep sequencing data.}, author = { Serghei Mangul and Nicholas C. Wu and Nicholas Mancuso and Alex Zelikovsky and Ren Sun and Eleazar Eskin}, url = {https://www.ncbi.nlm.nih.gov/pubmed/24932001}, issn = {1367-4811}, year = {2014}, date = {2014-01-01}, journal = {Bioinformatics}, volume = {30}, number = {12}, pages = {i329-i337}, address = {England}, abstract = {UNLABELLED: Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/ CONTACT: serghei@cs.ucla.edu; eeskin@cs.ucla.edu}, keywords = {Virus Assembly}, pubstate = {published}, tppubtype = {article} } UNLABELLED: Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/ CONTACT: serghei@cs.ucla.edu; eeskin@cs.ucla.edu |
Yang, Wen-Yun - Y; Hormozdiari, Farhad ; Eskin, Eleazar ; Pasaniuc, Bogdan A Spatial-Aware Haplotype Copying Model with Applications to Genotype Imputation Book Chapter Research in Computational Molecular Biology, pp. 371-384, Springer International Publishing, 2014. Abstract | Links | BibTeX | Tags: Spatial Population Structure @inbook{Yang:ResearchInComputationalMolecularBiology:2014, title = {A Spatial-Aware Haplotype Copying Model with Applications to Genotype Imputation}, author = { Wen-Yun -. Y. Yang and Farhad Hormozdiari and Eleazar Eskin and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1007/978-3-319-05269-4_30}, year = {2014}, date = {2014-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {371-384}, publisher = {Springer International Publishing}, organization = {University of California Los Angeles}, abstract = {Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.}, keywords = {Spatial Population Structure}, pubstate = {published}, tppubtype = {inbook} } Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data. |
Hasin-Brumshtein, Yehudit; Hormozdiari, Farhad ; Martin, Lisa ; van Nas, Atila ; Eskin, Eleazar ; Lusis, Aldons J; Drake, Thomas A Allele-specific expression and eQTL analysis in mouse adipose tissue. Journal Article BMC Genomics, 15 (1), pp. 471, 2014, ISSN: 1471-2164. Abstract | Links | BibTeX | Tags: Allele Specific Expression @article{HasinBrumshtein:BmcGenomics:2014, title = {Allele-specific expression and eQTL analysis in mouse adipose tissue.}, author = { Yehudit Hasin-Brumshtein and Farhad Hormozdiari and Lisa Martin and Atila van Nas and Eleazar Eskin and Aldons J. Lusis and Thomas A. Drake}, url = {http://dx.doi.org/10.1186/1471-2164-15-471}, issn = {1471-2164}, year = {2014}, date = {2014-01-01}, journal = {BMC Genomics}, volume = {15}, number = {1}, pages = {471}, abstract = {BACKGROUND: The simplest definition of cis-eQTLs versus trans, refers to genetic variants that affect expression in an allele specific manner, with implications on underlying mechanism. Yet, due to technical limitations of expression microarrays, the vast majority of eQTL studies performed in the last decade used a genomic distance based definition as a surrogate for cis, therefore exploring local rather than cis-eQTLs. RESULTS: In this study we use RNAseq to explore allele specific expression (ASE) in adipose tissue of male and female F1 mice, produced from reciprocal crosses of C57BL/6J and DBA/2J strains. Comparison of the identified cis-eQTLs, to local-eQTLs, that were obtained from adipose tissue expression in two previous population based studies in our laboratory, yields poor overlap between the two mapping approaches, while both local-eQTL studies show highly concordant results. Specifically, local-eQTL studies show ~60% overlap between themselves, while only 15-20% of local-eQTLs are identified as cis by ASE, and less than 50% of ASE genes are recovered in local-eQTL studies. Utilizing recently published ENCODE data, we also find that ASE genes show significant bias for SNPs prevalence in DNase I hypersensitive sites that is ASE direction specific. CONCLUSIONS: We suggest a new approach to analysis of allele specific expression that is more sensitive and accurate than the commonly used fisher or chi-square statistics. Our analysis indicates that technical differences between the cis and local-eQTL approaches, such as differences in genomic background or sex specificity, account for relatively small fraction of the discrepancy. Therefore, we suggest that the differences between two eQTL mapping approaches may facilitate sorting of SNP-eQTL interactions into true cis and trans, and that a considerable portion of local-eQTL may actually represent trans interactions}, keywords = {Allele Specific Expression}, pubstate = {published}, tppubtype = {article} } BACKGROUND: The simplest definition of cis-eQTLs versus trans, refers to genetic variants that affect expression in an allele specific manner, with implications on underlying mechanism. Yet, due to technical limitations of expression microarrays, the vast majority of eQTL studies performed in the last decade used a genomic distance based definition as a surrogate for cis, therefore exploring local rather than cis-eQTLs. RESULTS: In this study we use RNAseq to explore allele specific expression (ASE) in adipose tissue of male and female F1 mice, produced from reciprocal crosses of C57BL/6J and DBA/2J strains. Comparison of the identified cis-eQTLs, to local-eQTLs, that were obtained from adipose tissue expression in two previous population based studies in our laboratory, yields poor overlap between the two mapping approaches, while both local-eQTL studies show highly concordant results. Specifically, local-eQTL studies show ~60% overlap between themselves, while only 15-20% of local-eQTLs are identified as cis by ASE, and less than 50% of ASE genes are recovered in local-eQTL studies. Utilizing recently published ENCODE data, we also find that ASE genes show significant bias for SNPs prevalence in DNase I hypersensitive sites that is ASE direction specific. CONCLUSIONS: We suggest a new approach to analysis of allele specific expression that is more sensitive and accurate than the commonly used fisher or chi-square statistics. Our analysis indicates that technical differences between the cis and local-eQTL approaches, such as differences in genomic background or sex specificity, account for relatively small fraction of the discrepancy. Therefore, we suggest that the differences between two eQTL mapping approaches may facilitate sorting of SNP-eQTL interactions into true cis and trans, and that a considerable portion of local-eQTL may actually represent trans interactions |
Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914. Abstract | Links | BibTeX | Tags: Confounding, eQTL Confounding, Expression QTLs @article{Joo:GenomeBiol:2014, title = {Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies.}, author = { Jong Wha J. Joo and Jae Hoon Sul and Buhm Han and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/gb-2014-15-4-r61}, issn = {1465-6914}, year = {2014}, date = {2014-01-01}, journal = {Genome Biol}, volume = {15}, number = {4}, pages = {R61}, abstract = {Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods}, keywords = {Confounding, eQTL Confounding, Expression QTLs}, pubstate = {published}, tppubtype = {article} } Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods |
Wang, Zhanyong; Sul, Jae Hoon ; Snir, Sagi ; Lozano, Jose A; Eskin, Eleazar Gene-Gene Interactions Detection Using a Two-Stage Model Book Chapter Research in Computational Molecular Biology, pp. 340-355, Springer International Publishing, 2014. Abstract | Links | BibTeX | Tags: Gene-Gene Interactions @inbook{Wang:ResearchInComputationalMolecularBiology:2014, title = {Gene-Gene Interactions Detection Using a Two-Stage Model}, author = { Zhanyong Wang and Jae Hoon Sul and Sagi Snir and Jose A. Lozano and Eleazar Eskin}, url = {http://dx.doi.org/10.1007/978-3-319-05269-4_28}, year = {2014}, date = {2014-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {340-355}, publisher = {Springer International Publishing}, organization = {University of California Los Angeles}, abstract = {Genome wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphism (SNP) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach.}, keywords = {Gene-Gene Interactions}, pubstate = {published}, tppubtype = {inbook} } Genome wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphism (SNP) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach. |
Hormozdiari, Farhad; Joo, Jong Wha J; Wadia, Akshay ; Guan, Feng ; Ostrosky, Rafail ; Sahai, Amit ; Eskin, Eleazar Privacy preserving protocol for detecting genetic relatives using rare variants. Journal Article Bioinformatics, 30 (12), pp. i204-i211, 2014, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Genomic Privacy @article{Hormozdiari:Bioinformatics:2014, title = {Privacy preserving protocol for detecting genetic relatives using rare variants.}, author = { Farhad Hormozdiari and Jong Wha J. Joo and Akshay Wadia and Feng Guan and Rafail Ostrosky and Amit Sahai and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btu294}, issn = {1367-4811}, year = {2014}, date = {2014-01-01}, journal = {Bioinformatics}, volume = {30}, number = {12}, pages = {i204-i211}, address = {England}, abstract = {MOTIVATION: High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. RESULTS: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals. Availability: The software is freely available for download at http://genetics.cs.ucla.edu/crypto/. CONTACT: fhormoz@cs.ucla.edu or eeskin@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online}, keywords = {Genomic Privacy}, pubstate = {published}, tppubtype = {article} } MOTIVATION: High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. RESULTS: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals. Availability: The software is freely available for download at http://genetics.cs.ucla.edu/crypto/. CONTACT: fhormoz@cs.ucla.edu or eeskin@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online |
Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar Identifying causal variants at Loci with multiple signals of association. Journal Article Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Hormozdiari:Genetics:2014, title = {Identifying causal variants at Loci with multiple signals of association.}, author = { Farhad Hormozdiari and Emrah Kostem and Eun Yong Kang and Bogdan Pasaniuc and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.114.167908}, issn = {1943-2631}, year = {2014}, date = {2014-01-01}, journal = {Genetics}, volume = {198}, number = {2}, pages = {497-508}, address = {United States}, abstract = {Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/ |
Kichaev, Gleb; Yang, Wen-Yun Y; Lindstrom, Sara ; Hormozdiari, Farhad ; Eskin, Eleazar ; Price, Alkes L; Kraft, Peter ; Pasaniuc, Bogdan Integrating functional data to prioritize causal variants in statistical fine-mapping studies. Journal Article PLoS Genet, 10 (10), pp. e1004722, 2014, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Fine Mapping @article{Kichaev:PlosGenet:2014b, title = {Integrating functional data to prioritize causal variants in statistical fine-mapping studies.}, author = { Gleb Kichaev and Wen-Yun Y. Yang and Sara Lindstrom and Farhad Hormozdiari and Eleazar Eskin and Alkes L. Price and Peter Kraft and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1371/journal.pgen.1004722}, issn = {1553-7404}, year = {2014}, date = {2014-01-01}, journal = {PLoS Genet}, volume = {10}, number = {10}, pages = {e1004722}, address = {United States}, abstract = {Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data}, keywords = {Fine Mapping}, pubstate = {published}, tppubtype = {article} } Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data |
Zhang, Kuixing; Huentelman, Matthew J; Rao, Fangwen; Sun, Eric I; Corneveaux, Jason J; Schork, Andrew J; Wei, Zhiyun; Waalen, Jill; Miramontes-Gonzalez, Jose Pablo; Hightower, Makena C; Maihofer, Adam X; Mahata, Manjula; Pastinen, Tomi; Ehret, Georg B; Schork, Nicholas J; Eskin, Eleazar; Nievergelt, Caroline M; Saier, Milton H; O'Connor, Daniel T Genetic implication of a novel thiamine transporter in human hypertension. Journal Article J Am Coll Cardiol, 2014, ISSN: 1558-3597. Abstract | Links | BibTeX | Tags: genome-wide pooling, trait extremes @article{Zhang:JAmCollCardiol:2014, title = {Genetic implication of a novel thiamine transporter in human hypertension.}, author = { Kuixing Zhang and Matthew J. Huentelman and Fangwen Rao and Eric I. Sun and Jason J. Corneveaux and Andrew J. Schork and Zhiyun Wei and Jill Waalen and Jose Pablo Miramontes-Gonzalez and C. Makena Hightower and Adam X. Maihofer and Manjula Mahata and Tomi Pastinen and Georg B. Ehret and Nicholas J. Schork and Eleazar Eskin and Caroline M. Nievergelt and Milton H. Saier and Daniel T. O'Connor}, url = {http://dx.doi.org/10.1016/j.jacc.2014.01.007}, issn = {1558-3597}, year = {2014}, date = {2014-01-01}, journal = {J Am Coll Cardiol}, abstract = {OBJECTIVES: We coupled two strategies - trait extremes and genome-wide pooling - to discover a novel BP locus that encodes a previously uncharacterized thiamine transporter. BACKGROUND: Hypertension is a heritable trait that remains the most potent and widespread cardiovascular risk factor, though details of its genetic determination are poorly understood. Methods. Representative genomic DNA pools were created from male and female subjects in the highest and lowest 5(th) %iles of BP in a primary care population of >50,000 individuals. The peak associated SNPs were typed in individual DNA samples, as well as twins/siblings phenotyped for cardiovascular and autonomic traits. Biochemical properties of the associated transporter were evaluated in cellular assays. RESULTS: After chip hybridization and calculation of relative allele scores, the peak associations were typed in individual samples, revealing association of hypertension, SBP, and DBP to the previously uncharacterized solute carrier SLC35F3. The BP genetic association at SLC35F3 was validated by meta-analysis in an independent sample from the original source population, as well as the ICBP (across North America and Western Europe). Sequence homology to a putative yeast thiamine (vitamin B1) transporter prompted us to express human SLC35F3 in E. coli, which catalyzed [(3)H]-thiamine uptake. SLC35F3 risk allele (T/T) homozygotes displayed decreased erythrocyte thiamine content on microbiological assay. In twin pairs, the SLC35F3 risk allele predicted heritable cardiovascular traits previously associated with thiamine deficiency, including elevated cardiac stroke volume with decreased vascular resistance, and elevated pressor responses to environmental (cold) stress. Allelic expression imbalance (AEI) confirmed that cis-variation at the human SLC35F3 locus influenced expression of that gene, and the AEI peak coincided with the hypertension peak. CONCLUSIONS: Novel strategies were coupled to position a new hypertension susceptibility locus, uncovering a previously unsuspected thiamine transporter whose genetic variants predicted several disturbances in cardiac and autonomic function. The results have implications for the pathogenesis and treatment of systemic hypertension}, keywords = {genome-wide pooling, trait extremes}, pubstate = {published}, tppubtype = {article} } OBJECTIVES: We coupled two strategies - trait extremes and genome-wide pooling - to discover a novel BP locus that encodes a previously uncharacterized thiamine transporter. BACKGROUND: Hypertension is a heritable trait that remains the most potent and widespread cardiovascular risk factor, though details of its genetic determination are poorly understood. Methods. Representative genomic DNA pools were created from male and female subjects in the highest and lowest 5(th) %iles of BP in a primary care population of >50,000 individuals. The peak associated SNPs were typed in individual DNA samples, as well as twins/siblings phenotyped for cardiovascular and autonomic traits. Biochemical properties of the associated transporter were evaluated in cellular assays. RESULTS: After chip hybridization and calculation of relative allele scores, the peak associations were typed in individual samples, revealing association of hypertension, SBP, and DBP to the previously uncharacterized solute carrier SLC35F3. The BP genetic association at SLC35F3 was validated by meta-analysis in an independent sample from the original source population, as well as the ICBP (across North America and Western Europe). Sequence homology to a putative yeast thiamine (vitamin B1) transporter prompted us to express human SLC35F3 in E. coli, which catalyzed [(3)H]-thiamine uptake. SLC35F3 risk allele (T/T) homozygotes displayed decreased erythrocyte thiamine content on microbiological assay. In twin pairs, the SLC35F3 risk allele predicted heritable cardiovascular traits previously associated with thiamine deficiency, including elevated cardiac stroke volume with decreased vascular resistance, and elevated pressor responses to environmental (cold) stress. Allelic expression imbalance (AEI) confirmed that cis-variation at the human SLC35F3 locus influenced expression of that gene, and the AEI peak coincided with the hypertension peak. CONCLUSIONS: Novel strategies were coupled to position a new hypertension susceptibility locus, uncovering a previously unsuspected thiamine transporter whose genetic variants predicted several disturbances in cardiac and autonomic function. The results have implications for the pathogenesis and treatment of systemic hypertension |
Yang, Wen-Yun Y; Platt, Alexander; Chiang, Charleston Wen-Kai; Eskin, Eleazar; Novembre, John; Pasaniuc, Bogdan Spatial localization of recent ancestors for admixed individuals. Journal Article G3 (Bethesda), 4 (12), pp. 2505-18, 2014, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: Spatial Population Structure @article{Yang:G3:2014, title = {Spatial localization of recent ancestors for admixed individuals.}, author = { Wen-Yun Y. Yang and Alexander Platt and Charleston Wen-Kai Chiang and Eleazar Eskin and John Novembre and Bogdan Pasaniuc}, url = {http://dx.doi.org/10.1534/g3.114.014274}, issn = {2160-1836}, year = {2014}, date = {2014-01-01}, journal = {G3 (Bethesda)}, volume = {4}, number = {12}, pages = {2505-18}, address = {United States}, abstract = {Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over nonmodel-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g., grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods by using empirical data from individuals with mixed European ancestry from the Population Reference Sample study and show that our approach is able to localize their recent ancestors within an average of 470 km of the reported locations of their grandparents. Furthermore, simulations from real Population Reference Sample genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550 km from their true location for localization of two ancestries in Europe, four generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors}, keywords = {Spatial Population Structure}, pubstate = {published}, tppubtype = {article} } Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over nonmodel-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g., grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods by using empirical data from individuals with mixed European ancestry from the Population Reference Sample study and show that our approach is able to localize their recent ancestors within an average of 470 km of the reported locations of their grandparents. Furthermore, simulations from real Population Reference Sample genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550 km from their true location for localization of two ancestries in Europe, four generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors |
Hormozdiari, Farhad; Eskin, Eleazar Memory efficient assembly of human genome. Journal Article J Bioinform Comput Biol, pp. 1550008, 2014, ISSN: 1757-6334. Abstract | Links | BibTeX | Tags: Sequence Assembly @article{Hormozdiari:JBioinformComputBiol:2014, title = {Memory efficient assembly of human genome.}, author = {Farhad Hormozdiari and Eleazar Eskin}, url = {http://dx.doi.org/10.1142/S0219720015500080}, issn = {1757-6334}, year = {2014}, date = {2014-01-01}, journal = {J Bioinform Comput Biol}, pages = {1550008}, abstract = {The ability to detect the genetic variations between two individuals is an essential component for genetic studies. In these studies, obtaining the genome sequence of both individuals is the first step toward variation detection problem. The emergence of high-throughput sequencing (HTS) technology has made DNA sequencing practical, and is widely used by diagnosticians to increase their knowledge about the casual factor in genetic related diseases. As HTS advances, more data are generated every day than the amount that scientists can process. Genome assembly is one of the existing methods to tackle the variation detection problem. The de Bruijn graph formulation of the assembly problem is widely used in the field. Furthermore, it is the only method which can assemble any genome in linear time. However, it requires an enormous amount of memory in order to assemble any mammalian size genome. The high demands of sequencing more individuals and the urge to assemble them are the driving forces for a memory efficient assembler. In this work, we propose a novel method which builds the de Bruijn graph while consuming lower memory. Moreover, our proposed method can reduce the memory usage by 37% compared to the existing methods. In addition, we used a real data set (chromosome 17 of A/J strain) to illustrate the performance of our method}, keywords = {Sequence Assembly}, pubstate = {published}, tppubtype = {article} } The ability to detect the genetic variations between two individuals is an essential component for genetic studies. In these studies, obtaining the genome sequence of both individuals is the first step toward variation detection problem. The emergence of high-throughput sequencing (HTS) technology has made DNA sequencing practical, and is widely used by diagnosticians to increase their knowledge about the casual factor in genetic related diseases. As HTS advances, more data are generated every day than the amount that scientists can process. Genome assembly is one of the existing methods to tackle the variation detection problem. The de Bruijn graph formulation of the assembly problem is widely used in the field. Furthermore, it is the only method which can assemble any genome in linear time. However, it requires an enormous amount of memory in order to assemble any mammalian size genome. The high demands of sequencing more individuals and the urge to assemble them are the driving forces for a memory efficient assembler. In this work, we propose a novel method which builds the de Bruijn graph while consuming lower memory. Moreover, our proposed method can reduce the memory usage by 37% compared to the existing methods. In addition, we used a real data set (chromosome 17 of A/J strain) to illustrate the performance of our method |
2013 |
Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404. Abstract | Links | BibTeX | Tags: Expression QTLs, Meta-Analysis, Mixed Models, Multiple Phenotypes @article{10.1371/journal.pgen.1003491, title = {Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches}, author = { Jae Hoon Sul and Buhm Han and Chun Ye and Ted Choi and Eleazar Eskin}, url = {http://dx.doi.org/10.1371%2Fjournal.pgen.1003491}, issn = {1553-7404}, year = {2013}, date = {2013-01-01}, journal = {PLoS Genet}, volume = {9}, number = {6}, pages = {e1003491}, publisher = {Public Library of Science}, address = {United States}, abstract = {Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data.}, keywords = {Expression QTLs, Meta-Analysis, Mixed Models, Multiple Phenotypes}, pubstate = {published}, tppubtype = {article} } Author Summary The combination of gene expression and genetic variation data has enabled the identification of genetic variants that affect gene expression levels. It has been shown that some variants influence gene expression in only one tissue while others influence gene expression in multiple tissues. However, an analysis of multiple tissue data using traditional statistical methods typically fails to identify those variants that affect multiple tissues because each tissue is treated independently and due to low statistical power, the effect in a given tissue may be missed. Building on recent advances in statistical methods for meta-analysis and mixed models, we present a novel method that combines information from multiple tissues to identify genetic variation that affects multiple tissues. We show that our method detects more genetic variation that influences multiple tissues than traditional statistical methods both on simulated and real data. |
Yang, Wen-Yun Y; Hormozdiari, Farhad ; Wang, Zhanyong ; He, Dan ; Pasaniuc, Bogdan ; Eskin, Eleazar Leveraging Multi-SNP Reads from Sequencing Data for Haplotype Inference. Journal Article Bioinformatics, 2013, ISSN: 1367-4811. Abstract | Links | BibTeX | Tags: Haplotype Phasing, Haplotyping from Sequences, Imputation @article{Yang:Bioinformatics:2013, title = {Leveraging Multi-SNP Reads from Sequencing Data for Haplotype Inference.}, author = { Wen-Yun Y. Yang and Farhad Hormozdiari and Zhanyong Wang and Dan He and Bogdan Pasaniuc and Eleazar Eskin}, url = {http://dx.doi.org/10.1093/bioinformatics/btt386}, issn = {1367-4811}, year = {2013}, date = {2013-01-01}, journal = {Bioinformatics}, organization = {Department of Computer Science,University of California, Los Angeles.}, abstract = {MOTIVATION: Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. Since experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on Hidden Markov Models (HMMs), with the underlying assumption that the haplotypes of a given individual can be represented as a mosaic of segments from other haplotypes in the same population. Such algorithms utilize this model to predict the most likely haplotypes that explain the observed genotype data conditional on reference panel of haplotypes. With rapid advances in short read sequencing technologies, sequencing is quickly establishing as a powerful approach for collecting genetic variation information. As opposed to traditional genotyping-array technologies that independently calls genotypes at polymorphic sites, short read sequencing often collects haplotypic information; a read spanning more than one polymorphic locus (multi-SNP read) contains information on the haplotype from which the read originates. However, this information is generally ignored in existing approaches for haplotype phasing and genotype-calling from short read data. RESULTS: In this paper, we propose a novel framework for haplotype inference from short read sequencing that leverages multi-SNP reads together with a reference panel of haplotypes. The basis of our approach is a new probabilistic model that finds the most likely haplotype segments from the reference panel to explain the short read sequencing data for a given individual. We devised an efficient sampling method within a probabilistic model to achieve superior performance than existing methods. Using simulated sequencing reads from real individual genotypes in the HapMap data and the 1000 Genomes projects, we show that our method is highly accurate and computationally efficient. Our haplotype predictions improve accuracy over the basic haplotype copying model by around 20% with comparable computational time, and over another recently proposed approach Hap-SeqX by around 10% with significantly reduced computational time and memory usage. AVAILABILITY: Publicly available software is available at http://genetics.cs.ucla.edu/harsh CONTACT: bpasaniuc@mednet.ucla.edu; eeskin@cs.ucla.edu}, keywords = {Haplotype Phasing, Haplotyping from Sequences, Imputation}, pubstate = {published}, tppubtype = {article} } MOTIVATION: Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. Since experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on Hidden Markov Models (HMMs), with the underlying assumption that the haplotypes of a given individual can be represented as a mosaic of segments from other haplotypes in the same population. Such algorithms utilize this model to predict the most likely haplotypes that explain the observed genotype data conditional on reference panel of haplotypes. With rapid advances in short read sequencing technologies, sequencing is quickly establishing as a powerful approach for collecting genetic variation information. As opposed to traditional genotyping-array technologies that independently calls genotypes at polymorphic sites, short read sequencing often collects haplotypic information; a read spanning more than one polymorphic locus (multi-SNP read) contains information on the haplotype from which the read originates. However, this information is generally ignored in existing approaches for haplotype phasing and genotype-calling from short read data. RESULTS: In this paper, we propose a novel framework for haplotype inference from short read sequencing that leverages multi-SNP reads together with a reference panel of haplotypes. The basis of our approach is a new probabilistic model that finds the most likely haplotype segments from the reference panel to explain the short read sequencing data for a given individual. We devised an efficient sampling method within a probabilistic model to achieve superior performance than existing methods. Using simulated sequencing reads from real individual genotypes in the HapMap data and the 1000 Genomes projects, we show that our method is highly accurate and computationally efficient. Our haplotype predictions improve accuracy over the basic haplotype copying model by around 20% with comparable computational time, and over another recently proposed approach Hap-SeqX by around 10% with significantly reduced computational time and memory usage. AVAILABILITY: Publicly available software is available at http://genetics.cs.ucla.edu/harsh CONTACT: bpasaniuc@mednet.ucla.edu; eeskin@cs.ucla.edu |
Lagarrigue, Sandrine; Martin, Lisa J; Hormozdiari, Farhad; Roux, Pierre-François F; Pan, Calvin; van Nas, Atila; Demeure, Olivier; Cantor, Rita; Ghazalpour, Anatole; Eskin, Eleazar; Lusis, Aldons J Analysis of Allele Specific Expression in Mouse Liver by RNA-Seq: A Comparison with Journal Article Genetics, 2013, ISSN: 1943-2631. Abstract | Links | BibTeX | Tags: Allele Specific Expression @article{Lagarrigue:Genetics:2013b, title = {Analysis of Allele Specific Expression in Mouse Liver by RNA-Seq: A Comparison with }, author = {Sandrine Lagarrigue and Lisa J. Martin and Farhad Hormozdiari and Pierre-François F. Roux and Calvin Pan and Atila van Nas and Olivier Demeure and Rita Cantor and Anatole Ghazalpour and Eleazar Eskin and Aldons J. Lusis}, url = {http://dx.doi.org/10.1534/genetics.113.153882}, issn = {1943-2631}, year = {2013}, date = {2013-01-01}, journal = {Genetics}, organization = {INRA;}, abstract = {We report an analysis of allele specific expression (ASE) and parent-of-origin expression in adult mouse liver using next generation sequencing (RNA-Seq) of reciprocal crosses of heterozygous F1 mice from the parental strains C57BL/6J and DBA/2J. We found a 60% overlap between genes exhibiting ASE and putative cis-acting expression quantitative trait loci (cis-eQTL) identified in an intercross between the same strains. We discuss the various biological and technical factors that contribute to the differences. We also identify genes exhibiting parental imprinting and complex expression patterns. Our study demonstrates the importance of biological replicates to limit the number of false positives with RNA-Seq data}, keywords = {Allele Specific Expression}, pubstate = {published}, tppubtype = {article} } We report an analysis of allele specific expression (ASE) and parent-of-origin expression in adult mouse liver using next generation sequencing (RNA-Seq) of reciprocal crosses of heterozygous F1 mice from the parental strains C57BL/6J and DBA/2J. We found a 60% overlap between genes exhibiting ASE and putative cis-acting expression quantitative trait loci (cis-eQTL) identified in an intercross between the same strains. We discuss the various biological and technical factors that contribute to the differences. We also identify genes exhibiting parental imprinting and complex expression patterns. Our study demonstrates the importance of biological replicates to limit the number of false positives with RNA-Seq data |
Eskin, Itamar; Hormozdiari, Farhad; Conde, Lucia; Riby, Jacques; Skibola, Chris; Eskin, Eleazar; Halperin, Eran eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data. Journal Article J Comput Biol, 2013, ISSN: 1557-8666. Abstract | Links | BibTeX | Tags: Sequencing with Pools @article{Eskin:JComputBiol:2013, title = {eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data.}, author = {Itamar Eskin and Farhad Hormozdiari and Lucia Conde and Jacques Riby and Chris Skibola and Eleazar Eskin and Eran Halperin}, url = {http://dx.doi.org/10.1089/cmb.2013.0105}, issn = {1557-8666}, year = {2013}, date = {2013-01-01}, journal = {J Comput Biol}, organization = {1 The Blavatnik School of Computer Science, Tel-Aviv University , Tel Aviv, Israel .}, abstract = {Abstract The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainty of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of non-Hodgkin's lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR) and is particularly suitable for metagenomic quantification of closely related species}, keywords = {Sequencing with Pools}, pubstate = {published}, tppubtype = {article} } Abstract The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainty of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of non-Hodgkin's lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR) and is particularly suitable for metagenomic quantification of closely related species |
Marsden, Clare Diana; Lee, Yoosook; Kreppel, Katharina; Weakley, Allison; Cornel, Anthony; Ferguson, Heather M; Eskin, Eleazar; Lanzaro, Gregory C Diversity, Differentiation and Linkage Disequilibrium: Prospects for Association Mapping in the Malaria Vector, Anopheles arabiensis. Journal Article G3 (Bethesda), 2013, ISSN: 2160-1836. Abstract | Links | BibTeX | Tags: association mapping, population structure @article{Marsden:G3:2013, title = {Diversity, Differentiation and Linkage Disequilibrium: Prospects for Association Mapping in the Malaria Vector, Anopheles arabiensis.}, author = { Clare Diana Marsden and Yoosook Lee and Katharina Kreppel and Allison Weakley and Anthony Cornel and Heather M. Ferguson and Eleazar Eskin and Gregory C. Lanzaro}, url = {http://dx.doi.org/10.1534/g3.113.008326}, issn = {2160-1836}, year = {2013}, date = {2013-01-01}, journal = {G3 (Bethesda)}, organization = {University of California, Davis.}, abstract = {Association mapping is a widely applied method for elucidating the genetic basis of phenotypic traits. However, factors such as linkage disequilibrium and levels of genetic diversity influence the power and resolution of this approach. Moreover, the presence of population sub-division among samples can result in spurious associations if not accounted for. As such it is useful to have a detailed understanding of these factors prior to conducting association mapping experiments. Here we conducted whole genome sequencing on 24 specimens of the malaria mosquito vector, Anopheles arabiensis, to further understanding of patterns of genetic diversity, population sub-division and linkage disequilibrium in this species. We found high levels of genetic diversity within the An. arabiensis genome, with ~800,000 high confidence single nucleotide polymorphisms detected. However, levels of nucleotide diversity varied significantly both within and between chromosomes. We observed lower diversity on the X chromosome, within some inversions, and near centromeres. Population structure was absent at the local scale (Kilombero Valley, Tanzania) but detected between distant populations (Cameroon vs. Tanzania) where differentiation was largely restricted to certain autosomal chromosomal inversions such as 2Rb. Overall, linkage disequilibrium within An. arabiensis decayed very rapidly (within 200bp) across all chromosomes. However, elevated linkage disequilibrium was observed within some inversions, suggesting that recombination is reduced in those regions. The overall low levels of linkage disequilibrium suggests that association studies in this taxon will be very challenging for all but variants of large effect, and will require large sample sizes}, keywords = {association mapping, population structure}, pubstate = {published}, tppubtype = {article} } Association mapping is a widely applied method for elucidating the genetic basis of phenotypic traits. However, factors such as linkage disequilibrium and levels of genetic diversity influence the power and resolution of this approach. Moreover, the presence of population sub-division among samples can result in spurious associations if not accounted for. As such it is useful to have a detailed understanding of these factors prior to conducting association mapping experiments. Here we conducted whole genome sequencing on 24 specimens of the malaria mosquito vector, Anopheles arabiensis, to further understanding of patterns of genetic diversity, population sub-division and linkage disequilibrium in this species. We found high levels of genetic diversity within the An. arabiensis genome, with ~800,000 high confidence single nucleotide polymorphisms detected. However, levels of nucleotide diversity varied significantly both within and between chromosomes. We observed lower diversity on the X chromosome, within some inversions, and near centromeres. Population structure was absent at the local scale (Kilombero Valley, Tanzania) but detected between distant populations (Cameroon vs. Tanzania) where differentiation was largely restricted to certain autosomal chromosomal inversions such as 2Rb. Overall, linkage disequilibrium within An. arabiensis decayed very rapidly (within 200bp) across all chromosomes. However, elevated linkage disequilibrium was observed within some inversions, suggesting that recombination is reduced in those regions. The overall low levels of linkage disequilibrium suggests that association studies in this taxon will be very challenging for all but variants of large effect, and will require large sample sizes |
Eskin, Itamar; Hormozdiari, Farhad ; Conde, Lucia ; Riby, Jacques ; Skibola, Chris ; Eskin, Eleazar ; Halperin, Eran eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data Conference Research in Computational Molecular Biology, Tel-Aviv University Springer Berlin Heidelberg, 2013. Abstract | Links | BibTeX | Tags: Sequencing with Pools @conference{Eskin:ResearchInComputationalMolecularBiology:2013, title = {eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data}, author = { Itamar Eskin and Farhad Hormozdiari and Lucia Conde and Jacques Riby and Chris Skibola and Eleazar Eskin and Eran Halperin}, url = {http://dx.doi.org/10.1007/978-3-642-37195-0_4}, year = {2013}, date = {2013-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {32-44}, publisher = {Springer Berlin Heidelberg}, organization = {Tel-Aviv University}, abstract = {The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkins Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species.}, keywords = {Sequencing with Pools}, pubstate = {published}, tppubtype = {conference} } The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkins Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species. |