
Multiple testing correction in linear mixed models. Journal Article In: Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X. |
UCLA Computational Genetics
Joo, Jong Wha J; Hormozdiari, Farhad; Han, Buhm; Eskin, Eleazar Multiple testing correction in linear mixed models. Journal Article In: Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X. @article{Joo:GenomeBiol:2016, title = {Multiple testing correction in linear mixed models.}, author = {Jong Wha J. Joo and Farhad Hormozdiari and Buhm Han and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/s13059-016-0903-6}, issn = {1474-760X}, year = {2016}, date = {2016-01-01}, journal = {Genome Biol}, volume = {17}, number = {1}, pages = {62}, address = {England}, abstract = {BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data}, keywords = {}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data |
Joo, Jong Wha J; Hormozdiari, Farhad; Han, Buhm; Eskin, Eleazar Multiple testing correction in linear mixed models. Journal Article In: Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X. @article{Joo:GenomeBiol:2016, title = {Multiple testing correction in linear mixed models.}, author = {Jong Wha J. Joo and Farhad Hormozdiari and Buhm Han and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/s13059-016-0903-6}, issn = {1474-760X}, year = {2016}, date = {2016-01-01}, journal = {Genome Biol}, volume = {17}, number = {1}, pages = {62}, address = {England}, abstract = {BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data}, keywords = {}, pubstate = {published}, tppubtype = {article} } BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data |
Joo, Jong Wha J; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Lusis, Aldons J; Eskin, Eleazar In: Research in Computational Molecular Biology, pp. 136-153, Springer International Publishing, 2015. @inbook{Joo:ResearchInComputationalMolecularBiology:2015b, title = {Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure}, author = {Jong Wha J. Joo and Eun Yong Kang and Elin Org and Nick Furlotte and Brian Parks and Aldons J. Lusis and Eleazar Eskin}, url = {http://dx.doi.org/10.1007/978-3-319-16706-0_15}, year = {2015}, date = {2015-01-01}, booktitle = {Research in Computational Molecular Biology}, pages = {136-153}, publisher = {Springer International Publishing}, organization = {University of California}, abstract = {A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.}, keywords = {}, pubstate = {published}, tppubtype = {inbook} } A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism. |
Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article In: Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914. @article{Joo:GenomeBiol:2014, title = {Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies.}, author = { Jong Wha J. Joo and Jae Hoon Sul and Buhm Han and Chun Ye and Eleazar Eskin}, url = {http://dx.doi.org/10.1186/gb-2014-15-4-r61}, issn = {1465-6914}, year = {2014}, date = {2014-01-01}, journal = {Genome Biol}, volume = {15}, number = {4}, pages = {R61}, abstract = {Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods}, keywords = {}, pubstate = {published}, tppubtype = {article} } Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods |
Studies carried out over the last decade have revealed that gut microbiota contribute to a variety of common disorders, including obesity and diabetes (Musso et al. 2011), colitis (Devkota et al. 2012), atherosclerosis (Wang et al. 2011), rheumatoid arthritis (Vaahtovuo et al. 2008), and cancer (Yoshimoto et al. 2013). The evidence for metabolic interactions is particularly strong, as a large body of data now supports the conclusion that gut microbiota influence the energy harvest from dietary components, particularly complex carbohydrates, and that metabolites such as the short chain fatty acids produced by gut bacteria can perturb metabolic traits, including adiposity and insulin resistance (Turnbaugh et al. 2006; Backhed et al. 2007; Wen et al. 2008; Turnbaugh et al. 2009; Ridaura et al. 2013).
Gut microbiota communities are assembled by generation, influenced by maternal seeding, environmental factors, host genetics and age, resulting in substantial variations in composition among individuals in human populations (Eckburg et al. 2005; Costello et al. 2009; Huttenhower and Consortium 2012; Goodrich et al. 2014). Most experimental studies of host-gut microbiota interactions have employed large perturbations, such as comparisons of germ-free versus conventional mice, and the significance of common variations in gut microbiota composition for disease susceptibility is still poorly understood. Furthermore, while studies with germ-free mice have clearly implicated microbiota in clinically relevant traits, it has proven difficult to identify the responsible taxa of bacteria.
We now report a population-based analysis of host-gut microbiota interactions in the mouse. One of the issues we explore is the role of host genetics. Although some evidence is consistent with significant heritability of gut microbiota composition, the extent to which the host controls microbiota composition under controlled environmental conditions is unclear. We also examine the role of common variations in gut microbiota in metabolic traits such as obesity and insulin resistance. We performed our study using a resource termed the Hybrid Mouse Diversity Panel (HMDP), consisting of about 100 inbred strains of mice that have been either sequenced or subjected to high density genotyping (Bennett et al. 2010). The resource has several advantages for genetic analysis as compared to traditional genetic crosses. First, it allows high resolution mapping by association rather than linkage analysis, and it has now been used for the identification of a number of novel genes underlying complex traits (Farber et al. 2011; Lavinsky et al. 2015; Parks et al. 2015; Rau et al. 2015). Second, since the strains are permanent the data from separate studies can be integrated, allowing the development of large, publically available databases of physiological and molecular traits relevant to a variety of clinical disorders (systems.genetics.ucla.edu and phenome.jax.org). Third, the panel is ideal for examining gene-by-environment interactions, since it is possible to examine individuals of a particular genotype under a variety of conditions (Orozco et al. 2012; Parks et al. 2013).
Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a significant fraction of abundance of most common microbiota.The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies.
In our study, we concluded:
– In a total of 599 mice, 75% of them abundantly exhibited the same 17 genera
– These 17 genera accounted for 68% of reads
– Consistent with previous studies, changing diet drastically changes gut microbiota composition, and these shifts are strongly dependent on the genetic background of the mice
– Gut microbiota contribute to dietary responsiveness
– Several gut microbiota (known and novel to this study) contribute to obesity and metabolic phenotypes
– seven genome-wide significant loci (P < 4 x 10-6) were found to be associated with common genera
– We were able to estimated the heritability by using a linear mixed model approach andassuming an additive effect based on the proportion of phenotype variance accounted for by genetic relationships among the strains.
We began our study with the hypothesis that the dietary response was dictated in part by differences in gut microbiota. We showed that different inbred strains of mice differ strikingly in the composition of gut microbiota and provided evidence that the variation is determined in part by the host genetic background. Consistent with our hypothesis, we showed that cross-fostering between two strains of mice affected dietary response to the high fat, high sucrose diet. By correlating microbiota composition with dietary response among the HMDP inbred strains, we were able to identify several candidate microbiota influencing dietary response.
For all the details of our research and our methods, read our paper:
Org, Elin; Parks, Brian W W; Joo, Jong Wha J; Emert, Benjamin; Schwartzman, William; Kang, Eun Yong; Mehrabian, Margarete; Pan, Calvin; Knight, Rob; Gunsalus, Robert; Drake, Thomas A; Eskin, Eleazar; Lusis, Aldons J Genetic and environmental control of host-gut microbiota interactions. Journal Article In: Genome Res, 2015, ISSN: 1549-5469. @article{Org:GenomeRes:2015b, title = {Genetic and environmental control of host-gut microbiota interactions.}, author = {Elin Org and Brian W. W. Parks and Jong Wha J. Joo and Benjamin Emert and William Schwartzman and Eun Yong Kang and Margarete Mehrabian and Calvin Pan and Rob Knight and Robert Gunsalus and Thomas A. Drake and Eleazar Eskin and Aldons J. Lusis}, url = {http://dx.doi.org/10.1101/gr.194118.115}, issn = {1549-5469}, year = {2015}, date = {2015-01-01}, journal = {Genome Res}, abstract = {Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high resolution association mapping. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain AxB19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies}, keywords = {}, pubstate = {published}, tppubtype = {article} } Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high resolution association mapping. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain AxB19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies |