Multiple testing correction in linear mixed models

Our group recently published a new paper on multiple testing applied to genetic studies with population structure.  This project was led by Jong Wha (Joanne) Joo and also involved Farhad Hormozdiari.  The project was joint with Buhm Han’s group.  The approach built upon Buhm Han’s previous work SLIDE (Han et al. 2009; Han and Eskin 2012).
 
Genome-wide association studies (GWAS) have discovered many variants that are associated with complex traits in the human genome. In GWAS, researchers collect both phenotypic information and genetic information on variants spread through the genome from a population. In order to identify the set of variants associated with a trait of interest, we assess correlations between the phenotype and the genetic information at each variant, which we call the genotype. GWAS are now routinely performed on tens of thousands of individuals—and millions of genetic variants.
 
GWAS methodology must address specific problems that are tied to this exceptionally large scale of analysis. One major challenge in GWAS is multiple hypothesis testing. In routine analyses, the significance of hypothesis testing is assessed using the p value as a per-marker threshold. However, GWAS involves computing up to millions of statistical tests in a single study. When using traditional association study techniques, multiple hypothesis testing can generate false positives or spurious associations, and p value threshold for significance must be adjusted to control the overall false positive rate.
Several approaches are useful in correcting these potential pitfalls, including Bonferroni correction and permutation test.
 
Recently, researchers have accepted the linear mixed model (LMM) as standard practice for performing GWAS. The LMM can address two important challenges in GWAS: population structure and insufficient power. Population structure refers to the complex relatedness structure among individuals, which can drive errors in data reporting such as false positives. In many cases, LMM approaches can increase the statistical power and avoid generating false positives by explicitly modeling the population structure’s genetic relationships. Nonetheless, multiple hypothesis testing with LMM approaches may generate some errors of association. Unfortunately, the current approaches for multiple hypothesis testing correction cannot be applied to LMM.  This is because population structure actually affects the correlation structure of the statistics as we show in the paper.
 
To address this issue, we developed the first gold standard approach for multiple hypothesis testing correction in LMM. This method, called multiple testing in transformed space (MultiTrans), can efficiently correct for multiple testing in LMM approaches. MultiTrans is a parametric bootstrapping resampling approach that is the equivalent of the permutation test. Specifically, our approach samples randomized null phenotypes from the distribution fitted by LMM.
 
Straightforward parametric bootstrapping where phenotypes are sampled is prohibitively computationally expensive.  MultiTrans instead utilizes   a Multivariate Normal Distribution to directly samples the association statistics.  The figure shows an overview of our methodology.
figure-overview
The full citation to our paper is:

Joo, Jong Wha J; Hormozdiari, Farhad; Han, Buhm; Eskin, Eleazar

Multiple testing correction in linear mixed models. Journal Article

In: Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X.

Abstract | Links | BibTeX

 
 
Multiple hypothesis testing is an essential step in GWAS analysis. The correct per-marker threshold differs as a function of species, marker densities, genetic relatedness, and trait heritability—and no previous multiple testing correction methods can comprehensively account for these factors. The method we developed to address this issue, MultiTrans, is an efficient and accurate multiple testing correction approach for LMM. Our method (a) performs a unique transformation of genotype data to account for actual genetic relatedness and heritability under LMM approaches, and (b) efficiently utilizes the multivariate normal distribution. Using MultiTrans, we accurately estimated per-marker thresholds in mouse, yeast, and human datasets—while reducing computation time from months to hours.

Thesis Defense: Dr. Jong Wha (Joanne) Joo

DSC01632
Jong Wha (Joanne) Joo successfully defended her thesis,”Design of efficient and accurate statistical approaches to correct for confounding effects in genetic association studies,” on Friday, December 4, 2015 in Boelter 4760.  Her talk, which is posted on our YouTube channel ZarlabUCLA, discusses using a mixed model analysis (GAMMA) to efficiently analyzes large numbers of phenotypes while simultaneously considering population structure, an expression quantitative trait loci (eQTL) mapping tool to eliminate spurious hotspots while retaining genuine regulatory hotspots, and a multiple testing correction method (slideLMM) for linear mixed models.
More details about her research are available in the three papers she discusses:

Joo, Jong Wha J; Hormozdiari, Farhad; Han, Buhm; Eskin, Eleazar

Multiple testing correction in linear mixed models. Journal Article

In: Genome Biol, 17 (1), pp. 62, 2016, ISSN: 1474-760X.

Abstract | Links | BibTeX

Joo, Jong Wha J; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Lusis, Aldons J; Eskin, Eleazar

Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure Book Chapter

In: Research in Computational Molecular Biology, pp. 136-153, Springer International Publishing, 2015.

Abstract | Links | BibTeX

Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar

Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article

In: Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914.

Abstract | Links | BibTeX

Genetic and Environmental Control of Host-Gut Microbiota Interactions

Studies carried out over the last decade have revealed that gut microbiota contribute to a variety of common disorders, including obesity and diabetes (Musso et al. 2011), colitis (Devkota et al. 2012), atherosclerosis (Wang et al. 2011), rheumatoid arthritis (Vaahtovuo et al. 2008), and cancer (Yoshimoto et al. 2013). The evidence for metabolic interactions is particularly strong, as a large body of data now supports the conclusion that gut microbiota influence the energy harvest from dietary components, particularly complex carbohydrates, and that metabolites such as the short chain fatty acids produced by gut bacteria can perturb metabolic traits, including adiposity and insulin resistance (Turnbaugh et al. 2006; Backhed et al. 2007; Wen et al. 2008; Turnbaugh et al. 2009; Ridaura et al. 2013).

Gut microbiota communities are assembled by generation, influenced by maternal seeding, environmental factors, host genetics and age, resulting in substantial variations in composition among individuals in human populations (Eckburg et al. 2005; Costello et al. 2009; Huttenhower and Consortium 2012; Goodrich et al. 2014). Most experimental studies of host-gut microbiota interactions have employed large perturbations, such as comparisons of germ-free versus conventional mice, and the significance of common variations in gut microbiota composition for disease susceptibility is still poorly understood. Furthermore, while studies with germ-free mice have clearly implicated microbiota in clinically relevant traits, it has proven difficult to identify the responsible taxa of bacteria.

We now report a population-based analysis of host-gut microbiota interactions in the mouse. One of the issues we explore is the role of host genetics. Although some evidence is consistent with significant heritability of gut microbiota composition, the extent to which the host controls microbiota composition under controlled environmental conditions is unclear. We also examine the role of common variations in gut microbiota in metabolic traits such as obesity and insulin resistance. We performed our study using a resource termed the Hybrid Mouse Diversity Panel (HMDP), consisting of about 100 inbred strains of  mice that have been either sequenced or subjected to high density genotyping (Bennett et al. 2010). The resource has several advantages for genetic analysis as compared to traditional genetic crosses. First, it allows high resolution mapping by association rather than linkage analysis, and it has now been used for the identification of a number of novel genes underlying complex traits (Farber et al. 2011; Lavinsky et al. 2015; Parks et al. 2015; Rau et al. 2015). Second, since the strains are permanent the data from separate studies can be integrated, allowing the development of large, publically available databases of physiological and molecular traits relevant to a variety of clinical disorders (systems.genetics.ucla.edu and phenome.jax.org). Third, the panel is ideal for examining gene-by-environment interactions, since it is possible to examine individuals of a particular genotype under a variety of conditions (Orozco et al. 2012; Parks et al. 2013).

Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a significant fraction of abundance of most common microbiota.The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies.

In our study, we concluded:

– In a total of 599 mice, 75% of them abundantly exhibited the same 17 genera

– These 17 genera accounted for 68% of reads

– Consistent with previous studies, changing diet drastically changes gut microbiota composition, and these shifts are strongly dependent on the genetic background of the mice

– Gut microbiota contribute to dietary responsiveness

– Several gut microbiota (known and novel to this study) contribute to obesity and metabolic phenotypes

– seven genome-wide significant loci (P < 4 x 10-6) were found to be associated with common genera

– We were able to estimated the heritability by using a linear mixed model approach andassuming an additive effect based on the proportion of phenotype variance accounted for by genetic relationships among the strains.

We began our study with the hypothesis that the dietary response was dictated in part by differences in gut microbiota. We showed that different inbred strains of mice differ strikingly in the composition of gut microbiota and provided evidence that the variation is determined in part by the host genetic background. Consistent with our hypothesis, we showed that cross-fostering between two strains of mice affected dietary response to the high fat, high sucrose diet. By correlating microbiota composition with dietary response among the HMDP inbred strains, we were able to identify several candidate microbiota influencing dietary response.

For all the details of our research and our methods, read our paper:

Org, Elin; Parks, Brian W W; Joo, Jong Wha J; Emert, Benjamin; Schwartzman, William; Kang, Eun Yong; Mehrabian, Margarete; Pan, Calvin; Knight, Rob; Gunsalus, Robert; Drake, Thomas A; Eskin, Eleazar; Lusis, Aldons J

Genetic and environmental control of host-gut microbiota interactions. Journal Article

In: Genome Res, 2015, ISSN: 1549-5469.

Abstract | Links | BibTeX