Identification of causal genes for complex traits (CAVIAR-gene)

Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider ‘causal variants’ as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations.

In our recently published work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability q. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2.

In the context of association studies, the genetic variants which are responsible for the association signal at a locus are referred to in the genetics literature as the ‘causal variants.’ Causal variants have biological effect on the phenotype.

CAVIAR-Gene provides better ranking of the causal genes for Outbred, F2, and HMDP datasets. Panels a and b illustrate the results for Outbred genotypes for case where we have one causal and two causal genes, respectively. Panels c and d illustrate the results for F2 genotypes for case where we have one causal and two causal genes, respectively. Panels e and f illustrate the results for Outbred genotypes for case where we have one causal and two causal genes, respectively.

CAVIAR-Gene provides better ranking of the causal genes for Outbred, F2, and HMDP datasets. Panels a and b illustrate the results for Outbred genotypes for case where we have one causal and two causal genes, respectively. Panels c and d illustrate the results for F2 genotypes for case where we have one causal and two causal genes, respectively. Panels e and f illustrate the results for Outbred genotypes for case where we have one causal and two causal genes, respectively.

Generally, variants can be categorized into three main groups. The first group is the causal variants which have a biological effect on the phenotype and are responsible for the association signal. The second group is the variants which are statistically associated with the phenotype due to LD with a causal variant. Even though association tests for these variants may be statistically significant, under our definition, they are not causal variants. The third group is the variants which are not statistically associated with the phenotype and are not causal.

CAVIAR-Gene is a statistical method for fine mapping that addresses two main limitations of existing methods. First, as opposed to existing approaches that focus on individual variants, we propose to search only over the space of gene combinations that explain the statistical association signal, and thus drastically reduce runtime. Second, CAVIAR-Gene extends existing framework for fine mapping to account for population structure. The output of our approach is a minimal set of genes that will contain the true casual gene at a pre-specified significance level.  The output of our approach is a minimal set of genes that will contain the true casual gene at a pre-specified significance level. This gene set together with its individual gene probability of causality provides a natural way of prioritizing genes for functional testing (e.g. knockout strategies) in model organisms. Through extensive simulations, we demonstrate that CAVIAR-Gene is superior to existing methodologies, requiring the smallest set of genes to follow-up in order to capture the true causal gene(s).

Building off our previous work with CAVIAR,  CAVIAR-Gene takes as input the marginal statistics for each variant at a locus, an LD matrix consisting of pairwise Pearson correlations computed between the genotypes of a pair of genetic variants, a partitioning of the set of variants in a locus into genes, and the kinship matrix which indicates the genetic similarity between each pair of individuals. Marginal statistics are computed using methods that correct for population structure.  We consider a variant to be causal when the variant is responsible for the association signal at a locus and aim to discriminate these variants from ones that are correlated due to LD.

In model organisms, the large stretches of LD regions result in a large number of variants associated in each region, thus making CAVIAR computationally

infeasible. Instead of producing a rho causal set of SNPs, CAVIAR-gene detects a ‘q causal gene set’ which is a set of genes in the locus that will contain the actual causal genes with probability of at least q.

For further details of our new method, CAVIAR-gene, view our full paper here:

Sorry, no publications matched your criteria.

Genetic and Environmental Control of Host-Gut Microbiota Interactions

Studies carried out over the last decade have revealed that gut microbiota contribute to a variety of common disorders, including obesity and diabetes (Musso et al. 2011), colitis (Devkota et al. 2012), atherosclerosis (Wang et al. 2011), rheumatoid arthritis (Vaahtovuo et al. 2008), and cancer (Yoshimoto et al. 2013). The evidence for metabolic interactions is particularly strong, as a large body of data now supports the conclusion that gut microbiota influence the energy harvest from dietary components, particularly complex carbohydrates, and that metabolites such as the short chain fatty acids produced by gut bacteria can perturb metabolic traits, including adiposity and insulin resistance (Turnbaugh et al. 2006; Backhed et al. 2007; Wen et al. 2008; Turnbaugh et al. 2009; Ridaura et al. 2013).

Gut microbiota communities are assembled by generation, influenced by maternal seeding, environmental factors, host genetics and age, resulting in substantial variations in composition among individuals in human populations (Eckburg et al. 2005; Costello et al. 2009; Huttenhower and Consortium 2012; Goodrich et al. 2014). Most experimental studies of host-gut microbiota interactions have employed large perturbations, such as comparisons of germ-free versus conventional mice, and the significance of common variations in gut microbiota composition for disease susceptibility is still poorly understood. Furthermore, while studies with germ-free mice have clearly implicated microbiota in clinically relevant traits, it has proven difficult to identify the responsible taxa of bacteria.

We now report a population-based analysis of host-gut microbiota interactions in the mouse. One of the issues we explore is the role of host genetics. Although some evidence is consistent with significant heritability of gut microbiota composition, the extent to which the host controls microbiota composition under controlled environmental conditions is unclear. We also examine the role of common variations in gut microbiota in metabolic traits such as obesity and insulin resistance. We performed our study using a resource termed the Hybrid Mouse Diversity Panel (HMDP), consisting of about 100 inbred strains of  mice that have been either sequenced or subjected to high density genotyping (Bennett et al. 2010). The resource has several advantages for genetic analysis as compared to traditional genetic crosses. First, it allows high resolution mapping by association rather than linkage analysis, and it has now been used for the identification of a number of novel genes underlying complex traits (Farber et al. 2011; Lavinsky et al. 2015; Parks et al. 2015; Rau et al. 2015). Second, since the strains are permanent the data from separate studies can be integrated, allowing the development of large, publically available databases of physiological and molecular traits relevant to a variety of clinical disorders ( and Third, the panel is ideal for examining gene-by-environment interactions, since it is possible to examine individuals of a particular genotype under a variety of conditions (Orozco et al. 2012; Parks et al. 2013).

Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Using a SNP-based approach with a linear mixed model we estimated the heritability of microbiota composition. We conclude that in a controlled environment the genetic background accounts for a significant fraction of abundance of most common microbiota.The mice were previously studied for response to a high fat, high sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, AxB19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publically available data provide a resource for future studies.

In our study, we concluded:

– In a total of 599 mice, 75% of them abundantly exhibited the same 17 genera

– These 17 genera accounted for 68% of reads

– Consistent with previous studies, changing diet drastically changes gut microbiota composition, and these shifts are strongly dependent on the genetic background of the mice

– Gut microbiota contribute to dietary responsiveness

– Several gut microbiota (known and novel to this study) contribute to obesity and metabolic phenotypes

– seven genome-wide significant loci (P < 4 x 10-6) were found to be associated with common genera

– We were able to estimated the heritability by using a linear mixed model approach andassuming an additive effect based on the proportion of phenotype variance accounted for by genetic relationships among the strains.

We began our study with the hypothesis that the dietary response was dictated in part by differences in gut microbiota. We showed that different inbred strains of mice differ strikingly in the composition of gut microbiota and provided evidence that the variation is determined in part by the host genetic background. Consistent with our hypothesis, we showed that cross-fostering between two strains of mice affected dietary response to the high fat, high sucrose diet. By correlating microbiota composition with dietary response among the HMDP inbred strains, we were able to identify several candidate microbiota influencing dietary response.

For all the details of our research and our methods, read our paper:

Sorry, no publications matched your criteria.

Solving Crimes with DNA

Recently Zarlab hosted the first-ever Undergraduate Bioinformatics Speaker Series. Our lab has been steadily growing as our undergraduate research program becomes more robust, and we decided it was time we gave the undergrads an outlet of their own. Recently, the Computational Genetics Student Group (CGSG) was formed to serve the research, networking and extracurricular educational needs of the bioinformatics students (and those potentially interested in bioinformatics) at UCLA.

For our first event, we chose to explore the field of forensics and learn how bioinformatics and statistics can be used to solve crimes by analyzing DNA. Associate professor Kirk Lohmueller and Jill Licht, senior criminalist with the LA County Sheriff’s Department, gave insights into murder investigations where they served as expert witnesses. Kirk spoke about how the case was overthrown by the judge due to overlooking key forensic evidence. At the second trial, Kirk was able to testify to a potential second suspect whose blood was found at the crime scene. However, even with the additional DNA evidence, the jury still convicted the primary suspect based on a child’s eye witness account!

Jill was able to provide stories of what the day-to-day life of a forensic biologist is like. At least one week every month, she has to remain alert and ready to drive to the scene of a crime 24-hours a day. Sometimes she’ll get the call at 2 a.m. and have to drive an hour to get to the location. She explained how the Los Angeles Police Department only has jurisdiction in the city of Los Angeles, but the sheriff’s department oversees the rest of LA County. That means she could be called to anywhere from Pasadena to Long Beach. For someone who is squeamish at the sight of blood, Jill says she is able to handle it at work. The ultimate goal is to determine the story behind the scene, and she must stay focused in order to do her best work at the scene. Could you handle working with blood and brains?

If you are interested in this and future talks, leave us a comment below.