We recently published the first study to report a genetic component to host choice behavior in the major malaria vector Anopheles arabiensis. In a collaboration with the University of California Davis, University of Glasgow, and the Environmental Health and Ecological Sciences Group, Ifakara Health Institute, Ifakara, United Republic of Tanzania, we assess the genetic basis for An. arabiensis host choice and resting behavior. We link human-fed behavior to allelic variation between the 3Ra inversion states. This effort was led by researchers at UC Davis, including Bradley Main, Yoosook Lee, Travis Collier, Anthony Cornel, Catelyn Nieman, Allison Weakley, and Gregory Lanzaro. Eleazar Eskin and Eun Yong Kang contributed data analysis and interpretation.
Mosquitoes that feed on human blood pose an enormous public health threat by transmitting numerous pathogens, such as dengue virus, Zika virus, and malaria. Together, these mosquito-borne diseases kill more than one million people per year. Human exposure to malaria is driven by variable mosquito behaviors such as: (1) propensity to feed on humans relative to other animals (anthropophily) and (2) preference for living in close proximity to humans, as reflected by biting and residing inside houses (endophily).
Our project focused on the potential for An. arabiensis, the only remaining malaria vector in many parts of Africa, to adapt its behavior to avoid control measures such as insecticide-treated nets and indoor residual sprays. To investigate the genetic basis of host choice and resting behavior, we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We tested for genetic associations with each of the four phenotypes: human-fed, cow-fed, resting indoors, and resting outdoors.
With these genomes, we identified a set of 4,820,851 segregating SNPs after imposing a minor allele frequency threshold of 10%. We estimated the genetic component (or “SNP heritability”) for each phenotype. Results suggest a genetic component for host choice and no genetic component for resting behavior.
To test for the existence of genetic structure within our set of 48 sequenced genomes, individuals were partitioned by genetic relatedness using a Principle Component Analysis (Genome-Wide Complex Trait Analysis software, GCTA) applied to all SNPs. Using this approach, we observed three discrete genetic clusters. We used a novel population-scale inversion genotyping method to identify an association between the standard arrangement of 3Ra (3R+) and cattle-fed An. arabiensis. We highlight two intriguing candidate genes within the 3Ra, including the odorant binding protein Obp5, and the odorant receptor Or65. The enrichment of 3R+ among cattle-fed mosquitoes provides support for a genetic component to host choice, which is consistent with the report that zoophily can be selected for.
Our multiplex genotyping assays allowed us to directly estimate relationships between host choice and genotype in wild mosquitoes in a high-throughput and economical fashion. Given the importance of mosquito feeding and resting behavior to the effectiveness of malaria control and transmission, there is an urgent need to understand the underlying biological determinants of these behaviors and their short- and long-term impact on the effectiveness of current public health interventions.
For more information, see our paper, which is available for download through PLoS Genetics: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006303.
The full citation to our paper is:
Main, B.J., Lee, Y., Ferguson, H.M., Kreppel, K.S., Kihonda, A., Govella, N.J., Collier, T.C., Cornel, A.J., Eskin, E., Kang, E.Y. and Nieman, C.C., 2016. The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis. PLoS Genet, 12(9), p.e1006303.
We have published numerous blog posts on managing scientific labs, writing papers, and strategizing a graduate career. Articles presenting our advice on these subjects have become the top-viewed posts on our website. Moving forward, we will be organizing the ZarLab website to feature this content. We believe that the practices and strategies described in these posts have greatly improved our productivity and advanced our careers. While our posts are written with Bioinformatics in mind, the concepts can be applied broadly to careers across STEM fields.
Here we present a summary of the posts that provide advice to scientists.
- Writing Tips: An Authorship Policy that Maximizes Collaboration
- Writing Tips: Why we Publish Methods Papers
- Writing Tips: Results Subsections
- Writing Tips: Methods Overview
- Writing Tips: Introduction
- Writing Tips: How we Edit
- Writing Tips: Getting Organized (and Staying that Way)
- Writing Tips: Motivation (or the Lack of It)
- Writing Tips: Overcoming Writer’s Block
Advice for Scientists
- UCLA Bioinformatics: The Philosophy of the Training Environment and Programs
- UCLA Bioinformatics: The Philosophy of the Ph.D. Program
- UCLA Bioinformatics: The Philosophy of the Undergraduate Program
- UCLA Launches CGSI with Inaugural Summer Programs
- Video Tutorial: Serghei Mangul’s Introduction to UNIX Workshops
- Video Tutorial: An Introduction to Read Mapping and Next Generation Sequencing
- Learning Bioinformatics @ UCLA: Finding Bioinformatics Research Opportunities
- Learning Bioinformatics @ UCLA: What Courses Should I Take?
- Learning Bioinformatics @ UCLA: The Undergraduate Bioinformatics Minor
ZarLab Thesis Defenses
- Thesis Defense: Dr. Farhad Hormozdiari
- Thesis Defense: Dr. Jong Wha (Joanne) Joo
- Thesis Defense: Dr. Zhanyong (Jerry) Wang
- Thesis Defense: Dr. Eun Yong Kang
- Thesis Defense: Dr. Jae Hoon Sul
- Thesis Defense: Dr. Nick Furlotte
Michael Bilow and Eleazar Eskin, together with Fernando Crespo, Zhicheng Pan, and Susana Eyheramendy, recently released a novel method for accurate joint modeling of clinical phenotype and disease status. This approach incorporates a clinical phenotype into case/control studies under the assumption that the genetic variant can affect both.
Genetic case-control association studies have found thousands of associations between genetic variants and disease. Most studies collect data from individuals with and without disease, and they often search for variants with different frequencies between the groups. Jointly modelling clinical phenotype and disease status is a promising way to increase power to detect true associations between genetics and disease. In particular, this method increases potential for discovering genetic variants that are associated with both a clinical phenotype and a disease.
However, standard multivariate techniques fail to effectively solve this problem because their case-control status is discrete and not continuous. Standard approaches to estimate model parameters are biased due to the ascertainment in case/control studies. We present a novel method that resolves both of these issues for simultaneous association testing of genetic variants that have both case status and a clinical covariate.
In our paper, we show the utility of our method using data from the North Finland Birth Cohort (NFBC) dataset. NFBC enrolled almost everyone born in 1966 in Finland’s two most northern provinces. The NFBC dataset consists of 10 phenotypes and genotypes at 331,476 genetic variants measured in 5,327 individuals. We focus our study on the LDL cholesterol and triglyceride levels phenotypes.
Our evaluation strategy analyzes a subset of the NFBC data and compares what we discover here to what was discovered in the full NFBC dataset—which we treat as the gold standard. We compare the performance of our novel approach to three other methods: (1) the single univariate test applied to the disease status, (2) the multivariate approach applied to the disease status and the clinical phenotype modeled as a multivariate normal distribution, and (3) the liability threshold model treating the clinical phenotype as a covariate.
Using the univariate approach, the p-values are much weaker in comparison to those observed in the full NFBC dataset. Running the multivariate approaches, incorporating the triglyceride levels phenotypes, increased power (i.e., more significant p-values than SNPs).
Our method has the highest power in all scenarios. The advantage of our method is greater when there are substantial amounts of selection bias compared to lower amounts of selection bias. Our method is even more powerful when the correlation between the clinical covariate and the disease liability is lower, because we explicitly estimate the underlying liability using all of the data.
For more information, see our paper in Genetics: http://www.genetics.org/content/early/2017/01/27/genetics.116.198473
The software implementing the methods described in this paper was developed by Fernando Crespo and is available at: http://genetics.cs.ucla.edu/multipheno/ and
The full citation to our paper is:
Bilow, M., Crespo, F., Pan, Z., Eskin, E. and Eyheramendy, S., 2017. Simultaneous Modeling of Disease Status and Clinical Phenotypes to Increase Power in GWAS. Genetics, pp.genetics-116.