Simultaneous Genetic Analysis of more than One Trait

Most methods that try to understand the relationship between an individual’s genetics and traits analyze one trait at a time. Our lab recently published a paper focusing on analyzing multiple traits together. This subject is significant because analyzing multiple traits can discover more genetic variants that affect traits, but the analysis methods are challenging and often very computationally inefficient. This is especially the case for mixed-model methods which take into account the relatedness among individuals in the study. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation that is a measure of the portion of the total correlation between traits that is due to additive genetic effects.

In our recent paper, we aim to solve this problem by introducing a technique that can be used to assess genome-wide association quickly, reducing analysis time from hours to seconds. Our method is called a Matrix Variate Linear Mixed Model (mvLMM) and is similar to the method recently developed by Mathew Stephen’s group ((22706312)). Our method is available as a software which works together with the pylmm software that we are developing on mixed models which is available at http://genetics.cs.ucla.edu/pylmm/. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM/.

We demonstrate the efficacy of our method by analyzing correlated traits in the Northern Finland Birth Cohort ((19060910)). Comparing to a standard approach ((22843982); (22902788)), we show that our method results in more than a 10-fold time reduction for a pair of correlated traits, taking the analysis time from about 35 minutes to about 2.5 minutes for the cubic operations plus another 12 seconds for the iterative part of the algorithm. In addition, the cubic operation can be saved so that it does not have to be re-calculated when analyzing other traits in the same cohort. Finally, we demonstrate how this method can be used to analyze gene expression data. Using a well-studied yeast dataset ((18416601)), we show how estimation of the genetic and environmental components of correlation between pairs of genes allows us for to understand the relative contribution of genetics and environment to coexpression.

One of the key ideas of our approach is to represent the multiple phenotypes as a matrix where the rows are individuals and the columns are traits. We then assume the data follows a “matrix variate normal” distribution where we define a covariance structure on the trait among the rows (individuals) and columns (traits). The use of the matrix variate normal is the key to making our algorithm efficient.

The full paper about mvLMM is below:

Furlotte, Nicholas A; Eskin, Eleazar

Efficient Multiple Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed-Model. Journal Article

In: Genetics, 200 (1), pp. 59-68, 2015, ISSN: 1943-2631.

Abstract | Links | BibTeX

 

**Update** Since publishing, it has been brought to our attention there is related work published by Karin Meyer in 1985 (which cited earlier work by Robin Thompson from 1976) we did not cite. If our method interests you, please also take a moment to review the following paper:

Meyer, K

Maximum Likelihood Estimation of Variance Components for a Multivariate Mixed Model with Equal Design Matrices Journal Article

In: Biometrics, 41 (1), pp. pp. 153-165, 1985, ISSN: 0006341X.

Abstract | Links | BibTeX

Bibliography

Identifying genetic relatives without compromising privacy

Our DNA can tell us a lot about who our relatives are. Recently, several companies including 23andMe and AncestryDNA now provide services where they collect DNA from individuals and then match the DNA to a database of the DNA of other people to identify relatives. Relatives are then informed by the company that their DNAs match. Our lab was interested if we can perform this same type of service but without involving a company and more generally without involving any third party. One way to do this would be to have individuals obtain their own DNA sequences and then share their DNA sequences directly with each other. Unfortunately, DNA sequences are considered medical information and it is inappropriate to share them in this way.

Through a collaboration between our lab and the UCLA cryptography group, we recently published a paper that combines cryptography and genetics which describes an approach for identifying relatives without compromising privacy. Our paper was published in the April 2014 issue of Genome Research. The key ideas is that individuals release an encrypted version of their DNA information. Another individual can download this encrypted version and then use their own DNA information to try to decrypt it. If the are related to each other, their DNA sequences will be close enough that the decryption will work telling the individual that they are related. While if they are unrelated, the decryption will fail. What is important in this approach is that individuals who are not related do not obtain any information about each other’s DNA sequences.

The intuitive idea behind the approach is the following. Individuals each release a copy of their own genomes encrypted with a key that is based on the genome itself. Other users then download this encrypted information and try to decrypt it using their own genomes as the key. The encryption scheme is designed to allow for decryption if the encrypting key and decrypting key are “close enough”. Since related individuals share a portion of their genomes, we set the threshold for “close enough” to be exactly the threshold of relatedness that we want to detect.

Our approach uses a relatively new type of cryptographic technique called Fuzzy Extractors which were pioneered by our co-authors on this study, Amit Sahai and Rafail Ostrovsky. This type of technique allows for encryption and decryption with keys that match inexactly. Students in our group who were involved are Dan He, Nick Furlotte, Farhad Hormozdiari, and Jong Wha (Joanne) Joo. This research was supported by National Science Foundation grant 1065276.

The full citation of our paper is here:

He, Dan; Furlotte, Nicholas A; Hormozdiari, Farhad ; Joo, Jong Wha J; Wadia, Akshay ; Ostrovsky, Rafail ; Sahai, Amit ; Eskin, Eleazar

Identifying genetic relatives without compromising privacy. Journal Article

In: Genome Res, 2014, ISSN: 1549-5469.

Abstract | Links | BibTeX

Thesis Defense: Dr. Nick Furlotte

Nick Furlotte’s thesis defense talk is available on our newly created YouTube channel ZarLabUCLA.  This talk gives a great summary of Nick’s research over the course his his Ph.D. and an overview of the types of problems that our lab works on.  Note that for the record, students in the lab do not dress as well as Nick is dressed in the video.  Nick actually bought those clothes the day before especially for his defense.  Today was Nick’s last day in the lab and he is now on his way to start the next chapter in his career at 23andMe.

His thesis title and abstract are:

Nick Furlotte Thesis Defense
“Computational Genetic Approaches for the Dissection of Complex Traits”

University of California, Los Angeles
May 15 at 2:30 pm

Committee:
Eleazar Eskin (Chair)
David Heckerman
Christopher Lee
A. Jake Lusis
Amit Sahai
Abstract:
Over the past two decades, major technological innovations have transformed the field of genetics allowing researchers to examine the relationship between genetic and phenotypic variation at an unprecedented level of granularity. As a result, genetics has increasingly become a data-driven science, demanding effective statistical procedures and efficient computational methods and necessitating a new interface that some refer to as computational genetics. This talk will focus on a few problems existing within this interface. First, I will introduce a statistical and computational construct called the matrix-variate linear mixed-model (mvLMM), which is used for multiple phenotype genome-wide association. I show how the application of this method results in increased association power over single trait mapping and leads to a dramatic reduction in computational time over classical multiple phenotype optimization procedures. For example, where a classically-based approach takes hours to perform parameter optimization for moderate sample sizes mvLMM takes minutes. Next, I introduce a meta-analysis technique that allows for genome-wide association studies to be combined across populations that are known to contain population structure. This development was motivated by a specific problem in mouse genetics, the aim of which is to utilize multiple mouse association studies jointly. I show that by combining the studies using meta-analysis, while accounting for population structure, the proposed method achieves increased statistical power and increased association resolution. Finally, I will introduce a method for calculating gene coexpression in a way that is robust to statistical confounding introduced through expression heterogeneity.