Most methods that try to understand the relationship between an individual’s genetics and traits analyze one trait at a time. Our lab recently published a paper focusing on analyzing multiple traits together. This subject is significant because analyzing multiple traits can discover more genetic variants that affect traits, but the analysis methods are challenging and often very computationally inefficient. This is especially the case for mixed-model methods which take into account the relatedness among individuals in the study. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation that is a measure of the portion of the total correlation between traits that is due to additive genetic effects.
In our recent paper, we aim to solve this problem by introducing a technique that can be used to assess genome-wide association quickly, reducing analysis time from hours to seconds. Our method is called a Matrix Variate Linear Mixed Model (mvLMM) and is similar to the method recently developed by Mathew Stephen’s group ((22706312)). Our method is available as a software which works together with the pylmm software that we are developing on mixed models which is available at http://genetics.cs.ucla.edu/pylmm/. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM/.
We demonstrate the efficacy of our method by analyzing correlated traits in the Northern Finland Birth Cohort ((19060910)). Comparing to a standard approach ((22843982); (22902788)), we show that our method results in more than a 10-fold time reduction for a pair of correlated traits, taking the analysis time from about 35 minutes to about 2.5 minutes for the cubic operations plus another 12 seconds for the iterative part of the algorithm. In addition, the cubic operation can be saved so that it does not have to be re-calculated when analyzing other traits in the same cohort. Finally, we demonstrate how this method can be used to analyze gene expression data. Using a well-studied yeast dataset ((18416601)), we show how estimation of the genetic and environmental components of correlation between pairs of genes allows us for to understand the relative contribution of genetics and environment to coexpression.
One of the key ideas of our approach is to represent the multiple phenotypes as a matrix where the rows are individuals and the columns are traits. We then assume the data follows a “matrix variate normal” distribution where we define a covariance structure on the trait among the rows (individuals) and columns (traits). The use of the matrix variate normal is the key to making our algorithm efficient.
The full paper about mvLMM is below:
**Update** Since publishing, it has been brought to our attention there is related work published by Karin Meyer in 1985 (which cited earlier work by Robin Thompson from 1976) we did not cite. If our method interests you, please also take a moment to review the following paper: