Mixed models can correct for population structure for genomic regions under selection

Genome-wide association studies (GWAS) collect people with a disease (called  “cases”) and people without a disease (called “controls”) and compare allele frequencies between cases and controls to identify genomic locations associated the disease. An underlying assumption of GWAS is that cases and controls are sampled from the same population. If they are not, then a phenomenon called “population structure” may cause spurious associations. Correcting for population structure in GWAS has been a very important problem in model organism such as mouse and in human genetics.

In 2010, our group proposed a method called “EMMAX” (10.1038/ng.548) that uses a linear mixed model to correct for population structure in human GWAS. EMMAX computes the relationship between every pair of individuals from SNP data (called “kinship matrix”) and uses this kinship matrix to control population structure. We showed that our method removes effects of population structure better than previous methods using two human GWAS datasets. However, Price et al. showed in this paper (10.1038/nrg2813) that EMMAX may be susceptible to spurious associations for genomic regions under selection; these are regions where two populations have significantly different allele frequencies.

We investigated this issue further and found that by using an appropriate kinship matrix (or matrices), EMMAX can correct for population structure for genomic regions under selection. We showed in the paper that by computing the kinship matrix only from SNPs whose allele frequencies are very different between two populations, we can successfully remove effects of population structure. We also proposed using two kinship matrices; one kinship computed from SNPs under selection and the other kinship from the rest of SNPs. This also correctly controls population structure. Lastly, we looked at whether SNPs under selection actually cause this problem in two human GWAS datasets, but did not identify the problem in both datasets.

Full Citation:

Sul, Jae Hoon, and Eleazar Eskin. 2013. Mixed models can correct for population structure for genomic regions under selection. Nature Reviews Genetics 14, no. 4 (February 26): 300–300. http://dx.doi.org/10.1038/nrg2813-c1.


Identifying Genes Involved in Blood Cell Traits


In this study, blood cell traits were collected from each strain in the HMDP panel which consists of 100 mouse strains. Using EMMA(10.1534/genetics.107.080101), we identified associations with these traits. The main advantage of the HMDP compared to the traditional genetic cross approach is the increase in resolution of the association.

We identified a particularly striking association with mean corpuscular volume (MCV).  The figure from the paper shows both the manhattan plot for the HMDP as well as the linkage plot from a genetic cross examining the same trait for chromosome 7.  This example clearly shows the advantge of the HMDP compared to the cross in terms of resolution of the association.  The peak is less than 1 Mb from Hbb-b1 which has been previously suggested to affect this trait.

Some reviews covering the HMDP and mouse genetics more broadly are available here.

Full Citation:
Davis, Richard C, Atila van Nas, Brian Bennett, Luz Orozco, Calvin Pan, Christoph D Rau, Eleazar Eskin, and Aldons J Lusis. 2013. Genome-wide association mapping of blood cell traits in mice. Mamm Genomedoi:10.1007/s00335-013-9448-0

Genetic variations in blood cell parameters can impact clinical traits. We report here the mapping of blood cell traits in a panel of 100 inbred strains of mice of the Hybrid Mouse Diversity Panel (HMDP) using genome-wide association (GWA). We replicated a locus previously identified in using linkage analysis in several genetic crosses for mean corpuscular volume (MCV) and a number of other red blood cell traits on distal chromosome 7. Our peak for SNP association to MCV occurred in a linkage disequilibrium (LD) block spanning from 109.38 to 111.75 Mb that includes Hbb-b1, the likely causal gene. Altogether, we identified five loci controlling red blood cell traits (on chromosomes 1, 7, 11, 12, and 16), and four of these correspond to loci for red blood cell traits reported in a recent human GWA study. For white blood cells, including granulocytes, monocytes, and lymphocytes, a total of six significant loci were identified on chromosomes 1, 6, 8, 11, 12, and 15. An average of ten candidate genes were found at each locus and those were prioritized by examining functional variants in the HMDP such as missense and expression variants. These results provide intermediate phenotypes and candidate loci for genetic studies of atherosclerosis and cancer as well as inflammatory and immune disorders in mice