Mixed models can correct for population structure for genomic regions under selection

Genome-wide association studies (GWAS) collect people with a disease (called  “cases”) and people without a disease (called “controls”) and compare allele frequencies between cases and controls to identify genomic locations associated the disease. An underlying assumption of GWAS is that cases and controls are sampled from the same population. If they are not, then a phenomenon called “population structure” may cause spurious associations. Correcting for population structure in GWAS has been a very important problem in model organism such as mouse and in human genetics.

In 2010, our group proposed a method called “EMMAX” (10.1038/ng.548) that uses a linear mixed model to correct for population structure in human GWAS. EMMAX computes the relationship between every pair of individuals from SNP data (called “kinship matrix”) and uses this kinship matrix to control population structure. We showed that our method removes effects of population structure better than previous methods using two human GWAS datasets. However, Price et al. showed in this paper (10.1038/nrg2813) that EMMAX may be susceptible to spurious associations for genomic regions under selection; these are regions where two populations have significantly different allele frequencies.

We investigated this issue further and found that by using an appropriate kinship matrix (or matrices), EMMAX can correct for population structure for genomic regions under selection. We showed in the paper that by computing the kinship matrix only from SNPs whose allele frequencies are very different between two populations, we can successfully remove effects of population structure. We also proposed using two kinship matrices; one kinship computed from SNPs under selection and the other kinship from the rest of SNPs. This also correctly controls population structure. Lastly, we looked at whether SNPs under selection actually cause this problem in two human GWAS datasets, but did not identify the problem in both datasets.

Full Citation:

Sul, Jae Hoon, and Eleazar Eskin. 2013. Mixed models can correct for population structure for genomic regions under selection. Nature Reviews Genetics 14, no. 4 (February 26): 300–300. http://dx.doi.org/10.1038/nrg2813-c1.

Bibliography