Emrah Kostem, who graduated this year and is now at Illumina, gave a talk about the research he completed in the lab this summer at our retreat. It is available here and gives a good overview of what the goals of our group are and some details of the projects that Emrah completed in the lab.
One of the topics he discusses is his recently published work on estimating heritability, which is quantifying the amount that genetics accounts for the variance of a trait. He discusses his work on how to partition heritability into the contributions of genomic regions(10.1016/j.ajhg.2013.03.010).
He also talks about his work which takes advantage of the insight that association statistics follow the multivariate normal distribution and applies this to two problems. The first is the problem of selecting follow up SNPs using the results of an association study(10.1534/genetics.111.128595). The second problem is the problem of speeding up eQTL studies using a two stage approach where only a fraction of the association tests are performed but virtually all of the significant associations are still discovered(10.1089/cmb.2013.0087).
Details of what he talked about are in his papers:
Kostem, Emrah; Eskin, Eleazar Improving the accuracy and efficiency of partitioning heritability into the contributions of genomic regions. Journal Article In: Am J Hum Genet, 92 (4), pp. 558-64, 2013, ISSN: 1537-6605. @article{Kostem:AmJHumGenet:2013, title = {Improving the accuracy and efficiency of partitioning heritability into the contributions of genomic regions.}, author = { Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1016/j.ajhg.2013.03.010}, issn = {1537-6605}, year = {2013}, date = {2013-01-01}, journal = {Am J Hum Genet}, volume = {92}, number = {4}, pages = {558-64}, address = {United States}, organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA. Electronic address: ekostem@cs.ucla.edu.}, abstract = {Quantifying heritability, the amount of genetic contribution in a complex trait, has been of fundamental interest to geneticists for decades. Recently, partitioning the heritability accounted for by common variants into the contributions of genomic regions has received a lot of attention given its important applications for understanding the genetic architecture of complex traits. Current methods partition the total heritability by jointly estimating the contributions of all regions. However, these methods are computationally intractable and can be inaccurate when the number of regions is large. In this paper, we present an alternative approach that partitions the total heritability into the contributions of an arbitrary number of regions. We demonstrate by using simulations that our approach is more accurate and computationally efficient than current approaches. Using a data set from a genome-wide association study on human height, we demonstrate the utility of our method by estimating the heritability contributions of chromosomes and subchromosomal regions}, keywords = {}, pubstate = {published}, tppubtype = {article} } Quantifying heritability, the amount of genetic contribution in a complex trait, has been of fundamental interest to geneticists for decades. Recently, partitioning the heritability accounted for by common variants into the contributions of genomic regions has received a lot of attention given its important applications for understanding the genetic architecture of complex traits. Current methods partition the total heritability by jointly estimating the contributions of all regions. However, these methods are computationally intractable and can be inaccurate when the number of regions is large. In this paper, we present an alternative approach that partitions the total heritability into the contributions of an arbitrary number of regions. We demonstrate by using simulations that our approach is more accurate and computationally efficient than current approaches. Using a data set from a genome-wide association study on human height, we demonstrate the utility of our method by estimating the heritability contributions of chromosomes and subchromosomal regions |
Kostem, Emrah; Eskin, Eleazar Efficiently Identifying Significant Associations in Genome-wide Association Studies. Journal Article In: J Comput Biol, 20 (10), pp. 817-30, 2013, ISSN: 1557-8666. @article{Kostem:JComputBiol:2013, title = {Efficiently Identifying Significant Associations in Genome-wide Association Studies.}, author = {Emrah Kostem and Eleazar Eskin}, url = {http://dx.doi.org/10.1089/cmb.2013.0087}, issn = {1557-8666}, year = {2013}, date = {2013-01-01}, journal = {J Comput Biol}, volume = {20}, number = {10}, pages = {817-30}, address = {United States}, organization = {1 Computer Science Department, University of California , Los Angeles, California.}, abstract = {Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75}, keywords = {}, pubstate = {published}, tppubtype = {article} } Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75 |
Kostem, Emrah; Lozano, Jose A; Eskin, Eleazar Increasing Power of Genome-wide Association Studies by Collecting Additional SNPs. Journal Article In: Genetics, 2011, ISSN: 1943-2631. @article{Kostem:Genetics:2011, title = {Increasing Power of Genome-wide Association Studies by Collecting Additional SNPs.}, author = { Emrah Kostem and Jose A. Lozano and Eleazar Eskin}, url = {http://dx.doi.org/10.1534/genetics.111.128595}, issn = {1943-2631}, year = {2011}, date = {2011-01-01}, journal = {Genetics}, organization = {University of California, Los Angeles;}, abstract = {Genome-wide association studies (GWAS) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single nucleotide polymorphisms (SNPs), called tag SNPs, are genotyped in case-control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this paper we address how to characterize these regions cost-effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case-control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Pro ject can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case-Control Consortium to demonstrate that our method shows superior performance than the correlation and distance based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Genome-wide association studies (GWAS) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single nucleotide polymorphisms (SNPs), called tag SNPs, are genotyped in case-control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this paper we address how to characterize these regions cost-effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case-control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Pro ject can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case-Control Consortium to demonstrate that our method shows superior performance than the correlation and distance based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs. |