Thesis Defense: Dr. Jae Hoon Sul

Dr. Jae Hoon Sul with his committee.

Dr. Jae Hoon Sul with his committee.

Jae Hoon Sul successfully defended his thesis on Wednesday September 19th.  His talk is posted on our YouTube Channel ZarlabUCLA.  Jae Hoon’s talk discusses several projects including using mixed model to correct for population structure, rare variant association studies and a meta-analysis approach for detecting multi-tissue eQTLs.  Fortunately for the lab, Jae Hoon is staying at UCLA for another year as a post-doc.

More details about what he talks about in his talk are available in the papers he discusses:

Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar

Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article

In: PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404.

Abstract | Links | BibTeX

Sul, Jae Hoon; Han, Buhm ; He, Dan ; Eskin, Eleazar

An Optimal Weighted Aggregated Association Test for Identification of Rare Variants Involved in Common Diseases. Journal Article

In: Genetics, 188 (1), pp. 181-188, 2011, ISSN: 1943-2631.

Abstract | Links | BibTeX

Kang, Hyun Min; Sul, Jae Hoon ; Service, Susan K; Zaitlen, Noah A; Kong, Sit-Yee Y; Freimer, Nelson B; Sabatti, Chiara ; Eskin, Eleazar

Variance component model to account for sample structure in genome-wide association studies. Journal Article

In: Nat Genet, 42 (4), pp. 348-54, 2010, ISSN: 1546-1718.

Abstract | Links | BibTeX

Discovering Genetic Variation that Affects Expression in Multiple Tissues

Over the past several years, Genome Wide Association Studies (GWAS) have discovered hundreds of genetic variants involved in complex diseases(10.1056/NEJMra0905980).  The vast majority of these variants do not lie in the protein coding regions of genes and thus do not affect what the gene produces, but instead likely affect how the genes are regulated.  For this reason, the study of how genetic variation affect gene activity levels (referred to as expression levels) has been a major focus of research for many years.  Genetic variation that affects gene expression are referred to as expression quantitative trait loci (eQTL)(10.1038/nrg2969).

Several studies collect expression from multiple tissues which leads to the question of whether or not the same genetic variants affect expression in multiple tissues(10.1038/ng.2653).  Another way to ask this question is: Are eQTLs tissue specific or not tissue specific?

A challenge in this type of analysis is that an eQTL may affect expression in multiple tissues, but because of small sample sizes, the eQTL will only be detected in one of the tissues.  Thus, traditional techniques for eQTLs will systematically be biased against detecting eQTLs in multiple tissues.

Jae-Hoon Sul and Buhm Han in our group developed a method to address this issue which builds upon recent methods in random effects meta-analysis(10.1016/j.ajhg.2011.04.014),(10.1371/journal.pgen.1002555).  To apply these methods we first analyze each tissue separately and then use the meta-analysis method to combine the results of each tissue.  Since our methods are specifically designed to handle “heterogeneity” which is that the effect size can be different in each study, our method is able to perform well when the effect is present in all of the tissues or just some of the tissues.  More information about our meta-analysis research is here.

The full citation of our paper is here:

Sul, Jae Hoon; Han, Buhm ; Ye, Chun ; Choi, Ted ; Eskin, Eleazar

Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches Journal Article

In: PLoS Genet, 9 (6), pp. e1003491, 2013, ISSN: 1553-7404.

Abstract | Links | BibTeX

Over the past few years, our group has published several papers on methods for eQTL analysis.  Our other paper on eQTL analysis include:

2018

Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn; Wen, Xiaoquan; Xi, Hualin S; Hormozdiari, Farhad; Ongen, Halit; Konkashbaev, Anuar; Derks, Eske M; Aguet, François; Quan, Jie; Nicolae, Dan L; Eskin, Eleazar; Kellis, Manolis; Getz, Gad; McCarthy, Mark I; Dermitzakis, Emmanouil T; Cox, Nancy J; Ardlie, Kristin G

Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Journal Article

In: Nat Genet, 50 (7), pp. 956-967, 2018, ISSN: 1546-1718.

Abstract | Links | BibTeX

2017

Duong, Dat; Gai, Lisa; Snir, Sagi; Kang, Eun Yong; Han, Buhm; Sul, Jae Hoon; Eskin, Eleazar

Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Journal Article

In: Bioinformatics, 33 (14), pp. i67-i74, 2017, ISSN: 1367-4811.

Abstract | Links | BibTeX

2016

Duong, Dat ; Zou, Jennifer ; Hormozdiari, Farhad ; Sul, Jae Hoon ; Ernst, Jason ; Han, Buhm ; Eskin, Eleazar

Using genomic annotations increases statistical power to detect eGenes. Journal Article

In: Bioinformatics, 32 (12), pp. i156-i163, 2016, ISSN: 1367-4811.

Abstract | Links | BibTeX

Kang, Eun Yong; Martin, Lisa; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J; Shifman, Sagiv; Eskin, Eleazar

Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Journal Article

In: Genetics, 2016, ISSN: 1943-2631.

Abstract | Links | BibTeX

Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar

Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article

In: Am J Hum Genet, 2016, ISSN: 1537-6605.

Abstract | Links | BibTeX

Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara

Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. Journal Article

In: PLoS Genet, 12 (5), pp. e1006046, 2016, ISSN: 1553-7404.

Abstract | Links | BibTeX

Hasin-Brumshtein, Yehudit; Khan, Arshad H; Hormozdiari, Farhad; Pan, Calvin; Parks, Brian W; Petyuk, Vladislav A; Piehowski, Paul D; Brümmer, Anneke; Pellegrini, Matteo; Xiao, Xinshu; Eskin, Eleazar; Smith, Richard D; Lusis, Aldons J; Smith, Desmond J

Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. Journal Article

In: Elife, 5 , 2016, ISSN: 2050-084X.

Abstract | Links | BibTeX

2015

Sul, Jae Hoon; Raj, Towfique; de Jong, Simone; de Bakker, Paul I W; Raychaudhuri, Soumya; Ophoff, Roel A; Stranger, Barbara E; Eskin, Eleazar; Han, Buhm

Accurate and Fast Multiple-Testing Correction in eQTL Studies. Journal Article

In: Am J Hum Genet, 96 (6), pp. 857-68, 2015, ISSN: 1537-6605.

Abstract | Links | BibTeX

2014

Joo, Jong Wha J; Sul, Jae Hoon ; Han, Buhm ; Ye, Chun ; Eskin, Eleazar

Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Journal Article

In: Genome Biol, 15 (4), pp. R61, 2014, ISSN: 1465-6914.

Abstract | Links | BibTeX

2013

Kostem, Emrah; Eskin, Eleazar

Efficiently Identifying Significant Associations in Genome-Wide Association Studies Conference

Research in Computational Molecular Biology, University of California Springer Berlin Heidelberg, 2013.

Abstract | Links | BibTeX

Kostem, Emrah; Eskin, Eleazar

Efficiently Identifying Significant Associations in Genome-wide Association Studies. Journal Article

In: J Comput Biol, 20 (10), pp. 817-30, 2013, ISSN: 1557-8666.

Abstract | Links | BibTeX

2010

Kang, Eun Yong; Ye, Chun ; Shpitser, Ilya ; Eskin, Eleazar

Detecting the presence and absence of causal relationships between expression of yeast genes with very few samples. Journal Article

In: J Comput Biol, 17 (3), pp. 533-46, 2010, ISSN: 1557-8666.

Abstract | Links | BibTeX

2009

Ye, Chun; Galbraith, Simon J; Liao, James C; Eskin, Eleazar

Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. Journal Article

In: PLoS Comput Biol, 5 (3), pp. e1000311, 2009, ISSN: 1553-7358.

Abstract | Links | BibTeX

2008

Kang, Hyun Min; Ye, Chun ; Eskin, Eleazar

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Journal Article

In: Genetics, 180 (4), pp. 1909-25, 2008, ISSN: 0016-6731.

Abstract | Links | BibTeX

Bibliography

Sequencing with DNA Pools

Our group has recently published several papers on sequencing using DNA pools.  These include two methods for obtaining genotypes from pools(10.1186/1471-2105-12-S6-S2)(10.1109/ACSSC.2012.6489173), a method for correcting for errors when mixing the DNA into pools(10.1007/978-3-642-37195-0_4), and a method for performing association for rare variants when the sequence data is collected using pools(10.1534/genetics.113.150169).

High-throughput sequencing (HTS) technology has decreased the cost of sequencing for one individual tremendously in the past few years, however to perform genome-wide association studies (GWAS) we need to collect large cohorts having the disease (called cases) and cohorts not having the disease (called controls). Unfortunately, performing whole genome sequencing for large cohorts is still very expensive.

The actual cost of sequencing a sample consists of two parts. The first part is the cost of preparing a DNA sample for sequencing which is refereed to as library preparation cost. Library preparation is also the most labor-intensive part of a sequencing study. The second part is the cost of the actual sequencing, which is proportional to the amount of sequence, collected which we refer to as the sequencing per-base cost. Technological advances are rapidly reducing the per-base cost of sequencing while the library preparation costs are more stable (Figure1).

pool-cost

The first step of extracting the DNA and making it ready for sequencing is referred to as library preparation and the second step is to generate the DNA sequence from the pool of individuals. Library preparation is the costly step and labor-intensive compare to the second step.

 

Erlich et al. (10.1101/gr.092957.109) introduced the concept of DNA pooling. The basic idea behind this approach is that DNA from multiple individuals are pooled together into a single DNA mixture which is then prepared as a single library and sequenced. In this approach, the library preparation cost is reduced because one library is prepared per pool instead of one library per sample.

Pooling methods can be split into two categories. The first category puts each individual in only one pool and each pool consist of fixed number of individuals.   These types of methods are referred to as non-overlapping pool methods. The second category puts each individual in multiple pools and use this information to recover each individual’s genotype.  These methods are referred to as overlapping pool methods.

Many studies (10.1101/gr.088559.108), (10.1093/nar/gkq675) (10.1186/1471-2105-12-S6-S2) have shown using overlapping pools we can recover the rare SNPs with high accuracy.  In our work, we develop two methods to detect the genotype of both rare and common variances from pool sequencing (10.1109/ACSSC.2012.6489173). The idea is that we take advantage of genotypes on a subset of the variants which is often available for these cohorts.  Both methods tend to have better accuracy than imputation methods, which is the standard approach to predict the genotypes of variants which were not collected.

Pooling have been successful to detect the rare variants, which is the main reason many GWAS have used pooling to detect the rare casual SNPs ((10.1101/gr.094680.109), (10.1038/ng.952)). However, all these methods make the assumption that all individuals have the same abundance level in the pool. The abundance level for each individual is the fraction of the reads in a pool originated from that specific individual. We show in our paper (10.1007/978-3-642-37195-0_4) that this simple assumption is not true, and ignoring the fact that some individuals can have different abundance level can lead to spurious associations. In our paper, we describe a probabilistic model that can detect the abundance levels of individuals when genotype data on a subset of the variants is available.  Furthermore, we extend the model to the case the genotype of one of individual is missing. We showed leveraging the linkage disequilibrium (LD) pattern decrease the error rate.

Finally, in another recent paper(10.1534/genetics.113.150169), we extend methods for implicating rare variants in disease to data which is collected using DNA sequencing pools.

The full citations of our four papers are below.

1.

Navon, Oron; Sul, Jae Hoon ; Han, Buhm ; Conde, Lucia ; Bracci, Paige ; Riby, Jacques ; Skibola, Christine F; Eskin, Eleazar ; Halperin, Eran

Rare Variant Association Testing Under Low-Coverage Sequencing. Journal Article

In: Genetics, 2013, ISSN: 1943-2631.

Abstract | Links | BibTeX

2.

Eskin, Itamar; Hormozdiari, Farhad ; Conde, Lucia ; Riby, Jacques ; Skibola, Chris ; Eskin, Eleazar ; Halperin, Eran

eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data Conference

Research in Computational Molecular Biology, Tel-Aviv University Springer Berlin Heidelberg, 2013.

Abstract | Links | BibTeX

3.

Hormozdiariy, Farhad; Wang, Zhanyong ; Yang, Wen-Yun - Y; Eskin, Eleazar

Efficient genotyping of individuals using overlapping pool sequencing and imputation Conference

2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), IEEE, 2012, ISBN: 978-1-4673-5051-8.

Abstract | Links | BibTeX

4.

He, Dan; Zaitlen, Noah ; Pasaniuc, Bogdan ; Eskin, Eleazar ; Halperin, Eran

Genotyping common and rare variation using overlapping pool sequencing. Journal Article

In: BMC Bioinformatics, 12 Suppl 6 , pp. S2, 2011, ISSN: 1471-2105.

Abstract | Links | BibTeX

 

 

Bibliography