Chromosome conformation elucidates regulatory relationships in developing human brain

Farhad Hormozdiari, a recent ZarLab alumni, contributed to a paper published this week in Nature. Our paper reports new findings on genetic factors related to human cognition and neurodevelopmental disorders, the result of a collaboration with UCLA’s David Geffen School of Medicine and the School of Biotechnology and Biomolecular Sciences at University of New South Wales. Farhad implemented the software package CAVIAR which was utilized to identify the causal variants and interpretation of data.

Neurodevelopmental disorders such as autism and schizophrenia are thought to originate during embryonic development of the cerebral cortex. The project focused on the 3D interactions of genome-wide chromatin contacts, the areas of a cell’s nucleus that package chromosomes into DNA and influence cell replication. Chromatin contacts regulate gene expression in specific tissues, and mapping their interactions within chromosomes provides important biological insights into the malfunctioning gene regulatory mechanisms that drive these disorders.

The project generated high-resolution 3D maps of chromatin contacts active during development of the cortex region of the human brain. These maps enabled a large-scale annotation of previously uncharacterized regulatory mechanisms tied to the evolution of human cognition and disease. Using this data, the paper identified hundreds of genes involved with human cognitive function. Next, the paper integrated chromatin contacts with noncoding variants previously identified in schizophrenia genome-wide association studies (GWAS) and performed several analyses to explore the relationships of interactions between chromatin and biological function.  One of the uses of CAVIAR in the paper was to verify that the causal variants involved in schizophrenia GWAS are in fact compatible with the 3D maps of chromatin contacts.

The paper also found several highly interacting chromatin regions that correlate with levels of gene expression and are associated with promoters, positive transcriptional regulators, and enhances—areas of the genome that shape cell replication and neurological development. The paper identified specific sets of genes enriched in known intellectual disability risk genes, including mutations known to cause autosomal recessive primary microcephaly. The GWAS results identified approximately 500 genome-wide significant schizophrenia-associated loci, about 30% of which interact with schizophrenia SNPs exclusively in developing brain tissue. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene.

This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders. Read the paper for a detailed account of our data, methods, and results: http://www.nature.com/nature/journal/vaop/ncurrent/full/nature19847.html

The CAVIAR program was developed by Farhad Hormozdiari and is freely available for download on the following webpage: http://genetics.cs.ucla.edu/caviar/

The full citation to our paper is:

Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L; Parikshak, Neelroop N; Huang, Jerry; Opland, Carli K; Gandal, Michael J; Sutton, Gavin J; Hormozdiari, Farhad; Lu, Daning; Lee, Changhoon; Eskin, Eleazar; Voineagu, Irina; Ernst, Jason; Geschwind, Daniel H

Chromosome conformation elucidates regulatory relationships in developing human brain. Journal Article

In: Nature, 538 (7626), pp. 523-527, 2016, ISSN: 1476-4687.

Abstract | Links | BibTeX

 

figure

Annotation of schizophrenia-associated loci identified by a GWAS of chromatin contact data.

A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping

Meta-analyses of genome-wide association studies (GWASs) have become essential to identifying new loci associated with human diseases. We recently developed a novel framework that improves the accuracy and power of meta-analyses, which we describe in our recent Human Molecular Genetics paper. This framework can be applied to the fixed effects (FE) model, which assumes that effect sizes of genetic variants are constant across studies, and the random effects (RE) model, which assumes that effect sizes can be different among studies.

Almost all GWAS publications today employ meta-analysis methodologies, the majority of which assume that component studies are independent and that individuals among studies are unrelated. Yet many studies today use shared controls to reduce genotyping or sequencing cost. These “shared control” individuals can inadvertently overlap between multiple studies and, if not accounted for in the methodology, induce false associations in GWAS results. Most meta-analysis tools, including the RE model, cannot account for these overlapping subjects.

In our paper, we propose a general framework for adjusting association statistics to account for overlapping subjects within a meta-analysis. The key idea of our method is to transform the covariance structure of the data so it can be used in methods that strictly assume independence between studies. Specifically, our method decouples dependent studies into independent studies and adjusts association statistics to account for uncertainties in dependent studies. As a result, our approach enables general meta-analysis methods, including the FE and RE models, to account for overlapping subjects. Existing pipelines implementing these models can be reused for dependent studies if our framework is applied at the front end of the analysis procedure.

schema

A simple example of our decoupling approach. Ω and ΩDecoupled are the covariance matrices of the statistics of three studies A, B and C before and after decoupling, respectively. The thickness of the edges denotes the amount of correlation between the studies. After decoupling, the size of the nodes reflects the information that the studies contain in terms of the inverse variance.

We tested our framework for accuracy and power with five simulated datasets, each containing 1000 to 5000 individuals and 10,000 shared controls. A standard approach produced an inflated number of false positive. Our decoupling method, which systemically accounts for overlapping individuals in meta-analysis, and a standard splitting method, which splits controls into individual studies, both correctly controlled for type 1 errors. The advantage of our framework is apparent when assessing power; in one scenario, we gained 25% power in accounting for overlapping subjects with the decoupling when compared to the splitting method.

Next, we assessed the potential of our framework in identifying casual loci shared by multiple diseases and leveraging information from multiple tissues to increase power for eQTL identification. The decoupling and splitting methods controlled false-positive rates and produced significant p-values at several previously identified candidate shared loci among the three autoimmune conditions present in the Wellcome Trust Case Control Consortium (WTCCC) data. In comparison to the splitting method, our decoupling framework increased the significance of p-values in the shared loci test and increased the number of discovered eQTLs by 19%.

Our approach is flexible and allows many meta-analysis methods, such as the RE model, to account for dependency between studies and overlapping subjects. We developed this approach to complement standard software packages in the meta-analysis of GWAS. This project was led by Buhm Han and involved Dat Duong and Jae Hoon Sul. The article is available at:
https://www.ncbi.nlm.nih.gov/pubmed/26908615

The full citation to our paper is:

Han, Buhm; Duong, Dat; Sul, Jae Hoon; de Bakker, Paul I W; Eskin, Eleazar; Raychaudhuri, Soumya

A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping. Journal Article

In: Hum Mol Genet, 2016, ISSN: 1460-2083.

Abstract | Links | BibTeX

Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models

This year, our group published a paper in PLOS Genetics that describes our efforts to better understand and correct for population structure when computing gene-by-environment (GEI) statistics in genome-wide association studies (GWASs). We use simulated and actual GWAS datasets to demonstrate that population structure, the relatedness of individuals within a cohort, inflates test statistics for both GEIs and genetic variants. We present a novel mixed model method capable of improving accuracy when computing GEI statistics in GWAS. This method can be efficiently applied to GWAS datasets containing thousands of individuals and hundreds of thousands of SNPs.

GWASs have discovered many genetic variants associated with complex traits and diseases, yet these genetic variants explain only a small fraction of phenotypic variance in the human genome. Other sources of phenotypic variance include discrete environmental factors and GEIs, complex interactions between an individual’s genetic material and environmental factors. Recent GEI association analyses have demonstrated the importance of GEIs in complex traits and disease development. Identification of these causal GEIs would provide insight into disease pathways, particularly the effects of environmental factors in disease risk, and guide development of novel diagnostic tools and personalized therapies.

Several methodological challenges have limited successful identification of causal GEIs. As with standard GWAS approaches, GxE GWASs are prone to produce an inflated number of associations due to population structure. Unlike standard GWASs, we lack a method designed to avoid detection of these spurious associations when computing GEI statistics. Accounting for genetic similarity with a standard GWAS approach does control inflation of test statistics for causal SNPs, but does not control inflation of associated GEIs. Simultaneously accounting for both similarities would control both types of population structure known to confound GWASs—false associations caused by SNPs under selection and those caused by the remaining SNPs.

Our linear mixed model approach introduces two random effects and takes into account two types of similarities between individuals: overlap in the genome itself and overlap in genetic expression caused by complex interactions between genes and environment. We use a pair of kinship matrices corresponding to the two types of similarity to include these two random effects in the model and correct for population structure.

In order to better understand false associations in GxE GWASs, we compare our approach to two standard approaches. We apply the three methods to two large genomic datasets, one human and one mouse, that are known to contain population structure and have many quantitative phenotypes to test effect of GEIs. We use a standard GWAS method that does not correct for population structure (defined as “OLS” in our paper) and an approach that performs population structure correction for only SNP statistics (“One RE”). The last approach is our proposed mixed model approach that uses both genetic and GxE kinship to correct for population structure on both SNP and GEI statistics (“Two RE”).

journal-pgen-1005849-g004

Distribution of inflation factors of GEI statistics on HMDP GxE GWAS data. (A) Inflation factor for each phenotype with no population structure correction (OLS), population structure correction for SNP statistics (One RE), and population structure correction for both SNP and GEI statistics (Two RE). (B) QQ plot of one of the phenotypes (free fatty acids, ffa), showing the distributions of p-values of GEI statistics for the three methods.

In both datasets, even a moderate amount of population structure causes spurious GEIs when using standard approaches for identifying GEI in GWAS. While the One RE approach reduces inflation of test statistics on SNPs (see Supplement S1 Figure), it has almost the same or slightly higher inflation factors on GxE statistics when compared to OLS. Results from both datasets suggest that our approach effectively controls population structure when computing statistics for GEIs and genetic variants. We hope our method is useful advancing our understanding of how life-history influences an individual’s disease risk.

This project was led by Jae Hoon Sul and involved Michael Bilow. The article is available at: http://dx.doi.org/10.1371/journal.pgen.1005849

The full citation to our paper is: 

Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun Y; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. Journal Article

In: PLoS Genet, 12 (3), pp. e1005849, 2016, ISSN: 1553-7404.

Abstract | Links | BibTeX

This approach uses our PyLMM software package available for download at: http://genetics.cs.ucla.edu/pylmm/.