UCLA Undergraduate Bioinformatics: The program and philosophy

Bioinformatics is an important interdisciplinary research area with tremendous opportunities in graduate training and industry employment.  Yet, few academic institutions offer undergraduate programs designed to prepare students for opportunities in Bioinformatics.

The UCLA Undergraduate Bioinformatics Minor is an academic program established in Fall 2012 at UCLA.  Undergraduates in any Major can obtain a Bioinformatics Minor by completing an additional 8 courses. Since Fall 2012, approximately 80 students have joined the Minor program. These students represent Majors in over a dozen UCLA departments, including: Computer Science; Chemistry; Molecular, Cell, & Developmental Biology; Microbiology, Immunology, and Molecular Genetics; Ecology and Evolutionary Biology; and Computational and Systems Biology.

Over 45 faculty specializing in computational and experimental biology are associated with the Bioinformatics Minor, spanning the fields of biology, mathematics, engineering, and medicine. Course offerings from more than 12 unique departments allow the Minor program to encompass the breadth of the growing Bioinformatics field.

Here we describe the principles and philosophy that guided the design of our Minor.

  1. Our Core Bioinformatics Courses Teach Interdisciplinary Computation. The foundation of our program is the cluster of three integrated core courses in Bioinformatics. These courses are truly interdisciplinary; they satisfy elective requirements in multiple departments and recruit students from different Majors to the Minor program. These core courses build upon the philosophy that students must first learn fundamental concepts in computation in order to later explore problems in Bioinformatics.  These courses offer basic skills and appeal to many students beyond those interested in Bioinformatics.
  1. Rigorous Background in Computation. To be successful in Bioinformatics, students must have a solid background in both computation and Biology. Our core courses require as prerequisites a substantial background in computation and statistics. To enter the Minor, we require that students have completed one year of programming and one upper division Statistics course.  To complete the Minor, our students take Linear Algebra and one upper division course on Algorithms taught by the Computer Science or Math Department.  Our students also take a Molecular Biology course taught by the Life Sciences Department. We believe that it is important for faculty in Computer Science and Program in Computing to teach programming, and for faculty in the Life Sciences to teach Biology. Further, it is important for students to take the same programming classes as do their peers in Engineering majors, and for students to take the same Biology classes alongside their peers in Life Sciences.
  1. The Bioinformatics Minor Builds upon the Students’ Major. Every student graduating from UCLA with a Bioinformatics Minor also completes an academic Major program. While we do adjust the Minor curriculum to help students efficiently complete both their Major and Minor requirements within 4 years, each of our graduates has exactly the same amount of training in their Major as fellow Majors who are not in the Minor.  This avoids a common pitfall in interdisciplinary education: students only receive a superficial background in each academic area.
  1. Bioinformatics is a Research Oriented Field. Our Minor is closely integrated with our undergraduate research program, which places students in the labs of Bioinformatics faculty. Most of the Bioinformatics Minors at UCLA are working in a research lab.  Undergraduates are strongly encouraged to engage in research. The Minor allows for a substantial amount of research credits, an allowance that helps students complete their Major and Minor requirements in four years.  In addition, many of our undergraduates participate in the Bruins-in-Genomics Summer (B.I.G. Summer) program or similar undergraduate education experience summer programs.
  1. Bioinformatics is an Increasingly Diverse Field. The core courses in Bioinformatics are designed to be interesting and accessible to students from a wide variety of educational backgrounds. Each course typically has enrollment approaching 100. Far more students who are not in the Bioinformatics Minor take these courses as electives to fulfill their Major requirements. Student enthusiasm is high for these accessible interdisciplinary courses that combine computational sciences and Biology. We find that this approach boosts broader undergraduate engagement in the field and encourages students from traditionally underrepresented groups to pursue research, graduate school, or careers in STEM fields.
  1. Let Excitement Foster Program Growth. Bioinformatics is an exciting area, and specialized training is critical for the next generation of biomedical researchers. However, undergraduate Bioinformatics programs, when offered by a college or university, are typically quite small.  Such programs are often limited in size and engagement as students are unaware of the field or become aware of Bioinformatics late in their college career. We strategized the Bioinformatics Minor program at UCLA specifically to attract students at any stage of their college career and to maximize curricular flexibility so students can easily complete Minor requirements. Many students are attracted to the Minor when they enroll in Bioinformatics core courses to fulfill elective requirements for their Major; some develop a keen interest in the field and then join the Minor. Even students who are unable to complete all Minor requirements benefit from our program; they complete key coursework and join a research lab, gaining knowledge and experience crucial for gaining employment or admission to graduate school.

Our current goal for the Bioinformatics Minor is to graduate 50 students per year.  We hope that 10 to 20 of them will enter graduate studies in Bioinformatics.  We are not there yet, but are growing. This year, around 10 graduates applied to Ph.D. programs in Bioinformatics.  Many of our students recently began or are applying to Ph.D. programs in Bioinformatics and related areas.  We expect that they will do very well in the admissions process and have great backgrounds for starting Ph.D. study in Bioinformatics.

bioinformatics-minor-graphical-element-the-minor

Read more about the Bioinformatics Minor on the official website:
http://bioinformatics.ucla.edu/undergradute-bioinformatics-minor/

Check out a list of research opportunities available for undergrads at UCLA:
http://bioinformatics.ucla.edu/undergraduate-research/

Learn more about 2016 undergraduate research and B.I.G. Summer activities at ZarLab:
zarlab.cs.ucla.edu/b-i-g-summer-in-zarlab/

Applications to the 2017 B.I.G. Summer program are due January 27:
http://qcb.ucla.edu/big-summer/

Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing

In a project led by Serghei Mangul, members of our lab recently developed and tested a novel computational method that uses regular RNA-Seq data to rapidly and accurately profile the human immune system. Mangul and his collaborators, including UCLA graduate student Harry (Taegyun) Yang and 2016 B. I. G. Summer undergraduate participants Jeremy Rotman, Benjamin Statz, and Will Van Der Wey, recently published their results in a paper on bioRxiv.

Discoveries in human immunology and advancements in development of treatments for many common human diseases depend on detailed reconstructions of the adaptive immune repertoire. The “adaptive” immune repertoire recognizes pathogens and toxins that the “innate” defense system misses. Assay-based genetic studies provide a detailed view of these adaptive systems by profiling the genetic expression and repertoires of B and T cell receptors. Assay-based approaches have accurately characterized the immune repertoire of peripheral blood.

However, these methods are expensive and smaller in scale when compared to standard RNA sequencing (RNA-seq). Characterizing the immunological repertoires of other tissues, including barrier tissues like skin and mucosae, requires large-scale study. RNA-Seq can capture the entire cellular population of a sample, including B and T cell and their receptors.

ImReP is the first method to efficiently extract B and T cell receptor derived reads from RNA-Seq data, accurately assemble CDR3 sequences, the most variable regions of these receptors, and determine their antigen specificity. Mangul and his team used simulated data to test the feasibility of using RNA-Seq to study the adaptive immune repertoire. ImReP is able to identify 99% CDR3-derived reads from the RNA-Seq mixture, suggesting it is a powerful tool for profiling RNA-Seq samples of immune-related tissues.

They also compared methods and investigated the sequencing depth and read length required to reliably assemble B and T cell receptor sequences from RNA-Seq data. ImReP consistently outperformed existing methods in both recall and precision rates for the majority of simulated parameters. Notably, ImReP was the only method with acceptable performance at 50bp read length, reconstructing with higher precision rate significantly more CDR3 clonotypes.

Mangul and his team applied ImReP to 8,555 samples across 544 individuals from 53 tissues obtained from Genotype-Tissue Expression study (GTEx v6). The data was derived from 38 solid organ tissues, 11 brain subregions, whole blood, and three cell lines. ImRep identified over 26 million reads overlapping 3.8 million distinct CDR3 sequences that originate from diverse human tissues.

Using ImReP, they created a systematic atlas of immunological sequences for B and T cell repertoires across a broad range of tissue types, most of which were not previously studied for B and T cell repertoires. They also examined the compositional similarities of clonal populations between tissues to track the flow of B and T clonotypes across immune-related tissues, including secondary lymphoid and organs encompassing mucosal, exocrine, and endocrine sites.

Advantages of using RNA-Seq to study immune repertoires include the ability to simultaneously capture both B and T cell clonotype populations during a single run, simultaneously detect overall transcriptional responses of the adaptive immune system, and scaling up the atlas of B and T cell receptors that will provide valuable insights into immune responses across various autoimmune diseases, allergies, and cancers.

Read more about ImReP in the full article, which is available for download on bioRxivhttp://biorxiv.org/content/early/2016/11/22/089235.article-metrics

ImReP was created by Igor Mandric and Serghei Mangul. ImReP is freely available at: https://sergheimangul.wordpress.com/imrep/

The atlas of T and B cell receptors, the largest collection of CDR3 sequences and tissue types, is freely available at https://sergheimangul.wordpress.com/atlas-immune-repertoires/. This resource has potential to enhance future studies in areas such as immunology and advance development of therapies for human diseases.

The full citation to our paper is:

Mangul, S., Mandric, I., Yang, H.T., Strauli, N., Montoya, D., Rotman, J., Van Der Wey, W., Ronas, J.R., Statz, B., Zelikovsky, A. and Spreafico, R., 2016. Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing. bioRxiv, p.089235.

 

Figure 1. Overview of ImReP.

Figure 1. Overview of ImReP. (See full paper for details.)

 

Figure 6. Flow of T and B cell clonotypes across diverse human tissues.

Figure 6. Flow of T and B cell clonotypes across diverse human tissues. (See full paper for details.)

 

Colocalization of GWAS and eQTL Signals Detects Target Genes

Farhad Hormozdiari recently developed a method for combining genome-wide association studies (GWASs) and quantitative trait loci (eQTL) studies in a statistical framework that quantifies the probability of each variant to be causal while allowing an arbitrary number of causal variants. Together with collaborators at the University of Oxford and Broad Institute of MIT and Harvard, we present a paper in The American Journal of Human Genetics. Here, we describe eQTL and GWAS CAusal Variants Identification in Associated Regions (eCAVIAR). We apply our approach to datasets from several GWASs and eQTL studies in order to assess its accuracy and potential contributions to colocalization and fine-mapping.

Integrating GWASs and eQTL studies is a promising way to explore the mechanism of non-coding variants on diseases. Integration of GWAS and eQTL data is challenging due to the uncertainty induced by linkage disequilibrium (LD), the non-random association of alleles at different loci, and presence of loci that harbor multiple causal variants (allelic heterogeneity). Current methods assume that each locus contains a single causal variant and expect loci to be independent and associated randomly.

eCAVIAR is a novel probabilistic model for integrating GWAS and eQTL data that extends the CAVIAR (Hormozdiari et al. 2014) framework to explicitly estimate the posterior probability of the same variant being causal in both GWAS and eQTL studies, while accounting for allelic heterogeneity and LD. Our approach can quantify the strength between a causal variant and its associated signals in both studies, and it can be used to colocalize variants that pass the genome-wide significance threshold in GWAS. For any given peak variant identified in GWAS, eCAVIAR considers a collection of variants around that peak variant as one single locus.

We apply eCAVIAR to the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) dataset and GTEx dataset to detect the target gene and most relevant tissue for each GWAS risk locus. When applied to the MAGIC dataset’s 2 phenotypes, eCAVIAR identifies genetic variants that are causal in both eQTL and GWAS. Further, eCAVIAR detects a large number of loci where the GWAS causal variants are clearly distinct from the causal variants in the eQTL data. Interestingly, eCAVIAR also identifies genes that colocalize in one tissue yet can be excluded in others. For the majority of loci in which we identify a single variant causal for both GWAS and eQTL, eCAVIAR implicates more than one causal variant across the 45 tissues.

We observe that eCAVIAR outperforms existing methods even when there are different values of non-colocalization. Using simulated datasets, we compared accuracy, precision, and recall rate of eCAVIAR to RTC (Nica et al. 2010) and COLOC (Giambartolomei et al. 2014), two current methods for eQTL and GWAS colocalization. Our results show that eCAVIAR has high confidence for selecting loci to be colocalized between the GWAS and eQTL data and is conservative in selecting a locus to be colocalized.

We hope that future applications of eCAVIAR will advance identification of specific GWAS loci that share a causal variant with eQTL studies in a tissue, thus providing insight into presently unclear disease mechanisms.

Figure2

Overview of eCAVIAR.

 

eCAVIAR was created by Farhad Hormozdiari, Ayellet V. Segre, Martijn van de Bunt, Xiao Li, Jong Wha J Joo, Michael Bilow, Jae Hoon Sul, Bogdan Pasaniuc and Eleazar Eskin. The article is available at: http://www.cell.com/ajhg/abstract/S0002-9297(16)30439-6.

Visit the following page to download CAVIAR and eCAVIAR: http://genetics.cs.ucla.edu/caviar/

The full citation to our paper is:

Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet; Li, Xiao; Joo, Jong Wha; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar (2016): Colocalization of GWAS and eQTL Signals Detects Target Genes.. In: Am J Hum Genet, 2016, ISSN: 1537-6605. (Type: Journal Article | Abstract | Links | BibTeX)

Our paper builds upon a method introduced in a previous publication:

Hormozdiari, Farhad; Kostem, Emrah; Kang, Eun Yong; Pasaniuc, Bogdan; Eskin, Eleazar (2014): Identifying causal variants at Loci with multiple signals of association.. In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631. (Type: Journal Article | Abstract | Links | BibTeX)