ZarLab goes to Vancouver for ASHG!


Last week many members of our group traveled to Vancouver, British Columbia, for the annual meeting of the American Society of Human Genetics. The 66th Annual Meeting, which took place October 18-22, 2016, featured over 3000 talks, workshops, and poster presentations on topics such as bioinformatics and computational methods, developmental genetics and gene function, cancer and cardiovascular diseases, evolutionary and population genetics, and genetic counseling.

ZarLab contributed 8 poster presentations and one research talk. Serghei Mangul discussed his recent work on dumpster-diving techniques in a talk titled, “Comprehensive analysis of RNA-sequencing to find the source of every last read across 544 individuals from 53 tissues,” as part of the Interpreting the Transcriptome in Health and Disease symposium. You can view his slides here:

ZarLab in Vancouver!

ZarLab in Vancouver!

Recent alumni Farhad Hormozdiari received a Reviewers’ Choice ribbon for his poster titled, “Joint fine mapping of GWAS and eQTL detects target gene and relevant tissue.” Only the top 10% of posters by topic receive this honor, as determined by the reviewers’ scores of the submitted abstracts. Congratulations, Farhad!

Other posters presented by members of our group:

  • Prevalence of allelic heterogeneity in complex traits. Eleazar Eskin
  • Modeling the covariance of effect sizes in a meta-analysis. Dat Duong
  • Estimating regional heritability in the presence of linkage disequilibrium. Lisa Gai
  • linear mixed models for quantitative traits in health-system scale data. Michael Bilow
  • Utilizing allele specific expression to identify cis-regulatory variants. Jennifer Zou
  • Haplotype-based predictors for complex trait association. Rob Brown
  • Repeat elements expression profile across different tissues in GTEx samples. Harry Yang

HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads

Recent advances in RNA sequencing technology can generate deep coverage data containing millions of reads. RNA-Seq data are used to identify genetic variants and alternatively spliced isoforms, a common mechanism for diversity in a gene, that may play a role in heritable traits and diseases. Using this type of data, connections can be drawn between genetic expression and one of the two parental haplotypes identified in a diploid organism’s transcript. In other words, we can potentially identify the parent from which an individual inherited a group of genes.

These multi-kilobase reads are longer than most transcripts and enable sequencing of complete haplotype isoforms. New computational methods are required for efficient analysis of this highly complex data. In a recent paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a comprehensive method that can accurately reconstruct the haplotype-specific isoforms of a diploid cell. Our software package is the first method capable of reconstructing the haplotype-specific isoforms from long single-molecule reads.

HapIso uses splice mapping of long single-molecule reads to partition reads into two parental haplotypes. The single molecule reads entirely span the RNA transcripts and bridge the single nucleotide variation (SNV) loci across a single gene. To overcome gapped coverage and splicing structures of the gene, the haplotype reconstruction procedure is applied independently to regions of contiguous coverage that have been defined as transcribed segments. Restricted reads from the transcribed regions are partitioned into two local clusters using the 2-mean clustering. Using the linkage provided by the long single-molecule reads, we connect the local clusters into two global clusters. An error-correction protocol is then applied for the reads from the same cluster.

Discriminating the long reads into parental haplotypes allows HapIso to accurately calculate allele-specific gene expression and identify imprinted genes. Additionally, it has a potential to improve detection of the effect of cis– and trans-regulatory changes on gene expression regulation. Long reads allow access to genetic variation in regions previously unreachable by short read protocols and potentially lead to new insights in disease heritability.

We applied HapIso to publicly available single-molecule RNA-Seq data from the GM12878 cell line and circular-consensus (CCS) single-molecule reads generated by Pacific Biosciences platform. Our method discovered novel SNVs in regions that were previously unreachable by standard short read protocols, 53% of which follow Mendelian inheritance. HapIso detected 921 genes with both haplotypes expressed among 9,000 expressed genes. We observed 4,140 heterozygous loci corresponding to positions with non-identical alleles among inferred haplotypes. Additionally, we can theoretically identify recombinations in the transmitted haplotypes by checking the number of recombinations in the inferred haplotypes.

The open source Python implementation of HapIso was developed by Serghei Mangul and Harry (Taegyun) Yang, and the software package is freely available for download at

This paper appears in Proceedings of the International Symposium on Bioinformatics Research and Applications (ISBRA-2016), which can be downloaded here:

Serghei Mangul and Harry Yang led this project, which involved Farhad Hormozdiari. The full citation to our paper is:

Mangul, Serghei ; Yang, Harry ; Hormozdiari, Farhad ; Tseng, Elizabeth ; Zelikovsky, Alex ; Eskin, Eleazar

HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads Book Chapter

In: Bioinformatics Research and Applications, pp. 80-92, Springer International Publishing, 2016.

Links | BibTeX


Overview of HapIso.

B.I.G. Summer in ZarLab

This summer, six young adults engaged in a unique eight-week learning experience with ZarLab, learning practical skills in genomics and bioinformatics while conducting research on large-scale human genetic datasets. These four undergraduate students participated in the Bruins-In-Genomics (B.I.G.) Summer Program, an intensive laboratory and seminar program aimed at providing real-world experience for students who are interested in pursuing interdisciplinary graduate education in the quantitative and biological sciences. In addition, two Los Angeles-area high school students participated in laboratory activities as volunteer researchers.

Eleazar Eskin, co-organizer of the summer program, and Serghei Mangul, post-doctoral scholar, hosted the young scholars in ZarLab, a UCLA computational genetics group affiliated with both the Computer Science Department and the Human Genetics Department. Mangul supervised a group of students who collaborated on a project aimed at developing computational methods for the study of the human immune system and microbiome. Working with data from one of the largest sequencing projects in the world, the Genotype-Tissue Expression (GTEx) study, the students analyzed more than 8,000 samples obtained from 544 individuals and representing 53 different tissue types. In doing so, they gained familiarization with current approaches to studying how changes in our genes contribute to common human diseases.

During a poster session on August 12, 2016, the B.I.G. participants presented the results of their work on GTEx:

  • Jeremy Rotman: “Studying the microbiome by analyzing the coverage of sequencing reads mapped to viruses, eukaryotes, and bacteria”
  • Benjamin Statz: “An improved method for analysis of variable domain of B and T cell receptors”
  • William Van Der Wey: “Functional profiling of microbial communities across multiple human tissues”
  • Kevin Wesel: “Profiling repeat elements across multiple human tissues”

In addition to mentoring B.I.G. Program students in ZarLab, Mangul developed and presented a three-part series of workshops introducing students to UNIX earlier during the program.

Eskin and Mangul also hosted a B.I.G. Program student, Samantha Jenson, who collaborated with Jonathan Flint, a world-renowned authority on the genetics of depression and co-director of UCLA’s Depression Grand Challenge. This year, Eskin facilitated a Neurogenetics working group and weekly neurogenetics seminar series for the B.I.G. Program. Participants in this group gained first-hand experience in the process of developing methods for mapping the underlying genetic causes of Major Depression Disorder. Jenson presented her work on “Structural variant discovery in Major Depression Disorder” during the August 12th poster session.

The annual B.I.G. Program is a collaboration between multiple labs and includes next generation sequencing analysis workshops, weekly science talks by researchers, a weekly student journal club, professional development seminars, social activities, concluding poster sessions, and an optional GRE test prep course. Participants also benefited from relevant workshops and research talks presented during the UCLA Computational Genomics Summer Institute (CGSI).

Congratulations to Benjamin, Jeremy, Kevin, Samantha, and William on their acceptance to and success in the B.I.G. Summer Program!

This slideshow requires JavaScript.

We thank the following generous institutions that made this year’s B.I.G. Summer Program a big success:

  • National Institutes of Health grant MH109172
  • UCOP for a UC-HBCU partnership Program in Genomics and Systems
  • NIH NIBIB for NGS Data Analysis Skills for the Biosciences Pipeline  R25EB022364
  • NIH NIMH for Undergraduate Research Experience in Neuropsychiatric Genomics R25MH109172-01

Learn more about the B.I.G. Program:
UCLA Newsroom: UCLA hosts summer program for future biosciences leaders