Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing

In a project led by Serghei Mangul, members of our lab recently developed and tested a novel computational method that uses regular RNA-Seq data to rapidly and accurately profile the human immune system. Mangul and his collaborators, including UCLA graduate student Harry (Taegyun) Yang and 2016 B. I. G. Summer undergraduate participants Jeremy Rotman, Benjamin Statz, and Will Van Der Wey, recently published their results in a paper on bioRxiv.

Discoveries in human immunology and advancements in development of treatments for many common human diseases depend on detailed reconstructions of the adaptive immune repertoire. The “adaptive” immune repertoire recognizes pathogens and toxins that the “innate” defense system misses. Assay-based genetic studies provide a detailed view of these adaptive systems by profiling the genetic expression and repertoires of B and T cell receptors. Assay-based approaches have accurately characterized the immune repertoire of peripheral blood.

However, these methods are expensive and smaller in scale when compared to standard RNA sequencing (RNA-seq). Characterizing the immunological repertoires of other tissues, including barrier tissues like skin and mucosae, requires large-scale study. RNA-Seq can capture the entire cellular population of a sample, including B and T cell and their receptors.

ImReP is the first method to efficiently extract B and T cell receptor derived reads from RNA-Seq data, accurately assemble CDR3 sequences, the most variable regions of these receptors, and determine their antigen specificity. Mangul and his team used simulated data to test the feasibility of using RNA-Seq to study the adaptive immune repertoire. ImReP is able to identify 99% CDR3-derived reads from the RNA-Seq mixture, suggesting it is a powerful tool for profiling RNA-Seq samples of immune-related tissues.

They also compared methods and investigated the sequencing depth and read length required to reliably assemble B and T cell receptor sequences from RNA-Seq data. ImReP consistently outperformed existing methods in both recall and precision rates for the majority of simulated parameters. Notably, ImReP was the only method with acceptable performance at 50bp read length, reconstructing with higher precision rate significantly more CDR3 clonotypes.

Mangul and his team applied ImReP to 8,555 samples across 544 individuals from 53 tissues obtained from Genotype-Tissue Expression study (GTEx v6). The data was derived from 38 solid organ tissues, 11 brain subregions, whole blood, and three cell lines. ImRep identified over 26 million reads overlapping 3.8 million distinct CDR3 sequences that originate from diverse human tissues.

Using ImReP, they created a systematic atlas of immunological sequences for B and T cell repertoires across a broad range of tissue types, most of which were not previously studied for B and T cell repertoires. They also examined the compositional similarities of clonal populations between tissues to track the flow of B and T clonotypes across immune-related tissues, including secondary lymphoid and organs encompassing mucosal, exocrine, and endocrine sites.

Advantages of using RNA-Seq to study immune repertoires include the ability to simultaneously capture both B and T cell clonotype populations during a single run, simultaneously detect overall transcriptional responses of the adaptive immune system, and scaling up the atlas of B and T cell receptors that will provide valuable insights into immune responses across various autoimmune diseases, allergies, and cancers.

Read more about ImReP in the full article, which is available for download on bioRxiv

ImReP was created by Igor Mandric and Serghei Mangul. ImReP is freely available at:

The atlas of T and B cell receptors, the largest collection of CDR3 sequences and tissue types, is freely available at This resource has potential to enhance future studies in areas such as immunology and advance development of therapies for human diseases.

The full citation to our paper is:

Mangul, S., Mandric, I., Yang, H.T., Strauli, N., Montoya, D., Rotman, J., Van Der Wey, W., Ronas, J.R., Statz, B., Zelikovsky, A. and Spreafico, R., 2016. Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing. bioRxiv, p.089235.


Figure 1. Overview of ImReP.

Figure 1. Overview of ImReP. (See full paper for details.)


Figure 6. Flow of T and B cell clonotypes across diverse human tissues.

Figure 6. Flow of T and B cell clonotypes across diverse human tissues. (See full paper for details.)


ZarLab goes to Vancouver for ASHG!


Last week many members of our group traveled to Vancouver, British Columbia, for the annual meeting of the American Society of Human Genetics. The 66th Annual Meeting, which took place October 18-22, 2016, featured over 3000 talks, workshops, and poster presentations on topics such as bioinformatics and computational methods, developmental genetics and gene function, cancer and cardiovascular diseases, evolutionary and population genetics, and genetic counseling.

ZarLab contributed 8 poster presentations and one research talk. Serghei Mangul discussed his recent work on dumpster-diving techniques in a talk titled, “Comprehensive analysis of RNA-sequencing to find the source of every last read across 544 individuals from 53 tissues,” as part of the Interpreting the Transcriptome in Health and Disease symposium. You can view his slides here:

ZarLab in Vancouver!

ZarLab in Vancouver!

Recent alumni Farhad Hormozdiari received a Reviewers’ Choice ribbon for his poster titled, “Joint fine mapping of GWAS and eQTL detects target gene and relevant tissue.” Only the top 10% of posters by topic receive this honor, as determined by the reviewers’ scores of the submitted abstracts. Congratulations, Farhad!

Other posters presented by members of our group:

  • Prevalence of allelic heterogeneity in complex traits. Eleazar Eskin
  • Modeling the covariance of effect sizes in a meta-analysis. Dat Duong
  • Estimating regional heritability in the presence of linkage disequilibrium. Lisa Gai
  • linear mixed models for quantitative traits in health-system scale data. Michael Bilow
  • Utilizing allele specific expression to identify cis-regulatory variants. Jennifer Zou
  • Haplotype-based predictors for complex trait association. Rob Brown
  • Repeat elements expression profile across different tissues in GTEx samples. Harry Yang

HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads

Recent advances in RNA sequencing technology can generate deep coverage data containing millions of reads. RNA-Seq data are used to identify genetic variants and alternatively spliced isoforms, a common mechanism for diversity in a gene, that may play a role in heritable traits and diseases. Using this type of data, connections can be drawn between genetic expression and one of the two parental haplotypes identified in a diploid organism’s transcript. In other words, we can potentially identify the parent from which an individual inherited a group of genes.

These multi-kilobase reads are longer than most transcripts and enable sequencing of complete haplotype isoforms. New computational methods are required for efficient analysis of this highly complex data. In a recent paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a comprehensive method that can accurately reconstruct the haplotype-specific isoforms of a diploid cell. Our software package is the first method capable of reconstructing the haplotype-specific isoforms from long single-molecule reads.

HapIso uses splice mapping of long single-molecule reads to partition reads into two parental haplotypes. The single molecule reads entirely span the RNA transcripts and bridge the single nucleotide variation (SNV) loci across a single gene. To overcome gapped coverage and splicing structures of the gene, the haplotype reconstruction procedure is applied independently to regions of contiguous coverage that have been defined as transcribed segments. Restricted reads from the transcribed regions are partitioned into two local clusters using the 2-mean clustering. Using the linkage provided by the long single-molecule reads, we connect the local clusters into two global clusters. An error-correction protocol is then applied for the reads from the same cluster.

Discriminating the long reads into parental haplotypes allows HapIso to accurately calculate allele-specific gene expression and identify imprinted genes. Additionally, it has a potential to improve detection of the effect of cis– and trans-regulatory changes on gene expression regulation. Long reads allow access to genetic variation in regions previously unreachable by short read protocols and potentially lead to new insights in disease heritability.

We applied HapIso to publicly available single-molecule RNA-Seq data from the GM12878 cell line and circular-consensus (CCS) single-molecule reads generated by Pacific Biosciences platform. Our method discovered novel SNVs in regions that were previously unreachable by standard short read protocols, 53% of which follow Mendelian inheritance. HapIso detected 921 genes with both haplotypes expressed among 9,000 expressed genes. We observed 4,140 heterozygous loci corresponding to positions with non-identical alleles among inferred haplotypes. Additionally, we can theoretically identify recombinations in the transmitted haplotypes by checking the number of recombinations in the inferred haplotypes.

The open source Python implementation of HapIso was developed by Serghei Mangul and Harry (Taegyun) Yang, and the software package is freely available for download at

This paper appears in Proceedings of the International Symposium on Bioinformatics Research and Applications (ISBRA-2016), which can be downloaded here:

Serghei Mangul and Harry Yang led this project, which involved Farhad Hormozdiari. The full citation to our paper is:

Mangul, Serghei ; Yang, Harry ; Hormozdiari, Farhad ; Tseng, Elizabeth ; Zelikovsky, Alex ; Eskin, Eleazar

HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads Book Chapter

In: Bioinformatics Research and Applications, pp. 80-92, Springer International Publishing, 2016.

Links | BibTeX


Overview of HapIso.