Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX

Serghei Mangul and Lana Martin, together with Alexander Hoffmann, Matteo Pellegrini, and Eleazar Eskin, recently published a paper describing a workshop model for training scientists, who have no computer science background, to use UNIX. Our paper is available online as a preprint and will appear in an upcoming “Scientific Life” section of Trends in Biotechnology.

Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. Serghei developed a series of workshops in response to growing demand for life and medical science researchers to analyze their own data using the command line.

Administered by UCLA’s Institute for Quantitative and Computational Biosciences (QCBio), these workshops are designed to help life and medical science researchers use applications that lack a graphical interface. Our paper presents a training model for these workshops—a flexible approach that can be implemented at any institution to teach use of command-line tools when the learner has little to no prior knowledge of UNIX.

QCBio currently offers similar workshops to the UCLA community. In tandem with this publication, we created an online catalogue of resources and papers aimed to provide first-time learners with basic knowledge of command line:

We encourage fellow instructors of Bioinformatics, as well as scientists who are new learners of the command line, to read our paper and share their thoughts! Email us at: lana [dot] martin [at] ucla [dot] edu.


The full citation of our paper:
Mangul, Serghei, Martin, Lana S., Hoffmann, Alexander, Pellegrini, Matteo, and Eskin, Eleazar. Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX. Trends in Biotechnology; doi: 10.1016/j.tibtech.2017.06.007.

Advance preprint copies of our paper may be downloaded here:

Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models

Bioinformatics is a rapidly growing field comprised of multiple academic disciplines. The work of quantitative geneticists is often not well understood by scholars conducting other types of research in Genetics. In response to this information gap, we are launching a series of reviews that are aimed to make common problems in computational biology research accessible to anyone in Genetics. We hope these reviews help researchers in Genetics better understand the scope and applicability of each other’s work, and serve as study guides for students taking college courses on the subject matter.

Today we made available on bioRxiv the first paper in this series, our review of population structure and relatedness in association studies. A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Our review motivates the problem of population structure in association studies using laboratory mouse strains and how it can cause false positives associations. We then motivate mixed models in the context of unmodeled factors.

To read the full review, download our paper:

This review was written by Lana Martin and Eleazar Eskin. We welcome feedback; please e-mail Lana if you have comments or questions: lana [dot] martin [at] ucla [dot] edu.

Body weight phenotypes of 38 inbred mouse strains from the Mouse Phenome Database generated by The Jackson Laboratory. The distribution of mice body weights shows two clades of mice have very different body weights.