Writing Tips: Why we Publish Methods Papers

by Eleazar Eskin

Computational genomics is a field where many diverse academic groups collaborate, each bringing to a project their own distinct academic cultures.  In particular, each academic discipline involved in computational genomics has its own publication strategy in terms of the types of papers they publish and how they package methods and results in these papers.  Publishing papers is extremely important to careers in academia and science, because all scientists are reviewed for tenure or promotion based on our publications records.  An important factor in our review (unfortunately) is the impact factor of the journals that we publish in.  Here, we describe our lab’s publication strategy and the reasoning behind it.

Our lab is a computational lab, and the main contribution of our lab to Bioinformatics is the development of methods for solving important biological problems, particularly in the area of genetics.  These new methods are implemented in software packages that (hopefully) are used by others to enable biological discovery.  Naturally, the key papers our group produces are papers that describe and explain potential applications of these new methods.

Roughly speaking, there are two strategies for publishing methods in our field.  The first is to focus on writing methods papers that are primarily dedicated to describing the computational advances.  The second is to focus on publishing our novel methods as part of more comprehensive papers that present a biological contribution. In this case, our method is primarily described in the supplementary materials. Over the span of my career, I have seen computational researchers receive more pressure to follow the second strategy in order to have papers published in a high impact journal.  Unfortunately, following the second strategy often delays publication (sometimes for years), because peer review often involves applying the method to a new dataset and/or performing extensive functional validation.

Our group primarily follows the first strategy.  In addition, we work with other groups and, as collaborators, publish papers focused on biological contributions.  This strategy works out well for us, and we feel that writing methods-focused papers is the best way for us to make a contribution to science.  We hope that other computational biology groups will follow our example and publish more methods papers.

Here are some of the reasons we feel this is a good strategy:

  1. Doing Justice to our Work. We can fully explain the methods only in papers dedicated to methodology. Since our contribution is methods, the best way to push the science forward is to clearly describe our method and the context of its development and application. In a dedicated paper, we are most likely to have enough space to fully describe the method and explain how the approach works.  Methods papers also have the space (and are typically required) to compare the proposed method with previous methods. This comparison puts the performance of the paper in perspective to the work of others.  Methods papers ideally provide enough details that other groups can build upon our method and compare their results to our published results. Sharing authorship on these papers also allows students who were involved in the development of these methods to demonstrate their strong technical skills.  In my view, computational biologists should be evaluated by the quality and impact of their methodology development and departments when making hiring decisions should consider this impact.  The impact can be measured by the number of users of the software implementing the methods, the number of citations of the papers describing the methods and the discoveries that these methods have enabled.  These factors are more important than the impact factor of the journals where the methods are published.
  1. Self Determination of Publishing. There are no outside bottlenecks preventing us from finishing our papers quickly, and we can control the publication process of our papers. A methods paper is primarily written by members within our lab, and authors evaluate the method using both simulated and established datasets.  This structure means we need not wait for outside collaborators or experiments to finish.  Finishing the paper faster means that have more time to work on new papers.
  1. Increased Number and Improved Quality of Collaborations. The methods paper is a widely-distributed, often freely available, finished product, and many prospective collaborators approach us after reading a paper from our group. More importantly, in our collaborations, we have very little competition over authorship.  Students in the group are happy to work hard on a project just to be in the middle of the collaborative paper, because they already are first author on their own methods papers.  Our methods development students are not competing for credit with the students in the collaborators group.
  1. Project Longevity. Writing a methods paper forces the method to be finished, evaluated, and documented, and publishing the paper forces us to release the software. This process encourages the project to have more longevity. Once the method is fully developed, new students can easily pick up and build upon the previous method.  Once a student leaves the lab, the method can persist with new lab members as it is stable, well-documented, and de-bugged.  Long after they have left the lab, many of the students who wrote methods papers in our group continue to author papers related to applications of their method.

In full disclosure, we do identify one negative aspect of the methods paper publishing strategy.  High impact papers require collaborations, and it is less likely that methods developers can publish high impact journals as a senior or corresponding authors.  While it is less likely to occur, members of our lab do occasionally gain senior authorship in high impact journals through collaboration.  We have found that the combination of methods papers, where you are the senior or first author, and high impact papers, where you have middle authorship and it is clear that your role was the application of the method, is overall a positive outcome and looks good in your publication record.

For example, Eran Halperin and I published a 2004 paper in the lower-impact journal Bioinformatics that described the HAP haplotype phasing method.  The HAP method was later used in a Perlegen-led paper that was published, with Halperin and I as co-authors, in the notably high-impact journal Science. The 2005 Science paper helped me get my job at UCLA; it was clear what my contribution was as I also authored the methods paper in Bioinformatics.

Our lab has produced several other examples of methods papers paired with high-impact collaborations. Kang et al. (2008) presents the EMMA method in Genetics (impact factor of 5.963), and a collaboration with the Jake Lusis group on the HMDP presents results in Genome Research (impact factor of 11.351) (Bennett et al. 2010).  More recently, we published the CAVIAR method (Hormoziari et al., 2014) in Genetics and collaborated with Dan Geschwind’s group in applying the method to a Nature paper (Won et al. 2016).

Citations of papers mentioned in this post:

Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L; Parikshak, Neelroop N; Huang, Jerry; Opland, Carli K; Gandal, Michael J; Sutton, Gavin J; Hormozdiari, Farhad; Lu, Daning; Lee, Changhoon; Eskin, Eleazar; Voineagu, Irina; Ernst, Jason; Geschwind, Daniel H

Chromosome conformation elucidates regulatory relationships in developing human brain. Journal Article

In: Nature, 538 (7626), pp. 523-527, 2016, ISSN: 1476-4687.

Abstract | Links | BibTeX

Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar

Identifying causal variants at Loci with multiple signals of association. Journal Article

In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631.

Abstract | Links | BibTeX

Bennett, Brian J; Farber, Charles R; Orozco, Luz; Kang, Hyun Min; Ghazalpour, Anatole; Siemers, Nathan; Neubauer, Michael; Neuhaus, Isaac; Yordanova, Roumyana; Guan, Bo; Truong, Amy; Yang, Wen-Pin; He, Aiqing; Kayne, Paul; Gargalovic, Peter; Kirchgessner, Todd; Pan, Calvin; Castellani, Lawrence W; Kostem, Emrah; Furlotte, Nicholas; Drake, Thomas A; Eskin, Eleazar; Lusis, Aldons J

A high-resolution association mapping panel for the dissection of complex traits in mice. Journal Article

In: Genome Res, 20 (2), pp. 281-90, 2010, ISSN: 1549-5469.

Abstract | Links | BibTeX

Kang, Hyun Min; Ye, Chun ; Eskin, Eleazar

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Journal Article

In: Genetics, 180 (4), pp. 1909-25, 2008, ISSN: 0016-6731.

Abstract | Links | BibTeX

Hinds, David A; Stuve, Laura L; Nilsen, Geoffrey B; Halperin, Eran ; Eskin, Eleazar ; Ballinger, Dennis G; Frazer, Kelly A; Cox, David R

Whole-genome patterns of common DNA variation in three human populations. Journal Article

In: Science, 307 (5712), pp. 1072-9, 2005, ISSN: 1095-9203.

Abstract | Links | BibTeX

Halperin, Eran; Eskin, Eleazar

Haplotype reconstruction from genotype data using Imperfect Phylogeny. Journal Article

In: Bioinformatics, 20 (12), pp. 1842-9, 2004, ISSN: 1367-4803.

Abstract | Links | BibTeX

Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models

Bioinformatics is a rapidly growing field comprised of multiple academic disciplines. The work of quantitative geneticists is often not well understood by scholars conducting other types of research in Genetics. In response to this information gap, we are launching a series of reviews that are aimed to make common problems in computational biology research accessible to anyone in Genetics. We hope these reviews help researchers in Genetics better understand the scope and applicability of each other’s work, and serve as study guides for students taking college courses on the subject matter.

Today we made available on bioRxiv the first paper in this series, our review of population structure and relatedness in association studies. A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Our review motivates the problem of population structure in association studies using laboratory mouse strains and how it can cause false positives associations. We then motivate mixed models in the context of unmodeled factors.

To read the full review, download our paper: http://biorxiv.org/content/early/2016/12/07/092106.

This review was written by Lana Martin and Eleazar Eskin. We welcome feedback; please e-mail Lana if you have comments or questions: lana [dot] martin [at] ucla [dot] edu.

Body weight phenotypes of 38 inbred mouse strains from the Mouse Phenome Database generated by The Jackson Laboratory. The distribution of mice body weights shows two clades of mice have very different body weights.

UCLA Bioinformatics: The Philosophy of the Ph.D. Program

UCLA | Bioinformatics

(This post is a collaboration between the instructors of the core courses for the UCLA Bioinformatics programs: Eleazar Eskin, Chris Lee, Wei Wang, Bogdan Pasaniuc, Jason Ernst, Sriram Sankararaman, and Jessica Li, along with the current director of the program, Yi Xing.)

Bioinformatics is an interdisciplinary field that combines different aspects of quantitative sciences, such as Computer Science, Statistics, and Mathematics, with biological sciences, such as Molecular Biology and Genetics.  Training programs in quantitative sciences and biomedical sciences have very different cultures and structures, particularly at the doctoral level.  At UCLA, we aim to combine the best of both worlds with the Interdepartmental Bioinformatics Ph.D. program.

We established our Ph.D. program in 2008, and we enroll 6 to 10 Ph.D. students each year. Over 45 faculty specializing in computational and experimental biology are associated with the Bioinformatics Ph.D. program, with active research and education programs spanning biology, mathematics, engineering, and medicine. The program encompasses the breadth of the growing Bioinformatics field by offering courses from over 12 departments.  The Bioinformatics Ph.D. is not housed in any one department but is an Interdepartmental Program (IDP) whose faculty are members of 17 UCLA departments.  The IDP is an administrative unit designed for multidisciplinary academic programs.   This unit also administers the Biomedical Informatics Ph.D. program and will administer the planned Ph.D. program in Systems Biology.

For many aspects of the UCLA Bioinformatics Ph.D. program, we draw upon different ideas from the cultures of Quantitative and Biomedical training programs.

In traditional Biomedical science Ph.D. programs, the majority of a student’s training in applied sciences takes place through mentorship in the laboratory.  Students do take some courses during their first year, but these courses mainly cover recent research in the field and are often team-taught by multiple faculty.  These courses typically require only minimal work outside of class.  During the first year of the Ph.D. program, these students focus on identifying a research lab to join by completing rotations in three labs.  Starting with their second year, students become members of their chosen lab and perform research full time.

On the other hand, in traditional quantitative science Ph.D. programs, the majority of a student’s training takes place didactically through challenging coursework.  In these programs, coursework consumes at least 50% of the student’s time during their first two years.  These intensive courses are usually taught by a single instructor (or sometimes a team of two) and require substantial homework assignments, course projects, and exams.  However, the courses lay a foundation for the technical skills that will become the basis of a student’s future research.  Students admitted to these types of programs are encouraged to join the research lab of a specific professor and start research right away.

Here we describe how we combine these two cultures with the principles and philosophy that guided the design of our Ph.D. Program.

  1. Training in Methodology Development. The UCLA Bioinformatics program is uniquely focused on preparing our students to develop novel methodologies that can contribute to important biological problems.  Students who are interested in methodology development are a great fit for our program.  Our program is able to maintain this focus, because UCLA hosts many other Ph.D. programs that can accommodate students interested in Bioinformatics but prefer a program with a different, sometimes more traditional, focus. These include the recently established Genetics and Genomics Ph.D. program, which has focuses less on methodology development and prioritizes biological discovery.

    UCLA also has a broad set of other Ph.D. programs in quantitative sciences, such as Statistics and Computer Science, which also accommodate students who are interested in Ph.D. research in Bioinformatics  but are primarily interested in a quantitative sciences training program.  UCLA also offers Ph.D.  programs in Biomathematics, Biomedical Informatics and Biostatistics for students interested in other areas of Computational Biology.  In addition, a new graduate program in Systems Biology is being developed in conjunction with the Bioinformatics IDP.  The multitude of programs at UCLA enable students to join a program with similar goals in terms of their training which in turn allows the programs to be organized around these goals.

  2. Our Core Curriculum Provides Rigorous Computational Training. Our core courses are structured in the style of a quantitative Ph.D. program, complete with rigorous training requirements that are met through homework assignments, exams, and course projects. The philosophy behind our courses is to teach fundamental concepts in computation and use Bioinformatics to explore these concepts.

    For this reason, our core courses are rigorous enough to satisfy course requirements in quantitative Ph.D. programs at UCLA, including those for the Computer Science and Statistics graduate programs.  Bioinformatics core courses are taught and administered by faculty who have appointments in these quantitative departments.  Six of the courses are administered by the Computer Science Department, and one by the Statistics Department. Our rigorous core curriculum appeals to students in these programs as well as students in the Bioinformatics Ph.D. program. In fact, the majority of students enrolled our core courses are from quantitative graduate programs. This diversity of academic disciplines brings to these courses a high level of engagement and creativity.

  3. Substantial Didactic Training in Bioinformatics. Similar to a traditional quantitative sciences training program, our program offers a full load of Bioinformatics Courses. Our program includes five core courses that we strongly recommended students take during their first year.  These courses are: Introduction to Bioinformatics (Chris Lee), Algorithms in Bioinformatics (Eleazar Eskin), Methods in Computational Genomics (Jason Ernst and Bogdan Pasaniuc), Statistical Methods in Bioinformatics (Jessica Li), and Computational Genetics (Eleazar Eskin).

    In addition, students are encouraged to take during their second year Machine Learning in Bioinformatics (Sriram Sankararaman) as well as the multiple offerings of Current Topics in Bioinformatics (rotating faculty).  The Current Topics courses cover relevant issues such as Data Mining in Bioinformatics or Advanced Computational Genetics.  We designed the coursework for the UCLA Bioinformatics Ph.D. program so that students can take many skills-building courses comparable to those offered by a traditional quantitative science program.

  4. Rotation Program. Upon entering a Ph.D. program, students typically do not yet know whose lab they want to join. For this reason, we adopt a rotation program styled after typical Biomedical training programs.  Here, students undertake three 10-week rotations; one rotation during each of the three academic quarters of their first year.  Students use a rotation to try out a lab, and decide on a lab to join by the end of their first year in graduate school. Secondary, but important goals of the Rotation Program, are to develop diverse research skills, and to develop a collaborative network that may benefit the doctoral research project and career development.

  5. Seminar Program. An important aspect of Biomedical training programs is the informal training provided during seminars and journal clubs. The UCLA Bioinformatics Ph.D. program leverages informal training with a seminar that students are required to attend for the first two years of the program.  In fact, the weekly Bioinformatics Seminar series has become a key focal point of the UCLA Bioinformatics community.  Students also organize an annual overnight retreat where they share and get feedback on their research.

  6. Research Oriented Written Qualifier. Every Ph.D. program requires completion of a written qualifying exam, which typically occurs after coursework is completed. In traditional biomedical science programs, this exam is often preparation of a grant proposal in a topic of the student’s choice.  In traditional quantitative science Ph.D. programs, this requirement is often a challenging written exam covering topics in coursework. More recently, quantitative Ph.D. programs have abandoned the written qualifier and replaced it with an exam where the students write a paper demonstrating their research skills.

    In the UCLA Bioinformatics Ph.D. program, we have adopted such an exam.  After completion of first year courses, and faculty approval of their project proposal, students are given a one-month period to work independently on the project and to submit a written research paper reporting their results. Faculty in the program review the resulting papers. Although these projects are often small in scope because of the exam’s time constraints, the resulting papers are required to exhibit: 1) high quality in writing, 2) contextualizing the project within existing research, 3) supporting conclusions with chosen experiments, and 4) logical flow of the arguments in the paper.  The idea behind the exam is not to weed out students who cannot pass it, but to set an objective bar for achievement that the students can attain.

Just as Bioinformatics is an interdisciplinary field that combines methods, data, and theories from different academic traditions, the UCLA Bioinformatics Ph.D. is earned in an interdisciplinary program that combines aspects of the training cultures of quantitative and biological sciences. Our unit is a new kind of program that has been specifically designed to administer a rigorous, cross-sectional training in methodology development.