Writing Tips: Improving Clarity on the Sentence Level

(This post is authored by Lana Martin.)

Clarity is especially important when writing scientific methods papers, proposal, and reports. Journal referees and grant reviewers typically read many submissions in one sitting; they expect to quickly and easily understand the mechanics, significance, and potential contributions of your work. Once a project is published, readers expect to quickly and easily understand how they can use and apply your method in their own work.

Improving clarity of writing is an iterative process that involves a lot of practice in writing, editing one’s own writing, and editing the writing of others. Clear, orderly writing is not a natural tendency for most of us because we don’t normally speak that way in conversation! Similarly, academic specialization leaves us in the dark concerning the amount of detail necessary to make a piece accessible to a broader audience. For most people, developing an intentional practice around routine writing tasks is necessary in order to improve writing skills.

The first draft of any document can always be improved with multiple editing passes. One strategy to improve editing efficiency is to designate each editing pass to a specific editing component, keeping in mind your own personal weak areas. For example, you may first clean up mechanical errors such as spelling and grammar. Second, you may re-write sentences while considering a specific list of writing principles. Finally, editing for over-all cohesion and completeness of ideas can be easier once you have clean copy to work with.

Here, we present five principles for clear writing on the sentence level. These guidelines are universal, yet particularly relevant to scientific and technical non-fiction writing.

1. Directly modify a verb. Often, when describing an action, our default inclination is to add a verb modifier later in the sentence—well after the verb appears. For example:

“…considering simultaneously the population structure…”
“…considering population structure simultaneously…”

This structure makes sense in conversation, because you can emphasize how you did something with tone and inflection. We read in a more linear fashion than we speak; in writing, consistently placing the adverb before the verb makes it clear to the reader which action the modifier belongs to. Consistently ordering verbs and verb modifiers is especially useful when listing a series of actions that are each modified differently, such as in a protocol.

“…simultaneously considering population structure…”

 

2. Front-load the star topic of a sentence. Another habit that we carry from conversation to writing is to bury the most important part of a sentence at the end. This may cause the reader, particularly those less familiar with your subject matter, to re-read the sentence. Here, the specific concept—the star topic of the sentence—follows the general concept:

 “As a result, a large number of false discoveries may be found in the common case where the cell type composition is correlated with the phenotype.”

When reading about a methodology problem, we usually want to first know what is specifically interesting about a concept, and then learn about the concept’s significance on a larger scale. These “flipped” sentences are common in first drafts and can be easily edited in a single pass.

“As a result, the cell type composition is commonly correlated with the phenotype, and the methods produce a large number of false discoveries.”

 

3. Refine use of the dependent clause. A dependent clause is a group of words with a subject and a verb; alone, it is not a complete sentence and does not express a complete thought. We tend to use dependent clauses in writing because we tend to use dependent clauses in our own thought processes. This may suffice for problem-solving in our head-space vacuum, but, in order to effectively communicate with other people, we must completely describe these ideas in writing.

For example, this statement has two dependent clauses next to each other:

“Detecting allelic heterogeneity in regions that are more complicated is not intuitive.”

Given the provided information, the object of “more complicated” and/or “less intuitive” may not be clear. Adding a conjunction (“that”) between the two clauses clarifies that detection is the object of “less intuitive,” and regions is the object of “more complicated.”

“Detecting allelic heterogeneity is less intuitive in regions that are more complicated.”

 

4. Replace a vague dependent clause with a compound sentence. Dependent clauses help present contrasts by defining the scope in which the given statement is valid, but they can also be vague and confusing. For example:

In contrast to Mendelian traits, the extent of AH at loci contributing to common, complex disease is almost unknown.”

When reading scientific and technical writing, we want to see contrasts clearly described—especially for readers who may not have an in-depth understanding of the background concepts. We, as specialists, may not clearly define these concepts because we are not accustomed to working our way through the logic of fundamental ideas. Re-engineering the overly vague clause with a compound sentence can efficiently get the novice reader on the same page as the expert reader. The dependent clause is now a complete thought that stands on its own:

The genetic causes of Mendelian traits are well understood, but the extent of AH at loci contributing to common, complex disease is almost unknown.”

 

5. Add, remove, or modify an article used before a noun. An article is a word (the, a, an) that is placed before a noun to indicate the type of reference being made by the noun. The use of articles is tricky and, at times, a matter of stylistic choice. However, in scientific and technical writing, there are a few best practices for using articles to improve clarity. For example, articles can specify the volume or numerical scope of the noun. When articles are used to clarify numerical scope, first decide if the noun is one (singular) or many (plural), then choose to include or omit the appropriate article.

Use the definite article “the” when you are referring to the one unique item or set of items. In descriptions of methodology, this type of article is commonly used to signal that the noun is a general concept, a broad system, or a one-and-only example.

Immunological properties is a general concept, which the author may separately define in detail:

“…the immunological properties of a B cell receptor…”

Adaptive immune system is a broad system comprised of many parts:

“A key function of the adaptive immune system is…”

GTeX v6 project is one-and-only; future GTeX will presumably be v7!

“…the Genotype Tissue Expression (GTeX v6) project…”

Use the indefinite articles “a” or “an” when referring to a general type or group of items. In descriptions of mythology, this type of article is commonly used to signal that the noun can be any member of a group. “A” is placed before a noun that begins with a consonant; “an” is paired with a noun that begins with a vowel.

Assay-based protocol is a type of protocol:

“In contrast to an assay-based protocol…”

Useful tool is a type of tool:

“…ImReP provides a useful tool for mining large-scale RNA-Seq datasets …”

When using a plural noun, we typically omit the indefinite article.

“In contrast to assay-based protocols…”

In a hypothetical scenario, if ImReP actually provides not one—but many—useful tools:

“…ImReP provides useful tools for mining large-scale RNA-Seq datasets …”


Developing an intentional writing practice can be as simple as scanning your work for sentences with potential for improvement. By designating editing passes to specific mechanical errors or types of sentence-level improvement, writing in a consistent, clear manner may become more habitual—and feel less like an exercise in foreign language class. In upcoming blog posts, we will discuss more ways to efficiently improve the structure and readability of papers.

In addition, we have written numerous blog posts on strategies for writing papers:

Our group has also published numerous blog posts on managing scientific labs and strategizing a graduate career. Articles presenting our advice on these subjects have become the top-viewed posts on our website: http://www.zarlab.xyz/advice/.

Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes

Farhad Hormozdiari and Eleazar Eskin recently applied an extension of CAVIAR to assess signal selection in European ancestry. CAVIAR is a probabilistic method for detecting a confidence set of SNPs containing all the causal variants in a locus that are within a predefined probability (e.g., 90% or 95%)—while taking into account biases generated by linkage disequilibrium. Farhad, now a post-doctoral scholar at Boston University, developed CAVIAR while a PhD student at UCLA.

This project was led by Matthew T. Buckley and Fernando Racimo at the University of California, Berkeley, and Morten E. Allentoft at the University of Copenhagen. Alleles with strong selection signals have been recently selected for and are thought to carry an evolutionary advantage for individuals in the population. Identifying these alleles helps expand our understanding of the selective pressures that shaped historic populations.

Allele frequency changes across FADS region. For more information, see our full paper.

In order to analyze the selective processes in Europeans across space and time, the project compared sequencing data from FADS genes obtained from present-day and Bronze Age (5000 to 3000 years ago) Europeans. We focused on FADS genes because prior studies indicate they are subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. In other words, selective pressure in the FADS genes may be linked to dietary adaptations.

Other analyses conducted by the project show that alleles in the FAD2 gene display the strongest changes in allele frequency since the Bronze Age, and this change shows associations with expression changes and multiple lipid-related phenotypes. Farhad and Eleazar used CAVIAR to look for presence of allelic heterogeneity, an adaptive process in which different mutations at the same locus cause the same phenotype. In an evolutionary context, presence suggests that a strong pressure selective pressure likely acted upon the population.

Application of CAVIAR to genomic data from the 1000 Genomes Project and 54 Bronze Age Europeans revealed that specific causal variants within the FADS2 gene have been subjected to selective pressure. In particular, FADS2 shows evidence of allelic heterogeneity in three tissue types: transformed fibroblast cells (Pr(2 causal variants) = 0.72), left heart ventricle (Pr(2 causal variants) = 0.74), and whole blood (Pr(3 causal variants) = 0.74).

The project’s comparison of modern to Bronze Age European genomic data show that selection has indeed strongly acted on the FADS gene cluster over the past 3000 years. The selective patterns observed in European data may be driven by a change in the dietary composition of fatty acids following the human transition from hunting-and-gathering to agriculture. As Europeans obtained more lipids from plants, rather than from fish and mammals, their genes adapted to optimize metabolism of these cereal-based lipids.

For more information, see our paper, which is available for download through Molecular Biology and Evolution: https://www.ncbi.nlm.nih.gov/pubmed/28333262.

The full citation to our paper is: 

Sorry, no publications matched your criteria.

Buckley, M.T., Racimo, F., Allentoft, M.E., Jensen, M.K., Jonsson, A., Huang, H., Hormozdiari, F., Sikora, M., Marnetto, D., Eskin, E. and Jørgensen, M.E., 2017. Selection in Europeans on fatty acid desaturases associated with dietary changes. Molecular biology and evolution.

This project used a method introduced in a previous publication: 

Sorry, no publications matched your criteria.

CAVIAR was created by Farhad HormozdiariEmrah KostemEun Yong KangBogdan Pasaniuc, and Eleazar Eskin. Visit the following page to download CAVIAR and eCAVIAR: http://genetics.cs.ucla.edu/caviar/.

Incorporating prior information into association studies

Genome-wide association studies (GWAS) seek to identify genetic variants involved in specific traits. GWAS are advantageous for linking variants with traits, because they interrogate the genome in a uniform way. In other words, they examine the whole genome without a preconceived notion of where the associations may lie.

However, we now know a lot about the putative function of genetic variants due to tremendous progress in functional genomics. In many cases, we even know which variants are more likely to be involved in disease when compared to others. Advancements in our understanding of functional genomics motivate the strategic incorporation of prior information in GWAS.

Our group has been interested in this problem for many years. One challenge to addressing this problem is that the widely utilized approach for GWAS involves evaluating an association statistic at each single nucleotide polymorphism (SNP), and these methods take into account only one SNP at a time. The results are then adjusted for multiple testing, and an association is identified if a statistic exceeds a certain threshold. This approach can be described as a frequentist approach. On the other hand, one can incorporate prior information on which SNPs are likely to be the causal variants affecting the trait. This approach is inherently a Bayesian concept. Reconciling these two approaches is not straightforward.

Average power under varying relative risks. For more information, see our paper.

In a 2008 paper published in Genome Research, our group proposed a modification of the multiple testing framework to address this problem. Instead of using the same specific threshold for all of the association statistics, we use a different threshold for each association statistic, where the thresholds are adjusted based on the prior information. Our method takes advantage of the correlation structure by considering multiple markers within a region. In our paper, we demonstrate how to set the thresholds in order to optimally utilize prior information and maximize statistical power.

Using prior information in genetic association studies increases power over traditional association studies while maintaining the same overall false-positive rate. Compared to standard methods, our approach is equally simple to apply to association studies, produces interpretable results as p-values, and is optimal in its use of prior information in regards to statistical power.

In 2012, we extended this work to use only tag SNPs for the putative causal variant. This project was developed by Gregory Darnell (then UCLA undergraduate, now PhD student at Princeton University), Dat Duong (then UCLA undergraduate, now UCLA PhD student), and Buhm Han.

More recently, we have applied this framework to incorporate functional information in analysis of eQTL data. In this case, incorporating genomic annotation of variants significantly increases the statistical power of existing eQTL methods and detects more eGenes in comparison to standard approaches. Read the blog post on this paper, and download the full article.

For more information on our general approach, see our paper, which is available for download through Bioinformatics:
https://academic.oup.com/bioinformatics/article/28/12/i147/269880/Incorporating-prior-information-into-association
In addition, the open source implementation of our 2012 paper, MASA, which was developed by Greg Darnell and Dat Duong, is freely available for download at http://masa.cs.ucla.edu/.

The full citations to our papers on this topic are:

Sorry, no publications matched your criteria.


Eleazar Eskin. “Increasing Power in Association Studies by using Linkage Disequilibrium
Structure and Molecular Function as Prior Information.” Genome Research.
18(4):653-60 Special Issue Proceedings of the 12th Annual Conference on Research
in Computational Biology (RECOMB-2008), 2008.