Our group, in an effort led by former UCLA PhD student Dan He, developed an algorithm for reconstructing pedigrees with genotype data. This novel approach is presented in a paper recently published in IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Pedigree inference plays an important role in population genetics. Pedigrees, commonly known as family trees, represent genetic relationships between individuals of a family. A pedigree diagram provides a model to compute the inheritance probability for the observed genotype and encodes all possible inheritance options for an allele in an individual. Pedigree reconstruction methods face several challenges. First, there can be an exponential number of possible pedigree graphs, and, second, the number of unknown ancestors can become very large as the height of the pedigree increases.
Our project uses genotype data to reconstruct pedigrees with computational efficiency despite these challenges. Our previous method, IPED, is the only known algorithm scalable to large pedigrees with reasonable accuracy for cases involving both outbreeding and inbreeding. IPED starts from extant individuals and reconstructs the pedigree generation by generation backwards in time. For each generation, IPED predicts the pairwise relationships between the individuals at the current generation and create parents for them according to their relationships.
Existing methods, including IPED, only consider pedigrees with simple structure; they cannot handle populations where, for example, two children share only one parent. To improve pedigree reconstruction when populations have complex structure, we proposed the novel method IPED2. Our approach uses a new statistical test to detect half-sibling relationships and a new graph-based algorithm to reconstruct the pedigree when half-siblings are allowed.
In order to test the performance of our method on complicated pedigrees, we use simulated pedigrees with different parameter settings and, instead of genotype data, we simulate haplotypes
directly. Our experiments show that IPED2 outperforms IPED and two other existing approaches for cases where there are half-siblings.
To our knowledge, this is the first method that can, using just genotype data, reconstruct pedigrees with half-siblings and inbreeding. IPED2 is also scalable to large pedigrees. In future work, we would like to consider additional genetic actions, such as insertion, deletion, and replacement, to resolve the conflicts. We also plan to refine IPED2 to consider cases where genotypes of ancestral individuals are known and where genotypes of extant individuals that are not on the lowest generations are known.
For more information, see our paper, which is available for download through Bioinformatics: http://ieeexplore.ieee.org/abstract/document/7888513/.
In addition, the open source implementation of IPED2, which was developed by Dan He, is freely available for download at http://genetics.cs.ucla.edu/Dan/Software/IPED2.html.
The full citation to our paper is:
He, D., Wang, Z., Parida, L. and Eskin, E., 2017. IPED2: Inheritance path based pedigree reconstruction algorithm for complicated pedigrees. IEEE/ACM Transactions on Computational Biology and Bioinformatics.
(This post is authored by Lana Martin.)
Clarity is especially important when writing scientific methods papers, proposal, and reports. Journal referees and grant reviewers typically read many submissions in one sitting; they expect to quickly and easily understand the mechanics, significance, and potential contributions of your work. Once a project is published, readers expect to quickly and easily understand how they can use and apply your method in their own work.
Improving clarity of writing is an iterative process that involves a lot of practice in writing, editing one’s own writing, and editing the writing of others. Clear, orderly writing is not a natural tendency for most of us because we don’t normally speak that way in conversation! Similarly, academic specialization leaves us in the dark concerning the amount of detail necessary to make a piece accessible to a broader audience. For most people, developing an intentional practice around routine writing tasks is necessary in order to improve writing skills.
The first draft of any document can always be improved with multiple editing passes. One strategy to improve editing efficiency is to designate each editing pass to a specific editing component, keeping in mind your own personal weak areas. For example, you may first clean up mechanical errors such as spelling and grammar. Second, you may re-write sentences while considering a specific list of writing principles. Finally, editing for over-all cohesion and completeness of ideas can be easier once you have clean copy to work with.
Here, we present five principles for clear writing on the sentence level. These guidelines are universal, yet particularly relevant to scientific and technical non-fiction writing.
1. Directly modify a verb. Often, when describing an action, our default inclination is to add a verb modifier later in the sentence—well after the verb appears. For example:
“…considering simultaneously the population structure…”
“…considering population structure simultaneously…”
This structure makes sense in conversation, because you can emphasize how you did something with tone and inflection. We read in a more linear fashion than we speak; in writing, consistently placing the adverb before the verb makes it clear to the reader which action the modifier belongs to. Consistently ordering verbs and verb modifiers is especially useful when listing a series of actions that are each modified differently, such as in a protocol.
“…simultaneously considering population structure…”
2. Front-load the star topic of a sentence. Another habit that we carry from conversation to writing is to bury the most important part of a sentence at the end. This may cause the reader, particularly those less familiar with your subject matter, to re-read the sentence. Here, the specific concept—the star topic of the sentence—follows the general concept:
“As a result, a large number of false discoveries may be found in the common case where the cell type composition is correlated with the phenotype.”
When reading about a methodology problem, we usually want to first know what is specifically interesting about a concept, and then learn about the concept’s significance on a larger scale. These “flipped” sentences are common in first drafts and can be easily edited in a single pass.
“As a result, the cell type composition is commonly correlated with the phenotype, and the methods produce a large number of false discoveries.”
3. Refine use of the dependent clause. A dependent clause is a group of words with a subject and a verb; alone, it is not a complete sentence and does not express a complete thought. We tend to use dependent clauses in writing because we tend to use dependent clauses in our own thought processes. This may suffice for problem-solving in our head-space vacuum, but, in order to effectively communicate with other people, we must completely describe these ideas in writing.
For example, this statement has two dependent clauses next to each other:
“Detecting allelic heterogeneity in regions that are more complicated is not intuitive.”
Given the provided information, the object of “more complicated” and/or “less intuitive” may not be clear. Adding a conjunction (“that”) between the two clauses clarifies that detection is the object of “less intuitive,” and regions is the object of “more complicated.”
“Detecting allelic heterogeneity is less intuitive in regions that are more complicated.”
4. Replace a vague dependent clause with a compound sentence. Dependent clauses help present contrasts by defining the scope in which the given statement is valid, but they can also be vague and confusing. For example:
“In contrast to Mendelian traits, the extent of AH at loci contributing to common, complex disease is almost unknown.”
When reading scientific and technical writing, we want to see contrasts clearly described—especially for readers who may not have an in-depth understanding of the background concepts. We, as specialists, may not clearly define these concepts because we are not accustomed to working our way through the logic of fundamental ideas. Re-engineering the overly vague clause with a compound sentence can efficiently get the novice reader on the same page as the expert reader. The dependent clause is now a complete thought that stands on its own:
“The genetic causes of Mendelian traits are well understood, but the extent of AH at loci contributing to common, complex disease is almost unknown.”
5. Add, remove, or modify an article used before a noun. An article is a word (the, a, an) that is placed before a noun to indicate the type of reference being made by the noun. The use of articles is tricky and, at times, a matter of stylistic choice. However, in scientific and technical writing, there are a few best practices for using articles to improve clarity. For example, articles can specify the volume or numerical scope of the noun. When articles are used to clarify numerical scope, first decide if the noun is one (singular) or many (plural), then choose to include or omit the appropriate article.
Use the definite article “the” when you are referring to the one unique item or set of items. In descriptions of methodology, this type of article is commonly used to signal that the noun is a general concept, a broad system, or a one-and-only example.
Immunological properties is a general concept, which the author may separately define in detail:
“…the immunological properties of a B cell receptor…”
Adaptive immune system is a broad system comprised of many parts:
“A key function of the adaptive immune system is…”
GTeX v6 project is one-and-only; future GTeX will presumably be v7!
“…the Genotype Tissue Expression (GTeX v6) project…”
Use the indefinite articles “a” or “an” when referring to a general type or group of items. In descriptions of mythology, this type of article is commonly used to signal that the noun can be any member of a group. “A” is placed before a noun that begins with a consonant; “an” is paired with a noun that begins with a vowel.
Assay-based protocol is a type of protocol:
“In contrast to an assay-based protocol…”
Useful tool is a type of tool:
“…ImReP provides a useful tool for mining large-scale RNA-Seq datasets …”
When using a plural noun, we typically omit the indefinite article.
“In contrast to assay-based protocols…”
In a hypothetical scenario, if ImReP actually provides not one—but many—useful tools:
“…ImReP provides useful tools for mining large-scale RNA-Seq datasets …”
Developing an intentional writing practice can be as simple as scanning your work for sentences with potential for improvement. By designating editing passes to specific mechanical errors or types of sentence-level improvement, writing in a consistent, clear manner may become more habitual—and feel less like an exercise in foreign language class. In upcoming blog posts, we will discuss more ways to efficiently improve the structure and readability of papers.
In addition, we have written numerous blog posts on strategies for writing papers:
- Writing Tips: Improving Clarity on the Sentence Level
- Writing Tips: An Authorship Policy that Maximizes Collaboration
- Writing Tips: Why we Publish Methods Papers
- Writing Tips: Results Subsections
- Writing Tips: Methods Overview
- Writing Tips: Introduction
- Writing Tips: How we Edit
- Writing Tips: Getting Organized (and Staying that Way)
- Writing Tips: Motivation (or the Lack of It)
- Writing Tips: Overcoming Writer’s Block
Our group has also published numerous blog posts on managing scientific labs and strategizing a graduate career. Articles presenting our advice on these subjects have become the top-viewed posts on our website: http://www.zarlab.xyz/advice/.
Farhad Hormozdiari and Eleazar Eskin recently applied an extension of CAVIAR to assess signal selection in European ancestry. CAVIAR is a probabilistic method for detecting a confidence set of SNPs containing all the causal variants in a locus that are within a predefined probability (e.g., 90% or 95%)—while taking into account biases generated by linkage disequilibrium. Farhad, now a post-doctoral scholar at Boston University, developed CAVIAR while a PhD student at UCLA.
This project was led by Matthew T. Buckley and Fernando Racimo at the University of California, Berkeley, and Morten E. Allentoft at the University of Copenhagen. Alleles with strong selection signals have been recently selected for and are thought to carry an evolutionary advantage for individuals in the population. Identifying these alleles helps expand our understanding of the selective pressures that shaped historic populations.
In order to analyze the selective processes in Europeans across space and time, the project compared sequencing data from FADS genes obtained from present-day and Bronze Age (5000 to 3000 years ago) Europeans. We focused on FADS genes because prior studies indicate they are subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. In other words, selective pressure in the FADS genes may be linked to dietary adaptations.
Other analyses conducted by the project show that alleles in the FAD2 gene display the strongest changes in allele frequency since the Bronze Age, and this change shows associations with expression changes and multiple lipid-related phenotypes. Farhad and Eleazar used CAVIAR to look for presence of allelic heterogeneity, an adaptive process in which different mutations at the same locus cause the same phenotype. In an evolutionary context, presence suggests that a strong pressure selective pressure likely acted upon the population.
Application of CAVIAR to genomic data from the 1000 Genomes Project and 54 Bronze Age Europeans revealed that specific causal variants within the FADS2 gene have been subjected to selective pressure. In particular, FADS2 shows evidence of allelic heterogeneity in three tissue types: transformed fibroblast cells (Pr(2 causal variants) = 0.72), left heart ventricle (Pr(2 causal variants) = 0.74), and whole blood (Pr(3 causal variants) = 0.74).
The project’s comparison of modern to Bronze Age European genomic data show that selection has indeed strongly acted on the FADS gene cluster over the past 3000 years. The selective patterns observed in European data may be driven by a change in the dietary composition of fatty acids following the human transition from hunting-and-gathering to agriculture. As Europeans obtained more lipids from plants, rather than from fish and mammals, their genes adapted to optimize metabolism of these cereal-based lipids.
For more information, see our paper, which is available for download through Molecular Biology and Evolution: https://www.ncbi.nlm.nih.gov/pubmed/28333262.
The full citation to our paper is:
Buckley, M.T., Racimo, F., Allentoft, M.E., Jensen, M.K., Jonsson, A., Huang, H., Hormozdiari, F., Sikora, M., Marnetto, D., Eskin, E. and Jørgensen, M.E., 2017. Selection in Europeans on fatty acid desaturases associated with dietary changes. Molecular biology and evolution.
This project used a method introduced in a previous publication:
In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631.
CAVIAR was created by Farhad Hormozdiari, Emrah Kostem, Eun Yong Kang, Bogdan Pasaniuc, and Eleazar Eskin. Visit the following page to download CAVIAR and eCAVIAR: http://genetics.cs.ucla.edu/caviar/.