1
|
Li J, Singh U, Bhandary P, Campbell J, Arendsee Z, Seetharam AS, Wurtele ES. Foster thy young: enhanced prediction of orphan genes in assembled genomes. Nucleic Acids Res 2021; 50:e37. [PMID: 34928390 PMCID: PMC9023268 DOI: 10.1093/nar/gkab1238] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/22/2021] [Accepted: 12/02/2021] [Indexed: 02/06/2023] Open
Abstract
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Collapse
Affiliation(s)
- Jing Li
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA
| | - Urminder Singh
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Priyanka Bhandary
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Jacqueline Campbell
- Corn Insects and Crop Genetics Research Unit, US Department of Agriculture Agriculture Research Service, Ames, IA 50014, USA
| | - Zebulun Arendsee
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Iowa State University, Ames, IA 50014, USA
| | - Eve Syrkin Wurtele
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| |
Collapse
|
2
|
Bhandary P, Seetharam AS, Arendsee ZW, Hur M, Wurtele ES. Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2018; 267:32-47. [PMID: 29362097 DOI: 10.1016/j.plantsci.2017.10.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/07/2017] [Accepted: 10/15/2017] [Indexed: 05/19/2023]
Abstract
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.
Collapse
Affiliation(s)
- Priyanka Bhandary
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA 50011, USA
| | - Zebulun W Arendsee
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Manhoi Hur
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
3
|
Morgan K, McGaughran A, Rödelsperger C, Sommer RJ. Variation in rates of spontaneous male production within the nematode species Pristionchus pacificus supports an adaptive role for males and outcrossing. BMC Evol Biol 2017; 17:57. [PMID: 28228092 PMCID: PMC5322664 DOI: 10.1186/s12862-017-0873-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 01/05/2017] [Indexed: 12/18/2022] Open
Abstract
Background The nematode species Pristionchus pacificus has an androdioecious mating system in which populations consist of self-fertilizing hermaphrodites and relatively few males. The prevalence of males in such a system is likely to depend on the relative pros and cons of outcrossing. While outcrossing generates novel allelic combinations and can therefore increase adaptive potential, it may also disrupt the potentially beneficial consequences of repeated generations of selfing. These include purging of deleterious alleles, inheritance of co-adapted allele complexes, improved hermaphrodite fitness and increased population growth. Here we use experimental and population genetic approaches to test hypotheses relating to male production and outcrossing in laboratory and natural populations of P. pacificus sampled from the volcanic island of La Réunion. Results We find a significant interaction between sampling locality and temperature treatment influencing rates of spontaneous male production in the laboratory. While strains isolated at higher altitude, cooler localities produce a higher proportion of male offspring at 25 °C relative to 20 or 15 °C, the reverse pattern is seen in strains isolated from warmer, low altitude localities. Linkage disequilibrium extends across long physical distances, but fails to approach levels reported for the partially selfing nematode species Caenorhabditis elegans. Finally, we find evidence for admixture between divergent genetic lineages. Conclusions Elevated rates of laboratory male generation appear to occur under environmental conditions which differ from those experienced by populations in nature. Such elevated male generation may result in higher outcrossing rates, hence driving increased effective recombination and the creation of potentially adaptive novel allelic combinations. Patterns of linkage disequilibrium decay support selfing as the predominant reproductive strategy in P. pacificus. Finally, despite the potential for outcrossing depression, our results suggest admixture has occurred between distinct genetic lineages since their independent colonization of the island, suggesting outcrossing depression may not be uniform in this species. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-0873-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katy Morgan
- Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany. .,Department of Biological Sciences, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA70148, USA.
| | - Angela McGaughran
- Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany.,CSIRO Land & Water, Black Mountain Laboratories, Clunies Ross Street, Canberra, ACT 2601, Australia.,University of Melbourne, School of BioSciences, 30 Flemington Road, Melbourne, VIC, 3010, Australia
| | - Christian Rödelsperger
- Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| | - Ralf J Sommer
- Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| |
Collapse
|
4
|
Kaplan REW, Baugh LR. L1 arrest, daf-16/FoxO and nonautonomous control of post-embryonic development. WORM 2016; 5:e1175196. [PMID: 27383290 DOI: 10.1080/21624054.2016.1175196] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 03/31/2016] [Indexed: 10/22/2022]
Abstract
Post-embryonic development is governed by nutrient availability. L1 arrest, dauer formation and aging illustrate how starvation, anticipation of starvation and caloric restriction have profound influence on C. elegans development, respectively. Insulin-like signaling through the Forkhead box O transcription factor daf-16/FoxO regulates each of these processes. We recently reported that ins-4, ins-6 and daf-28 promote L1 development from the intestine and chemosensory neurons, similar to their role in dauer development. daf-16 functions cell-nonautonomously in regulation of L1 arrest, dauer development and aging. Discrepancies in daf-16 sites of action have been reported in each context, but the consensus implicates epidermis, intestine and nervous system. We suggest technical limitations of the experimental approach responsible for discrepant results. Steroid hormone signaling through daf-12/NHR is known to function downstream of daf-16 in control of dauer development, but signaling pathways mediating cell-nonautonomous effects of daf-16 in aging and L1 arrest had not been identified. We recently showed that daf-16 promotes L1 arrest by inhibiting daf-12/NHR and dbl-1/TGF-β Sma/Mab signaling, two pathways that promote L1 development in fed larvae. We will review these results on L1 arrest and speculate on why there are so many signals and signaling centers regulating post-embryonic development.
Collapse
Affiliation(s)
| | - L Ryan Baugh
- Department of Biology, Duke University , Durham, NC, USA
| |
Collapse
|