1
|
Fesenko I, Sahakyan H, Dhyani R, Shabalina SA, Storz G, Koonin EV. The hidden bacterial microproteome. Mol Cell 2025; 85:1024-1041.e6. [PMID: 39978337 PMCID: PMC11890958 DOI: 10.1016/j.molcel.2025.01.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 11/05/2024] [Accepted: 01/22/2025] [Indexed: 02/22/2025]
Abstract
Microproteins encoded by small open reading frames comprise the "dark matter" of proteomes. Although microproteins have been detected in diverse organisms from all three domains of life, many more remain to be identified, and only a few have been functionally characterized. In this comprehensive study of intergenic small open reading frames (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identify 67,297 clusters of ismORFs subject to purifying selection. Expression of tagged Escherichia coli microproteins is detected for 11 of the 16 tested, validating the predictions. Although the ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins, some globular folds, oligomeric structures, and possible interactions with proteins encoded by neighboring genes are predicted. Complete information on the predicted microprotein families, including evidence of transcription and translation, and structure predictions are available as an easily searchable resource for investigation of microprotein functions.
Collapse
Affiliation(s)
- Igor Fesenko
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajat Dhyani
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Svetlana A Shabalina
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Eugene V Koonin
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
2
|
Xia S, Chen J, Arsala D, Emerson JJ, Long M. Functional innovation through new genes as a general evolutionary process. Nat Genet 2025; 57:295-309. [PMID: 39875578 DOI: 10.1038/s41588-024-02059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 12/15/2024] [Indexed: 01/30/2025]
Abstract
In the past decade, our understanding of how new genes originate in diverse organisms has advanced substantially, and more than a dozen molecular mechanisms for generating initial gene structures were identified, in addition to gene duplication. These new genes have been found to integrate into and modify pre-existing gene networks primarily through mutation and selection, revealing new patterns and rules with stable origination rates across various organisms. This progress has challenged the prevailing belief that new proteins evolve from pre-existing genes, as new genes may arise de novo from noncoding DNA sequences in many organisms, with high rates observed in flowering plants. New genes have important roles in phenotypic and functional evolution across diverse biological processes and structures, with detectable fitness effects of sexual conflict genes that can shape species divergence. Such knowledge of new genes can be of translational value in agriculture and medicine.
Collapse
Affiliation(s)
- Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
3
|
Jin GT, Xu YC, Hou XH, Jiang J, Li XX, Xiao JH, Bian YT, Gong YB, Wang MY, Zhang ZQ, Zhang YE, Zhu WS, Liu YX, Guo YL. A de novo Gene Promotes Seed Germination Under Drought Stress in Arabidopsis. Mol Biol Evol 2025; 42:msae262. [PMID: 39719058 PMCID: PMC11721784 DOI: 10.1093/molbev/msae262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 10/29/2024] [Accepted: 12/06/2024] [Indexed: 12/26/2024] Open
Abstract
The origin of genes from noncoding sequences is a long-term and fundamental biological question. However, how de novo genes originate and integrate into the existing pathways to regulate phenotypic variations is largely unknown. Here, we selected 7 genes from 782 de novo genes for functional exploration based on transcriptional and translational evidence. Subsequently, we revealed that Sun Wu-Kong (SWK), a de novo gene that originated from a noncoding sequence in Arabidopsis thaliana, plays a role in seed germination under osmotic stress. SWK is primarily expressed in dry seed, imbibing seed and silique. SWK can be fully translated into an 8 kDa protein, which is mainly located in the nucleus. Intriguingly, SWK was integrated into an extant pathway of hydrogen peroxide content (folate synthesis pathway) via the upstream gene cytHPPK/DHPS, an Arabidopsis-specific gene that originated from the duplication of mitHPPK/DHPS, and downstream gene GSTF9, to improve seed germination in osmotic stress. In addition, we demonstrated that the presence of SWK may be associated with drought tolerance in natural populations of Arabidopsis. Overall, our study highlights how a de novo gene originated and integrated into the existing pathways to regulate stress adaptation.
Collapse
Affiliation(s)
- Guang-Teng Jin
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xing-Hui Hou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Juan Jiang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin-Xin Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jia-Hui Xiao
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu-Tao Bian
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bo Gong
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ming-Yu Wang
- State Key Laboratory of Maize Bio-breeding/College of Plant Protection, China Agricultural University, Beijing 100193, China
| | - Zhi-Qin Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong E Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents and Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Wang-Sheng Zhu
- State Key Laboratory of Maize Bio-breeding/College of Plant Protection, China Agricultural University, Beijing 100193, China
| | - Yong-Xiu Liu
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
4
|
Cherezov RO, Vorontsova JE, Kuvaeva EE, Akishina AA, Zavoloka EL, Simonova OB. The lawc gene emerged de novo from conserved genomic elements and acquired a broad expression pattern in Drosophila. J Genet Genomics 2024:S1673-8527(24)00367-9. [PMID: 39733859 DOI: 10.1016/j.jgg.2024.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 12/17/2024] [Accepted: 12/18/2024] [Indexed: 12/31/2024]
Abstract
It has recently become evident that the de novo emergence of genes is widespread and documented for a variety of organisms. De novo genes frequently emerge in proximity to existing genes, forming gene overlaps. Here, we present an analysis of the evolutionary history of a putative de novo gene, lawc, which overlaps with the conserved Trf2 gene, which encodes a general transcription factor in Drosophila melanogaster. We demonstrate that lawc emerged approximately 68 million years ago in the 5'-untranslated region (UTR) of Trf2 and displays an extensive spatiotemporal expression pattern. One of the most remarkable features of the lawc evolutionary history is that its emergence was facilitated by the engagement of Drosophilidae-specific short, highly conserved regions located in Trf2 introns. This represents a unique example of putative de novo gene birth involving conserved DNA regions localized in introns of conserved genes. The observed lawc expression pattern may be due to the overlap of lawc with the 5'-UTR of Trf2. This study not only enriches our understanding of gene evolution but also highlights the complex interplay between genetic conservation and innovation.
Collapse
Affiliation(s)
- Roman O Cherezov
- Kol'tsov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, 119334, Russia.
| | - Julia E Vorontsova
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Elena E Kuvaeva
- Kol'tsov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Angelina A Akishina
- Kol'tsov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Ekaterina L Zavoloka
- Kol'tsov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Olga B Simonova
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, 119334, Russia
| |
Collapse
|
5
|
Guay SY, Patel PH, Thomalla JM, McDermott KL, O'Toole JM, Arnold SE, Obrycki SJ, Wolfner MF, Findlay GD. An orphan gene is essential for efficient sperm entry into eggs in Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.08.607187. [PMID: 39149251 PMCID: PMC11326263 DOI: 10.1101/2024.08.08.607187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
While spermatogenesis has been extensively characterized in the Drosophila melanogaster model system, very little is known about the genes required for fly sperm entry into eggs. We identified a lineage-specific gene, which we named katherine johnson (kj), that is required for efficient fertilization. Males that do not express kj produce and transfer sperm that are stored normally in females, but sperm from these males enter eggs with severely reduced efficiency. Using a tagged transgenic rescue construct, we observed that the KJ protein localizes around the edge of the nucleus at various stages of spermatogenesis but is undetectable in mature sperm. These data suggest that kj exerts an effect on sperm development, the loss of which results in reduced fertilization ability. Interestingly, KJ protein lacks detectable sequence similarity to any other known protein, suggesting that kj could be a lineage-specific orphan gene. While previous bioinformatic analyses indicated that kj was restricted to the melanogaster group of Drosophila, we identified putative orthologs with conserved synteny, male-biased expression, and predicted protein features across the genus, as well as likely instances of gene loss in some lineages. Thus, kj was likely present in the Drosophila common ancestor and subsequently evolved an essential role in fertility in D. melanogaster. Our results demonstrate a new aspect of male reproduction that has been shaped by a lineage-specific gene and provide a molecular foothold for further investigating the mechanism of sperm entry into eggs in Drosophila.
Collapse
Affiliation(s)
- Sara Y Guay
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Prajal H Patel
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jonathon M Thomalla
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | - Kerry L McDermott
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jillian M O'Toole
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah E Arnold
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah J Obrycki
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | | |
Collapse
|
6
|
Aldrovandi S, Fajardo Castro J, Ullrich K, Karger A, Luria V, Tautz D. Expression of Random Sequences and de novo Evolved Genes From the Mouse in Human Cells Reveals Functional Diversity and Specificity. Genome Biol Evol 2024; 16:evae175. [PMID: 39663928 PMCID: PMC11635099 DOI: 10.1093/gbe/evae175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/01/2024] [Indexed: 12/13/2024] Open
Abstract
Proteins that emerge de novo from noncoding DNA could negatively or positively influence cellular physiology in the sense of providing a possible adaptive advantage. Here, we employ two approaches to study such effects in a human cell line by expressing random sequences and mouse de novo genes that lack homologs in the human genome. We show that both approaches lead to differential growth effects of the cell clones dependent on the sequences they express. For the random sequences, 53% of the clones decreased in frequency, and about 8% increased in frequency in a joint growth experiment. Of the 14 mouse de novo genes tested in a similar joint growth experiment, 10 decreased, and 3 increased in frequency. When individually analysed, each mouse de novo gene triggers a unique transcriptomic response in the human cells, indicating mostly specific rather than generalized effects. Structural analysis of the de novo gene open reading frames (ORFs) reveals a range of intrinsic disorder scores and/or foldability into alpha-helices or beta sheets, but these do not correlate with their effects on the growth of the cells. Our results indicate that de novo evolved ORFs could easily become integrated into cellular regulatory pathways, since most interact with components of these pathways and could therefore become directly subject to positive selection if the general conditions allow this.
Collapse
Affiliation(s)
- Silvia Aldrovandi
- Max-Planck Institute for Evolutionary Biology, Dept. Evol. Genetics, Plön 24306, Germany
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Johana Fajardo Castro
- Max-Planck Institute for Evolutionary Biology, Dept. Evol. Genetics, Plön 24306, Germany
- Science and Technology Academy, University of Kiel, Kiel 24118, Germany
| | - Kristian Ullrich
- Max-Planck Institute for Evolutionary Biology, Dept. Evol. Genetics, Plön 24306, Germany
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA 02115, USA
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, Dept. Evol. Genetics, Plön 24306, Germany
| |
Collapse
|
7
|
Zhao L, Svetec N, Begun DJ. De Novo Genes. Annu Rev Genet 2024; 58:211-232. [PMID: 39088850 PMCID: PMC12051474 DOI: 10.1146/annurev-genet-111523-102413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Although the majority of annotated new genes in a given genome appear to have arisen from duplication-related mechanisms, recent studies have shown that genes can also originate de novo from ancestrally nongenic sequences. Investigating de novo-originated genes offers rich opportunities to understand the origin and functions of new genes, their regulatory mechanisms, and the associated evolutionary processes. Such studies have uncovered unexpected and intriguing facets of gene origination, offering novel perspectives on the complexity of the genome and gene evolution. In this review, we provide an overview of the research progress in this field, highlight recent advancements, identify key technical and conceptual challenges, and underscore critical questions that remain to be addressed.
Collapse
Affiliation(s)
- Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - Nicolas Svetec
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis, California, USA;
| |
Collapse
|
8
|
Papadopoulos C, Arbes H, Cornu D, Chevrollier N, Blanchet S, Roginski P, Rabier C, Atia S, Lespinet O, Namy O, Lopes A. The ribosome profiling landscape of yeast reveals a high diversity in pervasive translation. Genome Biol 2024; 25:268. [PMID: 39402662 PMCID: PMC11472626 DOI: 10.1186/s13059-024-03403-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 09/26/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Pervasive translation is a widespread phenomenon that plays a critical role in the emergence of novel microproteins, but the diversity of translation patterns contributing to their generation remains unclear. Based on 54 ribosome profiling (Ribo-Seq) datasets, we investigated the yeast Ribo-Seq landscape using a representation framework that allows the comprehensive inventory and classification of the entire diversity of Ribo-Seq signals, including non-canonical ones. RESULTS We show that if coding regions occupy specific areas of the Ribo-Seq landscape, noncoding regions encompass a wide diversity of Ribo-Seq signals and, conversely, populate the entire landscape. Our results show that pervasive translation can, nevertheless, be associated with high specificity, with 1055 noncoding ORFs exhibiting canonical Ribo-Seq signals. Using mass spectrometry under standard conditions or proteasome inhibition with an in-house analysis protocol, we report 239 microproteins originating from noncoding ORFs that display canonical but also non-canonical Ribo-Seq signals. Each condition yields dozens of additional microprotein candidates with comparable translation properties, suggesting a larger population of volatile microproteins that are challenging to detect. Our findings suggest that non-canonical translation signals may harbor valuable information and underscore the significance of considering them in proteogenomic studies. Finally, we show that the translation outcome of a noncoding ORF is primarily determined by the initiating codon and the codon distribution in its two alternative frames, rather than features indicative of functionality. CONCLUSION Our results enable us to propose a topology of a species' Ribo-Seq landscape, opening the way to comparative analyses of this translation landscape under different conditions.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
- Hospital del Mar Research Institute, Barcelona, Spain
| | - Hugo Arbes
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - David Cornu
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | | | - Sandra Blanchet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Paul Roginski
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Camille Rabier
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Safiya Atia
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Olivier Namy
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, Cedex, 91198, France.
| |
Collapse
|
9
|
Chen JH, Landback P, Arsala D, Guzzetta A, Xia S, Atlas J, Sosa D, Zhang YE, Cheng J, Shen B, Long M. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567139. [PMID: 38045239 PMCID: PMC10690195 DOI: 10.1101/2023.11.14.567139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are genetic novelties pivotal in mammalian evolution. However, their phenotypic impacts and evolutionary patterns over time remain elusive in humans due to the technical and ethical complexities of functional studies. Integrating gene age dating with Mendelian disease phenotyping, our research shows a gradual rise in disease gene proportion as gene age increases. Logistic regression modeling indicates that this increase in older genes may be related to their longer sequence lengths and higher burdens of deleterious de novo germline variants (DNVs). We also find a steady integration of new genes with biomedical phenotypes into the human genome over macroevolutionary timescales (~0.07% per million years). Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures across gene ages. Notably, young genes show significant enrichment in diseases related to the male reproductive system, indicating strong sexual selection. Young genes also exhibit disease-related functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, musculoskeletal phenotypes, and color vision. We further reveal a logistic growth pattern of pleiotropy over evolutionary time, indicating a diminishing marginal growth of new functions for older genes due to intensifying selective constraints over time. We propose a "pleiotropy-barrier" model that delineates higher potentials for phenotypic innovation in young genes compared to older genes, a process that is subject to natural selection. Our study demonstrates that evolutionarily new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
Affiliation(s)
- Jian-Hai Chen
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Alexander Guzzetta
- Department of Pathology, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Jared Atlas
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| |
Collapse
|
10
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
11
|
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. The Rapid Evolution of De Novo Proteins in Structure and Complex. Genome Biol Evol 2024; 16:evae107. [PMID: 38753069 PMCID: PMC11149777 DOI: 10.1093/gbe/evae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/06/2024] Open
Abstract
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Qingrong Li
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dong Wang
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
12
|
Linnenbrink M, Breton G, Misra P, Pfeifle C, Dutheil JY, Tautz D. Experimental Evaluation of a Direct Fitness Effect of the De Novo Evolved Mouse Gene Pldi. Genome Biol Evol 2024; 16:evae084. [PMID: 38742287 PMCID: PMC11091481 DOI: 10.1093/gbe/evae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2024] [Indexed: 05/16/2024] Open
Abstract
De novo evolved genes emerge from random parts of noncoding sequences and have, therefore, no homologs from which a function could be inferred. While expression analysis and knockout experiments can provide insights into the function, they do not directly test whether the gene is beneficial for its carrier. Here, we have used a seminatural environment experiment to test the fitness of the previously identified de novo evolved mouse gene Pldi, which has been implicated to have a role in sperm differentiation. We used a knockout mouse strain for this gene and competed it against its parental wildtype strain for several generations of free reproduction. We found that the knockout (ko) allele frequency decreased consistently across three replicates of the experiment. Using an approximate Bayesian computation framework that simulated the data under a demographic scenario mimicking the experiment's demography, we could estimate a selection coefficient ranging between 0.21 and 0.61 for the wildtype allele compared to the ko allele in males, under various models. This implies a relatively strong selective advantage, which would fix the new gene in less than hundred generations after its emergence.
Collapse
Affiliation(s)
- Miriam Linnenbrink
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Max Planck Institute for Biological Intelligence, 82152 Martinsried, Germany
| | - Gwenna Breton
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Clinical Genomics Gothenburg, Science for Life Laboratory, Sahlgrenska Academy, University of Gothenburg, and Center for Medical Genomics, Department of Clinical Genetic and Genomics, Sahlgrenska University Hospital, Sweden
| | - Pallavi Misra
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
- Present address: Laboratory Corporation of America (LabCorp), Westborough, MA 01581, USA
| | - Christine Pfeifle
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Julien Y Dutheil
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
13
|
Fesenko I, Sahakyan H, Shabalina SA, Koonin EV. The Cryptic Bacterial Microproteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.17.580829. [PMID: 38903115 PMCID: PMC11188072 DOI: 10.1101/2024.02.17.580829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.
Collapse
Affiliation(s)
- Igor Fesenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A. Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
14
|
Chen J. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. RESEARCH SQUARE 2023:rs.3.rs-3632644. [PMID: 38045389 PMCID: PMC10690325 DOI: 10.21203/rs.3.rs-3632644/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are structural novelties pivotal in mammalian evolution. Their phenotypic impact on humans, however, remains elusive due to the technical and ethical complexities in functional studies. Through combining gene age dating with Mendelian disease phenotyping, our research reveals that new genes associated with disease phenotypes steadily integrate into the human genome at a rate of ~ 0.07% every million years over macroevolutionary timescales. Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures between young and old genes. Notably, young genes show significant enrichment in the male reproductive system, indicating strong sexual selection. Young genes also exhibit functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, bipedal locomotion, and color vision. Our findings further reveal increasing levels of pleiotropy over evolutionary time, which accompanies stronger selective constraints. We propose a "pleiotropy-barrier" model that delineates different potentials for phenotypic innovation between young and older genes subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
|
15
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
16
|
Liu J, Yuan R, Shao W, Wang J, Silman I, Sussman JL. Do "Newly Born" orphan proteins resemble "Never Born" proteins? A study using three deep learning algorithms. Proteins 2023. [PMID: 37092778 DOI: 10.1002/prot.26496] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 02/26/2023] [Accepted: 04/01/2023] [Indexed: 04/25/2023]
Abstract
"Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such "Newly Born" proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called "Never Born" proteins. The programs were used to compare the structures of two sets of "Never Born" proteins that had been expressed-Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3.
Collapse
Affiliation(s)
- Jing Liu
- Department of Biotechnology and Food Engineering, Guangdong Technion-Israel Institute of Technology, Shantou, China
- Faculty of Biotechnology and Food Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing, China
| | - Wei Shao
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jitong Wang
- Department of Chemistry, Tsinghua University, Beijing, China
| | - Israel Silman
- Department of Brain Sciences, The Weizmann Institute of Science, Rehovot, Israel
| | - Joel L Sussman
- Department of Chemical and Structural Biology, The Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
17
|
Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol 2023; 7:570-580. [PMID: 37024625 PMCID: PMC10089919 DOI: 10.1038/s41559-023-02010-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/10/2023] [Indexed: 04/08/2023]
Abstract
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Filip Buchel
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic
- Department of Biochemistry, Charles University, Prague, Czech Republic
| | - Margaux Aubel
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Dmitry Loginov
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Petr Novák
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
- Department of Protein Evolution, MPI for Developmental Biology, Tübingen, Germany.
| | - Klára Hlouchová
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic.
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
18
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
19
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
20
|
Venkataraman K, Shai N, Lakhiani P, Zylka S, Zhao J, Herre M, Zeng J, Neal LA, Molina H, Zhao L, Vosshall LB. Two novel, tightly linked, and rapidly evolving genes underlie Aedes aegypti mosquito reproductive resilience during drought. eLife 2023; 12:e80489. [PMID: 36744865 PMCID: PMC10076016 DOI: 10.7554/elife.80489] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 01/29/2023] [Indexed: 02/07/2023] Open
Abstract
Female Aedes aegypti mosquitoes impose a severe global public health burden as vectors of multiple viral pathogens. Under optimal environmental conditions, Aedes aegypti females have access to human hosts that provide blood proteins for egg development, conspecific males that provide sperm for fertilization, and freshwater that serves as an egg-laying substrate suitable for offspring survival. As global temperatures rise, Aedes aegypti females are faced with climate challenges like intense droughts and intermittent precipitation, which create unpredictable, suboptimal conditions for egg-laying. Here, we show that under drought-like conditions simulated in the laboratory, females retain mature eggs in their ovaries for extended periods, while maintaining the viability of these eggs until they can be laid in freshwater. Using transcriptomic and proteomic profiling of Aedes aegypti ovaries, we identify two previously uncharacterized genes named tweedledee and tweedledum, each encoding a small, secreted protein that both show ovary-enriched, temporally-restricted expression during egg retention. These genes are mosquito-specific, linked within a syntenic locus, and rapidly evolving under positive selection, raising the possibility that they serve an adaptive function. CRISPR-Cas9 deletion of both tweedledee and tweedledum demonstrates that they are specifically required for extended retention of viable eggs. These results highlight an elegant example of taxon-restricted genes at the heart of an important adaptation that equips Aedes aegypti females with 'insurance' to flexibly extend their reproductive schedule without losing reproductive capacity, thus allowing this species to exploit unpredictable habitats in a changing world.
Collapse
Affiliation(s)
- Krithika Venkataraman
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
| | - Nadav Shai
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
- Howard Hughes Medical InstituteNew YorkUnited States
| | - Priyanka Lakhiani
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
- Laboratory of Evolutionary Genetics and Genomics, Rockefeller UniversityNew YorkUnited States
| | - Sarah Zylka
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
| | - Jieqing Zhao
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
| | - Margaret Herre
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
- Kavli Neural Systems InstituteNew YorkUnited States
| | - Joshua Zeng
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
| | - Lauren A Neal
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
| | - Henrik Molina
- Proteomics Resource Center, Rockefeller UniversityNew YorkUnited States
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, Rockefeller UniversityNew YorkUnited States
| | - Leslie B Vosshall
- Laboratory of Neurogenetics and Behavior, Rockefeller UniversityNew YorkUnited States
- Howard Hughes Medical InstituteNew YorkUnited States
- Kavli Neural Systems InstituteNew YorkUnited States
| |
Collapse
|
21
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
22
|
Parikh SB, Houghton C, Van Oss SB, Wacholder A, Carvunis A. Origins, evolution, and physiological implications of de novo genes in yeast. Yeast 2022; 39:471-481. [PMID: 35959631 PMCID: PMC9544372 DOI: 10.1002/yea.3810] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 12/03/2022] Open
Abstract
De novo gene birth is the process by which new genes emerge in sequences that were previously noncoding. Over the past decade, researchers have taken advantage of the power of yeast as a model and a tool to study the evolutionary mechanisms and physiological implications of de novo gene birth. We summarize the mechanisms that have been proposed to explicate how noncoding sequences can become protein-coding genes, highlighting the discovery of pervasive translation of the yeast transcriptome and its presumed impact on evolutionary innovation. We summarize current best practices for the identification and characterization of de novo genes. Crucially, we explain that the field is still in its nascency, with the physiological roles of most young yeast de novo genes identified thus far still utterly unknown. We hope this review inspires researchers to investigate the true contribution of de novo gene birth to cellular physiology and phenotypic diversity across yeast strains and species.
Collapse
Affiliation(s)
- Saurin B. Parikh
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - S. Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Anne‐Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, Pittsburgh Center for Evolutionary Biology and EvolutionUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
23
|
Heinen T, Xie C, Keshavarz M, Stappert D, Künzel S, Tautz D. Evolution of a New Testis-Specific Functional Promoter Within the Highly Conserved Map2k7 Gene of the Mouse. Front Genet 2022; 12:812139. [PMID: 35069705 PMCID: PMC8766832 DOI: 10.3389/fgene.2021.812139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/08/2021] [Indexed: 12/03/2022] Open
Abstract
Map2k7 (synonym Mkk7) is a conserved regulatory kinase gene and a central component of the JNK signaling cascade with key functions during cellular differentiation. It shows complex transcription patterns, and different transcript isoforms are known in the mouse (Mus musculus). We have previously identified a newly evolved testis-specific transcript for the Map2k7 gene in the subspecies M. m. domesticus. Here, we identify the new promoter that drives this transcript and find that it codes for an open reading frame (ORF) of 50 amino acids. The new promoter was gained in the stem lineage of closely related mouse species but was secondarily lost in the subspecies M. m. musculus and M. m. castaneus. A single mutation can be correlated with its transcriptional activity in M. m. domesticus, and cell culture assays demonstrate the capability of this mutation to drive expression. A mouse knockout line in which the promoter region of the new transcript is deleted reveals a functional contribution of the newly evolved promoter to sperm motility and the spermatid transcriptome. Our data show that a new functional transcript (and possibly protein) can evolve within an otherwise highly conserved gene, supporting the notion of regulatory changes contributing to the emergence of evolutionary novelties.
Collapse
Affiliation(s)
| | - Chen Xie
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Dominik Stappert
- Deutsches Zentrum für Neurodegenerative Erkrankungen e. V. (DZNE), Bonn, Germany
| | - Sven Künzel
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Plank Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
24
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
25
|
Jin G, Ma PF, Wu X, Gu L, Long M, Zhang C, Li DZ. New Genes Interacted with Recent Whole Genome Duplicates in the Fast Stem Growth of Bamboos. Mol Biol Evol 2021; 38:5752-5768. [PMID: 34581782 PMCID: PMC8662795 DOI: 10.1093/molbev/msab288] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
As drivers of evolutionary innovations, new genes allow organisms to explore new niches. However, clear examples of this process remain scarce. Bamboos, the unique grass lineage diversifying into the forest, have evolved with a key innovation of fast growth of woody stem, reaching up to 1 m/day. Here, we identify 1,622 bamboo-specific orphan genes that appeared in recent 46 million years, and 19 of them evolved from noncoding ancestral sequences with entire de novo origination process reconstructed. The new genes evolved gradually in exon−intron structure, protein length, expression specificity, and evolutionary constraint. These new genes, whether or not from de novo origination, are dominantly expressed in the rapidly developing shoots, and make transcriptomes of shoots the youngest among various bamboo tissues, rather than reproductive tissue in other plants. Additionally, the particularity of bamboo shoots has also been shaped by recent whole-genome duplicates (WGDs), which evolved divergent expression patterns from ancestral states. New genes and WGDs have been evolutionarily recruited into coexpression networks to underline fast-growing trait of bamboo shoot. Our study highlights the importance of interactions between new genes and genome duplicates in generating morphological innovation.
Collapse
Affiliation(s)
- Guihua Jin
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Peng-Fei Ma
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Xiaopei Wu
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, Fujian, 350002, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, 60637, USA
| | - Chengjun Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| |
Collapse
|
26
|
Prabh N, Tautz D. Frequent lineage-specific substitution rate changes support an episodic model for protein evolution. G3-GENES GENOMES GENETICS 2021; 11:6372692. [PMID: 34542594 PMCID: PMC8664490 DOI: 10.1093/g3journal/jkab333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/13/2021] [Indexed: 12/04/2022]
Abstract
Since the inception of the molecular clock model for sequence evolution, the investigation of protein divergence has revolved around the question of a more or less constant change of amino acid sequences, with specific overall rates for each family. Although anomalies in clock-like divergence are well known, the assumption of a constant decay rate for a given protein family is usually taken as the null model for protein evolution. However, systematic tests of this null model at a genome-wide scale have lagged behind, despite the databases’ enormous growth. We focus here on divergence rate comparisons between very closely related lineages since this allows clear orthology assignments by synteny and reliable alignments, which are crucial for determining substitution rate changes. We generated a high-confidence dataset of syntenic orthologs from four ape species, including humans. We find that despite the appearance of an overall clock-like substitution pattern, several hundred protein families show lineage-specific acceleration and deceleration in divergence rates, or combinations of both in different lineages. Hence, our analysis uncovers a rather dynamic history of substitution rate changes, even between these closely related lineages, implying that one should expect that a large fraction of proteins will have had a history of episodic rate changes in deeper phylogenies. Furthermore, each of the lineages has a separate set of particularly fast diverging proteins. The genes with the highest percentage of branch-specific substitutions are ADCYAP1 in the human lineage (9.7%), CALU in chimpanzees (7.1%), SLC39A14 in the internal branch leading to humans and chimpanzees (4.1%), RNF128 in gorillas (9%), and S100Z in gibbons (15.2%). The mutational pattern in ADCYAP1 suggests a biased mutation process, possibly through asymmetric gene conversion effects. We conclude that a null model of constant change can be problematic for predicting the evolutionary trajectories of individual proteins.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| |
Collapse
|
27
|
Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, Scott EM, Kelly BJ, Mascha GC, Bornberg-Bauer E, Findlay GD. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet 2021; 17:e1009787. [PMID: 34478447 PMCID: PMC8445463 DOI: 10.1371/journal.pgen.1009787] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/16/2021] [Accepted: 08/19/2021] [Indexed: 02/07/2023] Open
Abstract
Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas, required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas. The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis.
Collapse
Affiliation(s)
- Emily L. Rivard
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Andrew G. Ludwig
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Prajal H. Patel
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Sarah E. Arnold
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Emilie M. Scott
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Brendan J. Kelly
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Grace C. Mascha
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Geoffrey D. Findlay
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| |
Collapse
|
28
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
29
|
Genomic analyses of new genes and their phenotypic effects reveal rapid evolution of essential functions in Drosophila development. PLoS Genet 2021; 17:e1009654. [PMID: 34242211 PMCID: PMC8270118 DOI: 10.1371/journal.pgen.1009654] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 06/09/2021] [Indexed: 12/27/2022] Open
Abstract
It is a conventionally held dogma that the genetic basis underlying development is conserved in a long evolutionary time scale. Ample experiments based on mutational, biochemical, functional, and complementary knockdown/knockout approaches have revealed the unexpectedly important role of recently evolved new genes in the development of Drosophila. The recent progress in the genome-wide experimental testing of gene effects and improvements in the computational identification of new genes (< 40 million years ago, Mya) open the door to investigate the evolution of gene essentiality with a phylogenetically high resolution. These advancements also raised interesting issues in techniques and concepts related to phenotypic effect analyses of genes, particularly of those that recently originated. Here we reported our analyses of these issues, including reproducibility and efficiency of knockdown experiment and difference between RNAi libraries in the knockdown efficiency and testing of phenotypic effects. We further analyzed a large data from knockdowns of 11,354 genes (~75% of the Drosophila melanogaster total genes), including 702 new genes (~66% of the species total new genes that aged < 40 Mya), revealing a similarly high proportion (~32.2%) of essential genes that originated in various Sophophora subgenus lineages and distant ancestors beyond the Drosophila genus. The transcriptional compensation effect from CRISPR knockout were detected for highly similar duplicate copies. Knockout of a few young genes detected analogous essentiality in various functions in development. Taken together, our experimental and computational analyses provide valuable data for detection of phenotypic effects of genes in general and further strong evidence for the concept that new genes in Drosophila quickly evolved essential functions in viability during development.
Collapse
|
30
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Zhang W, Tautz D. Dedicated transcriptomics combined with power analysis lead to functional understanding of genes with weak phenotypic changes in knockout lines. PLoS Comput Biol 2020; 16:e1008354. [PMID: 33180766 PMCID: PMC7685438 DOI: 10.1371/journal.pcbi.1008354] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 11/24/2020] [Accepted: 09/20/2020] [Indexed: 12/26/2022] Open
Abstract
Systematic knockout studies in mice have shown that a large fraction of the gene replacements show no lethal or other overt phenotypes. This has led to the development of more refined analysis schemes, including physiological, behavioral, developmental and cytological tests. However, transcriptomic analyses have not yet been systematically evaluated for non-lethal knockouts. We conducted a power analysis to determine the experimental conditions under which even small changes in transcript levels can be reliably traced. We have applied this to two gene disruption lines of genes for which no function was known so far. Dedicated phenotyping tests informed by the tissues and stages of highest expression of the two genes show small effects on the tested phenotypes. For the transcriptome analysis of these stages and tissues, we used a prior power analysis to determine the number of biological replicates and the sequencing depth. We find that under these conditions, the knockouts have a significant impact on the transcriptional networks, with thousands of genes showing small transcriptional changes. GO analysis suggests that A930004D18Rik is involved in developmental processes through contributing to protein complexes, and A830005F24Rik in extracellular matrix functions. Subsampling analysis of the data reveals that the increase in the number of biological replicates was more important that increasing the sequencing depth to arrive at these results. Hence, our proof-of-principle experiment suggests that transcriptomic analysis is indeed an option to study gene functions of genes with weak or no traceable phenotypic effects and it provides the boundary conditions under which this is possible. Knockout mice benefit the understanding of gene functions in mammals. However, it has proven difficult for many genes to identify clear phenotypes, related due to lack of sufficient assays. As Lewis Wolpert put it in a famous quote “But did you take them to the opera?”, thus metaphorically alluding to the need to extend phenotyping efforts. This insight led to the establishment of phenotyping pipelines that are nowadays routinely used to characterize knock-out lines. However, transcriptomic approaches based on RNA-Seq have been much less explored for such deep-level studies. We conducted here both, a theoretical power analysis and practical RNA-Seq experiments on two knockout lines with small phenotypic effects to investigate the parameters including sample size, sequencing depth, fold change, and dispersion. Our dedicated RNA-Seq studies discovered thousands of genes with small transcriptional changes and enriched in specific functions in both knockout lines. We find that it is more important to increase the number of samples than to increase the sequencing depth. Our work shows that a deep RNA-Seq study on knockouts is powerful for understanding gene functions in cases of weak phenotypic effects, and provides a guideline for the experimental design of such studies.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- * E-mail:
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian K. Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
31
|
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol 2020; 12:2183-2195. [PMID: 33210146 PMCID: PMC7674706 DOI: 10.1093/gbe/evaa194] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity-which have been proposed to play a role in survival of de novo genes-remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.
Collapse
Affiliation(s)
- Daniel Dowling
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | - Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | | |
Collapse
|
32
|
Kiniry SJ, Michel AM, Baranov PV. Computational methods for ribosome profiling data analysis. WILEY INTERDISCIPLINARY REVIEWS. RNA 2020; 11:e1577. [PMID: 31760685 DOI: 10.1002/wrna.1577] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/12/2019] [Accepted: 10/16/2019] [Indexed: 12/15/2022]
Abstract
Since the introduction of the ribosome profiling technique in 2009 its popularity has greatly increased. It is widely used for the comprehensive assessment of gene expression and for studying the mechanisms of regulation at the translational level. As the number of ribosome profiling datasets being produced continues to grow, so too does the need for reliable software that can provide answers to the biological questions it can address. This review describes the computational methods and tools that have been developed to analyze ribosome profiling data at the different stages of the process. It starts with initial routine processing of raw data and follows with more specific tasks such as the identification of translated open reading frames, differential gene expression analysis, or evaluation of local or global codon decoding rates. The review pinpoints challenges associated with each step and explains the ways in which they are currently addressed. In addition it provides a comprehensive, albeit incomplete, list of publicly available software applicable to each step, which may be a beneficial starting point to those unexposed to ribosome profiling analysis. The outline of current challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician's toolbox for ribosome profiling data analysis. This article is characterized under: Translation > Ribosome Structure/Function RNA Evolution and Genomics > Computational Analyses of RNA Translation > Translation Mechanisms Translation > Translation Regulation.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
33
|
Ruiz-Orera J, Villanueva-Cañas JL, Albà MM. Evolution of new proteins from translated sORFs in long non-coding RNAs. Exp Cell Res 2020; 391:111940. [PMID: 32156600 DOI: 10.1016/j.yexcr.2020.111940] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 02/26/2020] [Accepted: 03/02/2020] [Indexed: 01/07/2023]
Abstract
High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | | | - M Mar Albà
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Hospital Del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, 08010, Spain.
| |
Collapse
|