1
|
Cridland JM, Polston ES, Begun DJ. New perspectives on Drosophila melanogaster de novo gene origination revealed by investigation of ancient African genetic variation. Genetics 2025; 230:iyaf044. [PMID: 40106667 PMCID: PMC12059636 DOI: 10.1093/genetics/iyaf044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 03/04/2025] [Indexed: 03/22/2025] Open
Abstract
De novo genes can be defined as sequences producing evolutionarily derived transcripts that are not homologous to transcripts produced in an ancestor. While they appear to be taxonomically widespread, there is little agreement regarding their abundance, their persistence times in genomes, the population genetic processes responsible for their spread or loss, or their possible functions. In Drosophila melanogaster, 2 approaches have been used to discover these genes and investigate their properties. One uses traditional comparative approaches and existing genomic resources and annotations. A second approach uses raw transcriptome data to discover unannotated genes for which there is no evidence of presence in related species. Investigations using the second approach have focused on D. melanogaster genotypes from recently established cosmopolitan populations. However, most of the genetic variation in the species is found in African populations, suggesting the possibility that fuller understanding of genetic novelties in the species may follow from studies of these populations. Here, we investigate de novo gene candidates expressed in testis and accessory glands in a sample of flies from Zambia and compare them with candidate de novo genes expressed in North American populations. We report a large number of previously undiscovered de novo gene candidates, most of which are expressed polymorphically. Many are predicted to code for secreted proteins. In spite of much different levels of genomic variation in Zambian and North American populations, they express similar numbers of candidate de novo genes. We find evidence from genetic analysis of Raleigh inbred lines that a fraction of rarely expressed gene candidates in this population represent deleterious transcription promoted by inbreeding depression. Many de novo gene candidates are expressed in multiple tissues and both sexes, raising questions about how they may interact with natural selection. The relative importance of positive and negative selection, however, remains unclear.
Collapse
Affiliation(s)
- Julie M Cridland
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | - Elizabeth S Polston
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
2
|
Glaser-Schmitt A, Lebherz M, Saydam E, Bornberg-Bauer E, Parsch J. Expression of De Novo Open Reading Frames in Natural Populations of Drosophila melanogaster. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2025. [PMID: 40231390 DOI: 10.1002/jez.b.23297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2025] [Revised: 03/14/2025] [Accepted: 04/03/2025] [Indexed: 04/16/2025]
Abstract
De novo genes, which originate from noncoding DNA, are known to have a high rate of turnover over short evolutionary timescales, such as within a species. Thus, their expression is often lineage- or genetic background-specific. However, little is known about their levels and breadth of expression as populations of a species diverge. In this study, we utilized publicly available RNA-seq data to examine the expression of newly evolved open reading frames (neORFs) in comparison to non- and protein-coding genes in Drosophila melanogaster populations from the derived species range in Europe and the ancestral range in sub-Saharan Africa. Our datasets included two adult tissue types as well as whole bodies at two temperatures for both sexes and three larval/prepupal developmental stages in a single tissue and sex, which allowed us to examine neORF expression and divergence across multiple sample types as well as sex and population. We detected a relatively large proportion (approximately 50%) of annotated neORFs as expressed in the population samples, with neORFs often showing greater expression divergence between populations than non- or protein-coding genes. However, differential expression of neORFs between populations tended to occur in a sample type-specific manner. On the other hand, neORFs displayed less sex-biased expression than the other two gene classes, with the majority of sex-biased neORFs detected in whole bodies, which may be attributable to the presence of the gonads. We also found that neORFs shared among multiple lines in the original set of inbred lines in which they were first detected were more likely to be both expressed and differentially expressed in the new population samples, suggesting that neORFs at a higher frequency (i.e. present in more individuals) within a species are more likely to be functional.
Collapse
Affiliation(s)
- Amanda Glaser-Schmitt
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Munich, Bavaria, Germany
| | - Marie Lebherz
- Institute for Evolution and Biodiversity, University of Münster, Münster, North Rhine-Westphalia, Germany
| | - Ezgi Saydam
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Munich, Bavaria, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, North Rhine-Westphalia, Germany
| | - John Parsch
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Munich, Bavaria, Germany
| |
Collapse
|
3
|
Dohmen E, Aubel M, Eicholt LA, Roginski P, Luria V, Karger A, Grandchamp A. DeNoFo: a file format and toolkit for standardised, comparable de novo gene annotation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.31.644673. [PMID: 40236033 PMCID: PMC11996330 DOI: 10.1101/2025.03.31.644673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Motivation De novo genes emerge from previously non-coding regions of the genome, challenging the traditional view that new genes primarily arise through duplication and adaptation of existing ones. Characterised by their rapid evolution and their novel structural properties or functional roles, de novo genes represent a young area of research. Therefore, the field currently lacks established standards and methodologies, leading to inconsistent terminology and challenges in comparing and reproducing results. Results This work presents a standardised annotation format to document the methodology of de novo gene datasets in a reproducible way. We developed DeNoFo, a toolkit to provide easy access to this format that simplifies annotation of datasets and facilitates comparison across studies. Unifying the different protocols and methods in one standardised format, while providing integration into established file formats, such as fasta or gff, ensures comparability of studies and advances new insights in this rapidly evolving field. Availability and Implementation DeNoFo is available through the official Python Package Index (PyPI) and at https://github.com/EDohmen/denofo . All tools have a graphical user interface and a command line interface. The toolkit is implemented in Python3, available for all major platforms and installable with pip and uv.
Collapse
|
4
|
Xia S, Chen J, Arsala D, Emerson JJ, Long M. Functional innovation through new genes as a general evolutionary process. Nat Genet 2025; 57:295-309. [PMID: 39875578 DOI: 10.1038/s41588-024-02059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 12/15/2024] [Indexed: 01/30/2025]
Abstract
In the past decade, our understanding of how new genes originate in diverse organisms has advanced substantially, and more than a dozen molecular mechanisms for generating initial gene structures were identified, in addition to gene duplication. These new genes have been found to integrate into and modify pre-existing gene networks primarily through mutation and selection, revealing new patterns and rules with stable origination rates across various organisms. This progress has challenged the prevailing belief that new proteins evolve from pre-existing genes, as new genes may arise de novo from noncoding DNA sequences in many organisms, with high rates observed in flowering plants. New genes have important roles in phenotypic and functional evolution across diverse biological processes and structures, with detectable fitness effects of sexual conflict genes that can shape species divergence. Such knowledge of new genes can be of translational value in agriculture and medicine.
Collapse
Affiliation(s)
- Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
5
|
Zhao L, Svetec N, Begun DJ. De Novo Genes. Annu Rev Genet 2024; 58:211-232. [PMID: 39088850 PMCID: PMC12051474 DOI: 10.1146/annurev-genet-111523-102413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Although the majority of annotated new genes in a given genome appear to have arisen from duplication-related mechanisms, recent studies have shown that genes can also originate de novo from ancestrally nongenic sequences. Investigating de novo-originated genes offers rich opportunities to understand the origin and functions of new genes, their regulatory mechanisms, and the associated evolutionary processes. Such studies have uncovered unexpected and intriguing facets of gene origination, offering novel perspectives on the complexity of the genome and gene evolution. In this review, we provide an overview of the research progress in this field, highlight recent advancements, identify key technical and conceptual challenges, and underscore critical questions that remain to be addressed.
Collapse
Affiliation(s)
- Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - Nicolas Svetec
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis, California, USA;
| |
Collapse
|
6
|
Roginski P, Grandchamp A, Quignot C, Lopes A. De Novo Emerged Gene Search in Eukaryotes with DENSE. Genome Biol Evol 2024; 16:evae159. [PMID: 39212967 PMCID: PMC11363675 DOI: 10.1093/gbe/evae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2024] [Indexed: 09/04/2024] Open
Abstract
The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.
Collapse
Affiliation(s)
- Paul Roginski
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Chloé Quignot
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
7
|
Iyengar BR, Grandchamp A, Bornberg-Bauer E. How antisense transcripts can evolve to encode novel proteins. Nat Commun 2024; 15:6187. [PMID: 39043684 PMCID: PMC11266595 DOI: 10.1038/s41467-024-50550-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 07/12/2024] [Indexed: 07/25/2024] Open
Abstract
Protein coding features can emerge de novo in non coding transcripts, resulting in emergence of new protein coding genes. Studies across many species show that a large fraction of evolutionarily novel non-coding RNAs have an antisense overlap with protein coding genes. The open reading frames (ORFs) in these antisense RNAs could also overlap with existing ORFs. In this study, we investigate how the evolution an ORF could be constrained by its overlap with an existing ORF in three different reading frames. Using a combination of mathematical modeling and genome/transcriptome data analysis in two different model organisms, we show that antisense overlap can increase the likelihood of ORF emergence and reduce the likelihood of ORF loss, especially in one of the three reading frames. In addition to rationalising the repeatedly reported prevalence of de novo emerged genes in antisense transcripts, our work also provides a generic modeling and an analytical framework that can be used to understand evolution of antisense genes.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany.
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Aix-Marseille Université, INSERM, TAGC, Marseille, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, Germany
| |
Collapse
|
8
|
Lebherz MK, Iyengar BR, Bornberg-Bauer E. Modeling Length Changes in De Novo Open Reading Frames during Neutral Evolution. Genome Biol Evol 2024; 16:evae129. [PMID: 38879874 PMCID: PMC11339603 DOI: 10.1093/gbe/evae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2024] [Indexed: 07/06/2024] Open
Abstract
For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time. Existing literature shows that older de novo genes have longer ORF, but it is not clear if they elongated with time or remained of the same length since their inception. To address this question we developed a mathematical model of ORF elongation as a Markov-jump process, and show that ORFs tend to keep their length in short evolutionary timescales. We also show that if change occurs it is likely to be a truncation. Our genomics and transcriptomics data analyses of seven Drosophila melanogaster populations are also in agreement with the model's prediction. We conclude that selection could facilitate ORF length extension that may explain why longer ORFs were observed in old de novo genes in studies analysing longer evolutionary time scales. Alternatively, shorter ORFs may be purged because they may be less likely to yield functional proteins.
Collapse
Affiliation(s)
- Marie Kristin Lebherz
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen 72076, Germany
| |
Collapse
|
9
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 PMCID: PMC11221430 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
10
|
Lebherz MK, Fouks B, Schmidt J, Bornberg-Bauer E, Grandchamp A. DNA Transposons Favor De Novo Transcript Emergence Through Enrichment of Transcription Factor Binding Motifs. Genome Biol Evol 2024; 16:evae134. [PMID: 38934893 PMCID: PMC11264136 DOI: 10.1093/gbe/evae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 06/11/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024] Open
Abstract
De novo genes emerge from noncoding regions of genomes via succession of mutations. Among others, such mutations activate transcription and create a new open reading frame (ORF). Although the mechanisms underlying ORF emergence are well documented, relatively little is known about the mechanisms enabling new transcription events. Yet, in many species a continuum between absent and very prominent transcription has been reported for essentially all regions of the genome. In this study, we searched for de novo transcripts by using newly assembled genomes and transcriptomes of seven inbred lines of Drosophila melanogaster, originating from six European and one African population. This setup allowed us to detect sample specific de novo transcripts, and compare them to their homologous nontranscribed regions in other samples, as well as genic and intergenic control sequences. We studied the association with transposable elements (TEs) and the enrichment of transcription factor motifs upstream of de novo emerged transcripts and compared them with regulatory elements. We found that de novo transcripts overlap with TEs more often than expected by chance. The emergence of new transcripts correlates with regions of high guanine-cytosine content and TE expression. Moreover, upstream regions of de novo transcripts are highly enriched with regulatory motifs. Such motifs are more enriched in new transcripts overlapping with TEs, particularly DNA TEs, and are more conserved upstream de novo transcripts than upstream their 'nontranscribed homologs'. Overall, our study demonstrates that TE insertion is important for transcript emergence, partly by introducing new regulatory motifs from DNA TE families.
Collapse
Affiliation(s)
| | - Bertrand Fouks
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
- CIRAD, UMR AGAP Institut, F-34398, Montpellier, France
| | - Julian Schmidt
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
11
|
Sanejouand YH. Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? J Mol Evol 2024:10.1007/s00239-024-10174-z. [PMID: 38916610 DOI: 10.1007/s00239-024-10174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 06/26/2024]
Abstract
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, 2 rue de la Houssinière, Nantes, 44322, Pays de la Loire, France.
| |
Collapse
|
12
|
Aubel M, Buchel F, Heames B, Jones A, Honc O, Bornberg-Bauer E, Hlouchova K. High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential. Genome Biol Evol 2024; 16:evae069. [PMID: 38597156 PMCID: PMC11024478 DOI: 10.1093/gbe/evae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Filip Buchel
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Biochemistry, Faculty of Science, Charles University, Prague, Czech Republic
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alun Jones
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Ondrej Honc
- Imaging Methods Core Facility, BIOCEV, Prague, Czech Republic
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Department of Protein Evolution, Max Planck-Institute for Biology Tuebingen, Tuebingen, Germany
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
13
|
Grandchamp A, Czuppon P, Bornberg-Bauer E. Quantification and modeling of turnover dynamics of de novo transcripts in Drosophila melanogaster. Nucleic Acids Res 2024; 52:274-287. [PMID: 38000384 PMCID: PMC10783523 DOI: 10.1093/nar/gkad1079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/13/2023] [Accepted: 10/28/2023] [Indexed: 11/26/2023] Open
Abstract
Most of the transcribed eukaryotic genomes are composed of non-coding transcripts. Among these transcripts, some are newly transcribed when compared to outgroups and are referred to as de novo transcripts. De novo transcripts have been shown to play a major role in genomic innovations. However, little is known about the rates at which de novo transcripts are gained and lost in individuals of the same species. Here, we address this gap and estimate the de novo transcript turnover rate with an evolutionary model. We use DNA long reads and RNA short reads from seven geographically remote samples of inbred individuals of Drosophila melanogaster to detect de novo transcripts that are gained on a short evolutionary time scale. Overall, each sampled individual contains around 2500 unspliced de novo transcripts, with most of them being sample specific. We estimate that around 0.15 transcripts are gained per year, and that each gained transcript is lost at a rate around 5× 10-5 per year. This high turnover of transcripts suggests frequent exploration of new genomic sequences within species. These rate estimates are essential to comprehend the process and timescale of de novo gene birth.
Collapse
Affiliation(s)
- Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Peter Czuppon
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
14
|
Mani S, Tlusty T. Gene birth in a model of non-genic adaptation. BMC Biol 2023; 21:257. [PMID: 37957718 PMCID: PMC10644530 DOI: 10.1186/s12915-023-01745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 10/24/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.
Collapse
Affiliation(s)
- Somya Mani
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| |
Collapse
|