1
|
Cheng L, Han Q, Hao Y, Qiao Z, Li M, Liu D, Yin H, Li T, Long W, Luo S, Gao Y, Zhang Z, Yu H, Sun X, Li H, Zhao Y. Genome assembly of Stewartia sinensis reveals origin and evolution of orphan genes in Theaceae. Commun Biol 2025; 8:354. [PMID: 40032980 DOI: 10.1038/s42003-025-07525-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 01/13/2025] [Indexed: 03/05/2025] Open
Abstract
Orphan genes play crucial roles in diverse biological processes, but the evolutionary trajectories and functional divergence remain largely unexplored. The Theaceae family, including the economically and culturally important tea plant, offers a distinctive model to examine these aspects. Here, we integrated Nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C methods to decode a pseudo-chromosomal genome assembly of Stewartia sinensis, from the earliest-diverging tribe of Theaceae, spanning 2.95 Gb. Comparative genomic analysis revealed the absence of recent whole-genome duplication events in the Theaceae ancestor, highlighting tandem duplications as the predominant mechanism of gene expansion. We identified 31,331 orphan genes, some of which appear to have ancient origins, suggesting early emergence with frequent gains and losses, while others seem more specific and recent. Notably, orphan genes are distinguished by shorter lengths, fewer exons and functional domains compared to genes that originate much earlier, like transcription factors. Moreover, tandem duplication contributes significantly to the adaptive evolution and characteristic diversity of Theaceae, and it is also a major mechanism driving the origination of orphan genes. This study illuminates the evolutionary dynamics of orphan genes, providing a valuable resource for understanding the origin and evolution of tea plant flavor and enhancing genetic breeding efforts.
Collapse
Affiliation(s)
- Lin Cheng
- Dabie Mountain Laboratory, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
- Henan International Joint Laboratory of Tea-oil Tree Biology and High-Value Utilization, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
| | - Qunwei Han
- Dabie Mountain Laboratory, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
- Henan International Joint Laboratory of Tea-oil Tree Biology and High-Value Utilization, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
| | - Yanlin Hao
- Dabie Mountain Laboratory, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
| | - Zhen Qiao
- Dabie Mountain Laboratory, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
| | - Mengge Li
- Dabie Mountain Laboratory, College of Tea and Food Science, Xinyang Normal University, Xinyang, China
| | - Daliang Liu
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Hao Yin
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Tao Li
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Wen Long
- Xinyang Normal University Library, Xinyang Normal University, Xinyang, China
| | - Shanshan Luo
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Ya Gao
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Zhihan Zhang
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Houlin Yu
- Department of Biochemistry and Molecular Biology, University of Massachusetts Amherst, Amherst, USA
- Broad Institute of MIT and Harvard, Cambridge, USA
| | - Xinhao Sun
- College of Science, Northeastern University, Boston, USA
| | - Hao Li
- School of Life Sciences, East China Normal University, Shanghai, China.
- Shanghai Institute of Eco-Chongming (SIEC), Shanghai, China.
| | - Yiyong Zhao
- Guizhou Key Laboratory of Functional Agriculture, College of Agriculture, Guizhou University, Guiyang, China.
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China.
| |
Collapse
|
2
|
Cornelissen FMG, He Z, Ciputra E, de Haas RR, Beumer‐Chuwonpad A, Noske D, Vandertop WP, Piersma SR, Jiménez CR, Murre C, Westerman BA. The translatome of glioblastoma. Mol Oncol 2025; 19:716-740. [PMID: 39417309 PMCID: PMC11887679 DOI: 10.1002/1878-0261.13743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 07/17/2024] [Accepted: 07/19/2024] [Indexed: 10/19/2024] Open
Abstract
Glioblastoma (GB), the most common and aggressive brain tumor, demonstrates intrinsic resistance to current therapies, resulting in poor clinical outcomes. Cancer progression can be partially attributed to the deregulation of protein translation mechanisms that drive cancer cell growth. In this study, we present the translatome landscape of GB as a valuable data resource. Eight patient-derived GB sphere cultures (GSCs) were analyzed using ribosome profiling and messenger RNA (mRNA) sequencing. We investigated inter-cell-line differences through differential expression analysis at both the translatome and transcriptome levels. Translational changes post-radiotherapy were assessed at 30 and 60 min. The translation of non-coding RNAs (ncRNAs) was validated using in-house and public mass spectrometry (MS) data, whereas RNA expression was confirmed by quantitative PCR (qPCR). Our findings demonstrate that ribosome sequencing provides more detailed information than MS or transcriptional analyses. Transcriptional similarities among GSCs correlate with translational similarities, aligning with previously defined subtypes such as proneural and mesenchymal. Additionally, we identified a broad spectrum of open reading frame types in both coding and non-coding mRNA regions, including long non-coding RNAs (lncRNAs) and pseudogenes undergoing active translation. Translation of ncRNAs into peptides was independently confirmed by in-house data and external MS data. We also observed that translational regulation of histones (downregulated) and splicing factors (upregulated) occurs in response to radiotherapy. These data offer new insights into genome-wide protein synthesis, identifying translationally regulated genes and alternative translation initiation sites in GB under normal and radiotherapeutic conditions, providing a rich resource for GB research. Further functional validation of differentially expressed genes after radiotherapy is needed. Understanding translational control in GB can reveal mechanistic insights and identify currently unknown biomarkers, ultimately enhancing the diagnosis and treatment of this aggressive brain cancer.
Collapse
Affiliation(s)
- Fleur M. G. Cornelissen
- Department of Molecular BiologyUniversity of California, San DiegoLa JollaCAUSA
- Department of NeurosurgeryAmsterdam UMC, Location VUMC, Cancer CenterAmsterdamThe Netherlands
| | - Zhaoren He
- Department of Molecular BiologyUniversity of California, San DiegoLa JollaCAUSA
| | - Edward Ciputra
- Department of NeurosurgeryAmsterdam UMC, Location VUMC, Cancer CenterAmsterdamThe Netherlands
| | - Richard R. de Haas
- OncoProteomics Laboratory, Cancer Center AmsterdamAmsterdam UMCThe Netherlands
| | | | - David Noske
- Department of NeurosurgeryAmsterdam UMC, Location VUMC, Cancer CenterAmsterdamThe Netherlands
| | - W. Peter Vandertop
- Department of NeurosurgeryAmsterdam UMC, Location VUMC, Cancer CenterAmsterdamThe Netherlands
| | - Sander R. Piersma
- OncoProteomics Laboratory, Cancer Center AmsterdamAmsterdam UMCThe Netherlands
| | - Connie R. Jiménez
- OncoProteomics Laboratory, Cancer Center AmsterdamAmsterdam UMCThe Netherlands
| | - Cornelis Murre
- Department of Molecular BiologyUniversity of California, San DiegoLa JollaCAUSA
| | - Bart A. Westerman
- Department of NeurosurgeryAmsterdam UMC, Location VUMC, Cancer CenterAmsterdamThe Netherlands
| |
Collapse
|
3
|
Zhang Q. Structural insights into the advancements of mobile colistin resistance enzymes. Microbiol Res 2025; 291:127983. [PMID: 39612773 DOI: 10.1016/j.micres.2024.127983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 11/17/2024] [Accepted: 11/23/2024] [Indexed: 12/01/2024]
Abstract
The plasmid-encoded mobile colistin resistance enzyme (MCR) is challenging the clinical efficacy of colistin as a last-resort antibiotic against multidrug-resistant bacteria. This transferase catalyzes the addition of positively charged phosphoethanolamine to lipid A, and its catalytic domain in the periplasm has been elucidated. To date, there are many works on the catalytic domain and function of this enzyme class. However, the roles of unreported soluble or inter-membrane domains remain undefined, which might cause an inaccurate or even incorrect understanding of substrate recognition and binding. In this review, MCR-1 is first compared and analyzed from the perspective of the full-length alpha-fold MCR-1. Specifically, some disputed issues, especially in its architecture and catalytic mechanism are discussed independently. Meanwhile, the structure-based insights into MCRs variants, their evolutions, and the balance between colistin-resistance and survival costs, are also critically analyzed. Importantly, by comparing it with the full-length MCR-1, several potential pockets for drug design have been re-identified. Finally, recent advancements in inhibitors targeting MCR-1 are also in-depth summarized. These details offer a new perspective on MCRs and serve as a valuable foundation for drug development.
Collapse
Affiliation(s)
- Qi Zhang
- Centre for Eye and Vision Research, Hong Kong Science Park, Hong Kong.
| |
Collapse
|
4
|
Xia S, Chen J, Arsala D, Emerson JJ, Long M. Functional innovation through new genes as a general evolutionary process. Nat Genet 2025; 57:295-309. [PMID: 39875578 DOI: 10.1038/s41588-024-02059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 12/15/2024] [Indexed: 01/30/2025]
Abstract
In the past decade, our understanding of how new genes originate in diverse organisms has advanced substantially, and more than a dozen molecular mechanisms for generating initial gene structures were identified, in addition to gene duplication. These new genes have been found to integrate into and modify pre-existing gene networks primarily through mutation and selection, revealing new patterns and rules with stable origination rates across various organisms. This progress has challenged the prevailing belief that new proteins evolve from pre-existing genes, as new genes may arise de novo from noncoding DNA sequences in many organisms, with high rates observed in flowering plants. New genes have important roles in phenotypic and functional evolution across diverse biological processes and structures, with detectable fitness effects of sexual conflict genes that can shape species divergence. Such knowledge of new genes can be of translational value in agriculture and medicine.
Collapse
Affiliation(s)
- Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
5
|
Jin GT, Xu YC, Hou XH, Jiang J, Li XX, Xiao JH, Bian YT, Gong YB, Wang MY, Zhang ZQ, Zhang YE, Zhu WS, Liu YX, Guo YL. A de novo Gene Promotes Seed Germination Under Drought Stress in Arabidopsis. Mol Biol Evol 2025; 42:msae262. [PMID: 39719058 PMCID: PMC11721784 DOI: 10.1093/molbev/msae262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 10/29/2024] [Accepted: 12/06/2024] [Indexed: 12/26/2024] Open
Abstract
The origin of genes from noncoding sequences is a long-term and fundamental biological question. However, how de novo genes originate and integrate into the existing pathways to regulate phenotypic variations is largely unknown. Here, we selected 7 genes from 782 de novo genes for functional exploration based on transcriptional and translational evidence. Subsequently, we revealed that Sun Wu-Kong (SWK), a de novo gene that originated from a noncoding sequence in Arabidopsis thaliana, plays a role in seed germination under osmotic stress. SWK is primarily expressed in dry seed, imbibing seed and silique. SWK can be fully translated into an 8 kDa protein, which is mainly located in the nucleus. Intriguingly, SWK was integrated into an extant pathway of hydrogen peroxide content (folate synthesis pathway) via the upstream gene cytHPPK/DHPS, an Arabidopsis-specific gene that originated from the duplication of mitHPPK/DHPS, and downstream gene GSTF9, to improve seed germination in osmotic stress. In addition, we demonstrated that the presence of SWK may be associated with drought tolerance in natural populations of Arabidopsis. Overall, our study highlights how a de novo gene originated and integrated into the existing pathways to regulate stress adaptation.
Collapse
Affiliation(s)
- Guang-Teng Jin
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xing-Hui Hou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Juan Jiang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin-Xin Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jia-Hui Xiao
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu-Tao Bian
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bo Gong
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ming-Yu Wang
- State Key Laboratory of Maize Bio-breeding/College of Plant Protection, China Agricultural University, Beijing 100193, China
| | - Zhi-Qin Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong E Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents and Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Wang-Sheng Zhu
- State Key Laboratory of Maize Bio-breeding/College of Plant Protection, China Agricultural University, Beijing 100193, China
| | - Yong-Xiu Liu
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
6
|
Pereira AB, Marano M, Bathala R, Zaragoza RA, Neira A, Samano A, Owoyemi A, Casola C. Orphan genes are not a distinct biological entity. Bioessays 2025; 47:e2400146. [PMID: 39491810 DOI: 10.1002/bies.202400146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/06/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024]
Abstract
The genome sequencing revolution has revealed that all species possess a large number of unique genes critical for trait variation, adaptation, and evolutionary innovation. One widely used approach to identify such genes consists of detecting protein-coding sequences with no homology in other genomes, termed orphan genes. These genes have been extensively studied, under the assumption that they represent valid proxies for species-specific genes. Here, we critically evaluate taxonomic, phylogenetic, and sequence evolution evidence showing that orphan genes belong to a range of evolutionary ages and thus cannot be assigned to a single lineage. Furthermore, we show that the processes generating orphan genes are substantially more diverse than generally thought and include horizontal gene transfer, transposable element domestication, and overprinting. Thus, orphan genes represent a heterogeneous collection of genes rather than a single biological entity, making them unsuitable as a subject for meaningful investigation of gene evolution and phenotypic innovation.
Collapse
Affiliation(s)
- Andres Barboza Pereira
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Matthew Marano
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Ramya Bathala
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas, USA
| | | | - Andres Neira
- School of Pharmacy, Texas A&M University, College Station, Texas, USA
| | - Alex Samano
- Department of Biology, Texas A&M University, College Station, Texas, USA
| | - Adekola Owoyemi
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| | - Claudio Casola
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
7
|
Guay SY, Patel PH, Thomalla JM, McDermott KL, O'Toole JM, Arnold SE, Obrycki SJ, Wolfner MF, Findlay GD. An orphan gene is essential for efficient sperm entry into eggs in Drosophila melanogaster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.08.607187. [PMID: 39149251 PMCID: PMC11326263 DOI: 10.1101/2024.08.08.607187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
While spermatogenesis has been extensively characterized in the Drosophila melanogaster model system, very little is known about the genes required for fly sperm entry into eggs. We identified a lineage-specific gene, which we named katherine johnson (kj), that is required for efficient fertilization. Males that do not express kj produce and transfer sperm that are stored normally in females, but sperm from these males enter eggs with severely reduced efficiency. Using a tagged transgenic rescue construct, we observed that the KJ protein localizes around the edge of the nucleus at various stages of spermatogenesis but is undetectable in mature sperm. These data suggest that kj exerts an effect on sperm development, the loss of which results in reduced fertilization ability. Interestingly, KJ protein lacks detectable sequence similarity to any other known protein, suggesting that kj could be a lineage-specific orphan gene. While previous bioinformatic analyses indicated that kj was restricted to the melanogaster group of Drosophila, we identified putative orthologs with conserved synteny, male-biased expression, and predicted protein features across the genus, as well as likely instances of gene loss in some lineages. Thus, kj was likely present in the Drosophila common ancestor and subsequently evolved an essential role in fertility in D. melanogaster. Our results demonstrate a new aspect of male reproduction that has been shaped by a lineage-specific gene and provide a molecular foothold for further investigating the mechanism of sperm entry into eggs in Drosophila.
Collapse
Affiliation(s)
- Sara Y Guay
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Prajal H Patel
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jonathon M Thomalla
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | - Kerry L McDermott
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Jillian M O'Toole
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah E Arnold
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Sarah J Obrycki
- Department of Biology, College of the Holy Cross, Worcester, MA 01610
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | | |
Collapse
|
8
|
Wang Z, Wang Y, Kasuga T, Hassler H, Lopez‐Giraldez F, Dong C, Yarden O, Townsend JP. Origins of lineage-specific elements via gene duplication, relocation, and regional rearrangement in Neurospora crassa. Mol Ecol 2024; 33:e17168. [PMID: 37843462 PMCID: PMC11628664 DOI: 10.1111/mec.17168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/20/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023]
Abstract
The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well-sampled, recently diversified, clades. One such clade is the model genus Neurospora, members of which lack recent gene duplications. Several Neurospora species are comprehensively characterized organisms apt for studying the evolution of lineage-specific genes (LSGs). Using gene synteny, we documented that 78% of Neurospora LSG clusters are located adjacent to the telomeres featuring extensive tracts of non-coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co-regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSG mas-1, a gene with roles in cell-wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a "rummage region" in the N. crassa genome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non-coding sequences.
Collapse
Affiliation(s)
- Zheng Wang
- Department of BiostatisticsYale School of Public HealthNew HavenConnecticutUSA
| | - Yen‐Wen Wang
- Department of BiostatisticsYale School of Public HealthNew HavenConnecticutUSA
| | - Takao Kasuga
- College of Biological SciencesUniversity of California, DavisDavisCaliforniaUSA
| | - Hayley Hassler
- Department of BiostatisticsYale School of Public HealthNew HavenConnecticutUSA
| | | | - Caihong Dong
- Institute of MicrobiologyChinese Academy of SciencesBeijingChina
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and EnvironmentThe Hebrew University of JerusalemRehovotIsrael
| | - Jeffrey P. Townsend
- Department of BiostatisticsYale School of Public HealthNew HavenConnecticutUSA
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
9
|
Pai VJ, Lau CJ, Garcia-Ruiz A, Donaldson C, Vaughan JM, Miller B, De Souza EV, Pinto AM, Diedrich J, Gavva NR, Yu S, DeBoever C, Horman SR, Saghatelian A. Microprotein-encoding RNA regulation in cells treated with pro-inflammatory and pro-fibrotic stimuli. BMC Genomics 2024; 25:1034. [PMID: 39497054 PMCID: PMC11536906 DOI: 10.1186/s12864-024-10948-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 10/24/2024] [Indexed: 11/06/2024] Open
Abstract
BACKGROUND Recent analysis of the human proteome via proteogenomics and ribosome profiling of the transcriptome revealed the existence of thousands of previously unannotated microprotein-coding small open reading frames (smORFs). Most functional microproteins were chosen for characterization because of their evolutionary conservation. However, one example of a non-conserved immunomodulatory microprotein in mice suggests that strict sequence conservation misses some intriguing microproteins. RESULTS We examine the ability of gene regulation to identify human microproteins with potential roles in inflammation or fibrosis of the intestine. To do this, we collected ribosome profiling data of intestinal cell lines and peripheral blood mononuclear cells and used gene expression of microprotein-encoding transcripts to identify strongly regulated microproteins, including several examples of microproteins that are only conserved with primates. CONCLUSION This approach reveals a number of new microproteins worthy of additional functional characterization and provides a dataset that can be queried in different ways to find additional gut microproteins of interest.
Collapse
Affiliation(s)
- Victor J Pai
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA.
| | - Calvin J Lau
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Almudena Garcia-Ruiz
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Cynthia Donaldson
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Joan M Vaughan
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Brendan Miller
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Eduardo V De Souza
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Antonio M Pinto
- Mass Spectrometry Core, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Jolene Diedrich
- Mass Spectrometry Core, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Narender R Gavva
- Takeda Development Center Americas, Inc, San Diego, CA, 92121, USA
| | - Shan Yu
- Takeda Development Center Americas, Inc, San Diego, CA, 92121, USA
| | | | - Shane R Horman
- Takeda Development Center Americas, Inc, San Diego, CA, 92121, USA.
| | - Alan Saghatelian
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, 92037, USA.
| |
Collapse
|
10
|
Zhao L, Svetec N, Begun DJ. De Novo Genes. Annu Rev Genet 2024; 58:211-232. [PMID: 39088850 DOI: 10.1146/annurev-genet-111523-102413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]
Abstract
Although the majority of annotated new genes in a given genome appear to have arisen from duplication-related mechanisms, recent studies have shown that genes can also originate de novo from ancestrally nongenic sequences. Investigating de novo-originated genes offers rich opportunities to understand the origin and functions of new genes, their regulatory mechanisms, and the associated evolutionary processes. Such studies have uncovered unexpected and intriguing facets of gene origination, offering novel perspectives on the complexity of the genome and gene evolution. In this review, we provide an overview of the research progress in this field, highlight recent advancements, identify key technical and conceptual challenges, and underscore critical questions that remain to be addressed.
Collapse
Affiliation(s)
- Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - Nicolas Svetec
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA; ,
| | - David J Begun
- Department of Evolution and Ecology, University of California, Davis, California, USA;
| |
Collapse
|
11
|
Ruiz-Orera J, Miller DC, Greiner J, Genehr C, Grammatikaki A, Blachut S, Mbebi J, Patone G, Myronova A, Adami E, Dewani N, Liang N, Hummel O, Muecke MB, Hildebrandt TB, Fritsch G, Schrade L, Zimmermann WH, Kondova I, Diecke S, van Heesch S, Hübner N. Evolution of translational control and the emergence of genes and open reading frames in human and non-human primate hearts. NATURE CARDIOVASCULAR RESEARCH 2024; 3:1217-1235. [PMID: 39317836 PMCID: PMC11473369 DOI: 10.1038/s44161-024-00544-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 08/28/2024] [Indexed: 09/26/2024]
Abstract
Evolutionary innovations can be driven by changes in the rates of RNA translation and the emergence of new genes and small open reading frames (sORFs). In this study, we characterized the transcriptional and translational landscape of the hearts of four primate and two rodent species through integrative ribosome and transcriptomic profiling, including adult left ventricle tissues and induced pluripotent stem cell-derived cardiomyocyte cell cultures. We show here that the translational efficiencies of subunits of the mitochondrial oxidative phosphorylation chain complexes IV and V evolved rapidly across mammalian evolution. Moreover, we discovered hundreds of species-specific and lineage-specific genomic innovations that emerged during primate evolution in the heart, including 551 genes, 504 sORFs and 76 evolutionarily conserved genes displaying human-specific cardiac-enriched expression. Overall, our work describes the evolutionary processes and mechanisms that have shaped cardiac transcription and translation in recent primate evolution and sheds light on how these can contribute to cardiac development and disease.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
| | - Duncan C Miller
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Johannes Greiner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Carolin Genehr
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Aliki Grammatikaki
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Susanne Blachut
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Jeanne Mbebi
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Giannino Patone
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Anna Myronova
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Eleonora Adami
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Nikita Dewani
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Ning Liang
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Oliver Hummel
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael B Muecke
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Thomas B Hildebrandt
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
- Freie Universitaet Berlin, Berlin, Germany
| | - Guido Fritsch
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Lisa Schrade
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Wolfram H Zimmermann
- Institute of Pharmacology and Toxicology, University Medical Center Göttingen, Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Lower Saxony, Göttingen, Germany
- DZNE (German Center for Neurodegenerative Diseases), Göttingen, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Göttingen, Germany
| | - Ivanela Kondova
- Biomedical Primate Research Centre (BPRC), Rijswijk, The Netherlands
| | - Sebastian Diecke
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Norbert Hübner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany.
- Charité-Universitätsmedizin, Berlin, Germany.
- Helmholtz Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
12
|
Anderson AG, Moyers BA, Loupe JM, Rodriguez-Nunez I, Felker SA, Lawlor JMJ, Bunney WE, Bunney BG, Cartagena PM, Sequeira A, Watson SJ, Akil H, Mendenhall EM, Cooper GM, Myers RM. Allele-specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs. Genome Res 2024; 34:1224-1234. [PMID: 39152038 PMCID: PMC11444172 DOI: 10.1101/gr.278601.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
Transcription factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF-DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele-specific binding (ASB) at heterozygous variants for 94 TFs in nine brain regions from two donors. Leveraging graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signals between alleles at heterozygous variants within each brain region and identified thousands of variants exhibiting ASB for at least one TF. ASB reproducibility was measured by comparisons between independent experiments both within and between donors. We found that rare alleles in the general population more frequently led to reduced TF binding, whereas common alleles had an equal likelihood of increasing or decreasing binding. Further, for ASB variants in predicted binding motifs, the favored allele tended to be the one with the stronger expected motif match, but this concordance was not observed within highly occupied sites. We also found that neuron-specific cis-regulatory elements (cCREs), in contrast with oligodendrocyte-specific cCREs, showed depletion of ASB variants. We identified 2670 ASB variants associated with evidence for allele-specific gene expression in the brain from GTEx data and observed increasing eQTL effect direction concordance as ASB significance increases. These results provide a valuable and unique resource for mechanistic analysis of cis-regulatory variation in human brain tissue.
Collapse
Affiliation(s)
- Ashlyn G Anderson
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
- University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Belle A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Jacob M Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | | | - James M J Lawlor
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - William E Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Blynn G Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Preston M Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Stanley J Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
13
|
Yada T, Taniguchi T. A putative scenario of how de novo protein-coding genes originate in the Saccharomyces cerevisiae lineage. BMC Genomics 2024; 25:834. [PMID: 39237856 PMCID: PMC11378370 DOI: 10.1186/s12864-024-10669-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 07/25/2024] [Indexed: 09/07/2024] Open
Abstract
BACKGROUND Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood. RESULTS We have conceived bioinformatic analysis for sketching a scenario for de novo origination of protein-coding genes. For each de novo protein-coding gene, we firstly identified an edge of a given phylogenetic tree where the gene was born based on parsimony. Then, from a multiple sequence alignment of the de novo gene and its orthologous regions, we constructed ancestral DNA sequences of the gene corresponding to both end nodes of the edge. We finally revealed statistical features observed in evolution between the two ancestral sequences. In the analysis of the Saccharomyces cerevisiae lineage, we have successfully sketched a putative scenario for de novo origination of protein-coding genes. (1) In the beginning was GC-rich genome regions. (2) Neutral mutations were accumulated in the regions. (3) ORFs were extended/combined, and then (4) translation signature (Kozak consensus sequence) was recruited. Interestingly, as the scenario progresses from (2) to (4), the specificity of mutations increases. CONCLUSION To the best of our knowledge, this is the first report outlining a scenario of de novo origination of protein-coding genes. Our bioinformatic analysis can capture events that occur during a short evolutionary time by directly observing the evolution of the ancestral sequences from non-genic to genic. This property is suitable for the analysis of fast evolving de novo genes.
Collapse
Affiliation(s)
- Tetsushi Yada
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan.
| | | |
Collapse
|
14
|
Camarena ME, Theunissen P, Ruiz M, Ruiz-Orera J, Calvo-Serra B, Castelo R, Castro C, Sarobe P, Fortes P, Perera-Bel J, Albà MM. Microproteins encoded by noncanonical ORFs are a major source of tumor-specific antigens in a liver cancer patient meta-cohort. SCIENCE ADVANCES 2024; 10:eadn3628. [PMID: 38985879 PMCID: PMC11235171 DOI: 10.1126/sciadv.adn3628] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/04/2024] [Indexed: 07/12/2024]
Abstract
The expression of tumor-specific antigens during cancer progression can trigger an immune response against the tumor. Here, we investigate if microproteins encoded by noncanonical open reading frames (ncORFs) are a relevant source of tumor-specific antigens. We analyze RNA sequencing data from 117 hepatocellular carcinoma (HCC) tumors and matched healthy tissue together with ribosome profiling and immunopeptidomics data. Combining human leukocyte antigen-epitope binding predictions and experimental validation experiments, we conclude that around 40% of the tumor-specific antigens in HCC are likely to be derived from ncORFs, including two peptides that can trigger an immune response in humanized mice. We identify a subset of 33 tumor-specific long noncoding RNAs expressing novel cancer antigens shared by more than 10% of the HCC samples analyzed, which, when combined, cover a large proportion of the patients. The results of the study open avenues for extending the range of anticancer vaccines.
Collapse
Affiliation(s)
| | - Patrick Theunissen
- Center for Applied Medical Research (CIMA), University of Navarra (UNAV), Pamplona, Spain
| | - Marta Ruiz
- Center for Applied Medical Research (CIMA), University of Navarra (UNAV), Pamplona, Spain
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Beatriz Calvo-Serra
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Robert Castelo
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Carla Castro
- Center for Applied Medical Research (CIMA), University of Navarra (UNAV), Pamplona, Spain
| | - Pablo Sarobe
- Center for Applied Medical Research (CIMA), University of Navarra (UNAV), Pamplona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Pamplona, Spain
- Instituto de Investigación Sanitaria de Navarra (IdiSNA), Pamplona, Spain
- Cancer Clinic University of Navarra (CCUN), Pamplona, Spain
| | - Puri Fortes
- Center for Applied Medical Research (CIMA), University of Navarra (UNAV), Pamplona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Pamplona, Spain
- Instituto de Investigación Sanitaria de Navarra (IdiSNA), Pamplona, Spain
- Cancer Clinic University of Navarra (CCUN), Pamplona, Spain
- Spanish Network for Advanced Therapies (TERAV ISCIII), Madrid, Spain
| | | | - M Mar Albà
- Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
15
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
16
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 PMCID: PMC11221430 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
17
|
Sanejouand YH. Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? J Mol Evol 2024:10.1007/s00239-024-10174-z. [PMID: 38916610 DOI: 10.1007/s00239-024-10174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 06/26/2024]
Abstract
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, 2 rue de la Houssinière, Nantes, 44322, Pays de la Loire, France.
| |
Collapse
|
18
|
Lee U, Mozeika SM, Zhao L. A Synergistic, Cultivator Model of De Novo Gene Origination. Genome Biol Evol 2024; 16:evae103. [PMID: 38748819 PMCID: PMC11152449 DOI: 10.1093/gbe/evae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 06/07/2024] Open
Abstract
The origin and fixation of evolutionarily young genes is a fundamental question in evolutionary biology. However, understanding the origins of newly evolved genes arising de novo from noncoding genomic sequences is challenging. This is partly due to the low likelihood that several neutral or nearly neutral mutations fix prior to the appearance of an important novel molecular function. This issue is particularly exacerbated in large effective population sizes where the effect of drift is small. To address this problem, we propose a regulation-focused, cultivator model for de novo gene evolution. This cultivator-focused model posits that each step in a novel variant's evolutionary trajectory is driven by well-defined, selectively advantageous functions for the cultivator genes, rather than solely by the de novo genes, emphasizing the critical role of genome organization in the evolution of new genes.
Collapse
Affiliation(s)
- UnJin Lee
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Shawn M Mozeika
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
19
|
Yee SW, Ferrández-Peral L, Alentorn-Moron P, Fontsere C, Ceylan M, Koleske ML, Handin N, Artegoitia VM, Lara G, Chien HC, Zhou X, Dainat J, Zalevsky A, Sali A, Brand CM, Wolfreys FD, Yang J, Gestwicki JE, Capra JA, Artursson P, Newman JW, Marquès-Bonet T, Giacomini KM. Illuminating the function of the orphan transporter, SLC22A10, in humans and other primates. Nat Commun 2024; 15:4380. [PMID: 38782905 PMCID: PMC11116522 DOI: 10.1038/s41467-024-48569-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 05/06/2024] [Indexed: 05/25/2024] Open
Abstract
SLC22A10 is an orphan transporter with unknown substrates and function. The goal of this study is to elucidate its substrate specificity and functional characteristics. In contrast to orthologs from great apes, human SLC22A10, tagged with green fluorescent protein, is not expressed on the plasma membrane. Cells expressing great ape SLC22A10 orthologs exhibit significant accumulation of estradiol-17β-glucuronide, unlike those expressing human SLC22A10. Sequence alignments reveal a proline at position 220 in humans, which is a leucine in great apes. Replacing proline with leucine in SLC22A10-P220L restores plasma membrane localization and uptake function. Neanderthal and Denisovan genomes show proline at position 220, akin to modern humans, indicating functional loss during hominin evolution. Human SLC22A10 is a unitary pseudogene due to a fixed missense mutation, P220, while in great apes, its orthologs transport sex steroid conjugates. Characterizing SLC22A10 across species sheds light on its biological role, influencing organism development and steroid homeostasis.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Luis Ferrández-Peral
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Pol Alentorn-Moron
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Claudia Fontsere
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003, Barcelona, Spain
- Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Øster Farimagsgade 5A, 1352, Copenhagen, Denmark
| | - Merve Ceylan
- Department of Pharmacy, Uppsala University, Uppsala, Sweden
| | - Megan L Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Niklas Handin
- Department of Pharmacy, Uppsala University, Uppsala, Sweden
| | - Virginia M Artegoitia
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA, 95616, USA
| | - Giovanni Lara
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Huan-Chieh Chien
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Jacques Dainat
- Joint Research Unit for Infectious Diseases and Vectors Ecology Genetics Evolution and Control (MIVEGEC), University of Montpellier, French National Center for Scientific Research (CNRS 5290), French National Research Institute for Sustainable Development (IRD 224), 911 Avenue Agropolis, BP 64501, 34394, Montpellier Cedex 5, France
| | - Arthur Zalevsky
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA, US
| | - Colin M Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Finn D Wolfreys
- Department of Ophthalmology, University of California, San Francisco, CA, USA
| | - Jia Yang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Jason E Gestwicki
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, CA, USA
| | - John A Capra
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Per Artursson
- Department of Pharmacy, Uppsala University, Uppsala, Sweden
- Science for Life Laboratories, Uppsala University, Uppsala, Sweden
| | - John W Newman
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA, 95616, USA
- Department of Nutrition, University of California, Davis, Davis, CA, 95616, USA
| | - Tomàs Marquès-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
- CNAG, Centro Nacional de Analisis Genomico, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193, Cerdanyola del Vallès, Barcelona, Spain
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
20
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
21
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
22
|
Wang Z, Wang YW, Kasuga T, Lopez-Giraldez F, Zhang Y, Zhang Z, Wang Y, Dong C, Sil A, Trail F, Yarden O, Townsend JP. Lineage-specific genes are clustered with HET-domain genes and respond to environmental and genetic manipulations regulating reproduction in Neurospora. PLoS Genet 2023; 19:e1011019. [PMID: 37934795 PMCID: PMC10684091 DOI: 10.1371/journal.pgen.1011019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 11/28/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
Lineage-specific genes (LSGs) have long been postulated to play roles in the establishment of genetic barriers to intercrossing and speciation. In the genome of Neurospora crassa, most of the 670 Neurospora LSGs that are aggregated adjacent to the telomeres are clustered with 61% of the HET-domain genes, some of which regulate self-recognition and define vegetative incompatibility groups. In contrast, the LSG-encoding proteins possess few to no domains that would help to identify potential functional roles. Possible functional roles of LSGs were further assessed by performing transcriptomic profiling in genetic mutants and in response to environmental alterations, as well as examining gene knockouts for phenotypes. Among the 342 LSGs that are dynamically expressed during both asexual and sexual phases, 64% were detectable on unusual carbon sources such as furfural, a wildfire-produced chemical that is a strong inducer of sexual development, and the structurally-related furan 5-hydroxymethyl furfural (HMF). Expression of a significant portion of the LSGs was sensitive to light and temperature, factors that also regulate the switch from asexual to sexual reproduction. Furthermore, expression of the LSGs was significantly affected in the knockouts of adv-1 and pp-1 that regulate hyphal communication, and expression of more than one quarter of the LSGs was affected by perturbation of the mating locus. These observations encouraged further investigation of the roles of clustered lineage-specific and HET-domain genes in ecology and reproduction regulation in Neurospora, especially the regulation of the switch from the asexual growth to sexual reproduction, in response to dramatic environmental conditions changes.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Takao Kasuga
- College of Biological Sciences, University of California, Davis, California, United States of America
| | | | - Yang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Yaning Wang
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Anita Sil
- Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Frances Trail
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
23
|
Moyers BA, Loupe JM, Felker SA, Lawlor JM, Anderson AG, Rodriguez-Nunez I, Bunney WE, Bunney BG, Cartagena PM, Sequeira A, Watson SJ, Akil H, Mendenhall EM, Cooper GM, Myers RM. Allele biased transcription factor binding across human brain regions gives mechanistic insight into eQTLs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.06.561245. [PMID: 37873117 PMCID: PMC10592666 DOI: 10.1101/2023.10.06.561245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Transcription Factors (TFs) influence gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because genomic localization of TFs is in part driven by TF recognition of DNA sequence, variation in TF binding sites can disrupt TF-DNA associations and affect gene regulation. To identify variants that impact TF binding in human brain tissues, we quantified allele bias for 93 TFs analyzed with ChIP-seq experiments of multiple structural brain regions from two donors. Using graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signal between alleles at heterozygous variants within each tissue sample from each donor. Comparison of results from different brain regions within donors and the same regions between donors provided measures of allele bias reproducibility. We identified thousands of DNA variants that show reproducible bias in ChIP-seq for at least one TF. We found that alleles that are rarer in the general population were more likely than common alleles to exhibit large biases, and more frequently led to reduced TF binding. Combining ChIP-seq with RNA-seq, we identified TF-allele interaction biases with RNA bias in a phased allele linked to 6,709 eQTL variants identified in GTEx data, 3,309 of which were found in neural contexts. Our results provide insights into the effects of both common and rare variation on gene regulation in the brain. These findings can facilitate mechanistic understanding of cis-regulatory variation associated with biological traits, including disease.
Collapse
Affiliation(s)
| | - Jacob M. Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville AL, USA
| | | | | | | | | | - William E. Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Blynn G. Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Preston M. Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine CA, USA
| | - Stanley J. Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor MI, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor MI, USA
| | | | | | | |
Collapse
|
24
|
Yee SW, Ferrández-Peral L, Alentorn P, Fontsere C, Ceylan M, Koleske ML, Handin N, Artegoitia VM, Lara G, Chien HC, Zhou X, Dainat J, Zalevsky A, Sali A, Brand CM, Capra JA, Artursson P, Newman JW, Marques-Bonet T, Giacomini KM. Illuminating the Function of the Orphan Transporter, SLC22A10 in Humans and Other Primates. RESEARCH SQUARE 2023:rs.3.rs-3263845. [PMID: 37790518 PMCID: PMC10543398 DOI: 10.21203/rs.3.rs-3263845/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
SLC22A10 is classified as an orphan transporter with unknown substrates and function. Here we describe the discovery of the substrate specificity and functional characteristics of SLC22A10. The human SLC22A10 tagged with green fluorescent protein was found to be absent from the plasma membrane, in contrast to the SLC22A10 orthologs found in great apes. Estradiol-17β-glucuronide accumulated in cells expressing great ape SLC22A10 orthologs (over 4-fold, p<0.001). In contrast, human SLC22A10 displayed no uptake function. Sequence alignments revealed two amino acid differences including a proline at position 220 of the human SLC22A10 and a leucine at the same position of great ape orthologs. Site-directed mutagenesis yielding the human SLC22A10-P220L produced a protein with excellent plasma membrane localization and associated uptake function. Neanderthal and Denisovan genomes show human-like sequences at proline 220 position, corroborating that SLC22A10 were rendered nonfunctional during hominin evolution after the divergence from the pan lineage (chimpanzees and bonobos). These findings demonstrate that human SLC22A10 is a unitary pseudogene and was inactivated by a missense mutation that is fixed in humans, whereas orthologs in great apes transport sex steroid conjugates.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | | | - Pol Alentorn
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
| | - Claudia Fontsere
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain; Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Øster Farimagsgade 5A, 1352 Copenhagen, Denmark
| | - Merve Ceylan
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - Megan L. Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Niklas Handin
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - Virginia M. Artegoitia
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA 95616, USA
| | - Giovanni Lara
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Huan-Chieh Chien
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Jacques Dainat
- Joint Research Unit for Infectious Diseases and Vectors Ecology Genetics Evolution and Control (MIVEGEC), University of Montpellier, French National Center for Scientific Research (CNRS 5290), French National Research Institute for Sustainable Development (IRD 224), 911 Avenue Agropolis, BP 64501, 34394 Montpellier Cedex 5, France
| | - Arthur Zalevsky
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, UCSF Box 0775 1700 4th St, University of California, San Francisco, San Francisco, CA 94158, United States; Department of Pharmaceutical Chemistry, University of California, San Francisco, UCSF Box 2880 600 16th St, San Francisco, CA 94143, United States; Quantitative Biosciences Institute (QBI), University of California, San Francisco, 1700 4th St, San Francisco, CA, United States
| | - Colin M. Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - John A. Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Per Artursson
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - John W. Newman
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA 95616, USA; Department of Nutrition, University of California, Davis, Davis, CA 95616, USA; UC Davis West Coast Metabolomics Center, Davis, CA 95616, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain; Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain; CNAG, Centro Nacional de Analisis Genomico, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain; Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Kathleen M. Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
25
|
Liang X, Heath LS. Towards understanding paleoclimate impacts on primate de novo genes. G3 (BETHESDA, MD.) 2023; 13:jkad135. [PMID: 37313728 PMCID: PMC10468307 DOI: 10.1093/g3journal/jkad135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 06/15/2023]
Abstract
De novo genes are genes that emerge as new genes in some species, such as primate de novo genes that emerge in certain primate species. Over the past decade, a great deal of research has been conducted regarding their emergence, origins, functions, and various attributes in different species, some of which have involved estimating the ages of de novo genes. However, limited by the number of species available for whole-genome sequencing, relatively few studies have focused specifically on the emergence time of primate de novo genes. Among those, even fewer investigate the association between primate gene emergence with environmental factors, such as paleoclimate (ancient climate) conditions. This study investigates the relationship between paleoclimate and human gene emergence at primate species divergence. Based on 32 available primate genome sequences, this study has revealed possible associations between temperature changes and the emergence of de novo primate genes. Overall, findings in this study are that de novo genes tended to emerge in the recent 13 MY when the temperature continues cooling, which is consistent with past findings. Furthermore, in the context of an overall trend of cooling temperature, new primate genes were more likely to emerge during local warming periods, where the warm temperature more closely resembled the environmental condition that preceded the cooling trend. Results also indicate that both primate de novo genes and human cancer-associated genes have later origins in comparison to random human genes. Future studies can be in-depth on understanding human de novo gene emergence from an environmental perspective as well as understanding species divergence from a gene emergence perspective.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
26
|
Yee SW, Ferrández-Peral L, Alentorn P, Fontsere C, Ceylan M, Koleske ML, Handin N, Artegoitia VM, Lara G, Chien HC, Zhou X, Dainat J, Zalevsky A, Sali A, Brand CM, Capra JA, Artursson P, Newman JW, Marques-Bonet T, Giacomini KM. Illuminating the Function of the Orphan Transporter, SLC22A10 in Humans and Other Primates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.08.552553. [PMID: 37609337 PMCID: PMC10441401 DOI: 10.1101/2023.08.08.552553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
SLC22A10 is classified as an orphan transporter with unknown substrates and function. Here we describe the discovery of the substrate specificity and functional characteristics of SLC22A10. The human SLC22A10 tagged with green fluorescent protein was found to be absent from the plasma membrane, in contrast to the SLC22A10 orthologs found in great apes. Estradiol-17β-glucuronide accumulated in cells expressing great ape SLC22A10 orthologs (over 4-fold, p<0.001). In contrast, human SLC22A10 displayed no uptake function. Sequence alignments revealed two amino acid differences including a proline at position 220 of the human SLC22A10 and a leucine at the same position of great ape orthologs. Site-directed mutagenesis yielding the human SLC22A10-P220L produced a protein with excellent plasma membrane localization and associated uptake function. Neanderthal and Denisovan genomes show human-like sequences at proline 220 position, corroborating that SLC22A10 were rendered nonfunctional during hominin evolution after the divergence from the pan lineage (chimpanzees and bonobos). These findings demonstrate that human SLC22A10 is a unitary pseudogene and was inactivated by a missense mutation that is fixed in humans, whereas orthologs in great apes transport sex steroid conjugates.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | | | - Pol Alentorn
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
| | - Claudia Fontsere
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain; Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Øster Farimagsgade 5A, 1352 Copenhagen, Denmark
| | - Merve Ceylan
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - Megan L. Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Niklas Handin
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - Virginia M. Artegoitia
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA 95616, USA
| | - Giovanni Lara
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Huan-Chieh Chien
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Jacques Dainat
- Joint Research Unit for Infectious Diseases and Vectors Ecology Genetics Evolution and Control (MIVEGEC), University of Montpellier, French National Center for Scientific Research (CNRS 5290), French National Research Institute for Sustainable Development (IRD 224), 911 Avenue Agropolis, BP 64501, 34394 Montpellier Cedex 5, France
| | - Arthur Zalevsky
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, UCSF Box 0775 1700 4th St, University of California, San Francisco, San Francisco, CA 94158, United States; Department of Pharmaceutical Chemistry, University of California, San Francisco, UCSF Box 2880 600 16th St, San Francisco, CA 94143, United States; Quantitative Biosciences Institute (QBI), University of California, San Francisco, 1700 4th St, San Francisco, CA, United States
| | - Colin M. Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - John A. Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Per Artursson
- Department of Pharmacy and Science for Life Laboratory, Uppsala University, P.O. Box 580, 75123, Uppsala, Sweden
| | - John W. Newman
- United States Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA 95616, USA; Department of Nutrition, University of California, Davis, Davis, CA 95616, USA; UC Davis West Coast Metabolomics Center, Davis, CA 95616, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain; Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain; CNAG, Centro Nacional de Analisis Genomico, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain; Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Kathleen M. Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
27
|
Athanasouli M, Akduman N, Röseler W, Theam P, Rödelsperger C. Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota. PLoS Genet 2023; 19:e1010832. [PMID: 37399201 DOI: 10.1371/journal.pgen.1010832] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/15/2023] [Indexed: 07/05/2023] Open
Abstract
Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.
Collapse
Affiliation(s)
- Marina Athanasouli
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Nermin Akduman
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Waltraud Röseler
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Penghieng Theam
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| | - Christian Rödelsperger
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
28
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
29
|
Grandchamp A, Kühl L, Lebherz M, Brüggemann K, Parsch J, Bornberg-Bauer E. Population genomics reveals mechanisms and dynamics of de novo expressed open reading frame emergence in Drosophila melanogaster. Genome Res 2023; 33:872-890. [PMID: 37442576 PMCID: PMC10519401 DOI: 10.1101/gr.277482.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 06/06/2023] [Indexed: 07/15/2023]
Abstract
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa showed that some novel genes arise de novo, that is, from previously noncoding DNA. To characterize the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within noncoding sequences in closely related sister genomes. So far, most studies do not detect noncoding homologs of de novo genes because of incomplete assemblies and annotations, and long evolutionary distances separating genomes. Here, we overcome these issues by searching for de novo expressed open reading frames (neORFs), the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of Drosophila melanogaster, derived from seven geographically diverse populations. We found line-specific neORFs in abundance but few neORFs shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of ORFs, for example, by forming new start and stop codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in neORFs emergence. Furthermore, transposable elements (TEs) are major drivers for intragenomic duplications of neORFs, yet TE insertions are less important for the emergence of neORFs. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, neORFs have a high birth-death rate, are rapidly purged, but surviving neORFs spread neutrally through populations and within genomes.
Collapse
Affiliation(s)
- Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany;
| | - Lucas Kühl
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Marie Lebherz
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Kathrin Brüggemann
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - John Parsch
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Munich, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
- Max Planck Institute for Biology Tübingen, Department of Protein Evolution, 72076 Tübingen, Germany
| |
Collapse
|
30
|
Kawaguchi M, Chang WS, Tsuchiya H, Kinoshita N, Miyaji A, Kawahara-Miki R, Tomita K, Sogabe A, Yorifuji M, Kono T, Kaneko T, Yasumasu S. Orphan gene expressed in flame cone cells uniquely found in seahorse epithelium. Cell Tissue Res 2023:10.1007/s00441-023-03779-1. [PMID: 37227506 DOI: 10.1007/s00441-023-03779-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 04/26/2023] [Indexed: 05/26/2023]
Abstract
The seahorse is one of the most unique teleost fishes in its morphology. The body is surrounded by bony plates and spines, and the male fish possess a brooding organ, called the brood pouch, on their tail. The surfaces of the brood pouch and the spines are surrounded by characteristic so-called flame cone cells. Based on our histological observations, flame cone cells are present in the seahorse Hippocampus abdominalis, but not in the barbed pipefish Urocampus nanus or the seaweed pipefish Syngnathus schlegeli, both of which belong to the same family as the seahorse. In the flame cone cells, we observed expression of an "orphan gene" lacking homologs in other lineages. This gene, which we named the proline-glycine rich (pgrich) gene, codes for an amino acid sequence composed of repetitive units. In situ hybridization and immunohistochemical analyses detected pgrich-positive signals from the flame cone cells. Based on a survey of the genome sequences of 15 teleost species, the pgrich gene is only found from some species of Syngnathiformes (namely, the genera Syngnathus and Hippocampus). The amino acid sequence of the seahorse PGrich is somewhat similar to the sequence deduced from the antisense strand of elastin. Furthermore, there are many transposable elements around the pgrich gene. These results suggest that the pgrich gene may have originated from the elastin gene with the involvement of transposable elements and obtained its novel function in the flame cone cells during the evolution of the seahorse.
Collapse
Affiliation(s)
- Mari Kawaguchi
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan.
| | - Wen-Shan Chang
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Hazuki Tsuchiya
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Nana Kinoshita
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Akira Miyaji
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Ryouka Kawahara-Miki
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
| | - Kenji Tomita
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Atsushi Sogabe
- Department of Biology, Faculty of Agriculture and Life Science, Hirosaki University, Bunkyo, Hirosaki, Aomori, 036-8561, Japan
| | - Makiko Yorifuji
- Sesoko Station, Tropical Biosphere Research Center, University of the Ryukyus, Sesoko, Motobu, Okinawa, 905-0227, Japan
- Demonstration Laboratory, Marine Ecology Research Institute, Arahama, Kashiwazaki, Niigata, 945-0017, Japan
| | - Tomohiro Kono
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
- Department of Bioscience, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
| | - Toyoji Kaneko
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Shigeki Yasumasu
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| |
Collapse
|
31
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
32
|
Sandmann CL, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, Schoenenberger A, Muecke MB, Liang N, Moritz RL, Sun Z, Deutsch EW, Gotthardt M, Mudge JM, Prensner JR, Willnow TE, Mertins P, van Heesch S, Hubner N. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell 2023; 83:994-1011.e18. [PMID: 36806354 PMCID: PMC10032668 DOI: 10.1016/j.molcel.2023.01.023] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/12/2022] [Accepted: 01/25/2023] [Indexed: 02/19/2023]
Abstract
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.
Collapse
Affiliation(s)
- Clara-L Sandmann
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jana F Schulz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany
| | - Jorge Ruiz-Orera
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Marieluise Kirchner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Matthias Ziehm
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | - Eleonora Adami
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Maike Marczenke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Annabel Christ
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Nina Liebe
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Johannes Greiner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Aaron Schoenenberger
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michael B Muecke
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Ning Liang
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Michael Gotthardt
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John R Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Thomas E Willnow
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark
| | - Philipp Mertins
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Proteomics, 10117 Berlin, Germany
| | | | - Norbert Hubner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany.
| |
Collapse
|
33
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|
34
|
An NA, Zhang J, Mo F, Luan X, Tian L, Shen QS, Li X, Li C, Zhou F, Zhang B, Ji M, Qi J, Zhou WZ, Ding W, Chen JY, Yu J, Zhang L, Shu S, Hu B, Li CY. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat Ecol Evol 2023; 7:264-278. [PMID: 36593289 PMCID: PMC9911349 DOI: 10.1038/s41559-022-01925-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 10/04/2022] [Indexed: 01/03/2023]
Abstract
Human de novo genes can originate from neutral long non-coding RNA (lncRNA) loci and are evolutionarily significant in general, yet how and why this all-or-nothing transition to functionality happens remains unclear. Here, in 74 human/hominoid-specific de novo genes, we identified distinctive U1 elements and RNA splice-related sequences accounting for RNA nuclear export, differentiating mRNAs from lncRNAs, and driving the origin of de novo genes from lncRNA loci. The polymorphic sites facilitating the lncRNA-mRNA conversion through regulating nuclear export are selectively constrained, maintaining a boundary that differentiates mRNAs from lncRNAs. The functional new genes actively passing through it thus showed a mode of pre-adaptive origin, in that they acquire functions along with the achievement of their coding potential. As a proof of concept, we verified the regulations of splicing and U1 recognition on the nuclear export efficiency of one of these genes, the ENSG00000205704, in human neural progenitor cells. Notably, knock-out or over-expression of this gene in human embryonic stem cells accelerates or delays the neuronal maturation of cortical organoids, respectively. The transgenic mice with ectopically expressed ENSG00000205704 showed enlarged brains with cortical expansion. We thus demonstrate the key roles of nuclear export in de novo gene origin. These newly originated genes should reflect the novel uniqueness of human brain development.
Collapse
Affiliation(s)
- Ni A An
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jie Zhang
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuke Luan
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Lu Tian
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xiangshang Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunqiong Li
- Chinese Institute for Brain Research, Beijing, China
| | - Fanqi Zhou
- State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Boya Zhang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingjun Ji
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jianhuan Qi
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei-Zhen Zhou
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wanqiu Ding
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Jia Yu
- State Key Laboratory of Medical Molecular Biology, Key Laboratory of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Shaokun Shu
- Peking University International Cancer Institute, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Chuan-Yun Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China.
- Chinese Institute for Brain Research, Beijing, China.
| |
Collapse
|
35
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
36
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
37
|
Ma C, Li C, Ma H, Yu D, Zhang Y, Zhang D, Su T, Wu J, Wang X, Zhang L, Chen CL, Zhang YE. Pan-cancer surveys indicate cell cycle-related roles of primate-specific genes in tumors and embryonic cerebrum. Genome Biol 2022; 23:251. [PMID: 36474250 PMCID: PMC9724437 DOI: 10.1186/s13059-022-02821-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 11/24/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Despite having been extensively studied, it remains largely unclear why humans bear a particularly high risk of cancer. The antagonistic pleiotropy hypothesis predicts that primate-specific genes (PSGs) tend to promote tumorigenesis, while the molecular atavism hypothesis predicts that PSGs involved in tumors may represent recently derived duplicates of unicellular genes. However, these predictions have not been tested. RESULTS By taking advantage of pan-cancer genomic data, we find the upregulation of PSGs across 13 cancer types, which is facilitated by copy-number gain and promoter hypomethylation. Meta-analyses indicate that upregulated PSGs (uPSGs) tend to promote tumorigenesis and to play cell cycle-related roles. The cell cycle-related uPSGs predominantly represent derived duplicates of unicellular genes. We prioritize 15 uPSGs and perform an in-depth analysis of one unicellular gene-derived duplicate involved in the cell cycle, DDX11. Genome-wide screening data and knockdown experiments demonstrate that DDX11 is broadly essential across cancer cell lines. Importantly, non-neutral amino acid substitution patterns and increased expression indicate that DDX11 has been under positive selection. Finally, we find that cell cycle-related uPSGs are also preferentially upregulated in the highly proliferative embryonic cerebrum. CONCLUSIONS Consistent with the predictions of the atavism and antagonistic pleiotropy hypotheses, primate-specific genes, especially those PSGs derived from cell cycle-related genes that emerged in unicellular ancestors, contribute to the early proliferation of the human cerebrum at the cost of hitchhiking by similarly highly proliferative cancer cells.
Collapse
Affiliation(s)
- Chenyu Ma
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chunyan Li
- School of Engineering Medicine, Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), and Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, 100191, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Daqi Yu
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yufei Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- School of Life Sciences, Nanjing University, Nanjing, 210093, China
| | - Dan Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tianhan Su
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jianmin Wu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xiaoyue Wang
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, 102206, China
| | - Chun-Long Chen
- Institut Curie, Université PSL, Sorbonne Université, CNRS UMR3244, Dynamics of Genetic Information, 75005, Paris, France
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Chinese Institute for Brain Research, Beijing, 102206, China.
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
38
|
Rabbani M, Zheng X, Manske GL, Vargo A, Shami AN, Li JZ, Hammoud SS. Decoding the Spermatogenesis Program: New Insights from Transcriptomic Analyses. Annu Rev Genet 2022; 56:339-368. [PMID: 36070560 PMCID: PMC10722372 DOI: 10.1146/annurev-genet-080320-040045] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Spermatogenesis is a complex differentiation process coordinated spatiotemporally across and along seminiferous tubules. Cellular heterogeneity has made it challenging to obtain stage-specific molecular profiles of germ and somatic cells using bulk transcriptomic analyses. This has limited our ability to understand regulation of spermatogenesis and to integrate knowledge from model organisms to humans. The recent advancement of single-cell RNA-sequencing (scRNA-seq) technologies provides insights into the cell type diversity and molecular signatures in the testis. Fine-grained cell atlases of the testis contain both known and novel cell types and define the functional states along the germ cell developmental trajectory in many species. These atlases provide a reference system for integrated interspecies comparisons to discover mechanistic parallels and to enable future studies. Despite recent advances, we currently lack high-resolution data to probe germ cell-somatic cell interactions in the tissue environment, but the use of highly multiplexed spatial analysis technologies has begun to resolve this problem. Taken together, recent single-cell studies provide an improvedunderstanding of gametogenesis to examine underlying causes of infertility and enable the development of new therapeutic interventions.
Collapse
Affiliation(s)
- Mashiat Rabbani
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
| | - Xianing Zheng
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
| | - Gabe L Manske
- Cellular and Molecular Biology Graduate Program, University of Michigan, Ann Arbor, Michigan, USA
| | - Alexander Vargo
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
| | - Adrienne N Shami
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Saher Sue Hammoud
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA;
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, Michigan, USA
- Department of Urology, University of Michigan, Ann Arbor, Michigan, USA
- Cellular and Molecular Biology Graduate Program, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
39
|
Liu Y, Zeng S, Wu M. Novel insights into noncanonical open reading frames in cancer. Biochim Biophys Acta Rev Cancer 2022; 1877:188755. [PMID: 35777601 DOI: 10.1016/j.bbcan.2022.188755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/11/2022] [Accepted: 06/23/2022] [Indexed: 12/12/2022]
Abstract
With technological advances, previously neglected noncanonical open reading frames (nORFs) are drawing ever-increasing attention. However, the translation potential of numerous putative nORFs remains elusive, and the functions of noncanonical peptides have not been systemically summarized. Moreover, the relationship between noncanonical peptides and their counterpart protein or RNA products remains elusive and the clinical implementation of noncanonical peptides has not been explored. In this review, we highlight how recent technological advances such as ribosome profiling, bioinformatics approaches and CRISPR/Cas9 facilitate the research of noncanonical peptides. We delineate the features of each nORF category and the evolutionary process underneath the nORFs. Most importantly, we summarize the diversified functions of noncanonical peptides in cancer based on their subcellular location, which reflect their extensive participation in key pathways and essential cellular activities in cancer cells. Meanwhile, the equilibrium between noncanonical peptides and their corresponding transcripts or counterpart products may be dysregulated under pathological states, which is essential for their roles in cancer. Lastly, we explore their underestimated potential in clinical application as diagnostic biomarkers and treatment targets against cancer.
Collapse
Affiliation(s)
- Yihan Liu
- Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, Hunan, China; The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan 410008, China; Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; Key Laboratory for Molecular Radiation Oncology of Hunan Province, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shan Zeng
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; Key Laboratory for Molecular Radiation Oncology of Hunan Province, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China.
| | - Minghua Wu
- Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, Hunan, China; The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan 410008, China.
| |
Collapse
|
40
|
Ma D, Lai Z, Ding Q, Zhang K, Chang K, Li S, Zhao Z, Zhong F. Identification, Characterization and Function of Orphan Genes Among the Current Cucurbitaceae Genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:872137. [PMID: 35599909 PMCID: PMC9114813 DOI: 10.3389/fpls.2022.872137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 03/28/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) that are missing identifiable homologs in other lineages may potentially make contributions to a variety of biological functions. The Cucurbitaceae family consists of a wide range of fruit crops of worldwide or local economic significance. To date, very few functional mechanisms of OGs in Cucurbitaceae are known. In this study, we systematically identified the OGs of eight Cucurbitaceae species using a comparative genomics approach. The content of OGs varied widely among the eight Cucurbitaceae species, ranging from 1.63% in chayote to 16.55% in wax gourd. Genetic structure analysis showed that OGs have significantly shorter protein lengths and fewer exons in Cucurbitaceae. The subcellular localizations of OGs were basically the same, with only subtle differences. Except for aggregation in some chromosomal regions, the distribution density of OGs was higher near the telomeres and relatively evenly distributed on the chromosomes. Gene expression analysis revealed that OGs had less abundantly and highly tissue-specific expression. Interestingly, the largest proportion of these OGs was significantly more tissue-specific expressed in the flower than in other tissues, and more detectable expression was found in the male flower. Functional prediction of OGs showed that (1) 18 OGs associated with male sterility in watermelon; (2) 182 OGs associated with flower development in cucumber; (3) 51 OGs associated with environmental adaptation in watermelon; (4) 520 OGs may help with the large fruit size in wax gourd. Our results provide the molecular basis and research direction for some important mechanisms in Cucurbitaceae species and domesticated crops.
Collapse
Affiliation(s)
- Dongna Ma
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Zhengfeng Lai
- Subtropical Agricultural Research Institute, Fujian Academy of Agriculture Sciences, Fujian, China
| | - Qiansu Ding
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Kun Zhang
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Kaizhen Chang
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Shuhao Li
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| | - Zhizhu Zhao
- College of the Environment and Ecology, Xiamen University, Fujian, China
| | - Fenglin Zhong
- College of Horticulture, Fujian Agriculture and Forestry University, Fujian, China
| |
Collapse
|
41
|
Gabay O, Shoshan Y, Kopel E, Ben-Zvi U, Mann TD, Bressler N, Cohen-Fultheim R, Schaffer AA, Roth SH, Tzur Z, Levanon EY, Eisenberg E. Landscape of adenosine-to-inosine RNA recoding across human tissues. Nat Commun 2022; 13:1184. [PMID: 35246538 PMCID: PMC8897444 DOI: 10.1038/s41467-022-28841-4] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 01/27/2022] [Indexed: 12/18/2022] Open
Abstract
RNA editing by adenosine deaminases changes the information encoded in the mRNA from its genomic blueprint. Editing of protein-coding sequences can introduce novel, functionally distinct, protein isoforms and diversify the proteome. The functional importance of a few recoding sites has been appreciated for decades. However, systematic methods to uncover these sites perform poorly, and the full repertoire of recoding in human and other mammals is unknown. Here we present a new detection approach, and analyze 9125 GTEx RNA-seq samples, to produce a highly-accurate atlas of 1517 editing sites within the coding region and their editing levels across human tissues. Single-cell RNA-seq data shows protein recoding contributes to the variability across cell subpopulations. Most highly edited sites are evolutionary conserved in non-primate mammals, attesting for adaptation. This comprehensive set can facilitate understanding of the role of recoding in human physiology and diseases. Gabay et al. provide a highly-accurate atlas of recoding by A-to-I RNA editing in human, profiled across tissues and cell subpopulations. Most highly edited sites are evolutionary conserved in non-primate mammals, attesting for adaptation.
Collapse
Affiliation(s)
- Orshay Gabay
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Yoav Shoshan
- Raymond and Beverly Sackler School of Physics and Astronomy and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Eli Kopel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Udi Ben-Zvi
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Tomer D Mann
- Tel Aviv Sourasky Medical Center and Sackler school of medicine, Tel Aviv University, Tel Aviv, Israel
| | - Noam Bressler
- Raymond and Beverly Sackler School of Physics and Astronomy and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Roni Cohen-Fultheim
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Amos A Schaffer
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Shalom Hillel Roth
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Ziv Tzur
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Erez Y Levanon
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel. .,The Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan, 5290002, Israel.
| | - Eli Eisenberg
- Raymond and Beverly Sackler School of Physics and Astronomy and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel.
| |
Collapse
|
42
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
43
|
Cherezov RO, Vorontsova JE, Simonova OB. The Phenomenon of Evolutionary “De Novo Generation” of Genes. Russ J Dev Biol 2021. [DOI: 10.1134/s1062360421060035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
44
|
Lee J, Wacholder A, Carvunis AR. Evolutionary Characterization of the Short Protein SPAAR. Genes (Basel) 2021; 12:genes12121864. [PMID: 34946813 PMCID: PMC8702040 DOI: 10.3390/genes12121864] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 11/22/2021] [Accepted: 11/22/2021] [Indexed: 02/07/2023] Open
Abstract
Microproteins (<100 amino acids) are receiving increasing recognition as important participants in numerous biological processes, but their evolutionary dynamics are poorly understood. SPAAR is a recently discovered microprotein that regulates muscle regeneration and angiogenesis through interactions with conserved signaling pathways. Interestingly, SPAAR does not belong to any known protein family and has known homologs exclusively among placental mammals. This lack of distant homology could be caused by challenges in homology detection of short sequences, or it could indicate a recent de novo emergence from a noncoding sequence. By integrating syntenic alignments and homology searches, we identify SPAAR orthologs in marsupials and monotremes, establishing that SPAAR has existed at least since the emergence of mammals. SPAAR shows substantial primary sequence divergence but retains a conserved protein structure. In primates, we infer two independent evolutionary events leading to the de novo origination of 5' elongated isoforms of SPAAR from a noncoding sequence and find evidence of adaptive evolution in this extended region. Thus, SPAAR may be of ancient origin, but it appears to be experiencing continual evolutionary innovation in mammals.
Collapse
Affiliation(s)
- Jiwon Lee
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Joint CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; (J.L.); (A.W.)
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Correspondence: ; Tel.: +1-412-648-3335
| |
Collapse
|
45
|
Singh U, Wurtele ES. orfipy: a fast and flexible tool for extracting ORFs. Bioinformatics 2021; 37:3019-3020. [PMID: 33576786 PMCID: PMC8479652 DOI: 10.1093/bioinformatics/btab090] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/31/2020] [Accepted: 02/03/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY Searching for open reading frames is a routine task and a critical step prior to annotating protein coding regions in newly sequenced genomes or de novo transcriptome assemblies. With the tremendous increase in genomic and transcriptomic data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in genomic and transcriptomic sequences. The search is rapid and is fully customizable, with a choice of FASTA and BED output formats. AVAILABILITY AND IMPLEMENTATION orfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Urminder Singh
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
46
|
Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, Scott EM, Kelly BJ, Mascha GC, Bornberg-Bauer E, Findlay GD. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet 2021; 17:e1009787. [PMID: 34478447 PMCID: PMC8445463 DOI: 10.1371/journal.pgen.1009787] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/16/2021] [Accepted: 08/19/2021] [Indexed: 02/07/2023] Open
Abstract
Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas, required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas. The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis.
Collapse
Affiliation(s)
- Emily L. Rivard
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Andrew G. Ludwig
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Prajal H. Patel
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Sarah E. Arnold
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Emilie M. Scott
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Brendan J. Kelly
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Grace C. Mascha
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Geoffrey D. Findlay
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| |
Collapse
|
47
|
Abstract
Because gene expression is important for evolutionary adaptation, its misregulation is an important cause of maladaptation. A misregulated gene can be incorrectly silent ("off") when a transcription factor (TF) that is required for its activation does not binds its regulatory region. Conversely, a misregulated gene can be incorrectly active ("on") when a TF not normally involved in its activation binds its regulatory region, a phenomenon also known as regulatory crosstalk. DNA mutations that destroy or create TF binding sites on DNA are an important source of misregulation and crosstalk. Although misregulation reduces fitness in an environment to which an organism is well-adapted, it may become adaptive in a new environment. Here, I derive simple yet general mathematical expressions that delimit the conditions under which misregulation can be adaptive. These expressions depend on the strength of selection against misregulation, on the fraction of DNA sequence space filled with TF binding sites, and on the fraction of genes that must be expressed for optimal adaptation. I then use empirical data from RNA sequencing, protein-binding microarrays, and genome evolution, together with population genetic simulations to ask when these conditions are likely to be met. I show that they can be met under realistic circumstances, but these circumstances may vary among organisms and environments. My analysis provides a framework in which improved theory and data collection can help us demonstrate the role of misregulation in adaptation. It also shows that misregulation, like DNA mutation, is one of life's many imperfections that can help propel Darwinian evolution.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, CH-8057, Switzerland.,The Santa Fe Institute, Santa Fe, NM 87501, USA.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
48
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
49
|
Tan S, Ma H, Wang J, Wang M, Wang M, Yin H, Zhang Y, Zhang X, Shen J, Wang D, Banes GL, Zhang Z, Wu J, Huang X, Chen H, Ge S, Chen CL, Zhang YE. DNA transposons mediate duplications via transposition-independent and -dependent mechanisms in metazoans. Nat Commun 2021; 12:4280. [PMID: 34257290 PMCID: PMC8277862 DOI: 10.1038/s41467-021-24585-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Accepted: 06/23/2021] [Indexed: 01/06/2023] Open
Abstract
Despite long being considered as "junk", transposable elements (TEs) are now accepted as catalysts of evolution. One example is Mutator-like elements (MULEs, one type of terminal inverted repeat DNA TEs, or TIR TEs) capturing sequences as Pack-MULEs in plants. However, their origination mechanism remains perplexing, and whether TIR TEs mediate duplication in animals is almost unexplored. Here we identify 370 Pack-TIRs in 100 animal reference genomes and one Pack-TIR (Ssk-FB4) family in fly populations. We find that single-copy Pack-TIRs are mostly generated via transposition-independent gap filling, and multicopy Pack-TIRs are likely generated by transposition after replication fork switching. We show that a proportion of Pack-TIRs are transcribed and often form chimeras with hosts. We also find that Ssk-FB4s represent a young protein family, as supported by proteomics and signatures of positive selection. Thus, TIR TEs catalyze new gene structures and new genes in animals via both transposition-independent and -dependent mechanisms.
Collapse
Affiliation(s)
- Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Wang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Man Wang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, China
| | - Mengxia Wang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haodong Yin
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Xinying Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Danyang Wang
- University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, and China National Center for Bioinformation, Chinese Academy of Sciences, Beijing, China
| | - Graham L Banes
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, USA
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zhihua Zhang
- University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, and China National Center for Bioinformation, Chinese Academy of Sciences, Beijing, China
| | - Jianmin Wu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, China
| | - Xun Huang
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Hua Chen
- University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, and China National Center for Bioinformation, Chinese Academy of Sciences, Beijing, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Siqin Ge
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chun-Long Chen
- Curie Institute, PSL Research University, CNRS UMR 3244, Paris, France.
- Sorbonne University, Paris, France.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
- Chinese Institute for Brain Research, Beijing, China.
| |
Collapse
|
50
|
Khan N, de Manuel M, Peyregne S, Do R, Prufer K, Marques-Bonet T, Varki N, Gagneux P, Varki A. Multiple Genomic Events Altering Hominin SIGLEC Biology and Innate Immunity Predated the Common Ancestor of Humans and Archaic Hominins. Genome Biol Evol 2021; 12:1040-1050. [PMID: 32556248 PMCID: PMC7379906 DOI: 10.1093/gbe/evaa125] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2020] [Indexed: 12/11/2022] Open
Abstract
Human-specific pseudogenization of the CMAH gene eliminated the mammalian sialic acid (Sia) Neu5Gc (generating an excess of its precursor Neu5Ac), thus changing ubiquitous cell surface “self-associated molecular patterns” that modulate innate immunity via engagement of CD33-related-Siglec receptors. The Alu-fusion-mediated loss-of-function of CMAH fixed ∼2–3 Ma, possibly contributing to the origins of the genus Homo. The mutation likely altered human self-associated molecular patterns, triggering multiple events, including emergence of human-adapted pathogens with strong preference for Neu5Ac recognition and/or presenting Neu5Ac-containing molecular mimics of human glycans, which can suppress immune responses via CD33-related-Siglec engagement. Human-specific alterations reported in some gene-encoding Sia-sensing proteins suggested a “hotspot” in hominin evolution. The availability of more hominid genomes including those of two extinct hominins now allows full reanalysis and evolutionary timing. Functional changes occur in 8/13 members of the human genomic cluster encoding CD33-related Siglecs, all predating the human common ancestor. Comparisons with great ape genomes indicate that these changes are unique to hominins. We found no evidence for strong selection after the Human–Neanderthal/Denisovan common ancestor, and these extinct hominin genomes include almost all major changes found in humans, indicating that these changes in hominin sialobiology predate the Neanderthal–human divergence ∼0.6 Ma. Multiple changes in this genomic cluster may also explain human-specific expression of CD33rSiglecs in unexpected locations such as amnion, placental trophoblast, pancreatic islets, ovarian fibroblasts, microglia, Natural Killer(NK) cells, and epithelia. Taken together, our data suggest that innate immune interactions with pathogens markedly altered hominin Siglec biology between 0.6 and 2 Ma, potentially affecting human evolution.
Collapse
Affiliation(s)
- Naazneen Khan
- Glycobiology Research and Training Center, Department of Medicine, University of California San Diego.,Center for Academic Research and Training in Anthropogeny (CARTA),University of California San Diego
| | - Marc de Manuel
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
| | - Stephane Peyregne
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Raymond Do
- Glycobiology Research and Training Center, Department of Medicine, University of California San Diego.,Center for Academic Research and Training in Anthropogeny (CARTA),University of California San Diego
| | - Kay Prufer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain.,Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain.,CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Barcelona, Spain
| | - Nissi Varki
- Glycobiology Research and Training Center, Department of Medicine, University of California San Diego.,Center for Academic Research and Training in Anthropogeny (CARTA),University of California San Diego
| | - Pascal Gagneux
- Glycobiology Research and Training Center, Department of Medicine, University of California San Diego.,Center for Academic Research and Training in Anthropogeny (CARTA),University of California San Diego
| | - Ajit Varki
- Glycobiology Research and Training Center, Department of Medicine, University of California San Diego.,Center for Academic Research and Training in Anthropogeny (CARTA),University of California San Diego
| |
Collapse
|