1
|
Allard JB, Sharma S, Patel R, Sanderford M, Tamura K, Vucetic S, Gerhard GS, Kumar S. Evolutionary sparse learning reveals the shared genetic basis of convergent traits. Nat Commun 2025; 16:3217. [PMID: 40185716 PMCID: PMC11971283 DOI: 10.1038/s41467-025-58428-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 03/18/2025] [Indexed: 04/07/2025] Open
Abstract
Cases abound in which nearly identical traits have appeared in distant species facing similar environments. These unmistakable examples of adaptive evolution offer opportunities to gain insight into their genetic origins and mechanisms through comparative analyses. Here, we present an approach to build genetic models that underlie the independent origins of convergent traits using evolutionary sparse learning with paired species contrast (ESL-PSC). We tested the hypothesis that common genes and sites are involved in the convergent evolution of two key traits: C4 photosynthesis in grasses and echolocation in mammals. Genetic models were highly predictive of independent cases of convergent evolution of C4 photosynthesis. Genes contributing to genetic models for echolocation were highly enriched for functional categories related to hearing, sound perception, and deafness, a pattern that has eluded previous efforts applying standard molecular evolutionary approaches. These results support the involvement of sequence substitutions at common genetic loci in the evolution of convergent traits. Benchmarking on empirical and simulated datasets showed that ESL-PSC could be more sensitive in proteome-scale analyses to detect genes with convergent molecular evolution associated with the acquisition of convergent traits. We conclude that phylogeny-informed machine learning naturally excludes apparent molecular convergences due to shared species history, enhances the signal-to-noise ratio for detecting molecular convergence, and empowers the discovery of common genetic bases of trait convergences.
Collapse
Affiliation(s)
- John B Allard
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
| | - Slobodan Vucetic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Glenn S Gerhard
- Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
- Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
2
|
Allard JB, Sharma S, Patel R, Sanderford M, Tamura K, Vucetic S, Gerhard GS, Kumar S. Evolutionary sparse learning with paired species contrast reveals the shared genetic basis of convergent traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.08.631987. [PMID: 39829798 PMCID: PMC11741315 DOI: 10.1101/2025.01.08.631987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Cases abound in which nearly identical traits have appeared in distant species facing similar environments. These unmistakable examples of adaptive evolution offer opportunities to gain insight into their genetic origins and mechanisms through comparative analyses. Here, we present a novel comparative genomics approach to build genetic models that underlie the independent origins of convergent traits using evolutionary sparse learning. We test the hypothesis that common genes and sites are involved in the convergent evolution of two key traits: C4 photosynthesis in grasses and echolocation in mammals. Genetic models were highly predictive of independent cases of convergent evolution of C4 photosynthesis. These results support the involvement of sequence substitutions in many common genetic loci in the evolution of convergent traits studied. Genes contributing to genetic models for echolocation were highly enriched for functional categories related to hearing, sound perception, and deafness (P < 10-6); a pattern that has eluded previous efforts applying standard molecular evolutionary approaches. We conclude that phylogeny-informed machine learning naturally excludes apparent molecular convergences due to shared species history, enhances the signal-to-noise ratio for detecting molecular convergence, and empowers the discovery of common genetic bases of trait convergences.
Collapse
Affiliation(s)
- John B. Allard
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
| | - Slobodan Vucetic
- Department of Computer and Information Sciences, Temple University, Philadelphia PA, United States of America
| | - Glenn S. Gerhard
- Lewis Katz School of Medicine at Temple University, Philadelphia, PA, 19140, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
3
|
Yogadasan N, Doxey AC, Chuong SDX. A Machine Learning Framework Identifies Plastid-Encoded Proteins Harboring C3 and C4 Distinguishing Sequence Information. Genome Biol Evol 2023; 15:evad129. [PMID: 37462292 PMCID: PMC10368328 DOI: 10.1093/gbe/evad129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 07/27/2023] Open
Abstract
C4 photosynthesis is known to have at least 61 independent origins across plant lineages making it one of the most notable examples of convergent evolution. Of the >60 independent origins, a predicted 22-24 origins, encompassing greater than 50% of all known C4 species, exist within the Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD) clade of the Poaceae family. This clade is therefore primed with species ideal for the study of genomic changes associated with the acquisition of the C4 photosynthetic trait. In this study, we take advantage of the growing availability of sequenced plastid genomes and employ a machine learning (ML) approach to screen for plastid genes harboring C3 and C4 distinguishing information in PACMAD species. We demonstrate that certain plastid-encoded protein sequences possess distinguishing and informative sequence information that allows them to train accurate ML C3/C4 classification models. Our RbcL-trained model, for example, informs a C3/C4 classifier with greater than 99% accuracy. Accurate prediction of photosynthetic type from individual sequences suggests biologically relevant, and potentially differing roles of these sequence products in C3 versus C4 metabolism. With this ML framework, we have identified several key sequences and sites that are most predictive of C3/C4 status, including RbcL, subunits of the NAD(P)H dehydrogenase complex, and specific residues within, further highlighting their potential significance in the evolution and/or maintenance of C4 photosynthetic machinery. This general approach can be applied to uncover intricate associations between other similar genotype-phenotype relationships.
Collapse
Affiliation(s)
| | - Andrew C Doxey
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Simon D X Chuong
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|