1
|
Yan W, Yu F, Tan L, Mengshan L, Xiaojun X, Weihong Z, Sheng S, Jun W, Fu-An W. A hybrid machine learning model with attention mechanism and multidimensional multivariate feature coding for essential gene prediction. BMC Biol 2025; 23:108. [PMID: 40275343 PMCID: PMC12023577 DOI: 10.1186/s12915-025-02209-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/07/2025] [Indexed: 04/26/2025] Open
Abstract
BACKGROUND Essential genes are crucial for the development, inheritance, and survival of species. The exploration of these genes can unravel the complex mechanisms and fundamental life processes and identify potential therapeutic targets for various diseases. Therefore, the identification of essential genes is significant. Machine learning has become the mainstream approach for essential gene prediction. However, some key challenges in machine learning need to be addressed, such as the extraction of genetic features, the impact of imbalanced data, and the cross-species generalization ability. RESULTS Here, we proposed a hybrid machine learning model based on graph convolutional neural networks (GCN) and bi-directional long short-term memory (Bi-LSTM) with attention mechanism and multidimensional multivariate feature coding for essential gene prediction, called EGP Hybrid-ML. In the model, GCN was used to extract feature encoding information from the visualized graphics of gene sequences and the attention mechanism was combined with Bi-LSTM to assess the importance of each feature in gene sequences and analyze the influences of different feature encoding methods and data imbalance. Additionally, the cross-species predictive performance of the model was evaluated through cross-validation. The results indicated that the sensitivity of the EGP Hybrid-ML model reached 0.9122. CONCLUSIONS This model demonstrated the superior predictive performance and strong generalization capabilities compared to other models. The EGP Hybrid-ML model proposed in this paper has broad application prospects in bioinformatics, chemical information, and pharmaceutical information. The codes, architectures, parameters, and datasets of the proposed model are available free of charge at GitHub ( https://github.com/gnnumsli/EGP-Hybrid-ML ).
Collapse
Affiliation(s)
- Wu Yan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| | - Fu Yu
- Ganzhou Power Supply Branch of State Grid, Jiangxi Electric Power Co., Ltd, Ganzhou, Jiangxi, 341000, China
| | - Li Tan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Li Mengshan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
- Ganzhou Power Supply Branch of State Grid, Jiangxi Electric Power Co., Ltd, Ganzhou, Jiangxi, 341000, China.
| | - Xie Xiaojun
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Zhou Weihong
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Sheng Sheng
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wang Jun
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wu Fu-An
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| |
Collapse
|
2
|
Tuia D, Kellenberger B, Beery S, Costelloe BR, Zuffi S, Risse B, Mathis A, Mathis MW, van Langevelde F, Burghardt T, Kays R, Klinck H, Wikelski M, Couzin ID, van Horn G, Crofoot MC, Stewart CV, Berger-Wolf T. Perspectives in machine learning for wildlife conservation. Nat Commun 2022; 13:792. [PMID: 35140206 PMCID: PMC8828720 DOI: 10.1038/s41467-022-27980-y] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 12/08/2021] [Indexed: 11/08/2022] Open
Abstract
Inexpensive and accessible sensors are accelerating data acquisition in animal ecology. These technologies hold great potential for large-scale ecological understanding, but are limited by current processing approaches which inefficiently distill data into relevant information. We argue that animal ecologists can capitalize on large datasets generated by modern sensors by combining machine learning approaches with domain knowledge. Incorporating machine learning into ecological workflows could improve inputs for ecological models and lead to integrated hybrid modeling tools. This approach will require close interdisciplinary collaboration to ensure the quality of novel approaches and train a new generation of data scientists in ecology and conservation.
Collapse
Affiliation(s)
- Devis Tuia
- School of Architecture, Civil and Environmental Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Benjamin Kellenberger
- School of Architecture, Civil and Environmental Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sara Beery
- Department of Computing and Mathematical Sciences, California Institute of Technology (Caltech), Pasadena, CA, USA
| | - Blair R Costelloe
- Max Planck Institute of Animal Behavior, Radolfzell, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Silvia Zuffi
- Institute for Applied Mathematics and Information Technologies, IMATI-CNR, Pavia, Italy
| | - Benjamin Risse
- Computer Science Department, University of Münster, Münster, Germany
| | - Alexander Mathis
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mackenzie W Mathis
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Tilo Burghardt
- Computer Science Department, University of Bristol, Bristol, UK
| | - Roland Kays
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, USA
- North Carolina Museum of Natural Sciences, Raleigh, NC, USA
| | - Holger Klinck
- Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Martin Wikelski
- Max Planck Institute of Animal Behavior, Radolfzell, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz, Germany
| | - Iain D Couzin
- Max Planck Institute of Animal Behavior, Radolfzell, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Grant van Horn
- Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Margaret C Crofoot
- Max Planck Institute of Animal Behavior, Radolfzell, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Charles V Stewart
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Tanya Berger-Wolf
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
- Departments of Computer Science and Engineering; Electrical and Computer Engineering; Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
3
|
Biogeography of Long-Jawed Spiders Reveals Multiple Colonization of the Caribbean. DIVERSITY 2021. [DOI: 10.3390/d13120622] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Dispersal ability can affect levels of gene flow thereby shaping species distributions and richness patterns. The intermediate dispersal model of biogeography (IDM) predicts that in island systems, species diversity of those lineages with an intermediate dispersal potential is the highest. Here, we tested this prediction on long-jawed spiders (Tetragnatha) of the Caribbean archipelago using phylogenies from a total of 318 individuals delineated into 54 putative species. Our results support a Tetragnatha monophyly (within our sampling) but reject the monophyly of the Caribbean lineages, where we found low endemism yet high diversity. The reconstructed biogeographic history detects a potential early overwater colonization of the Caribbean, refuting an ancient vicariant origin of the Caribbean Tetragnatha as well as the GAARlandia land-bridge scenario. Instead, the results imply multiple colonization events to and from the Caribbean from the mid-Eocene to late-Miocene. Among arachnids, Tetragnatha uniquely comprises both excellently and poorly dispersing species. A direct test of the IDM would require consideration of three categories of dispersers; however, long-jawed spiders do not fit one of these three a priori definitions, but rather represent a more complex combination of attributes. A taxon such as Tetragnatha, one that readily undergoes evolutionary changes in dispersal propensity, can be referred to as a ‘dynamic disperser’.
Collapse
|
4
|
Turk E, Kralj-Fišer S, Kuntner M. Exploring diversification drivers in golden orbweavers. Sci Rep 2021; 11:9248. [PMID: 33927261 PMCID: PMC8084975 DOI: 10.1038/s41598-021-88555-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/14/2021] [Indexed: 11/08/2022] Open
Abstract
Heterogeneity in species diversity is driven by the dynamics of speciation and extinction, potentially influenced by organismal and environmental factors. Here, we explore macroevolutionary trends on a phylogeny of golden orbweavers (spider family Nephilidae). Our initial inference detects heterogeneity in speciation and extinction, with accelerated extinction rates in the extremely sexually size dimorphic Nephila and accelerated speciation in Herennia, a lineage defined by highly derived, arboricolous webs, and pronounced island endemism. We evaluate potential drivers of this heterogeneity that relate to organisms and their environment. Primarily, we test two continuous organismal factors for correlation with diversification in nephilids: phenotypic extremeness (female and male body length, and sexual size dimorphism as their ratio) and dispersal propensity (through range sizes as a proxy). We predict a bell-shaped relationship between factor values and speciation, with intermediate phenotypes exhibiting highest diversification rates. Analyses using SSE-class models fail to support our two predictions, suggesting that phenotypic extremeness and dispersal propensity cannot explain patterns of nephilid diversification. Furthermore, two environmental factors (tropical versus subtropical and island versus continental species distribution) indicate only marginal support for higher speciation in the tropics. Although our results may be affected by methodological limitations imposed by a relatively small phylogeny, it seems that the tested organismal and environmental factors play little to no role in nephilid diversification. In the phylogeny of golden orbweavers, the recent hypothesis of universal diversification dynamics may be the simplest explanation of macroevolutionary patterns.
Collapse
Affiliation(s)
- Eva Turk
- Evolutionary Zoology Laboratory, Institute of Biology, ZRC SAZU, Ljubljana, Slovenia.
- Biotechnical Faculty, Department of Biology, University of Ljubljana, Ljubljana, Slovenia.
| | - Simona Kralj-Fišer
- Evolutionary Zoology Laboratory, Institute of Biology, ZRC SAZU, Ljubljana, Slovenia
| | - Matjaž Kuntner
- Evolutionary Zoology Laboratory, Institute of Biology, ZRC SAZU, Ljubljana, Slovenia
- Evolutionary Zoology Laboratory, Department of Organisms and Ecosystems Research, National Institute of Biology, Ljubljana, Slovenia
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Centre for Behavioural Ecology and Evolution, School of Life Sciences, Hubei University, Wuhan, Hubei, China
- University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|