Zhao K, Ji Z, Zhang L, Quan N, Li Y, Yu G, Bi X. HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences.
BMC Bioinformatics 2025;
26:110. [PMID:
40263997 PMCID:
PMC12013097 DOI:
10.1186/s12859-025-06122-3]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/27/2025] [Indexed: 04/24/2025] Open
Abstract
BACKGROUND
Understanding the relationships between proteins and specific disease phenotypes contributes to the early detection of diseases and advances the development of personalized medicine. The acquisition of a large amount of proteomics data has facilitated this process. To improve discovery efficiency and reduce the time and financial costs associated with biological experiments, various computational methods have yielded promising results. However, the lack of rich and reliable protein-related information still presents challenges in this process.
RESULTS
In this paper, we propose an ensemble prediction model, named HPOseq, which predicts human protein-phenotype relationships based only on sequence information. HPOseq establishes two base models to achieve objectives. One directly extracts internal information from amino acid sequences as protein features to predict the associated phenotypes. The other builds a protein-protein network based on sequence similarity, extracting information between proteins for phenotype prediction. Ultimately, an ensemble module is employed to integrate the predictions from both base models, resulting in the final prediction.
CONCLUSION
The results of 5-fold cross-validation reveal that HPOseq outperforms seven baseline methods for predicting protein-phenotype relationships. Moreover, we conduct case studies from the points of phenotype annotation and protein analysis to verify the practical significance of HPOseq.
Collapse