1
|
Zhu T, Zhou Y, Zhang L, Kong L, Tang H, Xiao Q, Sun X, Shen F, Zhou H, Ni W, Liu S, Gao H, Jin G, Jia X, Hua F. A transcriptomic and proteomic analysis and comparison of human brain tissue from patients with and without epilepsy. Sci Rep 2025; 15:16369. [PMID: 40350490 PMCID: PMC12066717 DOI: 10.1038/s41598-025-00986-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Accepted: 05/02/2025] [Indexed: 05/14/2025] Open
Abstract
The present study was to investigate potential biomarkers and therapeutic targets for epilepsy by conducting a transcriptomic and proteomic analysis of human brain tissue from patients with epileptic lesions. Brain tissue was collected from the epileptic lesions after surgical resection and surgical removed brain tissue from non-epileptic patients. Using RNA sequencing and iTRAQ-based proteomic analysis, The transcriptomic analysis identified 1,604 DEGs, with 584 upregulated and 1,020 downregulated. The proteomic analysis identified 694 DEPs, with 331 upregulated and 363 downregulated. The combined transcriptomic and proteomic analysis showed that the DEGs and DEPs were mainly enriched in biological processes such as D-aspartate transport, transmembrane transport, cell junctions, vesicle transport, and metabolic processes. Tubulin polymerization promoting protein family member-3 (TPPP3), proprotein convertase subtilisin/kexin type-1 (PCSK1), and dihydropyrimidinase-like 3 (DPYSL3) were significantly altered in the epilepsy patients, and their expression trends were confirmed by the RT-qPCR, WB, and IHC staining results. By integrating transcriptomic and proteomic analyses, we identified genes and proteins expressed differently in epileptic and non-epileptic patients and their associated biological processes. Three key DEPs (TPPP3, PCSK1, and DPYSL3) were identified, indicating their potential significance in the pathological mechanisms of epilepsy.
Collapse
Affiliation(s)
- Taiyang Zhu
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Yan Zhou
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Lei Zhang
- Department of Neurosurgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Lingwen Kong
- Department of Neurosurgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Hai Tang
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Qihua Xiao
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Xiaoyu Sun
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
- Department of Rehabilitation Medicine, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Fanyu Shen
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Hui Zhou
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Wanyan Ni
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
- Department Neurology, West China Hospital, Sichuan University, Guo Xue Lane 37, Chengdu, 610041, Sichuan, PR China
| | - Sha Liu
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Huimin Gao
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Guoliang Jin
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Xiao Jia
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China.
| | - Fang Hua
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China.
- Department of Interdisciplinary Health Science, College of Allied Health Science, Augusta University, Augusta, 30912, USA.
| |
Collapse
|
2
|
Li R, Yu J, Ye D, Liu S, Zhang H, Lin H, Feng J, Deng K. Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics. Toxins (Basel) 2025; 17:78. [PMID: 39998095 PMCID: PMC11860864 DOI: 10.3390/toxins17020078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2024] [Revised: 01/25/2025] [Accepted: 02/07/2025] [Indexed: 02/26/2025] Open
Abstract
Conotoxins, a diverse family of disulfide-rich peptides derived from the venom of Conus species, have gained prominence in biomedical research due to their highly specific interactions with ion channels, receptors, and neurotransmitter systems. Their pharmacological properties make them valuable molecular tools and promising candidates for therapeutic development. However, traditional conotoxin classification and functional characterization remain labor-intensive, necessitating the increasing adoption of computational approaches. In particular, machine learning (ML) techniques have facilitated advancements in sequence-based classification, functional prediction, and de novo peptide design. This review explores recent progress in applying ML and deep learning (DL) to conotoxin research, comparing key databases, feature extraction techniques, and classification models. Additionally, we discuss future research directions, emphasizing the integration of multimodal data and the refinement of predictive frameworks to enhance therapeutic discovery.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Kejun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (R.L.); (J.Y.); (D.Y.); (S.L.); (H.Z.); (H.L.); (J.F.)
| |
Collapse
|
3
|
Wu CY, Xu ZX, Li N, Qi DY, Wu HY, Ding H, Jin YT. Predicting cyclins based on key features and machine learning methods. Methods 2025; 234:112-119. [PMID: 39694304 DOI: 10.1016/j.ymeth.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 12/12/2024] [Accepted: 12/15/2024] [Indexed: 12/20/2024] Open
Abstract
Cyclins are a group of proteins that regulate the cell cycle process by modulating various stages of cell division to ensure correct cell proliferation, differentiation, and apoptosis. Research on cyclins is crucial for understanding the biological functions and pathological states of cells. However, current research on cyclin identification based on machine learning only focuses on accuracy ignoring the interpretability of features. Therefore, in this study, we pay more attention to the interpretation and analysis of key features associated with cyclins. Firstly, we developed an SVM-based model for identifying cyclins with an accuracy of 92.8% through 5-fold. Then we analyzed the physicochemical properties of the 14 key features used in the model construction and identified the G and charged C1 features that are critical for distinguishing cyclins from non-cyclins. Furthermore, we constructed an SVM-based model using only these two features with an accuracy of 81.3% through the leave-one-out cross-validation. Our study shows that cyclins differ from non-cyclins in their physicochemical properties and that using only two features can achieve good prediction accuracy.
Collapse
Affiliation(s)
- Cheng-Yan Wu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teachers College, Baotou 014010, China.
| | - Zhi-Xue Xu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teachers College, Baotou 014010, China.
| | - Nan Li
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teachers College, Baotou 014010, China.
| | - Dan-Yang Qi
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teachers College, Baotou 014010, China.
| | - Hong-Ye Wu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teachers College, Baotou 014010, China.
| | - Hui Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
4
|
Huang Y, Qiu H, Chen Q, Meng Z, Qiao D, Yue X. Exploring Potential Diagnostic Biomarkers for Mechanical Asphyxia in the Heart Based on Proteomics Technology. Int J Mol Sci 2024; 25:12710. [PMID: 39684422 DOI: 10.3390/ijms252312710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2024] [Revised: 11/19/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Mechanical asphyxia presents a challenging diagnostic issue in forensic medicine due to its often covert nature, and the signs visible during an autopsy are usually not specific. Despite some progress in understanding hypoxia's effects, traditional methods' inherent limitations might overlook new biomarkers in mechanical asphyxia. This study employed 4D-DIA proteomics to explore the protein expression profiles of cardiac samples under conditions of mechanical asphyxia. Proteomic analysis identified 271 and 371 differentially expressed proteins in the strangulation and suffocation groups, respectively, compared to the control group. Seventy-eight differentially expressed proteins were identified across different mechanical asphyxia groups compared to the control group. GO and KEGG analysis showed enrichment in pathways, including complement and coagulation cascades, cAMP and cGMP-PKG signaling pathways, inflammatory mediator regulation of TRP channels, and phagosomes. Through stringent selection based on protein interactions, ALKBH5, NAA10, and CLPB were identified as potential diagnostic biomarkers. ALKBH5 showed increased expression in asphyxia models, while NAA10 and CLPB were downregulated; these biomarker changes were validated in both animal models and human cardiac samples. This study highlights the potential of proteomics in discovering reliable biomarkers, which can enhance the specificity of mechanical asphyxia diagnosis in forensic practice, provide new insights into the pathophysiological mechanisms of mechanical asphyxia, and offer new perspectives for diagnosing mechanical asphyxia.
Collapse
Affiliation(s)
- Yuebing Huang
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Hai Qiu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Qianling Chen
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Zilin Meng
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Dongfang Qiao
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Xia Yue
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
5
|
Yang S, Xing J, Liu D, Song Y, Yu H, Xu S, Zuo Y. Review and new insights into the catalytic structural domains of the Fe(ll) and 2-Oxoglutarate families. Int J Biol Macromol 2024; 278:134798. [PMID: 39153678 DOI: 10.1016/j.ijbiomac.2024.134798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
Histone lysine demethylase (KDM), AlkB homolog (ALKBH), and Ten-Eleven Translocation (TET) proteins are members of the 2-Oxoglutarate (2OG) and ferrous iron-dependent oxygenases, each of which harbors a catalytic domain centered on a double-stranded β-helix whose topology restricts the regions directly involved in substrate binding. However, they have different catalytic functions, and the deeply structural biological reasons are not yet clear. In this review, the catalytic domain features of the three protein families are summarized from both sequence and structural perspectives. The construction of the phylogenetic tree and comparison of the structure show ten relatively conserved β-sheets and three key regions with substantial structural differences. We summarize the relationship between three key regions of remarkable differences and the substrate compatibility of the three protein families. This review facilitates research into substrate-selective inhibition and bioengineering by providing new insights into the catalytic domains of KDM, ALKBH, and TET proteins.
Collapse
Affiliation(s)
- Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Jixiang Xing
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dongyang Liu
- Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yancheng Song
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Haoyu Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Shuhua Xu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China; State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China; Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China.
| |
Collapse
|
6
|
Liang Y, Guo Y, Zhai Y, Zhou J, Yang W, Zuo Y. Disease trend analysis platform accurately predicts the occurrence of cervical cancer under mixed diseases. Methods 2024; 230:108-115. [PMID: 39111721 DOI: 10.1016/j.ymeth.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 07/26/2024] [Accepted: 07/29/2024] [Indexed: 08/17/2024] Open
Abstract
Cervical cancer (CC) is one of the most common gynecological malignancies. Cytological screening, while being the most common and accurate method for detecting cervical cancer, is both time-consuming and costly. Predicting CC based on bioinformatics can assist in the rapid early screening of CC in clinical practice. Most recent CC prediction methods require a large amount of detection data or sequencing data and are not ideal for CC detection in complex disease samples. We developed the Disease trend analysis platform (Dtap), which can quickly predict the occurrence of diseases using only blood routine data. Blood routine data was collected from 1,292 cervical cancer patients, 4,860 patients with complex diseases, and 4,980 healthy individuals from various sources. The results show that the Dtap-based trend model maintained good and stable performance in the prediction task of multiple datasets as well as complex disease samples. Finally, we built DTAPCC (http://bioinfor.imu.edu.cn/dtapcc), a Dtap-based CC disease prediction platform, to help users quickly predict CC and visualize trend features.
Collapse
Affiliation(s)
- Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, PR China
| | - Yuting Guo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China
| | - Yifei Zhai
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China
| | - Wuritu Yang
- Computer Department, Hohhot Vocational College, Hohhot 010020, PR China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, PR China; Computer Department, Hohhot Vocational College, Hohhot 010020, PR China.
| |
Collapse
|
7
|
Suárez T, Montaño DF, Suárez R. Construction of amino acids reduced alphabets from molecular descriptors for interpretation of N-carbamylase, luciferase and PI3K mutations. Biosystems 2024; 246:105331. [PMID: 39260761 DOI: 10.1016/j.biosystems.2024.105331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 09/04/2024] [Accepted: 09/08/2024] [Indexed: 09/13/2024]
Abstract
The classification of amino acids has proven to be a useful tool for understanding the importance of sequence in protein function. The reduced amino acid alphabets are an example of these classifications, which, when built from physicochemical, structural and quantum characteristics of the amino acids, allow it to simplify the representation of the sequences, being useful in the modelling, design and understanding of proteins. So, an objective selection of amino acids properties is important, due classes formed in a reduced alphabet depend on the descriptors used for classification. In this research, based on a careful selection of descriptors for the 20 amino acids, through techniques such as the information content index and hierarchical cluster analysis with ties in proximity, 20,871,586 reduced amino acid alphabets were constructed. This large collection of reduced alphabets was been used to interpret alterations in the function of three proteins: N-carbamylase, Luciferase, and PI3K, caused by amino acid changes in their sequences. For this, the similar and different descriptors linked to these mutations were studied. Properties such as volume, hydrophobicity, charge and autocorrelation can be associated with variations in the behaviour of these proteins, while the frequency in specific secondary structures, the Gibbs free energy and some topological and quantum properties can be considered as the causes of preventing the deactivation of protein function. This work offers the most complete collection of reduced alphabets that promise to be a useful tool for the interpretation of alterations caused by amino acid mutations in the protein sequence.
Collapse
Affiliation(s)
- Tatiana Suárez
- CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| | - Diego F Montaño
- Departamento de Química, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| | - Rosana Suárez
- CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia
| |
Collapse
|
8
|
Yang S, Liu D, Song Y, Liang Y, Yu H, Zuo Y. Designing a structure-function alphabet of helix based on reduced amino acid clusters. Arch Biochem Biophys 2024; 754:109942. [PMID: 38387828 DOI: 10.1016/j.abb.2024.109942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 02/24/2024]
Abstract
Several simple secondary structures could form complex and diverse functional proteins, meaning that secondary structures may contain a lot of hidden information and are arranged according to certain principles, to carry enough information of functional specificity and diversity. However, these inner information and principles have not been understood systematically. In our study, we designed a structure-function alphabet of helix based on reduced amino acid clusters to describe the typical features of helices and delve into the information. Firstly, we selected 480 typical helices from membrane proteins, zymoproteins, transcription factors, and other proteins to define and calculate the interval range, and the helices are classified in terms of hydrophilicity, charge and length: (1) hydrophobic helix (≤43%), amphiphilic helix (43%∼71%), and hydrophilic helix (≥71%). (2) positive helix, negative helix, electrically neutral helix and uncharged helix. (3) short helix (≤8 aa), medium-length helix (9-28 aa), and long helix (≥29 aa). Then, we designed an alphabet containing 36 triplet codes according to the above classification, so that the main features of each helix can be represented by only three letters. This alphabet not only preliminarily defined the helix characteristics, but also greatly reduced the informational dimension of protein structure. Finally, we present an application example to demonstrate the value of the structure-function alphabet in protein functional determination and differentiation.
Collapse
Affiliation(s)
- Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Dongyang Liu
- Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yancheng Song
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Haoyu Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China.
| |
Collapse
|
9
|
Hauswedell H, Hetzel S, Gottlieb SG, Kretzmer H, Meissner A, Reinert K. Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences. Bioinformatics 2024; 40:btae097. [PMID: 38485699 PMCID: PMC10955267 DOI: 10.1093/bioinformatics/btae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/22/2023] [Accepted: 03/13/2024] [Indexed: 03/22/2024] Open
Abstract
MOTIVATION Local alignments of query sequences in large databases represent a core part of metagenomic studies and facilitate homology search. Following the development of NCBI Blast, many applications aimed to provide faster and equally sensitive local alignment frameworks. Most applications focus on protein alignments, while only few also facilitate DNA-based searches. None of the established programs allow searching DNA sequences from bisulfite sequencing experiments commonly used for DNA methylation profiling, for which specific alignment strategies need to be implemented. RESULTS Here, we introduce Lambda3, a new version of the local alignment application Lambda. Lambda3 is the first solution that enables the search of protein, nucleotide as well as bisulfite-converted nucleotide query sequences. Its protein mode achieves comparable performance to that of the highly optimized protein alignment application Diamond, while the nucleotide mode consistently outperforms established local nucleotide aligners. Combined, Lambda3 presents a universal local alignment framework that enables fast and sensitive homology searches for a wide range of use-cases. AVAILABILITY AND IMPLEMENTATION Lambda3 is free and open-source software publicly available at https://github.com/seqan/lambda/.
Collapse
Affiliation(s)
| | - Sara Hetzel
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Simon G Gottlieb
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
- Institute for Bio- and Geosciences, Forschungszentrum Jülich GmbH, Jülich 52428, Germany
| | - Helene Kretzmer
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Alexander Meissner
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
- Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
- Efficient Algorithms for Omics Data Group, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| |
Collapse
|
10
|
Ieremie I, Ewing RM, Niranjan M. Protein language models meet reduced amino acid alphabets. Bioinformatics 2024; 40:btae061. [PMID: 38310333 PMCID: PMC10872054 DOI: 10.1093/bioinformatics/btae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/14/2023] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open
Abstract
MOTIVATION Protein language models (PLMs), which borrowed ideas for modelling and inference from natural language processing, have demonstrated the ability to extract meaningful representations in an unsupervised way. This led to significant performance improvement in several downstream tasks. Clustering amino acids based on their physical-chemical properties to achieve reduced alphabets has been of interest in past research, but their application to PLMs or folding models is unexplored. RESULTS Here, we investigate the efficacy of PLMs trained on reduced amino acid alphabets in capturing evolutionary information, and we explore how the loss of protein sequence information impacts learned representations and downstream task performance. Our empirical work shows that PLMs trained on the full alphabet and a large number of sequences capture fine details that are lost in alphabet reduction methods. We further show the ability of a structure prediction model(ESMFold) to fold CASP14 protein sequences translated using a reduced alphabet. For 10 proteins out of the 50 targets, reduced alphabets improve structural predictions with LDDT-Cα differences of up to 19%. AVAILABILITY AND IMPLEMENTATION Trained models and code are available at github.com/Ieremie/reduced-alph-PLM.
Collapse
Affiliation(s)
- Ioan Ieremie
- Vision, Learning & Control Group, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Rob M Ewing
- Biological Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Mahesan Niranjan
- Vision, Learning & Control Group, University of Southampton, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
11
|
Zhang L, Zhou Q, Zhang J, Cao K, Fan C, Chen S, Jiang H, Wu F. Liver transcriptomic and proteomic analyses provide new insight into the pathogenesis of liver fibrosis in mice. Genomics 2023; 115:110738. [PMID: 37918454 DOI: 10.1016/j.ygeno.2023.110738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/25/2023] [Accepted: 10/30/2023] [Indexed: 11/04/2023]
Abstract
BACKGROUND Liver fibrosis (LF) is a kind of progressive liver injury reaction. The goal of this study was to achieve a more detailed understanding of the molecular changes in response to CCl4-induced LF through the identification of a differentially expressed liver transcriptomic and proteomic. RESULTS A total of 1224 differentially expressed genes (DEGs) and 302 differentially expressed proteins (DEPs) were significantly identified at the transcriptomic and proteomic level, respectively, and 69 genes (hereafter "cor-DEGs-DEPs" genes) were detected at both levels. Pathway enrichment analysis showed that these cor-DEGs-DEPs genes were significantly enriched in 133 pathways. Importantly, among the cor-DEGs-DEPs genes, Gstm1, Gstm3, Ephx1 and Gstp1 were shown to be associated with metabolic pathways, and confirmed by RT-qPCR and parallel reaction monitoring (PRM) verification. CONCLUSIONS Through the combined analysis of transcriptomic and proteomic data, this study provides valuable insights into the potential mechanism of the pathogenesis of LF, and lays a theoretical foundation for the further development of targeted therapy for LF.
Collapse
Affiliation(s)
- Lili Zhang
- Experimental Center of Clinical Research, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China; School of Pharmacy, Anhui University of Chinese Medicine, Hefei, China.
| | - Qiumei Zhou
- Experimental Center of Clinical Research, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China.
| | - Jiafu Zhang
- Department of Pharmacy, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China.
| | - Kefeng Cao
- Departments of Laboratory Medicine, Traditional Chinese Medical Hospital of Taihe County, Fuyang, China.
| | - Chang Fan
- Experimental Center of Clinical Research, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China; School of Pharmacy, Anhui University of Chinese Medicine, Hefei, China.
| | - Sen Chen
- Experimental Center of Clinical Research, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China; School of Pharmacy, Anhui University of Chinese Medicine, Hefei, China.
| | - Hui Jiang
- Experimental Center of Clinical Research, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, China; School of Pharmacy, Anhui University of Chinese Medicine, Hefei, China.
| | - Furong Wu
- Department of Pharmacy, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
| |
Collapse
|
12
|
Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]
Abstract
Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific score matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.
Collapse
Affiliation(s)
- Shanghua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Chengfang Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dezhi Yang
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.
| |
Collapse
|
13
|
Chang CH, Nelson WC, Jerger A, Wright AT, Egbert RG, McDermott JE. Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding. BIOINFORMATICS ADVANCES 2023; 3:vbad005. [PMID: 36789294 PMCID: PMC9913046 DOI: 10.1093/bioadv/vbad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 12/16/2022] [Accepted: 02/01/2023] [Indexed: 02/04/2023]
Abstract
Motivation The vast expansion of sequence data generated from single organisms and microbiomes has precipitated the need for faster and more sensitive methods to assess evolutionary and functional relationships between proteins. Representing proteins as sets of short peptide sequences (kmers) has been used for rapid, accurate classification of proteins into functional categories; however, this approach employs an exact-match methodology and thus may be limited in terms of sensitivity and coverage. We have previously used similarity groupings, based on the chemical properties of amino acids, to form reduced character sets and recode proteins. This amino acid recoding (AAR) approach simplifies the construction of protein representations in the form of kmer vectors, which can link sequences with distant sequence similarity and provide accurate classification of problematic protein families. Results Here, we describe Snekmer, a software tool for recoding proteins into AAR kmer vectors and performing either (i) construction of supervised classification models trained on input protein families or (ii) clustering for de novo determination of protein families. We provide examples of the operation of the tool against a set of nitrogen cycling families originally collected using both standard hidden Markov models and a larger set of proteins from Uniprot and demonstrate that our method accurately differentiates these sequences in both operation modes. Availability and implementation Snekmer is written in Python using Snakemake. Code and data used in this article, along with tutorial notebooks, are available at http://github.com/PNNL-CompBio/Snekmer under an open-source BSD-3 license. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Christine H Chang
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Abby Jerger
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Aaron T Wright
- Department of Biology, Baylor University, Waco, TX 76798, USA
| | - Robert G Egbert
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | |
Collapse
|