1
|
Zuo Y, Wan M, Shen Y, Wang X, He W, Bi Y, Liu X, Deng Z. ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique. Comput Biol Chem 2024; 113:108212. [PMID: 39277959 DOI: 10.1016/j.compbiolchem.2024.108212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/02/2024] [Accepted: 09/12/2024] [Indexed: 09/17/2024]
Abstract
Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
| | - Minquan Wan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Yang Shen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Xinheng Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Wenying He
- School of Artificial Intelligence, Hebei University of Technology, Tianjin 300130, China
| | - Yue Bi
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Xiangrong Liu
- Department of Computer Science and Technology, National Institute for Data Science in Health and Medicine, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen 361005, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
| |
Collapse
|
2
|
Feng Y, Ho KL, Zhang M, Sundaresha NB, Cavanagh HL, Zhao S. Canine major histocompatibility complex class I (MHC-I) diversity landscape. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580220. [PMID: 38405923 PMCID: PMC10888748 DOI: 10.1101/2024.02.14.580220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
The genes of the Major Histocompatibility Complex class I (MHC-I) are among the most diverse in the mammalian genome, playing a crucial role in immunology. Understanding the diversity landscape of MHC-I is therefore of paramount importance. The dog is a key translational model in various biomedical fields. However, our understanding of the canine MHC-I diversity landscape lags significantly behind that of humans. To address this deficiency, we used our newly developed software, KPR de novo assembler and genotyper, to genotype 1,325 samples from 1,025 dogs with paired-end RNA-seq data from 43 BioProjects, after extensive quality control. Among 926 dogs that pass the QC, 591 dogs (64%) have at least one allele genotyped, and a total of 97 known alleles and 52 putative new alleles were identified. Further analysis reveals that DLA-I gene expression levels vary among the tissues, with lowest for testis and brain tissues and highest for blood, corpus luteum, and spleen. We identified dominant alleles in each of the 17 canine breeds, as well as among the entire canine population. Furthermore, our analysis also identifies breed-specific alleles and mutually co-occurred/exclusive alleles. Our study indicates that canine DLA-88 is as diversified as human HLA-A/B/C genes within the entire population, but less diversified within a breed than with HLA-A/B/C within an ethnic group. Lastly, we examined the hypervariable regions (HVR) within or across human/canine MHC-I alleles and found that 80% of the HVRs overlap between the two species. We further noted that 80% of the HVRs are within 4A contact with the peptides, and that the dog-human difference overlaps with only 20% HVRs. Our research offers valuable insights for immunological studies involving dogs.
Collapse
|
3
|
Watson J, Wang T, Ho KL, Feng Y, Mahawan T, Dobbin KK, Zhao S. Human basal-like breast cancer is represented by one of the two mammary tumor subtypes in dogs. Breast Cancer Res 2023; 25:114. [PMID: 37789381 PMCID: PMC10546663 DOI: 10.1186/s13058-023-01705-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 08/31/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND About 20% of breast cancers in humans are basal-like, a subtype that is often triple-negative and difficult to treat. An effective translational model for basal-like breast cancer is currently lacking and urgently needed. To determine whether spontaneous mammary tumors in pet dogs could meet this need, we subtyped canine mammary tumors and evaluated the dog-human molecular homology at the subtype level. METHODS We subtyped 236 canine mammary tumors from 3 studies by applying various subtyping strategies on their RNA-seq data. We then performed PAM50 classification with canine tumors alone, as well as with canine tumors combined with human breast tumors. We identified feature genes for human BLBC and luminal A subtypes via machine learning and used these genes to repeat canine-alone and cross-species tumor classifications. We investigated differential gene expression, signature gene set enrichment, expression association, mutational landscape, and other features for dog-human subtype comparison. RESULTS Our independent genome-wide subtyping consistently identified two molecularly distinct subtypes among the canine tumors. One subtype is mostly basal-like and clusters with human BLBC in cross-species PAM50 and feature gene classifications, while the other subtype does not cluster with any human breast cancer subtype. Furthermore, the canine basal-like subtype recaptures key molecular features (e.g., cell cycle gene upregulation, TP53 mutation) and gene expression patterns that characterize human BLBC. It is enriched in histological subtypes that match human breast cancer, unlike the other canine subtype. However, about 33% of canine basal-like tumors are estrogen receptor negative (ER-) and progesterone receptor positive (PR+), which is rare in human breast cancer. Further analysis reveals that these ER-PR+ canine tumors harbor additional basal-like features, including upregulation of genes of interferon-γ response and of the Wnt-pluripotency pathway. Interestingly, we observed an association of PGR expression with gene silencing in all canine tumors and with the expression of T cell exhaustion markers (e.g., PDCD1) in ER-PR+ canine tumors. CONCLUSIONS We identify a canine mammary tumor subtype that molecularly resembles human BLBC overall and thus could serve as a vital translational model of this devastating breast cancer subtype. Our study also sheds light on the dog-human difference in the mammary tumor histology and the hormonal cycle.
Collapse
Affiliation(s)
- Joshua Watson
- Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA
| | - Tianfang Wang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA
| | - Kun-Lin Ho
- Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA
| | - Yuan Feng
- Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA
| | - Tanakamol Mahawan
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Kevin K Dobbin
- Department of Biostatistics, University of Georgia, Athens, GA, 30602, USA
| | - Shaying Zhao
- Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA.
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, 120 E Green Street, Athens, GA, 30602, USA.
| |
Collapse
|
4
|
Watson J, Wang T, Ho KL, Feng Y, Dobbin KK, Zhao S. Human basal-like breast cancer is represented by one of the two mammary tumor subtypes in dogs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.02.530622. [PMID: 37034591 PMCID: PMC10081165 DOI: 10.1101/2023.03.02.530622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Background About 20% of breast cancers in humans are basal-like, a subtype that is often triple negative and difficult to treat. An effective translational model for basal-like breast cancer (BLBC) is currently lacking and urgently needed. To determine if spontaneous mammary tumors in pet dogs could meet this need, we subtyped canine mammary tumors and evaluated the dog-human molecular homology at the subtype level. Methods We subtyped 236 canine mammary tumors from 3 studies by applying various subtyping strategies on their RNA-seq data. We then performed PAM50 classification with canine tumors alone, as well as with canine tumors combined with human breast tumors. We investigated differential gene expression, signature gene set enrichment, expression association, mutational landscape, and other features for dog-human subtype comparison. Results Our independent genome-wide subtyping consistently identified two molecularly distinct subtypes among the canine tumors. One subtype is mostly basal-like and clusters with human BLBC in cross-species PAM50 classification, while the other subtype does not cluster with any human breast cancer subtype. Furthermore, the canine basal-like subtype recaptures key molecular features (e.g., cell cycle gene upregulation, TP53 mutation) and gene expression patterns that characterize human BLBC. It is enriched histological subtypes that match human breast cancer, unlike the other canine subtype. However, about 33% of canine basal-like tumors are estrogen receptor negative (ER-) and progesterone receptor positive (PR+), which is rare in human breast cancer. Further analysis reveals that these ER-PR+ canine tumors harbor additional basal-like features, including upregulation of genes of interferon-γ response and of the Wnt-pluripotency pathway. Interestingly, we observed an association of PGR expression with gene silencing in all canine tumors, and with the expression of T cell exhaustion markers (e.g., PDCD1 ) in ER-PR+ canine tumors. Conclusions We identify a canine mammary tumor subtype that molecularly resembles human BLBC overall, and thus could serve as a vital spontaneous animal model of this devastating breast cancer subtype. Our study also sheds light on the dog-human difference in the mammary tumor histology and the hormonal cycle.
Collapse
Affiliation(s)
- Joshua Watson
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Tianfang Wang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Kun-Lin Ho
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Yuan Feng
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Kevin K Dobbin
- Department of Biostatistics, University of Georgia, Athens, GA 30602, USA
| | - Shaying Zhao
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|