1
|
Bengs BD, Nde J, Dutta S, Li Y, Sardiu ME. Integrative approaches for predicting protein network perturbations through machine learning and structural characterization. J Proteomics 2025; 316:105439. [PMID: 40228603 DOI: 10.1016/j.jprot.2025.105439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/14/2025] [Accepted: 04/08/2025] [Indexed: 04/16/2025]
Abstract
Chromatin remodeling complexes, such as the Saccharomyces cerevisiae INO80 complex, exemplify how dynamic protein interaction networks govern cellular function through a balance of conserved structural modules and context-dependent functional partnerships, as revealed by integrative machine learning and structural mapping approaches. In this study, we explored the INO80 complex using machine learning to predict network changes caused by genetic deletions. Tree-based models outperformed linear approaches, highlighting non-linear relationships within the interaction network. Feature selection identified key INO80 components (e.g., Arp5, Arp8) and cross-compartment features from other remodeling complexes like SWR1 and NuA4, emphasizing shared functional pathways. Perturbation patterns aligned with biological modules, particularly those linked to telomere maintenance and aging, underscoring the functional coherence of these networks. Structural mapping revealed that not all interactions are predictable through proximity alone, particularly with Arp5 and Yta7. By combining structural insights with machine learning, we enhanced predictions of genetic perturbation effects, providing a template for analyzing cross-species homologs (e.g., human INO80) and their disease-associated variants. This integrative approach bridges the gap between static structural data and dynamic functional networks, offering a pathway to disentangle conserved mechanisms from context-dependent adaptations in chromatin biology. SIGNIFICANCE: By leveraging an innovative, integrative machine learning approach, we have successfully predicted and analyzed perturbations in the INO80 network with good accuracy and depth. Our novel combination of machine learning, perturbation analysis, and structural investigation approach has provided crucial insights into the complex's structure-function relationships, shedding new light on its pivotal roles in affected pathways such as telomere maintenance. Our findings not only enhance our understanding of the INO80 complex but also establish a powerful framework for future studies in chromatin biology and beyond. This work represents a step forward in our understanding of chromatin remodeling complexes and their diverse cellular functions, laying the groundwork for future studies that can further refine our computational approaches and experimental techniques in this field.
Collapse
Affiliation(s)
- Bethany D Bengs
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Jules Nde
- Department of Cancer Biology, University of Kansas Medical Center, Kansas, USA
| | - Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas, USA; University of Kansas Cancer Center, Kansas City, USA; Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas, USA.
| |
Collapse
|
2
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, Vasanthakumari P, Doroshow JH, Stevens RL. Data imbalance in drug response prediction: multi-objective optimization approach in deep learning setting. Brief Bioinform 2025; 26:bbaf134. [PMID: 40178282 PMCID: PMC11966611 DOI: 10.1093/bib/bbaf134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 02/07/2025] [Accepted: 02/18/2025] [Indexed: 04/05/2025] Open
Abstract
Drug response prediction (DRP) methods tackle the complex task of associating the effectiveness of small molecules with the specific genetic makeup of the patient. Anti-cancer DRP is a particularly challenging task requiring costly experiments as underlying pathogenic mechanisms are broad and associated with multiple genomic pathways. The scientific community has exerted significant efforts to generate public drug screening datasets, giving a path to various machine learning models that attempt to reason over complex data space of small compounds and biological characteristics of tumors. However, the data depth is still lacking compared to application domains like computer vision or natural language processing domains, limiting current learning capabilities. To combat this issue and improves the generalizability of the DRP models, we are exploring strategies that explicitly address the imbalance in the DRP datasets. We reframe the problem as a multi-objective optimization across multiple drugs to maximize deep learning model performance. We implement this approach by constructing Multi-Objective Optimization Regularized by Loss Entropy loss function and plugging it into a Deep Learning model. We demonstrate the utility of proposed drug discovery methods and make suggestions for further potential application of the work to achieve desirable outcomes in the healthcare field.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Yvonne A Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21702, United States
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - James H Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, 31 Center Dr, Bethesda, MD 20892, United States
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
- Department of Computer Science, The University of Chicago, 5730 S Ellis Ave, Chicago, IL 60637, United States
| |
Collapse
|
3
|
Laine E, Freiberger MI. Toward a comprehensive profiling of alternative splicing proteoform structures, interactions and functions. Curr Opin Struct Biol 2025; 90:102979. [PMID: 39778413 PMCID: PMC7617313 DOI: 10.1016/j.sbi.2024.102979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 11/26/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025]
Abstract
The mRNA splicing machinery has been estimated to generate 100,000 known protein-coding transcripts for 20,000 human genes (Ensembl, Sept. 2024). However, this set is expanding with the massive and rapidly growing data coming from high-throughput technologies, particularly single-cell and long-read sequencing. Yet, the implications of splicing complexity at the protein level remain largely uncharted. In this review, we describe the current advances toward systematically assessing the contribution of alternative splicing to proteome function diversification. We discuss the potential and challenges of using artificial intelligence-based techniques in identifying alternative splicing proteoforms and characterising their structures, interactions, and functions.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, 75005 Paris, France; Institut universitaire de France (IUF), France.
| | - Maria Inés Freiberger
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, 75005 Paris, France
| |
Collapse
|
4
|
Tang Q, Khvorova A. RNAi-based drug design: considerations and future directions. Nat Rev Drug Discov 2024; 23:341-364. [PMID: 38570694 PMCID: PMC11144061 DOI: 10.1038/s41573-024-00912-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 04/05/2024]
Abstract
More than 25 years after its discovery, the post-transcriptional gene regulation mechanism termed RNAi is now transforming pharmaceutical development, proved by the recent FDA approval of multiple small interfering RNA (siRNA) drugs that target the liver. Synthetic siRNAs that trigger RNAi have the potential to specifically silence virtually any therapeutic target with unprecedented potency and durability. Bringing this innovative class of medicines to patients, however, has been riddled with substantial challenges, with delivery issues at the forefront. Several classes of siRNA drug are under clinical evaluation, but their utility in treating extrahepatic diseases remains limited, demanding continued innovation. In this Review, we discuss principal considerations and future directions in the design of therapeutic siRNAs, with a particular emphasis on chemistry, the application of informatics, delivery strategies and the importance of careful target selection, which together influence therapeutic success.
Collapse
Affiliation(s)
- Qi Tang
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA
- Department of Dermatology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Anastasia Khvorova
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA.
- Program in Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA.
| |
Collapse
|
5
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Shukla M, Xia F, Clyde A, Vasanthakumari P, Doroshow JH, Stevens RL. Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models. Cancers (Basel) 2023; 16:50. [PMID: 38201477 PMCID: PMC10777918 DOI: 10.3390/cancers16010050] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yvonne A. Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA;
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Austin Clyde
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - James H. Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
6
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|