1
|
Newaz K, Schaefers C, Weisel K, Baumbach J, Frishman D. Prognostic importance of splicing-triggered aberrations of protein complex interfaces in cancer. NAR Genom Bioinform 2024; 6:lqae133. [PMID: 39328266 PMCID: PMC11426328 DOI: 10.1093/nargab/lqae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 08/30/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open
Abstract
Aberrant alternative splicing (AS) is a prominent hallmark of cancer. AS can perturb protein-protein interactions (PPIs) by adding or removing interface regions encoded by individual exons. Identifying prognostic exon-exon interactions (EEIs) from PPI interfaces can help discover AS-affected cancer-driving PPIs that can serve as potential drug targets. Here, we assessed the prognostic significance of EEIs across 15 cancer types by integrating RNA-seq data with three-dimensional (3D) structures of protein complexes. By analyzing the resulting EEI network we identified patient-specific perturbed EEIs (i.e., EEIs present in healthy samples but absent from the paired cancer samples or vice versa) that were significantly associated with survival. We provide the first evidence that EEIs can be used as prognostic biomarkers for cancer patient survival. Our findings provide mechanistic insights into AS-affected PPI interfaces. Given the ongoing expansion of available RNA-seq data and the number of 3D structurally-resolved (or confidently predicted) protein complexes, our computational framework will help accelerate the discovery of clinically important cancer-promoting AS events.
Collapse
Affiliation(s)
- Khalique Newaz
- Institute for Computational Systems Biology and Center for Data and Computing in Natural Sciences, Universität Hamburg, 22761 Hamburg, Germany
| | - Christoph Schaefers
- Department of Oncology, Hematology and Bone Marrow Transplantation with Division of Pneumology, Universitätsklinikum Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Katja Weisel
- Department of Oncology, Hematology and Bone Marrow Transplantation with Division of Pneumology, Universitätsklinikum Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology and Center for Data and Computing in Natural Sciences, Universität Hamburg, 22761 Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
2
|
Xiao Y, Huang H, Chen Y, Zheng S, Chen J, Zou Z, Mehmood N, Ullah I, Liao X, Wang J. Insight on genetic features prevalent in five Ipomoea species using comparative codon pattern analysis reveals differences in major codons and reduced GC content at the 5’ end of CDS. Biochem Biophys Res Commun 2023; 657:92-99. [PMID: 37001285 DOI: 10.1016/j.bbrc.2023.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/10/2023] [Accepted: 03/10/2023] [Indexed: 03/30/2023]
Abstract
Ipomoea plants possess important commercial, medicinal, and ornamental value. Molecular and morphological studies have confirmed that most species of this genus exhibit similar phenotypes but complex phylogenetic relationships. To date, limited information is available on these evolutionary relationships. In this study, systematic analysis of diverse species from Ipomoea was used to elucidate the relationships in this genus. To this end, we employed the concept of codon usage bias (CUB) to analyze the codon usage bias of five Ipomoea species such as effective number of codons (ENC) and GC content at the third synonym codon position (GC3s). Three types of plots including ENC-GC3s, parity rule 2 (PR2) and neutrality plots were employed to discover the factors determining CUB, and the frequency of hydrogen bonds and nucleotide were calculated to dissect changes in GC content at the 5'-end of the coding sequence. Our results showed little distinctness in CUB among the five species, with a reduction of hydrogen bonds content at the 5'-end (with similar changes in cytosines). In addition, optimal codons of Ipomoea aquatica ended with G or C, different from those of the other four species, which ended in A or T. These results may be useful for exploring the evolutionary relationships among this group, and for understanding the reasons for the variation among Ipomoea species.
Collapse
|
3
|
Li Q, Newaz K, Milenković T. Towards future directions in data-integrative supervised prediction of human aging-related genes. BIOINFORMATICS ADVANCES 2022; 2:vbac081. [PMID: 36699345 PMCID: PMC9710570 DOI: 10.1093/bioadv/vbac081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/23/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022]
Abstract
Motivation Identification of human genes involved in the aging process is critical due to the incidence of many diseases with age. A state-of-the-art approach for this purpose infers a weighted dynamic aging-specific subnetwork by mapping gene expression (GE) levels at different ages onto the protein-protein interaction network (PPIN). Then, it analyzes this subnetwork in a supervised manner by training a predictive model to learn how network topologies of known aging- versus non-aging-related genes change across ages. Finally, it uses the trained model to predict novel aging-related gene candidates. However, the best current subnetwork resulting from this approach still yields suboptimal prediction accuracy. This could be because it was inferred using outdated GE and PPIN data. Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data. Results Unexpectedly, we find that not to be the case. To understand this, we perform aging-related pathway and Gene Ontology term enrichment analyses. We find that the suboptimal prediction accuracy, regardless of which GE or PPIN data is used, may be caused by the current knowledge about which genes are aging-related being incomplete, or by the current methods for inferring or analyzing an aging-specific subnetwork being unable to capture all of the aging-related knowledge. These findings can potentially guide future directions towards improving supervised prediction of aging-related genes via -omics data integration. Availability and implementation All data and code are available at zenodo, DOI: 10.5281/zenodo.6995045. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA
| | - Khalique Newaz
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA,Center for Data and Computing in Natural Sciences (CDCS), Institute for Computational Systems Biology, Universität Hamburg, Hamburg 20146, Germany
| | | |
Collapse
|
4
|
Newaz K, Piland J, Clark PL, Emrich SJ, Li J, Milenković T. Multi-layer sequential network analysis improves protein 3D structural classification. Proteins 2022; 90:1721-1731. [PMID: 35441395 PMCID: PMC9356989 DOI: 10.1002/prot.26349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 03/04/2022] [Accepted: 03/30/2022] [Indexed: 11/08/2022]
Abstract
Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which performed better than or comparable to state-of-the-art sequence or other 3D structure-based PSC approaches. However, existing PSN-based PSC approaches model the whole 3D structure of a protein as a static (i.e., single-layer) PSN. Because folding of a protein is a dynamic process, where some parts (i.e., sub-structures) of a protein fold before others, modeling the 3D structure of a protein as a PSN that captures the sub-structures might further help improve the existing PSC performance. Here, we propose to model 3D structures of proteins as multi-layer sequential PSNs that approximate 3D sub-structures of proteins, with the hypothesis that this will improve upon the current state-of-the-art PSC approaches that are based on single-layer PSNs (and thus upon the existing state-of-the-art sequence and other 3D structural approaches). Indeed, we confirm this on 72 datasets spanning ~44 000 CATH and SCOPe protein domains.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA,Center for Data and Computing in Natural Sciences (CDCS), Institute for Computational Systems Biology, Universität Hamburg, Hamburg, 20146, Germany
| | - Jacob Piland
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Patricia L. Clark
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Scott J. Emrich
- Department of Electrical Engineering and Computer Science; University of Tennessee, Knoxville, TN 37996, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
5
|
Komar AA. A Code Within a Code: How Codons Fine-Tune Protein Folding in the Cell. BIOCHEMISTRY (MOSCOW) 2021; 86:976-991. [PMID: 34488574 DOI: 10.1134/s0006297921080083] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The genetic code sets the correspondence between the sequence of a given nucleotide triplet in an mRNA molecule, called a codon, and the amino acid that is added to the growing polypeptide chain during protein synthesis. With four bases (A, G, U, and C), there are 64 possible triplet codons: 61 sense codons (encoding amino acids) and 3 nonsense codons (so-called, stop codons that define termination of translation). In most organisms, there are 20 common/standard amino acids used in protein synthesis; thus, the genetic code is redundant with most amino acids (with the exception of Met and Trp) are being encoded by more than one (synonymous) codon. Synonymous codons were initially presumed to have entirely equivalent functions, however, the finding that synonymous codons are not present at equal frequencies in mRNA suggested that the specific codon choice might have functional implications beyond coding for amino acid. Observation of nonequivalent use of codons in mRNAs implied a possibility of the existence of auxiliary information in the genetic code. Indeed, it has been found that genetic code contains several layers of such additional information and that synonymous codons are strategically placed within mRNAs to ensure a particular translation kinetics facilitating and fine-tuning co-translational protein folding in the cell via step-wise/sequential structuring of distinct regions of the polypeptide chain emerging from the ribosome at different points in time. This review summarizes key findings in the field that have identified the role of synonymous codons and their usage in protein folding in the cell.
Collapse
Affiliation(s)
- Anton A Komar
- Center for Gene Regulation in Health and Disease and Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH 44115, USA. .,Department of Biochemistry and Center for RNA Science and Therapeutics, Case Western Reserve University, Cleveland, OH 44106, USA.,Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA.,DAPCEL, Inc., Cleveland, OH 44106, USA
| |
Collapse
|
6
|
Brysbaert G, Lensink MF. Centrality Measures in Residue Interaction Networks to Highlight Amino Acids in Protein–Protein Binding. FRONTIERS IN BIOINFORMATICS 2021; 1:684970. [PMID: 36303777 PMCID: PMC9581030 DOI: 10.3389/fbinf.2021.684970] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 12/21/2022] Open
Abstract
Residue interaction networks (RINs) describe a protein structure as a network of interacting residues. Central nodes in these networks, identified by centrality analyses, highlight those residues that play a role in the structure and function of the protein. However, little is known about the capability of such analyses to identify residues involved in the formation of macromolecular complexes. Here, we performed six different centrality measures on the RINs generated from the complexes of the SKEMPI 2 database of changes in protein–protein binding upon mutation in order to evaluate the capability of each of these measures to identify major binding residues. The analyses were performed with and without the crystallographic water molecules, in addition to the protein residues. We also investigated the use of a weight factor based on the inter-residue distances to improve the detection of these residues. We show that for the identification of major binding residues, closeness, degree, and PageRank result in good precision, whereas betweenness, eigenvector, and residue centrality analyses give a higher sensitivity. Including water in the analysis improves the sensitivity of all measures without losing precision. Applying weights only slightly raises the sensitivity of eigenvector centrality analysis. We finally show that a combination of multiple centrality analyses is the optimal approach to identify residues that play a role in protein–protein interaction.
Collapse
|