1
|
Midha T, Kolomeisky AB, Igoshin OA. Linear-Decoupling Enables Accurate Speed and Accuracy Predictions for Copolymerization Processes. J Phys Chem Lett 2024; 15:9361-9368. [PMID: 39240239 DOI: 10.1021/acs.jpclett.4c02132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Biological processes exhibit remarkable accuracy and speed and can be theoretically explored through various approaches. The Markov-chain copolymerization theory, describing polymer growth kinetics as a Markov chain, provides an exact set of equations to solve for error and speed. Still, due to nonlinearity, these equations are hard to solve. Alternatively, the enzyme-kinetics approach, which formulates a set of linear equations, simplifies the biological processes as transitions between discrete chemical states, but generally, it might not be accurate. Here, we show that the enzyme-kinetic approach can lead to inaccurate fluxes, even for first-order polymerization processes. To address the problem, we propose a simplified linear-decoupling approximation for steady-state probabilities of higher-order copolymer chains under biologically relevant conditions. Our findings demonstrate that the stationary speed and error rate obtained from the linear-decoupling method align closely with exact values from the Markov-chain (nonlinear) approximation. Extending the technique to higher-order processes with proofreading and internal states shows that it works equally well to describe trade-offs between speed and accuracy for DNA replication and transcription elongation. Our work underscores the proposed linear-decoupling approximation's efficacy in addressing the nonlinear behavior of the Markov-chain approach and the enzyme-kinetic approach's limitations, ensuring accurate predictions for high-fidelity biological processes.
Collapse
Affiliation(s)
- Tripti Midha
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| | - Oleg A Igoshin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
- Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
2
|
Midha T, Kolomeisky AB, Igoshin OA. Insights into Error Control Mechanisms in Biological Processes: Copolymerization and Enzyme-Kinetics Revisited. J Phys Chem B 2024; 128:5612-5622. [PMID: 38814670 DOI: 10.1021/acs.jpcb.4c02173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
The high fidelity observed in biological information processing ranging from replication to translation has stimulated significant research efforts to clarify the underlying microscopic picture. Theoretically, several approaches to analyze the error rates have been proposed. The copolymerization theory describes the addition and removal of monomers at the growing tip of a copolymer, leading to a closed set of nonlinear equations. On the other hand, enzyme-kinetics approaches formulate linear equations of biochemical networks, describing transitions between discrete chemical states. However, it is still unclear whether the error values computed by the two approaches agree. Moreover, there are conflicting interpretations on whether the error is under thermodynamic or kinetic discrimination control. In this work, we examine the error rate in persistent copying biochemical processes by specifically analyzing both theoretical approaches. The initial disagreement of the results between the two theories motivated us to rederive the formula for the error rate in the kinetic model. The error computed with the new method resulted in excellent agreement between both theoretical approaches and with Monte Carlo simulations. Furthermore, our theoretical analysis shows that the kinetic discrimination controls the error, even when the energy difference between adding the right and wrong products is very small. Our theoretical investigation gives important insights into the physical-chemical properties of complex biological processes by providing the quantitative framework to evaluate them.
Collapse
Affiliation(s)
- Tripti Midha
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| | - Oleg A Igoshin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
- Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
3
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Aydin Son Y, Tibshirani R, Unal S, Gevaert O. Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study. NPJ Digit Med 2024; 7:117. [PMID: 38714751 PMCID: PMC11076490 DOI: 10.1038/s41746-024-01128-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/25/2024] [Indexed: 05/10/2024] Open
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey.
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey.
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
4
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Son YA, Tibshirani R, Unal S, Gevaert O. Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19. RESEARCH SQUARE 2023:rs.3.rs-3569833. [PMID: 38045288 PMCID: PMC10690316 DOI: 10.21203/rs.3.rs-3569833/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
5
|
Midha T, Mallory JD, Kolomeisky AB, Igoshin OA. Synergy among Pausing, Intrinsic Proofreading, and Accessory Proteins Results in Optimal Transcription Speed and Tolerable Accuracy. J Phys Chem Lett 2023; 14:3422-3429. [PMID: 37010247 DOI: 10.1021/acs.jpclett.3c00345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Cleavage of dinucleotides after the misincorporational pauses serves as a proofreading mechanism that increases transcriptional elongation accuracy. The accuracy is further improved by accessory proteins such as GreA and TFIIS. However, it is not clear why RNAP pauses and why cleavage-factor-assisted proofreading is necessary despite transcriptional errors in vitro being of the same order as those in downstream translation. Here, we developed a chemical-kinetic model that incorporates most relevant features of transcriptional proofreading and uncovers how the balance between speed and accuracy is achieved. We found that long pauses are essential for high accuracy, whereas cleavage-factor-stimulated proofreading optimizes speed. Moreover, in comparison to the cleavage of a single nucleotide or three nucleotides, RNAP backtracking and dinucleotide cleavage improve both speed and accuracy. Our results thereby show how the molecular mechanism and the kinetic parameters of the transcriptional process were evolutionarily optimized to achieve maximal speed and tolerable accuracy.
Collapse
Affiliation(s)
- Tripti Midha
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Joel D Mallory
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| | - Oleg A Igoshin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
- Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
6
|
Yu Q, Kolomeisky AB, Igoshin OA. The energy cost and optimal design of networks for biological discrimination. J R Soc Interface 2022; 19:20210883. [PMID: 35259959 PMCID: PMC8905179 DOI: 10.1098/rsif.2021.0883] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many biological processes discriminate between correct and incorrect substrates through the kinetic proofreading mechanism that enables lower error at the cost of higher energy dissipation. Elucidating physico-chemical constraints for global minimization of dissipation and error is important for understanding enzyme evolution. Here, we identify theoretically a fundamental error-cost bound that tightly constrains the performance of proofreading networks under any parameter variations preserving the rate discrimination between substrates. The bound is kinetically controlled, i.e. completely determined by the difference between the transition state energies on the underlying free energy landscape. The importance of the bound is analysed for three biological processes. DNA replication by T7 DNA polymerase is shown to be nearly optimized, i.e. its kinetic parameters place it in the immediate proximity of the error-cost bound. The isoleucyl-tRNA synthetase (IleRS) of E. coli also operates close to the bound, but further optimization is prevented by the need for reaction speed. In contrast, E. coli ribosome operates in a high-dissipation regime, potentially in order to speed up protein production. Together, these findings establish a fundamental error-dissipation relation in biological proofreading networks and provide a theoretical framework for studying error-dissipation trade-off in other systems with biological discrimination.
Collapse
Affiliation(s)
- Qiwei Yu
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Anatoly B Kolomeisky
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.,Department of Chemistry, Rice University, Houston, TX 77005, USA.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX 77005, USA.,Department of Physics and Astronomy, Rice University, Houston, TX 77005, USA
| | - Oleg A Igoshin
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.,Department of Chemistry, Rice University, Houston, TX 77005, USA.,Department of Bioengineering, Rice University, Houston, TX 77005, USA.,Department of Biosciences, Rice University, Houston, TX 77005, USA
| |
Collapse
|
7
|
Domingo E, García-Crespo C, Lobo-Vega R, Perales C. Mutation Rates, Mutation Frequencies, and Proofreading-Repair Activities in RNA Virus Genetics. Viruses 2021; 13:1882. [PMID: 34578463 PMCID: PMC8473064 DOI: 10.3390/v13091882] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/06/2021] [Accepted: 09/17/2021] [Indexed: 12/29/2022] Open
Abstract
The error rate displayed during template copying to produce viral RNA progeny is a biologically relevant parameter of the replication complexes of viruses. It has consequences for virus-host interactions, and it represents the first step in the diversification of viruses in nature. Measurements during infections and with purified viral polymerases indicate that mutation rates for RNA viruses are in the range of 10-3 to 10-6 copying errors per nucleotide incorporated into the nascent RNA product. Although viruses are thought to exploit high error rates for adaptation to changing environments, some of them possess misincorporation correcting activities. One of them is a proofreading-repair 3' to 5' exonuclease present in coronaviruses that may decrease the error rate during replication. Here we review experimental evidence and models of information maintenance that explain why elevated mutation rates have been preserved during the evolution of RNA (and some DNA) viruses. The models also offer an interpretation of why error correction mechanisms have evolved to maintain the stability of genetic information carried out by large viral RNA genomes such as the coronaviruses.
Collapse
Affiliation(s)
- Esteban Domingo
- Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain;
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Carlos García-Crespo
- Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain;
| | - Rebeca Lobo-Vega
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Av. Reyes Católicos 2, 28040 Madrid, Spain;
| | - Celia Perales
- Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain;
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, 28029 Madrid, Spain
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Av. Reyes Católicos 2, 28040 Madrid, Spain;
| |
Collapse
|