1
|
Pezoulas VC, Kourou KD, Mylona E, Papaloukas C, Liontos A, Biros D, Milionis OI, Kyriakopoulos C, Kostikas K, Milionis H, Fotiadis DI. ICU admission and mortality classifiers for COVID-19 patients based on subgroups of dynamically associated profiles across multiple timepoints. Comput Biol Med 2022; 141:105176. [PMID: 35007991 DOI: 10.1016/j.compbiomed.2021.105176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/22/2021] [Accepted: 12/23/2021] [Indexed: 01/08/2023]
Abstract
The coronavirus disease 2019 (COVID-19) which is caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is consistently causing profound wounds in the global healthcare system due to its increased transmissibility. Currently, there is an urgent unmet need to identify the underlying dynamic associations among COVID-19 patients and distinguish patient subgroups with common clinical profiles towards the development of robust classifiers for ICU admission and mortality. To address this need, we propose a four step pipeline which: (i) enhances the quality of multiple timeseries clinical data through an automated data curation workflow, (ii) deploys Dynamic Bayesian Networks (DBNs) for the detection of features with increased connectivity based on dynamic association analysis across multiple points, (iii) utilizes Self Organizing Maps (SOMs) and trajectory analysis for the early identification of COVID-19 patients with common clinical profiles, and (iv) trains robust multiple additive regression trees (MART) for ICU admission and mortality classification based on the extracted homogeneous clusters, to identify risk factors and biomarkers for disease progression. The contribution of the extracted clusters and the dynamically associated clinical data improved the classification performance for ICU admission to sensitivity 0.83 and specificity 0.83, and for mortality to sensitivity 0.74 and specificity 0.76. Additional information was included to enhance the performance of the classifiers yielding an increase by 4% in sensitivity and specificity for mortality. According to the risk factor analysis, the number of lymphocytes, SatO2, PO2/FiO2, and O2 supply type were highlighted as risk factors for ICU admission and the percentage of neutrophils and lymphocytes, PO2/FiO2, LDH, and ALP for mortality, among others. To our knowledge, this is the first study that combines dynamic modeling with clustering analysis to identify homogeneous groups of COVID-19 patients towards the development of robust classifiers for ICU admission and mortality.
Collapse
|
2
|
Pezoulas VC, Papaloukas C, Veyssiere M, Goules A, Tzioufas AG, Soumelis V, Fotiadis DI. A computational workflow for the detection of candidate diagnostic biomarkers of Kawasaki disease using time-series gene expression data. Comput Struct Biotechnol J 2021; 19:3058-3068. [PMID: 34136104 PMCID: PMC8178098 DOI: 10.1016/j.csbj.2021.05.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/17/2021] [Accepted: 05/20/2021] [Indexed: 12/15/2022] Open
Abstract
Unlike autoimmune diseases, there is no known constitutive and disease-defining biomarker for systemic autoinflammatory diseases (SAIDs). Kawasaki disease (KD) is one of the "undiagnosed" types of SAIDs whose pathogenic mechanism and gene mutation still remain unknown. To address this issue, we have developed a sequential computational workflow which clusters KD patients with similar gene expression profiles across the three different KD phases (Acute, Subacute and Convalescent) and utilizes the resulting clustermap to detect prominent genes that can be used as diagnostic biomarkers for KD. Self-Organizing Maps (SOMs) were employed to cluster patients with similar gene expressions across the three phases through inter-phase and intra-phase clustering. Then, false discovery rate (FDR)-based feature selection was applied to detect genes that significantly deviate across the per-phase clusters. Our results revealed five genes as candidate biomarkers for KD diagnosis, namely, the HLA-DQB1, HLA-DRA, ZBTB48, TNFRSF13C, and CASD1. To our knowledge, these five genes are reported for the first time in the literature. The impact of the discovered genes for KD diagnosis against the known ones was demonstrated by training boosting ensembles (AdaBoost and XGBoost) for KD classification on common platform and cross-platform datasets. The classifiers which were trained on the proposed genes from the common platform data yielded an average increase by 4.40% in accuracy, 5.52% in sensitivity, and 3.57% in specificity than the known genes in the Acute and Subacute phases, followed by a notable increase by 2.30% in accuracy, 2.20% in sensitivity, and 4.70% in specificity in the cross-platform analysis.
Collapse
Affiliation(s)
- Vasileios C. Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina GR45110, Greece
| | - Costas Papaloukas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina GR45110, Greece
- Department of Biological Applications and Technology, University of Ioannina, Ioannina GR45100, Greece
| | - Maëva Veyssiere
- INSERM U976, Human Immunology, Physiopathology and Immunotherapy, Paris, France
| | - Andreas Goules
- Department of Pathophysiology, School of Medicine, University of Athens, Athens GR15772, Greece
| | - Athanasios G. Tzioufas
- Department of Pathophysiology, School of Medicine, University of Athens, Athens GR15772, Greece
| | - Vassili Soumelis
- INSERM U976, Human Immunology, Physiopathology and Immunotherapy, Paris, France
- Hôpital Saint Louis, Saint Louis Research Institute, Paris, France
| | - Dimitrios I. Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina GR45110, Greece
- Department of Biomedical Research, FORTH (Foundation for Research & Technology)-IMBB (Institute of Molecular Biology and Biotechnology), Ioannina GR45110, Greece
| |
Collapse
|
3
|
Love KR, Shah KA, Whittaker CA, Wu J, Bartlett MC, Ma D, Leeson RL, Priest M, Borowsky J, Young SK, Love JC. Comparative genomics and transcriptomics of Pichia pastoris. BMC Genomics 2016; 17:550. [PMID: 27495311 PMCID: PMC4974788 DOI: 10.1186/s12864-016-2876-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 07/05/2016] [Indexed: 11/24/2022] Open
Abstract
Background Pichia pastoris has emerged as an important alternative host for producing recombinant biopharmaceuticals, owing to its high cultivation density, low host cell protein burden, and the development of strains with humanized glycosylation. Despite its demonstrated utility, relatively little strain engineering has been performed to improve Pichia, due in part to the limited number and inconsistent frameworks of reported genomes and transcriptomes. Furthermore, the co-mingling of genomic, transcriptomic and fermentation data collected about Komagataella pastoris and Komagataella phaffii, the two strains co-branded as Pichia, has generated confusion about host performance for these genetically distinct species. Generation of comparative high-quality genomes and transcriptomes will enable meaningful comparisons between the organisms, and potentially inform distinct biotechnological utilies for each species. Results Here, we present a comprehensive and standardized comparative analysis of the genomic features of the three most commonly used strains comprising the tradename Pichia: K. pastoris wild-type, K. phaffii wild-type, and K. phaffii GS115. We used a combination of long-read (PacBio) and short-read (Illumina) sequencing technologies to achieve over 1000X coverage of each genome. Construction of individual genomes was then performed using as few as seven individual contigs to create gap-free assemblies. We found substantial syntenic rearrangements between the species and characterized a linear plasmid present in K. phaffii. Comparative analyses between K. phaffii genomes enabled the characterization of the mutational landscape of the GS115 strain. We identified and examined 35 non-synonomous coding mutations present in GS115, many of which are likely to impact strain performance. Additionally, we investigated transcriptomic profiles of gene expression for both species during cultivation on various carbon sources. We observed that the most highly transcribed genes in both organisms were consistently highly expressed in all three carbon sources examined. We also observed selective expression of certain genes in each carbon source, including many sequences not previously reported as promoters for expression of heterologous proteins in yeasts. Conclusions Our studies establish a foundation for understanding critical relationships between genome structure, cultivation conditions and gene expression. The resources we report here will inform and facilitate rational, organism-wide strain engineering for improved utility as a host for protein production. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2876-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kerry R Love
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 76-253, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Kartik A Shah
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 76-253, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Charles A Whittaker
- The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility in the Swanson Biotechnology Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Jie Wu
- The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility in the Swanson Biotechnology Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - M Catherine Bartlett
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 76-253, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Duanduan Ma
- The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility in the Swanson Biotechnology Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Rachel L Leeson
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 76-253, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Margaret Priest
- The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jonathan Borowsky
- The Barbara K. Ostrom (1978) Bioinformatics and Computing Facility in the Swanson Biotechnology Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sarah K Young
- The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - J Christopher Love
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 76-253, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA. .,The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|