51
|
A deep learning model (FociRad) for automated detection of γ-H2AX foci and radiation dose estimation. Sci Rep 2022; 12:5527. [PMID: 35365702 PMCID: PMC8975967 DOI: 10.1038/s41598-022-09180-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
DNA double-strand breaks (DSBs) are the most lethal form of damage to cells from irradiation. γ-H2AX (phosphorylated form of H2AX histone variant) has become one of the most reliable and sensitive biomarkers of DNA DSBs. However, the γ-H2AX foci assay still has limitations in the time consumed for manual scoring and possible variability between scorers. This study proposed a novel automated foci scoring method using a deep convolutional neural network based on a You-Only-Look-Once (YOLO) algorithm to quantify γ-H2AX foci in peripheral blood samples. FociRad, a two-stage deep learning approach, consisted of mononuclear cell (MNC) and γ-H2AX foci detections. Whole blood samples were irradiated with X-rays from a 6 MV linear accelerator at 1, 2, 4 or 6 Gy. Images were captured using confocal microscopy. Then, dose-response calibration curves were established and implemented with unseen dataset. The results of the FociRad model were comparable with manual scoring. MNC detection yielded 96.6% accuracy, 96.7% sensitivity and 96.5% specificity. γ-H2AX foci detection showed very good F1 scores (> 0.9). Implementation of calibration curve in the range of 0-4 Gy gave mean absolute difference of estimated doses less than 1 Gy compared to actual doses. In addition, the evaluation times of FociRad were very short (< 0.5 min per 100 images), while the time for manual scoring increased with the number of foci. In conclusion, FociRad was the first automated foci scoring method to use a YOLO algorithm with high detection performance and fast evaluation time, which opens the door for large-scale applications in radiation triage.
Collapse
|
52
|
Prediction of Pulmonary Function Parameters Based on a Combination Algorithm. Bioengineering (Basel) 2022; 9:bioengineering9040136. [PMID: 35447696 PMCID: PMC9032560 DOI: 10.3390/bioengineering9040136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/18/2022] [Accepted: 03/23/2022] [Indexed: 11/19/2022] Open
Abstract
Objective: Pulmonary function parameters play a pivotal role in the assessment of respiratory diseases. However, the accuracy of the existing methods for the prediction of pulmonary function parameters is low. This study proposes a combination algorithm to improve the accuracy of pulmonary function parameter prediction. Methods: We first established a system to collect volumetric capnography and then processed the data with a combination algorithm to predict pulmonary function parameters. The algorithm consists of three main parts: a medical feature regression structure consisting of support vector machines (SVM) and extreme gradient boosting (XGBoost) algorithms, a sequence feature regression structure consisting of one-dimensional convolutional neural network (1D-CNN), and an error correction structure using improved K-nearest neighbor (KNN) algorithm. Results: The root mean square error (RMSE) of the pulmonary function parameters predicted by the combination algorithm was less than 0.39L and the R2 was found to be greater than 0.85 through a ten-fold cross-validation experiment. Conclusion: Compared with the existing methods for predicting pulmonary function parameters, the present algorithm can achieve a higher accuracy rate. At the same time, this algorithm uses specific processing structures for different features, and the interpretability of the algorithm is ensured while mining the feature depth information.
Collapse
|
53
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
54
|
|
55
|
Gan Y, Huang X, Zou G, Zhou S, Guan J. Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network. Brief Bioinform 2022; 23:6529282. [PMID: 35172334 DOI: 10.1093/bib/bbac018] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/27/2021] [Accepted: 01/13/2022] [Indexed: 12/20/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University 201600, Shanghai, China
| | - Xingyu Huang
- School of Computer Science and Technology, Donghua University 201600, Shanghai, China
| | - Guobing Zou
- School of Computer Science and Technology, Shanghai University 200444, Shanghai, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University 200433, Shanghai, China
| | - Jihong Guan
- Computer Science and Technology, Tongji University 200092, Shanghai, China
| |
Collapse
|
56
|
Ioannides AA, Orphanides GA, Liu L. Rhythmicity in heart rate and its surges usher a special period of sleep, a likely home for PGO waves. Curr Res Physiol 2022; 5:118-141. [PMID: 35243361 PMCID: PMC8867048 DOI: 10.1016/j.crphys.2022.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 02/01/2022] [Accepted: 02/06/2022] [Indexed: 11/30/2022] Open
Abstract
High amplitude electroencephalogram (EEG) events, like unitary K-complex (KC), are used to partition sleep into stages and hence define the hypnogram, a key instrument of sleep medicine. Throughout sleep the heart rate (HR) changes, often as a steady HR increase leading to a peak, what is known as a heart rate surge (HRS). The hypnogram is often unavailable when most needed, when sleep is disturbed and the graphoelements lose their identity. The hypnogram is also difficult to define during normal sleep, particularly at the start of sleep and the periods that precede and follow rapid eye movement (REM) sleep. Here, we use objective quantitative criteria that group together periods that cannot be assigned to a conventional sleep stage into what we call REM0 periods, with the presence of a HRS one of their defining properties. Extended REM0 periods are characterized by highly regular sequences of HRS that generate an infra-low oscillation around 0.05 Hz. During these regular sequence of HRS, and just before each HRS event, we find avalanches of high amplitude events for each one of the mass electrophysiological signals, i.e. related to eye movement, the motor system and the general neural activity. The most prominent features of long REM0 periods are sequences of three to five KCs which we label multiple K-complexes (KCm). Regarding HRS, a clear dissociation is demonstrated between the presence or absence of high gamma band spectral power (55-95 Hz) of the two types of KCm events: KCm events with strong high frequencies (KCmWSHF) cluster just before the peak of HRS, while KCm between HRS show no increase in high gamma band (KCmNOHF). Tomographic estimates of activity from magnetoencephalography (MEG) in pre-KC periods (single and multiple) showed common increases in the cholinergic Nucleus Basalis of Meynert in the alpha band. The direct contrast of KCmWSHF with KCmNOHF showed increases in all subjects in the high sigma band in the base of the pons and in three subjects in both the delta and high gamma bands in the medial Pontine Reticular Formation (mPRF), the putative Long Lead Initial pulse (LLIP) for Ponto-Geniculo-Occipital (PGO) waves.
Collapse
Affiliation(s)
- Andreas A. Ioannides
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
| | - Gregoris A. Orphanides
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
- The English School, Nicosia, 1684, Cyprus
| | - Lichan Liu
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
| |
Collapse
|
57
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
58
|
Martinez-Tejada I, Riedel CS, Juhler M, Andresen M, Wilhjelm JE. k-Shape clustering for extracting macro-patterns in intracranial pressure signals. Fluids Barriers CNS 2022; 19:12. [PMID: 35123535 PMCID: PMC8817510 DOI: 10.1186/s12987-022-00311-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 01/21/2022] [Indexed: 11/27/2022] Open
Abstract
Background Intracranial pressure (ICP) monitoring is a core component of neurosurgical diagnostics. With the introduction of telemetric monitoring devices in the last years, ICP monitoring has become feasible in a broader clinical setting including monitoring during full mobilization and at home, where a greater diversity of ICP waveforms are present. The need for identification of these variations, the so-called macro-patterns lasting seconds to minutes—emerges as a potential tool for better understanding the physiological underpinnings of patient symptoms. Methods We introduce a new methodology that serves as a foundation for future automatic macro-pattern identification in the ICP signal to comprehensively understand the appearance and distribution of these macro-patterns in the ICP signal and their clinical significance. Specifically, we describe an algorithm based on k-Shape clustering to build a standard library of such macro-patterns. Results In total, seven macro-patterns were extracted from the ICP signals. This macro-pattern library may be used as a basis for the classification of new ICP variation distributions based on clinical disease entities. Conclusions We provide the starting point for future researchers to use a computational approach to characterize ICP recordings from a wide cohort of disorders.
Collapse
|
59
|
Millán Arias P, Alipour F, Hill KA, Kari L. DeLUCS: Deep learning for unsupervised clustering of DNA sequences. PLoS One 2022; 17:e0261531. [PMID: 35061715 PMCID: PMC8782307 DOI: 10.1371/journal.pone.0261531] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 12/06/2021] [Indexed: 11/25/2022] Open
Abstract
We present a novel Deep Learning method for the Unsupervised Clustering of DNA Sequences (DeLUCS) that does not require sequence alignment, sequence homology, or (taxonomic) identifiers. DeLUCS uses Frequency Chaos Game Representations (FCGR) of primary DNA sequences, and generates "mimic" sequence FCGRs to self-learn data patterns (genomic signatures) through the optimization of multiple neural networks. A majority voting scheme is then used to determine the final cluster assignment for each sequence. The clusters learned by DeLUCS match true taxonomic groups for large and diverse datasets, with accuracies ranging from 77% to 100%: 2,500 complete vertebrate mitochondrial genomes, at taxonomic levels from sub-phylum to genera; 3,200 randomly selected 400 kbp-long bacterial genome segments, into clusters corresponding to bacterial families; three viral genome and gene datasets, averaging 1,300 sequences each, into clusters corresponding to virus subtypes. DeLUCS significantly outperforms two classic clustering methods (K-means++ and Gaussian Mixture Models) for unlabelled data, by as much as 47%. DeLUCS is highly effective, it is able to cluster datasets of unlabelled primary DNA sequences totalling over 1 billion bp of data, and it bypasses common limitations to classification resulting from the lack of sequence homology, variation in sequence length, and the absence or instability of sequence annotations and taxonomic identifiers. Thus, DeLUCS offers fast and accurate DNA sequence clustering for previously intractable datasets.
Collapse
Affiliation(s)
- Pablo Millán Arias
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | - Fatemeh Alipour
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | - Kathleen A. Hill
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Lila Kari
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
60
|
Basciu A, Callea L, Motta S, Bonvin AM, Bonati L, Vargiu AV. No dance, no partner! A tale of receptor flexibility in docking and virtual screening. VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
61
|
Wu H, Zeng R, Qiu X, Chen K, Zhuo Z, Guo K, Xiang Y, Yang Q, Jiang R, Leung FW, Lian Q, Sha W, Chen H. Investigating regulatory patterns of NLRP3 Inflammasome features and association with immune microenvironment in Crohn's disease. Front Immunol 2022; 13:1096587. [PMID: 36685554 PMCID: PMC9849378 DOI: 10.3389/fimmu.2022.1096587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 12/02/2022] [Indexed: 01/06/2023] Open
Abstract
INTRODUCTION Crohn's disease is characterized of dysregulated inflammatory and immune reactions. The role of the NOD-like receptor family, pyrin domain-containing 3 (NLRP3) inflammasome in Crohn's disease remains largely unknown. METHODS The microarray-based transcriptomic data and corresponding clinical information of GSE100833 and GSE16879 were obtained from the Gene Expression Omnibus (GEO) database. Identification of in the NLRP3 inflammasome-related genes and construction of LASSO regression model. Immune landscape analysis was evaluated with ssGSEA. Classification of Crohn's-disease samples based on NLRP3 inflammasome-related genes with ConsensusClusterPlus. Functional enrichment analysis, gene set variation analysis (GSVA) and drug-gene interaction network. RESULTS The expressions of NLRP3 inflammasome-related genes were increased in diseased tissues, and higher expressions of NLRP3 inflammasome-related genes were correlated with generally enhanced immune cell infiltration, immune-related pathways and human leukocyte antigen (HLA)-gene expressions. The gene-based signature showed well performance in the diagnosis of Crohn's disease. Moreover, consensus clustering identified two Crohn's disease clusters based on NLRP3 inflammasome-related genes, and cluster 2 was with higher expressions of the genes. Cluster 2 demonstrated upregulated activities of immune environment in Crohn's disease. Furthermore, four key hub genes were identified and potential drugs were explored for the treatment of Crohn's disease. CONCLUSIONS Our findings indicate that NLRP3 inflammasome and its related genes could regulate immune cells and responses, as well as involve in the pathogenesis of Crohn's disease from transcriptomic aspects. These findings provide in silico insights into the diagnosis and treatment of Crohn's disease and might assist in the clinical decision-making process.
Collapse
Affiliation(s)
- Huihuan Wu
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- School of Medicine, South China University of Technology, Guangzhou, China
| | - Ruijie Zeng
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- School of Medicine, Shantou University Medical College, Shantou, China
| | - Xinqi Qiu
- Zhuguang Community Healthcare Center, Guangzhou, China
| | - Kequan Chen
- Department of Gastroenterology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zewei Zhuo
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Kehang Guo
- Department of Critical Care Medicine, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yawen Xiang
- Edinburgh Medical School, College of Medicine and Veterinary Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Qi Yang
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Rui Jiang
- School of Medicine, South China University of Technology, Guangzhou, China
| | - Felix W. Leung
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
- *Correspondence: Felix W. Leung, ; Qizhou Lian, ; Weihong Sha, ; Hao Chen,
| | - Qizhou Lian
- Department of Medicine, Queen Mary Hospital, Hong Kong, Hong Kong SAR, China
- *Correspondence: Felix W. Leung, ; Qizhou Lian, ; Weihong Sha, ; Hao Chen,
| | - Weihong Sha
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- School of Medicine, South China University of Technology, Guangzhou, China
- *Correspondence: Felix W. Leung, ; Qizhou Lian, ; Weihong Sha, ; Hao Chen,
| | - Hao Chen
- Department of Gastroenterology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- School of Medicine, South China University of Technology, Guangzhou, China
- *Correspondence: Felix W. Leung, ; Qizhou Lian, ; Weihong Sha, ; Hao Chen,
| |
Collapse
|
62
|
Dai W, Yue W, Peng W, Fu X, Liu L, Liu L. Identifying Cancer Subtypes Using a Residual Graph Convolution Model on a Sample Similarity Network. Genes (Basel) 2021; 13:genes13010065. [PMID: 35052405 PMCID: PMC8774659 DOI: 10.3390/genes13010065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 11/16/2022] Open
Abstract
Cancer subtype classification helps us to understand the pathogenesis of cancer and develop new cancer drugs, treatment from which patients would benefit most. Most previous studies detect cancer subtypes by extracting features from individual samples, ignoring their associations with others. We believe that the interactions of cancer samples can help identify cancer subtypes. This work proposes a cancer subtype classification method based on a residual graph convolutional network and a sample similarity network. First, we constructed a sample similarity network regarding cancer gene co-expression patterns. Then, the gene expression profiles of cancer samples as initial features and the sample similarity network were passed into a two-layer graph convolutional network (GCN) model. We introduced the initial features to the GCN model to avoid over-smoothing during the training process. Finally, the classification of cancer subtypes was obtained through a softmax activation function. Our model was applied to breast invasive carcinoma (BRCA), glioblastoma multiforme (GBM) and lung cancer (LUNG) datasets. The accuracy values of our model reached 82.58%, 85.13% and 79.18% for BRCA, GBM and LUNG, respectively, which outperformed the existing methods. The survival analysis of our results proves the significant clinical features of the cancer subtypes identified by our model. Moreover, we can leverage our model to detect the essential genes enriched in gene ontology (GO) terms and the biological pathways related to a cancer subtype.
Collapse
Affiliation(s)
- Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
| | - Wenhao Yue
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
- Correspondence: ; Tel.: +86-13700600056
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| |
Collapse
|
63
|
Xu X, Chen Y, Zhang X, Zhang R, Chen X, Liu S, Sun Q. Modular characteristics and the mechanism of Chinese medicine's treatment of gastric cancer: a data mining and pharmacology-based identification. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:1777. [PMID: 35071471 PMCID: PMC8756228 DOI: 10.21037/atm-21-6301] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/17/2021] [Indexed: 11/27/2022]
Abstract
Background Traditional Chinese medicine (TCM) is increasingly extensively being applied as a complementary and alternative therapy for gastric cancer (GC); however, there is a lack of large-scale evidence-based deep learning for the guidance of its clinical prescription. Methods The combinational search terms of “Gastric cancer and/or gastric malignancy” and “Traditional Chinese Medicine” were used to retrieve clinical study-based herbal prescriptions from public database over the past 3 decades [1990–2020]. Association rules mining (ARM) was used to analyze the prescription patterns of the herbs extracted from the eligible studies. Deep machine learning and computational prediction were conducted to explore candidate prescriptions with general applicability for GC. The action mechanism of the preferred prescription was investigated through network pharmacology, and further validated via in vivo and in vitro experiments. Results A total of 194 clinical study-based herbal prescriptions with good efficacy for GC were collected. TCM with focus on invigorating the Spleen and tonifying the vital-Qi is a promising adjuvant therapy for GC. The preferred prescription is composed of Atractylodis Macrocephalae Rhizoma, Astragali Radix, Pinelliae Rhizoma, Citri Reticulatae Pericarpium, Herba Hedyotidis Diffusae, Crataegi Fructus, and so on. We screened 74 bioactive compounds and 2,128 predictive targets of the preferred prescription from public databases. Eventually, 135 GC-related genes were identified as the targets of the preferred prescription. The compound-target network revealed that the crucial substances in the preferred prescription are quercetin, kaempferol, baicalein, and nobiletin. Experimentally, the preferred prescription was validated to modulate GC cell survival and inhibit tumor progression mainly via the hTERT/MDM2-p53 signaling pathway in vivo and in vitro. Conclusions TCM aimed at invigorating the Spleen and tonifying the vital-Qi is a promising adjuvant therapy for GC, which offers a guidance for worldwide use of TCM in the treatment of GC.
Collapse
Affiliation(s)
- Xintian Xu
- Oncology Department, Jiangsu Province Hospital of Chinese Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, China.,No. 1 Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, China
| | - Yaling Chen
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xingxing Zhang
- Gastroenterology Department, Jiangsu Province Hospital of Chinese Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, China
| | - Ruijuan Zhang
- No. 1 Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xu Chen
- No. 1 Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, China
| | - Shenlin Liu
- Oncology Department, Jiangsu Province Hospital of Chinese Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, China
| | - Qingmin Sun
- Science and technology Department, Jiangsu Province Hospital of Chinese Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, China
| |
Collapse
|
64
|
Ali S, Li J, Pei Y, Khurram R, Rehman KU, Rasool AB. State-of-the-Art Challenges and Perspectives in Multi-Organ Cancer Diagnosis via Deep Learning-Based Methods. Cancers (Basel) 2021; 13:5546. [PMID: 34771708 PMCID: PMC8583666 DOI: 10.3390/cancers13215546] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 11/16/2022] Open
Abstract
Thus far, the most common cause of death in the world is cancer. It consists of abnormally expanding areas that are threatening to human survival. Hence, the timely detection of cancer is important to expanding the survival rate of patients. In this survey, we analyze the state-of-the-art approaches for multi-organ cancer detection, segmentation, and classification. This article promptly reviews the present-day works in the breast, brain, lung, and skin cancer domain. Afterwards, we analytically compared the existing approaches to provide insight into the ongoing trends and future challenges. This review also provides an objective description of widely employed imaging techniques, imaging modality, gold standard database, and related literature on each cancer in 2016-2021. The main goal is to systematically examine the cancer diagnosis systems for multi-organs of the human body as mentioned. Our critical survey analysis reveals that greater than 70% of deep learning researchers attain promising results with CNN-based approaches for the early diagnosis of multi-organ cancer. This survey includes the extensive discussion part along with current research challenges, possible solutions, and prospects. This research will endow novice researchers with valuable information to deepen their knowledge and also provide the room to develop new robust computer-aid diagnosis systems, which assist health professionals in bridging the gap between rapid diagnosis and treatment planning for cancer patients.
Collapse
Affiliation(s)
- Saqib Ali
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (S.A.); (J.L.); (K.u.R.)
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (S.A.); (J.L.); (K.u.R.)
| | - Yan Pei
- Computer Science Division, University of Aizu, Aizuwakamatsu 965-8580, Japan
| | - Rooha Khurram
- Beijing Key Laboratory for Green Catalysis and Separation, Department of Chemistry and Chemical Engineering, Beijing University of Technology, Beijing 100124, China;
| | - Khalil ur Rehman
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (S.A.); (J.L.); (K.u.R.)
| | - Abdul Basit Rasool
- Research Institute for Microwave and Millimeter-Wave (RIMMS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan;
| |
Collapse
|
65
|
Chelebian E, Avenel C, Kartasalo K, Marklund M, Tanoglidi A, Mirtti T, Colling R, Erickson A, Lamb AD, Lundeberg J, Wählby C. Morphological Features Extracted by AI Associated with Spatial Transcriptomics in Prostate Cancer. Cancers (Basel) 2021; 13:4837. [PMID: 34638322 PMCID: PMC8507756 DOI: 10.3390/cancers13194837] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/21/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Prostate cancer is a common cancer type in men, yet some of its traits are still under-explored. One reason for this is high molecular and morphological heterogeneity. The purpose of this study was to develop a method to gain new insights into the connection between morphological changes and underlying molecular patterns. We used artificial intelligence (AI) to analyze the morphology of seven hematoxylin and eosin (H&E)-stained prostatectomy slides from a patient with multi-focal prostate cancer. We also paired the slides with spatially resolved expression for thousands of genes obtained by a novel spatial transcriptomics (ST) technique. As both spaces are highly dimensional, we focused on dimensionality reduction before seeking associations between them. Consequently, we extracted morphological features from H&E images using an ensemble of pre-trained convolutional neural networks and proposed a workflow for dimensionality reduction. To summarize the ST data into genetic profiles, we used a previously proposed factor analysis. We found that the regions were automatically defined, outlined by unsupervised clustering, associated with independent manual annotations, in some cases, finding further relevant subdivisions. The morphological patterns were also correlated with molecular profiles and could predict the spatial variation of individual genes. This novel approach enables flexible unsupervised studies relating morphological and genetic heterogeneity using AI to be carried out.
Collapse
Affiliation(s)
- Eduard Chelebian
- Science for Life Laboratory, Department of Information Technology, Uppsala University, 752 37 Uppsala, Sweden;
| | - Christophe Avenel
- Science for Life Laboratory, Department of Information Technology, Uppsala University, 752 37 Uppsala, Sweden;
| | - Kimmo Kartasalo
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77 Stockholm, Sweden;
| | - Maja Marklund
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65 Solna, Sweden; (M.M.); (J.L.)
| | - Anna Tanoglidi
- Department of Clinical Pathology, Uppsala University Hospital, 752 37 Uppsala, Sweden;
| | - Tuomas Mirtti
- Department of Pathology, Research Program in Systems Oncology, University of Helsinki, Helsinki University Hospital, 00100 Helsinki, Finland;
| | - Richard Colling
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford OX3 7DQ, UK; (R.C.); (A.E.); (A.D.L.)
- Department of Cellular Pathology, Oxford University Hospitals NHS Foundation Trust, Oxford OX3 9DU, UK
| | - Andrew Erickson
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford OX3 7DQ, UK; (R.C.); (A.E.); (A.D.L.)
| | - Alastair D. Lamb
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford OX3 7DQ, UK; (R.C.); (A.E.); (A.D.L.)
- Department of Urology, Oxford University Hospitals NHS Foundation Trust, Oxford OX3 7LE, UK
| | - Joakim Lundeberg
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, 171 65 Solna, Sweden; (M.M.); (J.L.)
| | - Carolina Wählby
- Science for Life Laboratory, Department of Information Technology, Uppsala University, 752 37 Uppsala, Sweden;
| |
Collapse
|
66
|
Distance-based clustering challenges for unbiased benchmarking studies. Sci Rep 2021; 11:18988. [PMID: 34556686 PMCID: PMC8460803 DOI: 10.1038/s41598-021-98126-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 09/02/2021] [Indexed: 02/08/2023] Open
Abstract
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures. Clustering yields arbitrary labels and often depends on the trial, leading to varying results. Moreover, recent research indicated that all partition comparison measures can yield the same results for different clustering solutions. Consequently, algorithm selection and parameter optimization by unsupervised quality measures (QM) are always biased and misleading. Only if the predefined structures happen to meet the particular clustering criterion and QM, can the clusters be recovered. Results are presented based on 41 open-source algorithms which are particularly useful in biomedical scenarios. Furthermore, comparative analysis with mirrored density plots provides a significantly more detailed benchmark than that with the typically used box plots or violin plots.
Collapse
|
67
|
Ziletti A, Berns C, Treichel O, Weber T, Liang J, Kammerath S, Schwaerzler M, Virayah J, Ruau D, Ma X, Mattern A. Discovering Key Topics From Short, Real-World Medical Inquiries via Natural Language Processing. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.672867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we combine biomedical word embeddings, non-linear dimensionality reduction, and hierarchical clustering to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.
Collapse
|
68
|
Analysis of Nanotoxicity with Integrated Omics and Mechanobiology. NANOMATERIALS 2021; 11:nano11092385. [PMID: 34578701 PMCID: PMC8470953 DOI: 10.3390/nano11092385] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/06/2021] [Accepted: 09/09/2021] [Indexed: 12/13/2022]
Abstract
Nanoparticles (NPs) in biomedical applications have benefits owing to their small size. However, their intricate and sensitive nature makes an evaluation of the adverse effects of NPs on health necessary and challenging. Since there are limitations to conventional toxicological methods and omics analyses provide a more comprehensive molecular profiling of multifactorial biological systems, omics approaches are necessary to evaluate nanotoxicity. Compared to a single omics layer, integrated omics across multiple omics layers provides more sensitive and comprehensive details on NP-induced toxicity based on network integration analysis. As multi-omics data are heterogeneous and massive, computational methods such as machine learning (ML) have been applied for investigating correlation among each omics. This integration of omics and ML approaches will be helpful for analyzing nanotoxicity. To that end, mechanobiology has been applied for evaluating the biophysical changes in NPs by measuring the traction force and rigidity sensing in NP-treated cells using a sub-elastomeric pillar. Therefore, integrated omics approaches are suitable for elucidating mechanobiological effects exerted by NPs. These technologies will be valuable for expanding the safety evaluations of NPs. Here, we review the integration of omics, ML, and mechanobiology for evaluating nanotoxicity.
Collapse
|
69
|
Montemurro A, Schuster V, Povlsen HR, Bentzen AK, Jurtz V, Chronister WD, Crinklaw A, Hadrup SR, Winther O, Peters B, Jessen LE, Nielsen M. NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data. Commun Biol 2021; 4:1060. [PMID: 34508155 PMCID: PMC8433451 DOI: 10.1038/s42003-021-02610-3] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 08/27/2021] [Indexed: 12/17/2022] Open
Abstract
Prediction of T-cell receptor (TCR) interactions with MHC-peptide complexes remains highly challenging. This challenge is primarily due to three dominant factors: data accuracy, data scarceness, and problem complexity. Here, we showcase that "shallow" convolutional neural network (CNN) architectures are adequate to deal with the problem complexity imposed by the length variations of TCRs. We demonstrate that current public bulk CDR3β-pMHC binding data overall is of low quality and that the development of accurate prediction models is contingent on paired α/β TCR sequence data corresponding to at least 150 distinct pairs for each investigated pMHC. In comparison, models trained on CDR3α or CDR3β data alone demonstrated a variable and pMHC specific relative performance drop. Together these findings support that T-cell specificity is predictable given the availability of accurate and sufficient paired TCR sequence data. NetTCR-2.0 is publicly available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.0 .
Collapse
Affiliation(s)
- Alessandro Montemurro
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Viktoria Schuster
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Helle Rus Povlsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Amalie Kai Bentzen
- Department of Health Technology, Section for Experimental and Translational Immunology, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Vanessa Jurtz
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - William D Chronister
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
| | - Austin Crinklaw
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
| | - Sine R Hadrup
- Department of Health Technology, Section for Experimental and Translational Immunology, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Ole Winther
- Department of Biology, Bioinformatics Centre, University of Copenhagen, 2200, Copenhagen, Denmark
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs., Lyngby, 2800, Denmark
- Centre for Genomic Medicine, Rigshospitalet, Copenhagen University Hospital, København, Ø 2100, Denmark
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
- Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Leon Eyrich Jessen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark.
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina.
| |
Collapse
|
70
|
A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes (Basel) 2021. [DOI: 10.3390/pr9081466] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Data-driven model with predictive ability are important to be used in medical and healthcare. However, the most challenging task in predictive modeling is to construct a prediction model, which can be addressed using machine learning (ML) methods. The methods are used to learn and trained the model using a gene expression dataset without being programmed explicitly. Due to the vast amount of gene expression data, this task becomes complex and time consuming. This paper provides a recent review on recent progress in ML and deep learning (DL) for cancer classification, which has received increasing attention in bioinformatics and computational biology. The development of cancer classification methods based on ML and DL is mostly focused on this review. Although many methods have been applied to the cancer classification problem, recent progress shows that most of the successful techniques are those based on supervised and DL methods. In addition, the sources of the healthcare dataset are also described. The development of many machine learning methods for insight analysis in cancer classification has brought a lot of improvement in healthcare. Currently, it seems that there is highly demanded further development of efficient classification methods to address the expansion of healthcare applications.
Collapse
|
71
|
Zheng H, Talukder A, Li X, Hu H. A systematic evaluation of the computational tools for lncRNA identification. Brief Bioinform 2021; 22:6343529. [PMID: 34368833 DOI: 10.1093/bib/bbab285] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/21/2021] [Accepted: 07/03/2021] [Indexed: 12/28/2022] Open
Abstract
The computational identification of long non-coding RNAs (lncRNAs) is important to study lncRNAs and their functions. Despite the existence of many computation tools for lncRNA identification, to our knowledge, there is no systematic evaluation of these tools on common datasets and no consensus regarding their performance and the importance of the features used. To fill this gap, in this study, we assessed the performance of 17 tools on several common datasets. We also investigated the importance of the features used by the tools. We found that the deep learning-based tools have the best performance in terms of identifying lncRNAs, and the peptide features do not contribute much to the tool accuracy. Moreover, when the transcripts in a cell type were considered, the performance of all tools significantly dropped, and the deep learning-based tools were no longer as good as other tools. Our study will serve as an excellent starting point for selecting tools and features for lncRNA identification.
Collapse
Affiliation(s)
- Hansi Zheng
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
72
|
Arnholdt-Schmitt B, Mohanapriya G, Bharadwaj R, Noceda C, Macedo ES, Sathishkumar R, Gupta KJ, Sircar D, Kumar SR, Srivastava S, Adholeya A, Thiers KL, Aziz S, Velada I, Oliveira M, Quaresma P, Achra A, Gupta N, Kumar A, Costa JH. From Plant Survival Under Severe Stress to Anti-Viral Human Defense - A Perspective That Calls for Common Efforts. Front Immunol 2021; 12:673723. [PMID: 34211468 PMCID: PMC8240590 DOI: 10.3389/fimmu.2021.673723] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 05/13/2021] [Indexed: 12/11/2022] Open
Abstract
Reprogramming of primary virus-infected cells is the critical step that turns viral attacks harmful to humans by initiating super-spreading at cell, organism and population levels. To develop early anti-viral therapies and proactive administration, it is important to understand the very first steps of this process. Plant somatic embryogenesis (SE) is the earliest and most studied model for de novo programming upon severe stress that, in contrast to virus attacks, promotes individual cell and organism survival. We argued that transcript level profiles of target genes established from in vitro SE induction as reference compared to virus-induced profiles can identify differential virus traits that link to harmful reprogramming. To validate this hypothesis, we selected a standard set of genes named 'ReprogVirus'. This approach was recently applied and published. It resulted in identifying 'CoV-MAC-TED', a complex trait that is promising to support combating SARS-CoV-2-induced cell reprogramming in primary infected nose and mouth cells. In this perspective, we aim to explain the rationale of our scientific approach. We are highlighting relevant background knowledge on SE, emphasize the role of alternative oxidase in plant reprogramming and resilience as a learning tool for designing human virus-defense strategies and, present the list of selected genes. As an outlook, we announce wider data collection in a 'ReprogVirus Platform' to support anti-viral strategy design through common efforts.
Collapse
Affiliation(s)
- Birgit Arnholdt-Schmitt
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Functional Genomics and Bioinformatics Group, Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Gunasekaran Mohanapriya
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Revuru Bharadwaj
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Carlos Noceda
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Cell and Molecular Biotechnology of Plants (BIOCEMP)/Industrial Biotechnology and Bioproducts, Departamento de Ciencias de la Vida y de la Agricultura, Universidad de las Fuerzas Armadas-ESPE, Sangolquí, Ecuador
| | - Elisete Santos Macedo
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
| | - Ramalingam Sathishkumar
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Kapuganti Jagadis Gupta
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
| | - Debabrata Sircar
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Department of Biotechnology, Indian Institute of Technology, Roorkee, Uttarakhand, India
| | - Sarma Rajeev Kumar
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Shivani Srivastava
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Centre for Mycorrhizal Research, Sustainable Agriculture Division, The Energy and Resources Institute (TERI), TERI Gram, Gual Pahari, Gurugram, India
| | - Alok Adholeya
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Centre for Mycorrhizal Research, Sustainable Agriculture Division, The Energy and Resources Institute (TERI), TERI Gram, Gual Pahari, Gurugram, India
| | - KarineLeitão Lima Thiers
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Functional Genomics and Bioinformatics Group, Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Shahid Aziz
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Functional Genomics and Bioinformatics Group, Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Isabel Velada
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- MED—Mediterranean Institute for Agriculture, Environment and Development, Instituto de Investigação e Formação Avançada, Universidade de Évora, Évora, Portugal
| | - Manuela Oliveira
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Department of Mathematics and CIMA - Center for Research on Mathematics and its Applications, Universidade de Évora, Évora, Portugal
| | - Paulo Quaresma
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- NOVA LINCS – Laboratory for Informatics and Computer Science, University of Évora, Évora, Portugal
| | - Arvind Achra
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Department of Microbiology, Atal Bihari Vajpayee Institute of Medical Sciences & Dr Ram Manohar Lohia Hospital, New Delhi, India
| | - Nidhi Gupta
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
| | - Ashwani Kumar
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Hargovind Khorana Chair, Jayoti Vidyapeeth Womens University, Jaipur, India
| | - José Hélio Costa
- Non-Institutional Competence Focus (NICFocus) ‘Functional Cell Reprogramming and Organism Plasticity’ (FunCROP), Coordinated from Foros de Vale de Figueira, Alentejo, Portugal
- Functional Genomics and Bioinformatics Group, Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| |
Collapse
|
73
|
Feldbauer R, Gosch L, Lüftinger L, Hyden P, Flexer A, Rattei T. DeepNOG: fast and accurate protein orthologous group assignment. Bioinformatics 2021; 36:5304-5312. [PMID: 33367584 PMCID: PMC8016488 DOI: 10.1093/bioinformatics/btaa1051] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 12/02/2020] [Accepted: 12/10/2020] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION Protein orthologous group databases are powerful tools for evolutionary analysis, functional annotation or metabolic pathway modeling across lineages. Sequences are typically assigned to orthologous groups with alignment-based methods, such as profile hidden Markov models, which have become a computational bottleneck. RESULTS We present DeepNOG, an extremely fast and accurate, alignment-free orthology assignment method based on deep convolutional networks. We compare DeepNOG against state-of-the-art alignment-based (HMMER, DIAMOND) and alignment-free methods (DeepFam) on two orthology databases (COG, eggNOG 5). DeepNOG can be scaled to large orthology databases like eggNOG, for which it outperforms DeepFam in terms of precision and recall by large margins. While alignment-based methods still provide the most accurate assignments among the investigated methods, computing time of DeepNOG is an order of magnitude lower on CPUs. Optional GPU usage further increases throughput massively. A command-line tool enables rapid adoption by users. AVAILABILITYAND IMPLEMENTATION Source code and packages are freely available at https://github.com/univieCUBE/deepnog. Install the platform-independent Python program with $pip install deepnog. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roman Feldbauer
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
| | - Lukas Gosch
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
| | - Lukas Lüftinger
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
- Ares Genetics GmbH, Vienna 1030, Austria
| | - Patrick Hyden
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
| | - Arthur Flexer
- Institute of Computational Perception, Johannes Kepler University Linz, Linz 4040, Austria
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
| |
Collapse
|
74
|
Zhang N, Luo X, Huang J, Song H, Zhang X, Huang H, Zhao S, Wang G. The landscape of different molecular modules in an immune microenvironment during tuberculosis infection. Brief Bioinform 2021; 22:6204792. [PMID: 33787849 DOI: 10.1093/bib/bbab071] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/02/2021] [Accepted: 02/10/2021] [Indexed: 12/13/2022] Open
Abstract
Tuberculosis is a chronic inflammatory disease caused by Mycobacterium tuberculosis. When tuberculosis invades the human body, innate immunity is the first line of defense. However, how the innate immune microenvironment responds remains unclear. In this research, we studied the function of each type of cell and explained the principle of an immune microenvironment. Based on the differences in the innate immune microenvironment, we modularized the analysis of the response of five immune cells and two structural cells. The results showed that in the innate immune stress response, the genes CXCL3, PTGS2 and TNFAIP6 regulated by the nuclear factor kappa B(NK-KB) pathway played a crucial role in fighting against tuberculosis. Based on the active pathway algorithm, each immune cell showed metabolic heterogeneity. Besides, after tuberculosis infection, structural cells showed a chemotactic immunity effect based on the co-expression immunoregulatory module.
Collapse
Affiliation(s)
- Nan Zhang
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China.,College of Mathematics, Jilin University, Changchun 130021, China
| | - Xizi Luo
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China
| | - JuanJuan Huang
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China
| | - Hongyan Song
- College of Mathematics, Jilin University, Changchun 130021, China
| | - Xinyue Zhang
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China
| | - Honglan Huang
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China
| | - Shishun Zhao
- College of Mathematics, Jilin University, Changchun 130021, China
| | - Guoqing Wang
- Department of Pathogen Biology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130021, China
| |
Collapse
|
75
|
Abbas S, Jalil Z, Javed AR, Batool I, Khan MZ, Noorwali A, Gadekallu TR, Akbar A. BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm. PeerJ Comput Sci 2021; 7:e390. [PMID: 33817036 PMCID: PMC7959601 DOI: 10.7717/peerj-cs.390] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.
Collapse
Affiliation(s)
- Shafaq Abbas
- Department of Computer Science, Air University, Islamabad, Pakistan
| | - Zunera Jalil
- Department of Cyber Security, Air University, Islamabad, Pakistan
| | | | - Iqra Batool
- Department of Computer Science, Air University, Islamabad, Pakistan
| | - Mohammad Zubair Khan
- Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah, Saudi Arabia
| | | | - Thippa Reddy Gadekallu
- School of Information Technology and Engineering, Vellore Institute of Technology University, Tamil Nadu, India
| | - Aqsa Akbar
- Department of Computer Science, Air University, Islamabad, Pakistan
| |
Collapse
|
76
|
Eitel F, Schulz MA, Seiler M, Walter H, Ritter K. Promises and pitfalls of deep neural networks in neuroimaging-based psychiatric research. Exp Neurol 2021; 339:113608. [PMID: 33513353 DOI: 10.1016/j.expneurol.2021.113608] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 12/13/2022]
Abstract
By promising more accurate diagnostics and individual treatment recommendations, deep neural networks and in particular convolutional neural networks have advanced to a powerful tool in medical imaging. Here, we first give an introduction into methodological key concepts and resulting methodological promises including representation and transfer learning, as well as modelling domain-specific priors. After reviewing recent applications within neuroimaging-based psychiatric research, such as the diagnosis of psychiatric diseases, delineation of disease subtypes, normative modeling, and the development of neuroimaging biomarkers, we discuss current challenges. This includes for example the difficulty of training models on small, heterogeneous and biased data sets, the lack of validity of clinical labels, algorithmic bias, and the influence of confounding variables.
Collapse
Affiliation(s)
- Fabian Eitel
- Charité - Universitätsmedizin Berlin, Corporate Member Of Freie Universität Berlin, Humboldt-Universität zu Berlin; Department of Psychiatry and Psychotherapy, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience, 10117 Berlin, Germany
| | - Marc-André Schulz
- Charité - Universitätsmedizin Berlin, Corporate Member Of Freie Universität Berlin, Humboldt-Universität zu Berlin; Department of Psychiatry and Psychotherapy, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience, 10117 Berlin, Germany
| | - Moritz Seiler
- Charité - Universitätsmedizin Berlin, Corporate Member Of Freie Universität Berlin, Humboldt-Universität zu Berlin; Department of Psychiatry and Psychotherapy, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience, 10117 Berlin, Germany
| | - Henrik Walter
- Charité - Universitätsmedizin Berlin, Corporate Member Of Freie Universität Berlin, Humboldt-Universität zu Berlin; Department of Psychiatry and Psychotherapy, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience, 10117 Berlin, Germany
| | - Kerstin Ritter
- Charité - Universitätsmedizin Berlin, Corporate Member Of Freie Universität Berlin, Humboldt-Universität zu Berlin; Department of Psychiatry and Psychotherapy, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience, 10117 Berlin, Germany.
| |
Collapse
|
77
|
Levy J, Haudenschild C, Barwick C, Christensen B, Vaickus L. Topological Feature Extraction and Visualization of Whole Slide Images using Graph Neural Networks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:285-296. [PMID: 33691025 PMCID: PMC7959046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Whole-slide images (WSI) are digitized representations of thin sections of stained tissue from various patient sources (biopsy, resection, exfoliation, fluid) and often exceed 100,000 pixels in any given spatial dimension. Deep learning approaches to digital pathology typically extract information from sub-images (patches) and treat the sub-images as independent entities, ignoring contributing information from vital large-scale architectural relationships. Modeling approaches that can capture higher-order dependencies between neighborhoods of tissue patches have demonstrated the potential to improve predictive accuracy while capturing the most essential slide-level information for prognosis, diagnosis and integration with other omics modalities. Here, we review two promising methods for capturing macro and micro architecture of histology images, Graph Neural Networks, which contextualize patch level information from their neighbors through message passing, and Topological Data Analysis, which distills contextual information into its essential components. We introduce a modeling framework, WSI-GTFE that integrates these two approaches in order to identify and quantify key pathogenic information pathways. To demonstrate a simple use case, we utilize these topological methods to develop a tumor invasion score to stage colon cancer.
Collapse
Affiliation(s)
- Joshua Levy
- Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH 03756, USA* To whom correspondence should be addressed.,
| | | | | | | | | |
Collapse
|
78
|
Keenan TD. The Hitchhiker’s Guide to Cluster Analysis: Multi Pertransibunt et Augebitur Scientia. ACTA ACUST UNITED AC 2020; 4:1125-1128. [DOI: 10.1016/j.oret.2020.08.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 02/01/2023]
|
79
|
Massi MC, Ieva F, Lettieri E. Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases. BMC Med Inform Decis Mak 2020; 20:160. [PMID: 32664923 PMCID: PMC7362640 DOI: 10.1186/s12911-020-01143-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 06/01/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients' casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients' population and the hospitals' characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most 'cautious' fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation.
Collapse
Affiliation(s)
- Michela Carlotta Massi
- MOX Laboratory, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, Milan, Italy. .,CADS - Center for Analysis, Decisions and Society, Human Technopole, Palazzo Italia, Via Cristina Belgioioso 28, Milan, 20157, Italy.
| | - Francesca Ieva
- MOX Laboratory, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, Milan, Italy.,CADS - Center for Analysis, Decisions and Society, Human Technopole, Palazzo Italia, Via Cristina Belgioioso 28, Milan, 20157, Italy.,CHRP-National Center for Healthcare Research and Pharmacoepidemiology, Università degli Studi di Milano-Bicocca, via Bicocca degli Arcimboldi 8, Milan, 20126, Italy
| | - Emanuele Lettieri
- Department of Management Engineering, Politecnico di Milano, Via Lambruschini 4/c, Milan, 20100, Italy
| |
Collapse
|
80
|
Gal J, Bailleux C, Chardin D, Pourcher T, Gilhodes J, Jing L, Guigonis JM, Ferrero JM, Milano G, Mograbi B, Brest P, Chateau Y, Humbert O, Chamorey E. Comparison of unsupervised machine-learning methods to identify metabolomic signatures in patients with localized breast cancer. Comput Struct Biotechnol J 2020; 18:1509-1524. [PMID: 32637048 PMCID: PMC7327012 DOI: 10.1016/j.csbj.2020.05.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 05/15/2020] [Accepted: 05/16/2020] [Indexed: 02/08/2023] Open
Abstract
Genomics and transcriptomics have led to the widely-used molecular classification of breast cancer (BC). However, heterogeneous biological behaviors persist within breast cancer subtypes. Metabolomics is a rapidly-expanding field of study dedicated to cellular metabolisms affected by the environment. The aim of this study was to compare metabolomic signatures of BC obtained by 5 different unsupervised machine learning (ML) methods. Fifty-two consecutive patients with BC with an indication for adjuvant chemotherapy between 2013 and 2016 were retrospectively included. We performed metabolomic profiling of tumor resection samples using liquid chromatography-mass spectrometry. Here, four hundred and forty-nine identified metabolites were selected for further analysis. Clusters obtained using 5 unsupervised ML methods (PCA k-means, sparse k-means, spectral clustering, SIMLR and k-sparse) were compared in terms of clinical and biological characteristics. With an optimal partitioning parameter k = 3, the five methods identified three prognosis groups of patients (favorable, intermediate, unfavorable) with different clinical and biological profiles. SIMLR and K-sparse methods were the most effective techniques in terms of clustering. In-silico survival analysis revealed a significant difference for 5-year predicted OS between the 3 clusters. Further pathway analysis using the 449 selected metabolites showed significant differences in amino acid and glucose metabolism between BC histologic subtypes. Our results provide proof-of-concept for the use of unsupervised ML metabolomics enabling stratification and personalized management of BC patients. The design of novel computational methods incorporating ML and bioinformatics techniques should make available tools particularly suited to improving the outcome of cancer treatment and reducing cancer-related mortalities.
Collapse
Affiliation(s)
- Jocelyn Gal
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| | - Caroline Bailleux
- University Côte d’Azur, Medical Oncology Department Centre Antoine Lacassagne, Nice F-06189, France
| | - David Chardin
- University Côte d’Azur, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice F-06189, France
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Thierry Pourcher
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Julia Gilhodes
- Department of Biostatistics, Institut Claudius Regaud, IUCT-O Toulouse, France
| | - Lun Jing
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Jean-Marie Guigonis
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Jean-Marc Ferrero
- University Côte d’Azur, Medical Oncology Department Centre Antoine Lacassagne, Nice F-06189, France
| | - Gerard Milano
- University Côte d’Azur, Centre Antoine Lacassagne, Oncopharmacology Unit, Nice F-06189, France
| | - Baharia Mograbi
- University Côte d’Azur, CNRS UMR7284, INSERM U1081, IRCAN TEAM4 Centre Antoine Lacassagne FHU-Oncoage, Nice F-06189, France
| | - Patrick Brest
- University Côte d’Azur, CNRS UMR7284, INSERM U1081, IRCAN TEAM4 Centre Antoine Lacassagne FHU-Oncoage, Nice F-06189, France
| | - Yann Chateau
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| | - Olivier Humbert
- University Côte d’Azur, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice F-06189, France
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Emmanuel Chamorey
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| |
Collapse
|
81
|
Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé EA. Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources. Metabolites 2020; 10:E202. [PMID: 32429287 PMCID: PMC7281435 DOI: 10.3390/metabo10050202] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/07/2020] [Accepted: 05/13/2020] [Indexed: 02/06/2023] Open
Abstract
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
Collapse
Affiliation(s)
- Tara Eicher
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
| | - Garrett Kinnebrew
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Bioinformatics Shared Resource Group, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Patt
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle Spencer
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
- Nationwide Children’s Research Hospital, Columbus, OH 43210, USA
| | - Kevin Ying
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Molecular, Cellular and Developmental Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
| | - Raghu Machiraju
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A. Mathé
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
| |
Collapse
|