1
|
Sahoo K, Sundararajan V. Methods in DNA methylation array dataset analysis: A review. Comput Struct Biotechnol J 2024; 23:2304-2325. [PMID: 38845821 PMCID: PMC11153885 DOI: 10.1016/j.csbj.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 04/25/2024] [Accepted: 05/08/2024] [Indexed: 06/09/2024] Open
Abstract
Understanding the intricate relationships between gene expression levels and epigenetic modifications in a genome is crucial to comprehending the pathogenic mechanisms of many diseases. With the advancement of DNA Methylome Profiling techniques, the emphasis on identifying Differentially Methylated Regions (DMRs/DMGs) has become crucial for biomarker discovery, offering new insights into the etiology of illnesses. This review surveys the current state of computational tools/algorithms for the analysis of microarray-based DNA methylation profiling datasets, focusing on key concepts underlying the diagnostic/prognostic CpG site extraction. It addresses methodological frameworks, algorithms, and pipelines employed by various authors, serving as a roadmap to address challenges and understand changing trends in the methodologies for analyzing array-based DNA methylation profiling datasets derived from diseased genomes. Additionally, it highlights the importance of integrating gene expression and methylation datasets for accurate biomarker identification, explores prognostic prediction models, and discusses molecular subtyping for disease classification. The review also emphasizes the contributions of machine learning, neural networks, and data mining to enhance diagnostic workflow development, thereby improving accuracy, precision, and robustness.
Collapse
Affiliation(s)
| | - Vino Sundararajan
- Correspondence to: Department of Bio Sciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632 014, Tamil Nadu, India.
| |
Collapse
|
2
|
Ma W, Tang W, Kwok JS, Tong AH, Lo CW, Chu AT, Chung BH. A review on trends in development and translation of omics signatures in cancer. Comput Struct Biotechnol J 2024; 23:954-971. [PMID: 38385061 PMCID: PMC10879706 DOI: 10.1016/j.csbj.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/23/2024] Open
Abstract
The field of cancer genomics and transcriptomics has evolved from targeted profiling to swift sequencing of individual tumor genome and transcriptome. The steady growth in genome, epigenome, and transcriptome datasets on a genome-wide scale has significantly increased our capability in capturing signatures that represent both the intrinsic and extrinsic biological features of tumors. These biological differences can help in precise molecular subtyping of cancer, predicting tumor progression, metastatic potential, and resistance to therapeutic agents. In this review, we summarized the current development of genomic, methylomic, transcriptomic, proteomic and metabolic signatures in the field of cancer research and highlighted their potentials in clinical applications to improve diagnosis, prognosis, and treatment decision in cancer patients.
Collapse
Affiliation(s)
- Wei Ma
- Hong Kong Genome Institute, Hong Kong, China
| | - Wenshu Tang
- Hong Kong Genome Institute, Hong Kong, China
| | | | | | | | | | - Brian H.Y. Chung
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hong Kong Genome Project
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
Chen S, Hu T, Zhao J, Zhu Q, Wang J, Huang Z, Xiang C, Zhao R, Zhu C, Lu S, Han Y. Novel molecular subtypes of METex14 non-small cell lung cancer with distinct biological and clinical significance. NPJ Precis Oncol 2024; 8:159. [PMID: 39060379 PMCID: PMC11282101 DOI: 10.1038/s41698-024-00642-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Not all MET exon 14 skipping (METex14) NSCLC patients benefited from MET inhibitors. We hypothesized an inter-tumoral heterogeneity in METex14 NSCLC. Investigations at genomic and transcriptomic level were conducted in METex14 NSCLC samples from stage I-III and recurrent/metastatic patients as discovery and validation cohort. Four molecular subtypes were discovered. MET-Driven subtype, with the worst prognosis, displayed MET overexpression, enrichment of MET-related pathways, and higher infiltration of fibroblast and regulatory T cells. Immune-Activated subtype having the most idea long-term survival, had higher tertiary lymphoid structures, spatial co-option of PD-L1+ cancer cells, and GZMK+ CD8+ T cell. FGFR- and Bypass-Activated subtypes displayed FGFR2 overexpression and enrichments of multiple oncogenic pathways respectively. In the validation cohort, patients with MET-Driven subtype had better response to MET inhibitors than those with MET overexpression. Thus, molecular subtypes of METex14 NSCLC with distinct biological and clinical significance may indicate more precise therapeutic strategies for METex14 NSCLC patients.
Collapse
Affiliation(s)
- Shengnan Chen
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Tao Hu
- Department of Medicine, Amoy Diagnostics Co., Ltd., Xiamen, China
| | - Jikai Zhao
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Qian Zhu
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Jin Wang
- Department of Medicine, Amoy Diagnostics Co., Ltd., Xiamen, China
| | - Zhan Huang
- Department of Medicine, Amoy Diagnostics Co., Ltd., Xiamen, China
| | - Chan Xiang
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Ruiying Zhao
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Changbin Zhu
- Department of Medicine, Amoy Diagnostics Co., Ltd., Xiamen, China.
| | - Shun Lu
- Department of Oncology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China.
| | - Yuchen Han
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China.
| |
Collapse
|
4
|
Chang Z, Peng CH, Chen KJ, Xu GK. Enhancing liver fibrosis diagnosis and treatment assessment: a novel biomechanical markers-based machine learning approach. Phys Med Biol 2024; 69:115046. [PMID: 38749471 DOI: 10.1088/1361-6560/ad4c4e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 05/15/2024] [Indexed: 05/31/2024]
Abstract
Accurate diagnosis and treatment assessment of liver fibrosis face significant challenges, including inherent limitations in current techniques like sampling errors and inter-observer variability. Addressing this, our study introduces a novel machine learning (ML) framework, which integrates light gradient boosting machine and multivariate imputation by chained equations to enhance liver status assessment using biomechanical markers. Building upon our previously established multiscale mechanical characteristics in fibrotic and treated livers, this framework employs Gaussian Bayesian optimization for post-imputation, significantly improving classification performance. Our findings indicate a marked increase in the precision of liver fibrosis diagnosis and provide a novel, quantitative approach for assessing fibrosis treatment. This innovative combination of multiscale biomechanical markers with advanced ML algorithms represents a transformative step in liver disease diagnostics and treatment evaluation, with potential implications for other areas in medical diagnostics.
Collapse
Affiliation(s)
- Zhuo Chang
- Laboratory for Multiscale Mechanics and Medical Science, Department of Engineering Mechanics, SVL, School of Aerospace Engineering, Xi'an Jiaotong University, Xi'an 710049, People's Republic of China
| | - Chen-Hao Peng
- Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung 41170, Taiwan, R.O.C
| | - Kai-Jung Chen
- Department of Mechanical Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan, R.O.C
| | - Guang-Kui Xu
- Laboratory for Multiscale Mechanics and Medical Science, Department of Engineering Mechanics, SVL, School of Aerospace Engineering, Xi'an Jiaotong University, Xi'an 710049, People's Republic of China
| |
Collapse
|
5
|
Chan KO, Mulcahy DG, Anuar S. The Artefactual Branch Effect and Phylogenetic Conflict: Species Delimitation with Gene Flow in Mangrove Pit Vipers (Trimeresurus purpureomaculatus-erythrurus Complex). Syst Biol 2023; 72:1209-1219. [PMID: 37478480 DOI: 10.1093/sysbio/syad043] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/19/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023] Open
Abstract
Mangrove pit vipers of the Trimeresurus purpureomaculatus-erythrurus complex are the only species of viper known to naturally inhabit mangroves. Despite serving integral ecological functions in mangrove ecosystems, the evolutionary history, distribution, and species boundaries of mangrove pit vipers remain poorly understood, partly due to overlapping distributions, confusing phenotypic variations, and the lack of focused studies. Here, we present the first genomic study on mangrove pit vipers and introduce a robust hypothesis-driven species delimitation framework that considers gene flow and phylogenetic uncertainty in conjunction with a novel application of a new class of speciation-based delimitation model implemented through the program Delineate. Our results showed that gene flow produced phylogenetic conflict in our focal species and substantiates the artefactual branch effect where highly admixed populations appear as divergent nonmonophyletic lineages arranged in a stepwise manner at the basal position of clades. Despite the confounding effects of gene flow, we were able to obtain unequivocal support for the recognition of a new species based on the intersection and congruence of multiple lines of evidence. This study demonstrates that an integrative hypothesis-driven approach predicated on the consideration of multiple plausible evolutionary histories, population structure/differentiation, gene flow, and the implementation of a speciation-based delimitation model can effectively delimit species in the presence of gene flow and phylogenetic conflict.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian Natural History Museum, National University of Singapore, 2 Conservatory Drive, Singapore 117377, Singapore
- School of Biological Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia
| | - Daniel G Mulcahy
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany
| | - Shahrul Anuar
- School of Biological Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia
| |
Collapse
|
6
|
Erfani M, Baalousha M, Goharian E. Unveiling elemental fingerprints: A comparative study of clustering methods for multi-element nanoparticle data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 905:167176. [PMID: 37730026 DOI: 10.1016/j.scitotenv.2023.167176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/03/2023] [Accepted: 09/16/2023] [Indexed: 09/22/2023]
Abstract
Single particle-inductively coupled plasma-time of flight-mass spectrometers (SP-ICP-TOF-MS) generates large datasets of the multi-elemental composition of nanoparticles. However, extracting useful information from such datasets is challenging. Hierarchical clustering (HC) has been successfully applied to extract elemental fingerprints from multi-element nanoparticle data obtained by SP-ICP-TOF-MS. However, many other clustering approaches can be applied to analyze SP-ICP-TOF-MS data that have not yet been evaluated. This study fills this knowledge gap by comparing the performance of three clustering approaches: HC, spectral clustering, and t-distributed Stochastic Neighbor Embedding coupled with Density-Based Spatial Clustering of Applications with Noise (tSNE-DBSCAN) for analyzing SP-ICP-TOF-MS data. The performance of these clustering techniques was evaluated by comparing the size of the extracted clusters and the similarity of the elemental composition of nanoparticles within each cluster. Hierarchical clustering often failed to achieve an optimal clustering solution for SP-ICP-TOF-MS data because HC is sensitive to the presence of outliers. Spectral clustering and tSNE-DBSCAN extracted clusters that were not identified by HC. This is because spectral clustering, a method developed based on graph theory, reveals the global and local structure in the data. tSNE reduces and maps the data into a lower-dimensional space, enabling clustering algorithms such as DBSCAN to identify subclusters with subtle differences in their elemental composition. However, tSNE-DBSCAN can lead to unsatisfactory clustering solutions because tuning the perplexity hyperparameter of tSNE is a difficult and a time-consuming task, and the relative distance between datapoints is not maintained. Although the three clustering approaches successfully extract useful information from SP-ICP-TOF-MS data, spectral clustering outperforms HC and tSNE-DBSCAN by generating clusters of a large number of nanoparticles with similar elemental compositions.
Collapse
Affiliation(s)
- Mahdi Erfani
- Department of Civil and Environmental Engineering, University of South Carolina, SC 29208, USA
| | - Mohammed Baalousha
- Center for Environmental Nanoscience and Risk, Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC, 29201, USA.
| | - Erfan Goharian
- Department of Civil and Environmental Engineering, University of South Carolina, SC 29208, USA.
| |
Collapse
|
7
|
Cai X, Zhou T, Shi W, Cai Y, Zhou J. Monkeypox Virus Crosstalk with HIV: An Integrated Skin Transcriptome and Machine Learning Study. ACS OMEGA 2023; 8:47283-47294. [PMID: 38107964 PMCID: PMC10720282 DOI: 10.1021/acsomega.3c07687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/13/2023] [Accepted: 11/14/2023] [Indexed: 12/19/2023]
Abstract
The emergence of the monkeypox virus (MPXV) outbreak presents a formidable challenge to human health. Emerging evidence suggests that individuals with HIV have been disproportionately affected by MPXV, with adverse clinical outcomes and higher mortality rates. However, the shared molecular mechanisms underlying MPXV and HIV remain elusive. We identified differentially expressed genes (DEGs) from two public data sets, GSE219036 and GSE184320, and extracted common DEGs between MPXV and HIV. We further performed gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), protein-protein interactions (PPI), candidate drug assessment, and immune correlation of hub genes analysis. We validated the key biomarkers using multiple machine learning (ML) methods including random forest (RF), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP). A total of 59 common DEGs were identified between MPXV and HIV. Our functional analysis highlighted multiple pathways, including the ERK cascade, NF-κB signaling, and various immune responses, playing a collaborative role in the progression of both diseases. The PPI and gene co-expression networks were constructed, and five key genes with significant immune correlations were identified and validated by multiple ML models, including SPRED1, SPHK1, ATF3, AKT3, and AKT1S1. Our study emphasizes the common pathogenesis of HIV and MPXV and highlights the pivotal genes and shared pathways, providing new opportunities for evidence-based management strategies in HIV patients co-infected with MPXV.
Collapse
Affiliation(s)
- Xueyao Cai
- Department
of Plastic Surgery, The Third Xiangya Hospital
of Central South University, Changsha 410013, China
| | - Tianyi Zhou
- Department
of Ophthalmology, Shanghai Ninth People’s
Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
- Shanghai
Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China
| | - Wenjun Shi
- Department
of Plastic and Reconstructive Surgery, Shanghai
Ninth People’s Hospital, Shanghai Jiao Tong University School
of Medicine, Shanghai 200011, China
| | - Yuchen Cai
- Department
of Plastic and Reconstructive Surgery, Shanghai
Ninth People’s Hospital, Shanghai Jiao Tong University School
of Medicine, Shanghai 200011, China
| | - Jianda Zhou
- Department
of Plastic Surgery, The Third Xiangya Hospital
of Central South University, Changsha 410013, China
| |
Collapse
|
8
|
P V, Mohanan M, U K S, E Pa S, U C A J. Graph Attention Network based mapping of knowledge relations between chemical spaces of Nuclear factor kappa B and Centella asiatica. Comput Biol Chem 2023; 107:107955. [PMID: 37734134 DOI: 10.1016/j.compbiolchem.2023.107955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 08/02/2023] [Accepted: 09/07/2023] [Indexed: 09/23/2023]
Abstract
The confounding nature of the innate immunity target Nuclear Factor kappa B (NF-κB) and its interaction with Centella asiatica (CA) molecules necessitate the intervention of advanced technologies, such as deep learning methods. The integration of chemical space concepts with deep learning technologies is a new way of knowledge mapping used to explore drug-target interactions, especially in molecular libraries derived from traditional medicine based molecular sources. The current constraint of virtual screening for mechanistic target hunting is the use of a binary classification model that includes active and inactive molecules from in vitro experiments to explore drug-target interaction. This study aims to explore the regulatory nature of the molecules from the inhibition and activation of the NF-κB bioassay data set and map this information for a knowledge-based analysis against the molecules of CA, a low-growing tropical plant. This finding has led to a new direction in the field, transitioning from the conventional active-inactive framework to a more comprehensive active-inactive-regulatory model. This approach can be thoroughly explored by leveraging a graph-based deep learning system. The study presents an innovative approach using a Graph Attention Network (GAT) to rank CA molecules in chemical space based on their similarity with NF-κB bioassay molecules, enabling the efficient analysis of complex relationships between molecules and their regulatory function. Graph Attention Network (GAT) overcomes the limitations of traditional deep learning models such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in handling non-Euclidean graph data and allows for a more precise understanding of similarity ranking by utilizing molecular graphs and attention behavior. By measuring similarity and arranging a matrix of similarity ranking based on GAT, deep neural ranking-based algorithms confirmed the regulatory behaviour of an innate immunity target NF-κB with the support of underlying inverse mapping in the surjective chemical spaces of NF-κB bioassays and CA molecular spaces. Overall, the study introduces new techniques for exploring the regulatory behaviour of complex targets like NF-κB. We then used t-SNE for clustering in chemical space and scaffold hunting for scaffold property analysis and identified nine CA molecules that exhibit regulatory behavior of NF-κB target and are recommended for further investigation.
Collapse
Affiliation(s)
- Vivek P
- UL Research Center, UL Cyber Park Calicut, India
| | | | | | - Sandesh E Pa
- UL Research Center, UL Cyber Park Calicut, India
| | - Jaleel U C A
- OSPF-NIAS Drug DIscovery Lab, National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, India
| |
Collapse
|
9
|
Arana A, Esteves J, Ramírez R, Galetti PM, Pérez Z J, Ramirez JL. Population genomics reveals how 5 ka of human occupancy led the Lima leaf-toed gecko (Phyllodactylus sentosus) to the brink of extinction. Sci Rep 2023; 13:18465. [PMID: 37891335 PMCID: PMC10611785 DOI: 10.1038/s41598-023-45715-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 10/23/2023] [Indexed: 10/29/2023] Open
Abstract
Small species with high home fidelity, high ecological specialization or low vagility are particularly prone to suffer from habitat modification and fragmentation. The Lima leaf-toed gecko (Phyllodactylus sentosus) is a critically endangered Peruvian species that shelters mostly in pre-Incan archeological areas called huacas, where the original environmental conditions are maintained. We used genotyping by sequencing to understand the population genomic history of P. sentosus. We found low genetic diversity (He 0.0406-0.134 and nucleotide diversity 0.0812-0.145) and deviations of the observed heterozygosity relative to the expected heterozygosity in some populations (Fis - 0.0202 to 0.0187). In all analyses, a clear population structuring was observed that cannot be explained by isolation by distance alone. Also, low levels of historical gene flow were observed between most populations, which decreased as shown in contemporary migration rate analysis. Demographic inference suggests these populations experienced bottleneck events during the last 5 ka. These results indicate that habitat modification since pre-Incan civilizations severely affected these populations, which currently face even more drastic urbanization threats. Finally, our predictions show that this species could become extinct in a decade without further intervention, which calls for urgent conservation actions being undertaken.
Collapse
Affiliation(s)
- Alejandra Arana
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Juan Esteves
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Rina Ramírez
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Pedro M Galetti
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, SP, 13565-905, Brazil
| | - José Pérez Z
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Jorge L Ramirez
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru.
| |
Collapse
|
10
|
Zanelli S, Eveilleau K, Charlton PH, Ammi M, Hallab M, El Yacoubi MA. Clustered photoplethysmogram pulse wave shapes and their associations with clinical data. Front Physiol 2023; 14:1176753. [PMID: 37954447 PMCID: PMC10637540 DOI: 10.3389/fphys.2023.1176753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/05/2023] [Indexed: 11/14/2023] Open
Abstract
Photopletysmography (PPG) is a non-invasive and well known technology that enables the recording of the digital volume pulse (DVP). Although PPG is largely employed in research, several aspects remain unknown. One of these is represented by the lack of information about how many waveform classes best express the variability in shape. In the literature, it is common to classify DVPs into four classes based on the dicrotic notch position. However, when working with real data, labelling waveforms with one of these four classes is no longer straightforward and may be challenging. The correct identification of the DVP shape could enhance the precision and the reliability of the extracted bio markers. In this work we proposed unsupervised machine learning and deep learning approaches to overcome the data labelling limitations. Concretely we performed a K-medoids based clustering that takes as input 1) DVP handcrafted features, 2) similarity matrix computed with the Derivative Dynamic Time Warping and 3) DVP features extracted from a CNN AutoEncoder. All the cited methods have been tested first by imposing four medoids representative of the Dawber classes, and after by automatically searching four clusters. We then searched the optimal number of clusters for each method using silhouette score, the prediction strength and inertia. To validate the proposed approaches we analyse the dissimilarities in the clinical data related to obtained clusters.
Collapse
Affiliation(s)
- Serena Zanelli
- Laboratoire Analyse, Géométrie et Applications, University Sorbonne Nord, Villetaneuse, France
| | | | - Peter H. Charlton
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Mehdi Ammi
- Laboratoire Analyse, Géométrie et Applications, University of Sorbonne Nord, Saint-Denis, France
| | - Magid Hallab
- Axelife, Saint-Nicolas-de-Redon, France
- Clinique Bizet, Paris, France
| | - Mounim A. El Yacoubi
- SAMOVAR Telecom SudParis, CNRS, Institut Polytechnique de Paris, Palaiseau, France
| |
Collapse
|
11
|
Moon J, Posada-Quintero HF, Chon KH. Genetic data visualization using literature text-based neural networks: Examples associated with myocardial infarction. Neural Netw 2023; 165:562-595. [PMID: 37364469 DOI: 10.1016/j.neunet.2023.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 04/11/2023] [Accepted: 05/09/2023] [Indexed: 06/28/2023]
Abstract
Data visualization is critical to unraveling hidden information from complex and high-dimensional data. Interpretable visualization methods are critical, especially in the biology and medical fields, however, there are limited effective visualization methods for large genetic data. Current visualization methods are limited to lower-dimensional data and their performance suffers if there is missing data. In this study, we propose a literature-based visualization method to reduce high-dimensional data without compromising the dynamics of the single nucleotide polymorphisms (SNP) and textual interpretability. Our method is innovative because it is shown to (1) preserves both global and local structures of SNP while reducing the dimension of the data using literature text representations, and (2) enables interpretable visualizations using textual information. For performance evaluations, we examined the proposed approach to classify various classification categories including race, myocardial infarction event age groups, and sex using several machine learning models on the literature-derived SNP data. We used visualization approaches to examine clustering of data as well as quantitative performance metrics for the classification of the risk factors examined above. Our method outperformed all popular dimensionality reduction and visualization methods for both classification and visualization, and it is robust against missing and higher-dimensional data. Moreover, we found it feasible to incorporate both genetic and other risk information obtained from literature with our method.
Collapse
Affiliation(s)
- Jihye Moon
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA.
| | | | - Ki H Chon
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA.
| |
Collapse
|
12
|
Men K, Li Y, Wang X, Zhang G, Hu J, Gao Y, Han A, Liu W, Han H. Estimate the incubation period of coronavirus 2019 (COVID-19). Comput Biol Med 2023; 158:106794. [PMID: 37044045 PMCID: PMC10062796 DOI: 10.1016/j.compbiomed.2023.106794] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 02/23/2023] [Accepted: 03/20/2023] [Indexed: 04/14/2023]
Abstract
COVID-19 is an infectious disease that presents unprecedented challenges to society. Accurately estimating the incubation period of the coronavirus is critical for effective prevention and control. However, the exact incubation period remains unclear, as COVID-19 symptoms can appear in as little as 2 days or as long as 14 days or more after exposure. Accurate estimation requires original chain-of-infection data, which may not be fully available from the original outbreak in Wuhan, China. In this study, we estimated the incubation period of COVID-19 by leveraging well-documented and epidemiologically informative chain-of-infection data collected from 10 regions outside the original Wuhan areas prior to February 10, 2020. We employed a proposed Monte Carlo simulation approach and nonparametric methods to estimate the incubation period of COVID-19. We also utilized manifold learning and related statistical analysis to uncover incubation relationships between different age and gender groups. Our findings revealed that the incubation period of COVID-19 did not follow general distributions such as lognormal, Weibull, or Gamma. Using proposed Monte Carlo simulations and nonparametric bootstrap methods, we estimated the mean and median incubation periods as 5.84 (95% CI, 5.42-6.25 days) and 5.01 days (95% CI 4.00-6.00 days), respectively. We also found that the incubation periods of groups with ages greater than or equal to 40 years and less than 40 years demonstrated a statistically significant difference. The former group had a longer incubation period and a larger variance than the latter, suggesting the need for different quarantine times or medical intervention strategies. Our machine-learning results further demonstrated that the two age groups were linearly separable, consistent with previous statistical analyses. Additionally, our results indicated that the incubation period difference between males and females was not statistically significant.
Collapse
Affiliation(s)
- Ke Men
- Institute for Research on Health Information and Technology, School of Public Health, Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Yihao Li
- The Gabelli School of Business, Fordham University, Lincoln Center, New York, NY, 10023, USA
| | - Xia Wang
- The Air Force Military Medical University, Xi'an, Shaanxi, 710032, China
| | - Guangwei Zhang
- Institute for Research on Health Information and Technology, School of Public Health, Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Jingjing Hu
- Institute for Research on Health Information and Technology, School of Public Health, Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Yanyan Gao
- Institute for Research on Health Information and Technology, School of Public Health, Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Ashley Han
- The Skyline High School, Ann Arbor, MI, 48103, USA
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Henry Han
- The Laboratory of Data Science and Artificial Intelligence Innovation, Department of Computer Science, School of Engineering and Computer Science, Baylor University, Waco, TX, 76789, USA.
| |
Collapse
|
13
|
Chen K, Chen G, Li J, Huang Y, Wang E, Hou T, Heng PA. MetaRF: attention-based random forest for reaction yield prediction with a few trails. J Cheminform 2023; 15:43. [PMID: 37038222 PMCID: PMC10084704 DOI: 10.1186/s13321-023-00715-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/21/2023] [Indexed: 04/12/2023] Open
Abstract
Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.
Collapse
Affiliation(s)
- Kexin Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong SAR
| | | | | | - Yuansheng Huang
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Ercheng Wang
- Zhejiang Lab, Zhejiang, China
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong SAR
- Zhejiang Lab, Zhejiang, China
| |
Collapse
|
14
|
dos Santos ALC, Sullasi HSL, Gokcumen O, Lindo J, DeGiorgio M. Spatiotemporal fluctuations of population structure in the Americas revealed by a meta-analysis of the first decade of archaeogenomes. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023; 180:703-714. [PMID: 39081397 PMCID: PMC11288623 DOI: 10.1002/ajpa.24673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/15/2022] [Indexed: 08/02/2024]
Abstract
Objectives Since 2010, genome-wide data from hundreds of ancient Native Americans have contributed to the understanding of Americas' prehistory. However, these samples have never been studied as a single dataset, and distinct relationships among themselves and with present-day populations may have never come to light. Here, we reassess genomic diversity and population structure of 223 ancient Native Americans published between 2010 and 2019. Materials and Methods The genomic data from ancient Americas was merged with a worldwide reference panel of 278 present-day genomes from the Simons Genome Diversity Project and then analyzed through ADMIXTURE, D-statistics, PCA, t-SNE, and UMAP. Results We find largely similar population structures in ancient and present-day Americas. However, the population structure of contemporary Native Americans, traced here to at least 10,000 years before present, is noticeably less diverse than their ancient counterparts, a possible outcome of the European contact. Additionally, in the past there were greater levels of population structure in North than in South America, except for ancient Brazil, which harbors comparatively high degrees of structure. Moreover, we find a component of genetic ancestry in the ancient dataset that is closely related to that of present-day Oceanic populations but does not correspond to the previously reported Australasian signal. Lastly, we report an expansion of the Ancient Beringian ancestry, previously reported for only one sample. Discussion Overall, our findings support a complex scenario for the settlement of the Americas, accommodating the occurrence of founder effects and the emergence of ancestral mixing events at the regional level.
Collapse
Affiliation(s)
- Andre Luiz Campelo dos Santos
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, USA
- Department of Archaeology, Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | | | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, USA
| | - John Lindo
- Department of Anthropology, Emory University, Atlanta, Georgia, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, USA
| |
Collapse
|
15
|
Lin B, Zhou X, Jiang D, Shen X, Ouyang H, Li W, Xu D, Fang L, Tian Y, Li X, Huang Y. Comparative transcriptomic analysis reveals candidate genes for seasonal breeding in the male Lion-Head goose. Br Poult Sci 2023; 64:157-163. [PMID: 36440984 DOI: 10.1080/00071668.2022.2152651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
1. Due to seasonal breeding, geese breeds from Southern China have low egg yield. The genetic makeup underlying performance of local breeds is largely unknown, and few studies have investigated this problem. This study integrated 21 newly generated and 50 publicly existing RNA-seq libraries, representing the hypothalamus, pituitary and testis, to identify candidate genes and importantly related pathways associated with seasonal breeding in male Lion-Head geese.2. In total, 19, 119 and 302 differentially expressed genes (DEGs) were detected in the hypothalamus, pituitary and testis, respectively, of male Lion-Head geese between non-breeding and breeding periods. These genes were significantly involved in the neuropeptide signalling pathway, gland development, neuroactive ligand-receptor interaction, JAK-STAT signalling pathway, cAMP signalling pathway, PI3K-Akt signalling pathway and Foxo signalling pathway.3. By integrating another 50 RNA-seq samples 4, 18 and 40 promising DEGs were confirmed in hypothalamus, pituitary and testis, respectively.4. HOX genes were identified as having important roles in the development of testis between non-breeding and breeding periods of male Lion-Head geese.
Collapse
Affiliation(s)
- B Lin
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - X Zhou
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - D Jiang
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - X Shen
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - H Ouyang
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - W Li
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - D Xu
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - L Fang
- MRC Human Genetics Unit at Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Y Tian
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - X Li
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| | - Y Huang
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, P. R. China
| |
Collapse
|
16
|
Xi J, Deng Z, Liu Y, Wang Q, Shi W. Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery. PeerJ 2023; 11:e14843. [PMID: 36755866 PMCID: PMC9901305 DOI: 10.7717/peerj.14843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023] Open
Abstract
Driver event discovery is a crucial demand for breast cancer diagnosis and therapy. In particular, discovering subtype-specificity of drivers can prompt the personalized biomarker discovery and precision treatment of cancer patients. Still, most of the existing computational driver discovery studies mainly exploit the information from DNA aberrations and gene interactions. Notably, cancer driver events would occur due to not only DNA aberrations but also RNA alternations, but integrating multi-type aberrations from both DNA and RNA is still a challenging task for breast cancer drivers. On the one hand, the data formats of different aberration types also differ from each other, known as data format incompatibility. On the other hand, different types of aberrations demonstrate distinct patterns across samples, known as aberration type heterogeneity. To promote the integrated analysis of subtype-specific breast cancer drivers, we design a "splicing-and-fusing" framework to address the issues of data format incompatibility and aberration type heterogeneity simultaneously. To overcome the data format incompatibility, the "splicing-step" employs a knowledge graph structure to connect multi-type aberrations from the DNA and RNA data into a unified formation. To tackle the aberration type heterogeneity, the "fusing-step" adopts a dynamic mapping gene space integration approach to represent the multi-type information by vectorized profiles. The experiments also demonstrate the advantages of our approach in both the integration of multi-type aberrations from DNA and RNA and the discovery of subtype-specific breast cancer drivers. In summary, our "splicing-and-fusing" framework with knowledge graph connection and dynamic mapping gene space fusion of multi-type aberrations data from DNA and RNA can successfully discover potential breast cancer drivers with subtype-specificity indication.
Collapse
Affiliation(s)
- Jianing Xi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Zhen Deng
- School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou, China
| | - Yang Liu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Qian Wang
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Wen Shi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
17
|
Espadoto M, Appleby G, Suh A, Cashman D, Li M, Scheidegger C, Anderson EW, Chang R, Telea AC. UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1559-1572. [PMID: 34748493 DOI: 10.1109/tvcg.2021.3125576] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
Collapse
|
18
|
Piłat-Rożek M, Łazuka E, Majerek D, Szeląg B, Duda-Saternus S, Łagód G. Application of Machine Learning Methods for an Analysis of E-Nose Multidimensional Signals in Wastewater Treatment. SENSORS (BASEL, SWITZERLAND) 2023; 23:487. [PMID: 36617095 PMCID: PMC9824643 DOI: 10.3390/s23010487] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The work represents a successful attempt to combine a gas sensors array with instrumentation (hardware), and machine learning methods as the basis for creating numerical codes (software), together constituting an electronic nose, to correct the classification of the various stages of the wastewater treatment process. To evaluate the multidimensional measurement derived from the gas sensors array, dimensionality reduction was performed using the t-SNE method, which (unlike the commonly used PCA method) preserves the local structure of the data by minimizing the Kullback-Leibler divergence between the two distributions with respect to the location of points on the map. The k-median method was used to evaluate the discretization potential of the collected multidimensional data. It showed that observations from different stages of the wastewater treatment process have varying chemical fingerprints. In the final stage of data analysis, a supervised machine learning method, in the form of a random forest, was used to classify observations based on the measurements from the sensors array. The quality of the resulting model was assessed based on several measures commonly used in classification tasks. All the measures used confirmed that the classification model perfectly assigned classes to the observations from the test set, which also confirmed the absence of model overfitting.
Collapse
Affiliation(s)
- Magdalena Piłat-Rożek
- Faculty of Technology Fundamentals, Lublin University of Technology, 20-618 Lublin, Poland
| | - Ewa Łazuka
- Faculty of Technology Fundamentals, Lublin University of Technology, 20-618 Lublin, Poland
| | - Dariusz Majerek
- Faculty of Technology Fundamentals, Lublin University of Technology, 20-618 Lublin, Poland
| | - Bartosz Szeląg
- Faculty of Environmental, Geomatic and Energy Engineering, Kielce University of Technology, 25-314 Kielce, Poland
| | | | - Grzegorz Łagód
- Faculty of Environmental Engineering, Lublin University of Technology, 20-618 Lublin, Poland
| |
Collapse
|
19
|
Lima DDS, Amichi LJA, Fernandez MA, Constantino AA, Seixas FAV. NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:557-565. [PMID: 34826297 DOI: 10.1109/tcbb.2021.3131136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred (Non-Coding/Y RNA Prediction), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server (https://www.gpea.uem.br/ncypred/).
Collapse
|
20
|
Liu J, Vinck M. Improved visualization of high-dimensional data using the distance-of-distance transformation. PLoS Comput Biol 2022; 18:e1010764. [PMID: 36538561 PMCID: PMC9812310 DOI: 10.1371/journal.pcbi.1010764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/04/2023] [Accepted: 11/28/2022] [Indexed: 12/24/2022] Open
Abstract
Dimensionality reduction tools like t-SNE and UMAP are widely used for high-dimensional data analysis. For instance, these tools are applied in biology to describe spiking patterns of neuronal populations or the genetic profiles of different cell types. Here, we show that when data include noise points that are randomly scattered within a high-dimensional space, a "scattering noise problem" occurs in the low-dimensional embedding where noise points overlap with the cluster points. We show that a simple transformation of the original distance matrix by computing a distance between neighbor distances alleviates this problem and identifies the noise points as a separate cluster. We apply this technique to high-dimensional neuronal spike sequences, as well as the representations of natural images by convolutional neural network units, and find an improvement in the constructed low-dimensional embedding. Thus, we present an improved dimensionality reduction technique for high-dimensional data containing noise points.
Collapse
Affiliation(s)
- Jinke Liu
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen University, Nijmegen, Netherlands
| | - Martin Vinck
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen University, Nijmegen, Netherlands
| |
Collapse
|
21
|
Pan X, Lin H, Han C, Feng Z, Wang Y, Lin J, Qiu B, Yan L, Li B, Xu Z, Wang Z, Zhao K, Liu Z, Liang C, Chen X, Li Z, Cui Y, Lu C, Liu Z. Computerized tumor-infiltrating lymphocytes density score predicts survival of patients with resectable lung adenocarcinoma. iScience 2022; 25:105605. [PMID: 36505920 PMCID: PMC9730047 DOI: 10.1016/j.isci.2022.105605] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/23/2022] [Accepted: 11/14/2022] [Indexed: 11/17/2022] Open
Abstract
A high abundance of tumor-infiltrating lymphocytes (TILs) has a positive impact on the prognosis of patients with lung adenocarcinoma (LUAD). We aimed to develop and validate an artificial intelligence-driven pathological scoring system for assessing TILs on H&E-stained whole-slide images of LUAD. Deep learning-based methods were applied to calculate the densities of lymphocytes in cancer epithelium (DLCE) and cancer stroma (DLCS), and a risk score (WELL score) was built through linear weighting of DLCE and DLCS. Association between WELL score and patient outcome was explored in 793 patients with stage I-III LUAD in four cohorts. WELL score was an independent prognostic factor for overall survival and disease-free survival in the discovery cohort and validation cohorts. The prognostic prediction model-integrated WELL score demonstrated better discrimination performance than the clinicopathologic model in the four cohorts. This artificial intelligence-based workflow and scoring system could promote risk stratification for patients with resectable LUAD.
Collapse
Affiliation(s)
- Xipeng Pan
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Cardiovascular Institute, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Huan Lin
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Chu Han
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Zhengyun Feng
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yumeng Wang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiatai Lin
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Bingjiang Qiu
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Cardiovascular Institute, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Lixu Yan
- Department of Pathology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Bingbing Li
- Department of Pathology, Guangdong Provincial People’s Hospital Ganzhou Hospital (Ganzhou Municipal Hospital), 49 Dagong Road, Ganzhou 341000, China
| | - Zeyan Xu
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Zhizhen Wang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ke Zhao
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Cardiovascular Institute, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Zhenbing Liu
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Changhong Liang
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Xin Chen
- Department of Radiology, Guangzhou First People’s Hospital, School of Medicine, South China University of Technology, Guangzhou 510180, China,Corresponding author
| | - Zhenhui Li
- Guangdong Cardiovascular Institute, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,Department of Radiology, The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer Center, Kunming 650118, China,Corresponding author
| | - Yanfen Cui
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Cardiovascular Institute, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,Department of Radiology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan 030013, China,Corresponding author
| | - Cheng Lu
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,Corresponding author
| | - Zaiyi Liu
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou 510080, China,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China,Corresponding author
| |
Collapse
|
22
|
Yamada H, Xu L, Eto F, Takeichi R, Islam A, Mamun MA, Zhang C, Yao I, Sakamoto T, Aramaki S, Kikushima K, Sato T, Takahashi Y, Machida M, Kahyo T, Setou M. Changes of Mass Spectra Patterns on a Brain Tissue Section Revealed by Deep Learning with Imaging Mass Spectrometry Data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2022; 33:1607-1614. [PMID: 35881989 DOI: 10.1021/jasms.2c00080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The characteristic patterns of mass spectra in imaging mass spectrometry (IMS) strongly reflect the tissue environment. However, the boundaries formed where different tissue environments collide have not been visually assessed. In this study, IMS and convolutional neural network (CNN), one of the deep learning methods, were applied to the extraction of characteristic mass spectra patterns from training brain regions on rodents' brain sections. CNN produced classification models with high accuracy and low loss rate in any test data sets of mouse coronal sections measured by desorption electrospray ionization (DESI)-IMS and of mouse and rat sagittal sections by matrix-assisted laser desorption (MALDI)-IMS. On the basis of the extracted mass spectra pattern features, the histologically plausible segmentation and classification score imaging of the brain sections were obtained. The boundary imaging generated from classification scores showed the extreme changes of mass spectra patterns between the tissue environments, with no significant buffer zones for the intermediate state. The CNN-based analysis of IMS data is a useful tool for visually assessing the changes of mass spectra patterns on a tissue section, and it will contribute to a comprehensive view of the tissue environment.
Collapse
Affiliation(s)
- Hidemoto Yamada
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Lili Xu
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Fumihiro Eto
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Rei Takeichi
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Ariful Islam
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Md Ai Mamun
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Chi Zhang
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Ikuko Yao
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- Department of Biomedical Sciences, School of Biological and Environmental Sciences, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan
| | - Takumi Sakamoto
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Shuhei Aramaki
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- Department of Radiation Oncology, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Kenji Kikushima
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Tomohito Sato
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Yutaka Takahashi
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Manabu Machida
- Department of Systems Molecular Anatomy, Institute for Medical Photonics Research, Preeminent Medical Photonics Education & Research Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Tomoaki Kahyo
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| | - Mitsutoshi Setou
- Department of Cellular and Molecular Anatomy, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- International Mass Imaging Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
- Department of Systems Molecular Anatomy, Institute for Medical Photonics Research, Preeminent Medical Photonics Education & Research Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka 431-3192, Japan
| |
Collapse
|
23
|
Ubbens J, Feldmann MJ, Stavness I, Sharpe AG. Quantitative evaluation of nonlinear methods for population structure visualization and inference. G3 GENES|GENOMES|GENETICS 2022; 12:6651067. [PMID: 35900169 PMCID: PMC9434256 DOI: 10.1093/g3journal/jkac191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/20/2022] [Indexed: 11/20/2022]
Abstract
Population structure (also called genetic structure and population stratification) is the presence of a systematic difference in allele frequencies between subpopulations in a population as a result of nonrandom mating between individuals. It can be informative of genetic ancestry, and in the context of medical genetics, it is an important confounding variable in genome-wide association studies. Recently, many nonlinear dimensionality reduction techniques have been proposed for the population structure visualization task. However, an objective comparison of these techniques has so far been missing from the literature. In this article, we discuss the previously proposed nonlinear techniques and some of their potential weaknesses. We then propose a novel quantitative evaluation methodology for comparing these nonlinear techniques, based on populations for which pedigree is known a priori either through artificial selection or simulation. Based on this evaluation metric, we find graph-based algorithms such as t-SNE and UMAP to be superior to principal component analysis, while neural network-based methods fall behind.
Collapse
Affiliation(s)
- Jordan Ubbens
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SKS7N 0W9, Canada
| | - Mitchell J Feldmann
- Department of Plant Sciences, University of California , Davis, CA95616, USA
| | - Ian Stavness
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SKS7N 0W9, Canada
- Department of Computer Science, University of Saskatchewan , Saskatoon, SKS7N 0W9, Canada
| | - Andrew G Sharpe
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SKS7N 0W9, Canada
| |
Collapse
|
24
|
Hingray C, Ertan D, Reuber M, Lother A, Chrusciel J, Tarrada A, Michel N, Meyer M, Klemina I, Maillard L, Sanchez S, El‐Hage W. Heterogeneity of patients with functional/dissociative seizures: Three multidimensional profiles. Epilepsia 2022; 63:1500-1515. [PMID: 35305025 PMCID: PMC9790427 DOI: 10.1111/epi.17230] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 03/16/2022] [Accepted: 03/16/2022] [Indexed: 12/30/2022]
Abstract
OBJECTIVE Current concepts highlight the neurological and psychological heterogeneity of functional/dissociative seizures (FDS). However, it remains uncertain whether it is possible to distinguish between a limited number of subtypes of FDS disorders. We aimed to identify profiles of distinct FDS subtypes by cluster analysis of a multidimensional dataset without any a priori hypothesis. METHODS We conducted an exploratory, prospective multicenter study of 169 patients with FDS. We collected biographical, trauma (childhood and adulthood traumatic experiences), semiological (seizure characteristics), and psychopathological data (psychiatric comorbidities, dissociation, and alexithymia) through psychiatric interviews and standardized scales. Clusters were identified by the Partitioning Around Medoids method. The similarity of patients was computed using Gower distance. The clusters were compared using analysis of variance, chi-squared, or Fisher exact tests. RESULTS Three patient clusters were identified in this exploratory, hypothesis-generating study and named on the basis of their most prominent characteristics: A "No/Single Trauma" group (31.4%), with more male patients, intellectual disabilities, and nonhyperkinetic seizures, and a low level of psychopathology; A "Cumulative Lifetime Traumas" group (42.6%), with clear female predominance, hyperkinetic seizures, relatively common comorbid epilepsy, and a high level of psychopathology; and A "Childhood Traumas" group (26%), commonly with comorbid epilepsy, history of childhood sexual abuse (75%), and posttraumatic stress disorder, but also with a high level of anxiety and dissociation. SIGNIFICANCE Although our cluster analysis was undertaken without any a priori hypothesis, the nature of the trauma history emerged as the most important differentiator between three common FDS disorder subtypes. This subdifferentiation of FDS disorders may facilitate the development of more specific therapeutic programs for each patient profile.
Collapse
Affiliation(s)
- Coraline Hingray
- Department of NeurologyNancy Regional University Hospital CenterNancyFrance,National Center for Scientific ResearchResearch Center for Automatic ControlMixed Unit of Research 7039University of LorraineNancyFrance,Nancy Psychotherapeutic CenterUniversity Hospital Center for Adult Psychiatry of Greater NancyLaxouFrance
| | - Deniz Ertan
- National Center for Scientific ResearchResearch Center for Automatic ControlMixed Unit of Research 7039University of LorraineNancyFrance,Clinical Research UnitTeppe InstituteTain‐l’HérmitageFrance
| | - Markus Reuber
- Academic Neurology UnitRoyal Hallamshire HospitalUniversity of SheffieldSheffieldUK
| | | | - Jan Chrusciel
- Public Health and Performance Territorial CenterTroyes Hospital CenterTroyesFrance
| | - Alexis Tarrada
- Department of NeurologyNancy Regional University Hospital CenterNancyFrance,National Center for Scientific ResearchResearch Center for Automatic ControlMixed Unit of Research 7039University of LorraineNancyFrance
| | - Nathalie Michel
- La Conception Hospital, Marseille University HospitalsPublic Assistance–Marseille HospitalsMarseilleFrance
| | - Mylene Meyer
- Department of NeurologyNancy Regional University Hospital CenterNancyFrance
| | - Irina Klemina
- Department of NeurologyNancy Regional University Hospital CenterNancyFrance
| | - Louis Maillard
- Department of NeurologyNancy Regional University Hospital CenterNancyFrance,National Center for Scientific ResearchResearch Center for Automatic ControlMixed Unit of Research 7039University of LorraineNancyFrance
| | - Stephane Sanchez
- Public Health and Performance Territorial CenterTroyes Hospital CenterTroyesFrance
| | - Wissam El‐Hage
- Mixed Unit of Research 1253iBrainNational Institute of Health and Medical ResearchUniversity of ToursToursFrance,Psychiatry Center, Tours Regional University Hospital CenterToursFrance
| |
Collapse
|
25
|
|
26
|
Bej S, Sarkar J, Biswas S, Mitra P, Chakrabarti P, Wolkenhauer O. Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach. Nutr Diabetes 2022; 12:27. [PMID: 35624098 PMCID: PMC9142500 DOI: 10.1038/s41387-022-00206-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 03/11/2022] [Accepted: 05/18/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients. METHODS Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data. RESULTS Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods. CONCLUSIONS From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.
Collapse
Affiliation(s)
- Saptarshi Bej
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.
- Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany.
| | - Jit Sarkar
- Division of Cell Biology and Physiology, CSIR-Indian Institute of Chemical Biology, Kolkata, India.
- Academy of Innovative and Scientific Research, Ghaziabad, India.
| | - Saikat Biswas
- Advanced Technology Development Centre, Indian Institute of Technology, Kharagpur, India
| | - Pabitra Mitra
- Department of Computer Science & Engineering, Indian Institute of Technology, Kharagpur, India
| | - Partha Chakrabarti
- Division of Cell Biology and Physiology, CSIR-Indian Institute of Chemical Biology, Kolkata, India
- Academy of Innovative and Scientific Research, Ghaziabad, India
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.
- Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany.
- Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, Stellenbosch, South Africa.
| |
Collapse
|
27
|
Omee SS, Louis SY, Fu N, Wei L, Dey S, Dong R, Li Q, Hu J. Scalable deeper graph neural networks for high-performance materials property prediction. PATTERNS (NEW YORK, N.Y.) 2022; 3:100491. [PMID: 35607621 PMCID: PMC9122959 DOI: 10.1016/j.patter.2022.100491] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 02/22/2022] [Accepted: 03/18/2022] [Indexed: 01/02/2023]
Abstract
Machine-learning-based materials property prediction models have emerged as a promising approach for new materials discovery, among which the graph neural networks (GNNs) have shown the best performance due to their capability to learn high-level features from crystal structures. However, existing GNN models suffer from their lack of scalability, high hyperparameter tuning complexity, and constrained performance due to over-smoothing. We propose a scalable global graph attention neural network model DeeperGATGNN with differentiable group normalization (DGN) and skip connections for high-performance materials property prediction. Our systematic benchmark studies show that our model achieves the state-of-the-art prediction results on five out of six datasets, outperforming five existing GNN models by up to 10%. Our model is also the most scalable one in terms of graph convolution layers, which allows us to train very deep networks (e.g., >30 layers) without significant performance degradation. Our implementation is available at https://github.com/usccolumbia/deeperGATGNN.
Collapse
Affiliation(s)
- Sadman Sadeed Omee
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Steph-Yves Louis
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Nihang Fu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Lai Wei
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Sourin Dey
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Rongzhi Dong
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Qinyang Li
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA
| |
Collapse
|
28
|
Prediction of GPCR activity using Machine Learning. Comput Struct Biotechnol J 2022; 20:2564-2573. [PMID: 35685352 PMCID: PMC9163700 DOI: 10.1016/j.csbj.2022.05.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/20/2022] Open
Abstract
GPCRs are the target for one-third of the FDA-approved drugs, however; the development of new drug molecules targeting GPCRs is limited by the lack of mechanistic understanding of the GPCR structure–activity-function relationship. To modulate the GPCR activity with highly specific drugs and minimal side-effects, it is necessary to quantitatively describe the important structural features in the GPCR and correlate them to the activation state of GPCR. In this study, we developed 3 ML approaches to predict the conformation state of GPCR proteins. Additionally, we predict the activity level of GPCRs based on their structure. We leverage the unique advantages of each of the 3 ML approaches, interpretability of XGBoost, minimal feature engineering for 3D convolutional neural network, and graph representation of protein structure for graph neural network. By using these ML approaches, we are able to predict the activation state of GPCRs with high accuracy (91%–95%) and also predict the activation state of GPCRs with low error (MAE of 7.15–10.58). Furthermore, the interpretation of the ML approaches allows us to determine the importance of each of the features in distinguishing between the GPCRs conformations.
Collapse
|
29
|
Musher LJ, Giakoumis M, Albert J, Del-Rio G, Rego M, Thom G, Aleixo A, Ribas CC, Brumfield RT, Smith BT, Cracraft J. River network rearrangements promote speciation in lowland Amazonian birds. SCIENCE ADVANCES 2022; 8:eabn1099. [PMID: 35394835 PMCID: PMC8993111 DOI: 10.1126/sciadv.abn1099] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Large Amazonian rivers impede dispersal for many species, but lowland river networks frequently rearrange, thereby altering the location and effectiveness of river barriers through time. These rearrangements may promote biotic diversification by facilitating episodic allopatry and secondary contact among populations. We sequenced genome-wide markers to evaluate the histories of divergence and introgression in six Amazonian avian species complexes. We first tested the assumption that rivers are barriers for these taxa and found that even relatively small rivers facilitate divergence. We then tested whether species diverged with gene flow and recovered reticulate histories for all species, including one potential case of hybrid speciation. Our results support the hypothesis that river rearrangements promote speciation and reveal that many rainforest taxa are micro-endemic, unrecognized, and thus threatened with imminent extinction. We propose that Amazonian hyper-diversity originates partly from fine-scale barrier displacement processes-including river dynamics-which allow small populations to differentiate and disperse into secondary contact.
Collapse
Affiliation(s)
- Lukas J. Musher
- Department of Ornithology, The Academy of Natural
Sciences of Drexel University, Philadelphia, PA 19103, USA
- Department of Ornithology, American Museum of Natural
History, New York, NY 10028, USA
- Corresponding author.
| | - Melina Giakoumis
- Department of Biology, City College of New York, New
York, NY 10031, USA
- Graduate Center, City University of New York, New
York, NY 10016, USA
| | - James Albert
- Department of Biology, University of Louisiana at
Lafayette, Lafayette, LA 70503, USA
| | - Glaucia Del-Rio
- Department of Biological Sciences, Louisiana State
University, Baton Rouge, LA 70803, USA
- Museum of Natural Science, Louisiana State
University, Baton Rouge, LA 70803, USA
| | - Marco Rego
- Department of Biological Sciences, Louisiana State
University, Baton Rouge, LA 70803, USA
- Museum of Natural Science, Louisiana State
University, Baton Rouge, LA 70803, USA
| | - Gregory Thom
- Department of Ornithology, American Museum of Natural
History, New York, NY 10028, USA
| | - Alexandre Aleixo
- Finnish Museum of Natural History of Helsinki,
University of Helsinki, Helsinki, Finland
- Museu Paraense Emílio Goeldi, Belém,
Brazil
- Instituto Tecnológico Vale, Belém,
Brazil
| | - Camila C. Ribas
- Instituto Nacional de Pesquisas da
Amazônia, INPA, Manaus, Brazil
| | - Robb T. Brumfield
- Department of Biological Sciences, Louisiana State
University, Baton Rouge, LA 70803, USA
- Museum of Natural Science, Louisiana State
University, Baton Rouge, LA 70803, USA
| | - Brian Tilston Smith
- Department of Ornithology, American Museum of Natural
History, New York, NY 10028, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural
History, New York, NY 10028, USA
| |
Collapse
|
30
|
The Principal Component Analysis as a tool for predicting the mechanical properties of Perovskites and Inverse Perovskites. Chem Phys Lett 2022. [DOI: 10.1016/j.cplett.2022.139615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
31
|
Luo R, Li Y, Wu Z, Zhang Y, Luo J, Yang K, Qin X, Wang H, Huang R, Wang H, Luo H. Comprehensive Analysis of Microsatellite-Related Transcriptomic Signature and Identify Its Clinical Value in Colon Cancer. Front Surg 2022; 9:871823. [PMID: 35433823 PMCID: PMC9008782 DOI: 10.3389/fsurg.2022.871823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 03/03/2022] [Indexed: 12/03/2022] Open
Abstract
Background Microsatellite has been proved to be an important prognostic factor and a treatment reference in colon cancer. The transcriptome profile and tumor microenvironment of different microsatellite statuses are different. Metastatic colon cancer patients with microsatellite instability-high (MSI-H) are sensitive to immune checkpoint inhibitors (ICIs), but not fluorouracil. Efforts have been devoted to identify the predictive factors of immunotherapy. Methods We analyzed the transcriptome profile of different microsatellite statuses in colon cancer by using single-cell and bulk transcriptome data from publicly available databases. The immune cells in the tumor microenvironment were analyzed by the ESTIMATION algorithm. The microsatellite-related gene signature (MSRS) was constructed by the least absolute shrinkage and selection operator (LASSO) Cox regression based on the differentially expressed genes (DEGs) and its prognostic value and predictive value of response to immunotherapy were assessed. The prognostic value of the MSRS was also validated in another cohort. Results The MSI-H cancers cells were clustered differentially in the dimension reduction plot. Most of the immune cells have a higher proportion in the tumor immune microenvironment, except for CD56 bright natural killer cells. A total of 238 DEGs were identified. Based on the 238 DEGs, a neural network was constructed with a Kappa coefficient of 0.706 in the testing cohort. The MSRS is a favorable prognostic factor of overall survival, which was also validated in another cohort (GSE39582). Besides, MSRS is correlated with tumor mutation burden in MSI-H colon cancer. However, the MSRS is a barely satisfactory factor in predicting immunotherapy with the area under the curve (AUC) of 0.624. Conclusion We developed the MSRS, which is a robust prognostic factor of overall survival in spite of a barely satisfactory immunotherapy predictor. Further studies may need to improve the predictive ability.
Collapse
Affiliation(s)
- Rui Luo
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Yang Li
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Zhijie Wu
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Yuanxin Zhang
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Jian Luo
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Keli Yang
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xiusen Qin
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Huaiming Wang
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Rongkang Huang
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- *Correspondence: Rongkang Huang
| | - Hui Wang
- Department of Colorectal Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Hui Wang
| | - Hongzhi Luo
- Department of Tumor Surgery, Zhongshan City People's Hospital, Zhongshan, China
- Hongzhi Luo
| |
Collapse
|
32
|
Zhang X, Chan T, Carbonella J, Gong X, Ahmed N, Liu C, Demandel I, Zhang J, Pashankar F, Mak M. A microfluidic-informatics assay for quantitative physical occlusion measurement in sickle cell disease. LAB ON A CHIP 2022; 22:1126-1136. [PMID: 35174373 DOI: 10.1039/d2lc00043a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Sickle cell disease (SCD) is a genetic condition that causes abnormalities in hemoglobin mechanics. Those affected are at high risk of vaso-occlusive crisis (VOC), which can induce life-threatening symptoms. The development of measurements related to vaso-occlusion facilitates the diagnosis of the patient's disease state. To complement existing readouts, we design a microfluidic-informatics analytical system with varied confined geometries for the quantification of sickle cell disease occlusion. We detect an increase in physical occlusion events in the most severe hemoglobin SS group. We use bioinformatics and modeling to quantify the in vitro disease severity score (DSS) of individual patients. We also show the potential effect of hydration, clinically recommended for crisis management, on reducing the disease severity of high-risk patients. Overall, we demonstrate the device as an easy-to-use assay for quick occlusion information extraction with a simple setup and minimal additional instruments. We show the device can provide physical readouts distinct from clinical data. We also show the device sensitivity in separate samples from patients with different disease severity. Finally, we demonstrate the system as a potential platform for testing the effectiveness of therapeutic strategies (e.g. hydration) on reducing sickle cell disease severity.
Collapse
Affiliation(s)
- Xingjian Zhang
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Trevor Chan
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Judith Carbonella
- Section of Pediatric Hematology and Oncology, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Xiangyu Gong
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Noureen Ahmed
- Section of Pediatric Hematology and Oncology, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Chang Liu
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Israel Demandel
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Junqi Zhang
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| | - Farzana Pashankar
- Section of Pediatric Hematology and Oncology, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Michael Mak
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
33
|
A Novel Feature Identification Method of Pipeline In-Line Inspected Bending Strain Based on Optimized Deep Belief Network Model. ENERGIES 2022. [DOI: 10.3390/en15041586] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Both long-distance oil and gas pipelines often pass through areas with unstable geological conditions or natural disasters. As a result, they are prone to bending, displacement, and deformation due to the action of an external environmental loading, which poses a threat to the safe operation of pipelines. The in-line inspection method that is based on the implementation of high-precision inertial measurement units (IMU) has become the main means of pipeline bending stress-strain detection technique. However, to address the problems of the inconsistent identification, low identification efficiency, and high misjudgment rate during the application of the traditional manual identification methods, a feature identification approach for the in-line inspected pipeline bending strain based on the employment of an optimized deep belief network (DBN) model is proposed in this work. In addition, our model can automatically learn features from the pipeline bending strain signals and complete classification and identification. On top of that, after the network model was trained and tested by using the actual pipeline bending strain inspection data, the extracted results showed that the model after the implementation of the training process could accurately identify and classify various pipeline features, with an identification accuracy and efficiency of 97.8% and 0.02 min/km, respectively. The high efficiency, elevated accuracy, and strong robustness of our method can effectively improve the in-line inspection procedure of pipelines during the enforcement of a bending strain load.
Collapse
|
34
|
Intelligent Robust Cross-Domain Fault Diagnostic Method for Rotating Machines Using Noisy Condition Labels. MATHEMATICS 2022. [DOI: 10.3390/math10030455] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cross-domain fault diagnosis methods have been successfully and widely developed in the past years, which focus on practical industrial scenarios with training and testing data from numerous machinery working regimes. Due to the remarkable effectiveness in such problems, deep learning-based domain adaptation approaches have been attracting increasing attention. However, the existing methods in the literature are generally lower compared to environmental noise and data availability, and it is difficult to achieve promising performance under harsh practical conditions. This paper proposes a new cross-domain fault diagnosis method with enhanced robustness. Noisy labels are introduced to significantly increase the generalization ability of the data-driven model. Promising diagnosis performance can be obtained with strong noise interference in testing, as well as in practical cases with low-quality data. Experiments on two rotating machinery datasets are carried out for validation. The results indicate that the proposed algorithm is well suited to be applied in real industrial environments to achieve promising performance with variations of working conditions.
Collapse
|
35
|
Fu H, Sun H, Kong H, Lou B, Chen H, Zhou Y, Huang C, Qin L, Shan Y, Dai S. Discoveries in Pancreatic Physiology and Disease Biology Using Single-Cell RNA Sequencing. Front Cell Dev Biol 2022; 9:732776. [PMID: 35141228 PMCID: PMC8819087 DOI: 10.3389/fcell.2021.732776] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 12/15/2021] [Indexed: 11/16/2022] Open
Abstract
Transcriptome analysis is used to study gene expression in human tissues. It can promote the discovery of new therapeutic targets for related diseases by characterizing the endocrine function of pancreatic physiology and pathology, as well as the gene expression of pancreatic tumors. Compared to whole-tissue RNA sequencing, single-cell RNA sequencing (scRNA-seq) can detect transcriptional activity within a single cell. The scRNA-seq had an invaluable contribution to discovering previously unknown cell subtypes in normal and diseased pancreases, studying the functional role of rare islet cells, and studying various types of cells in diabetes as well as cancer. Here, we review the recent in vitro and in vivo advances in understanding the pancreatic physiology and pathology associated with single-cell sequencing technology, which may provide new insights into treatment strategy optimization for diabetes and pancreatic cancer.
Collapse
Affiliation(s)
- Haotian Fu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Hongwei Sun
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, Wenzhou, China
| | - Hongru Kong
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Bin Lou
- Department of Surgery, The Third People’s Hospital of Yuhang District, Hangzhou, China
| | - Hao Chen
- Department of Thyroid Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yilin Zhou
- Department of Biology, Boston University, Boston, MA, United States
| | - Chaohao Huang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Lei Qin
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Lei Qin, ; Yunfeng Shan, ; Shengjie Dai,
| | - Yunfeng Shan
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, Wenzhou, China
- *Correspondence: Lei Qin, ; Yunfeng Shan, ; Shengjie Dai,
| | - Shengjie Dai
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Lei Qin, ; Yunfeng Shan, ; Shengjie Dai,
| |
Collapse
|
36
|
A Pricing Model for Urban Rental Housing Based on Convolutional Neural Networks and Spatial Density: A Case Study of Wuhan, China. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11010053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the development of urbanization and the expansion of floating populations, rental housing has become an increasingly common living choice for many people, and housing rental prices have attracted great attention from individuals, enterprises and the government. The housing rental prices are principally estimated based on structural, locational and neighborhood variables, among which the relationships are complicated and can hardly be captured entirely by simple one-dimensional models; in addition, the influence of the geographic objects on the price may vary with the increase in their quantities. However, existing pricing models usually take those structural, locational and neighborhood variables as one-dimensional inputs into neural networks, and often neglect the aggregated effects of geographical objects, which may lead to fluctuating rental price estimations. Therefore, this paper proposes a rental housing price model based on the convolutional neural network (CNN) and the synthetic spatial density of points of interest (POIs). The CNN can efficiently extract the complex characteristics among the relevant variables of housing, and the two-dimensional locational and neighborhood variables, based on the synthetic spatial density, effectively reflect the aggregated effects of the urban facilities on rental housing prices, thereby improving the accuracy of the model. Taking Wuhan, China, as the study area, the proposed method achieves satisfactory and accurate rental price estimations (coefficient of determination (R2) = 0.9097, root mean square error (RMSE) = 3.5126) in comparison with other commonly used pricing models.
Collapse
|
37
|
Bai Q, Shi M, Sun X, Lou Q, Peng H, Qu Z, Fan J, Dai L. Comprehensive analysis of the m6A-related molecular patterns and diagnostic biomarkers in osteoporosis. Front Endocrinol (Lausanne) 2022; 13:957742. [PMID: 36034449 PMCID: PMC9399504 DOI: 10.3389/fendo.2022.957742] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 06/30/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND N6-methyladenosine (m6A) modification is a critical epigenetic modification in eukaryotes and involves several biological processes and occurrences of diseases. However, the roles and regulatory mechanisms of m6A regulators in osteoporosis (OP) remain unclear. Thus, the purpose of this study is to explore the roles and mechanisms of m6A regulators in OP. METHODS The mRNA and microRNA (miRNA) expression profiles were respectively obtained from GSE56815, GSE7158, and GSE93883 datasets in Gene Expression Omnibus (GEO). The differential expression of 21 m6A regulators between high-bone mineral density (BMD) and low-BMD women was identified. Then, a consensus clustering of low-BMD women was performed based on differentially expressed (DE)-m6A regulators. The m6A-related differentially expressed genes (DEGs), the differentially expressed miRNAs (DE-miRNAs), and biological functions were investigated. Moreover, a weighted gene co-expression network analysis (WGCNA) was constructed to identify the OP-related hub modules, hub genes, and the functional pathways. Then, an m6A regulator-target-pathway network and the competing endogenous RNA (ceRNA) network in key modules were constructed. A least absolute shrinkage and selection operation (LASSO) Cox regression model and a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) model were constructed to identify the candidate genes for OP prediction. The receiver operator characteristic (ROC) curves were used to validate the performances of predictive models and candidate genes. RESULTS A total of 10,520 DEGs, 13 DE-m6A regulators, and 506 DE-miRNAs between high-BMD and low-BMD women were identified. Two m6A-related subclusters with 13 DE-m6A regulators were classified for OP. There were 5,260 m6A-related DEGs identified between two m6A-related subclusters, the PI3K-Akt, MAPK, and immune-related pathways, and bone metabolism was mainly enriched in cluster 2. Cell cycle-related pathways, RNA methylation, and cell death-related pathways were significantly involved in cluster 1. Five modules were identified as key modules based on WGCNA, and an m6A regulator-target gene-pathway network and the ceRNA network were constructed in module brown. Moreover, three m6A regulators (FTO, YTHDF2, and CBLL1) were selected as the candidate genes for OP. CONCLUSION M6A regulators play an important role in the occurrences and diagnosis of OP.
Collapse
Affiliation(s)
- Qiong Bai
- Laboratory of Genetic Breeding and Molecular Biology, Southwest Forestry University, Kunming, China
| | - Min Shi
- Laboratory of Genetic Breeding and Molecular Biology, Southwest Forestry University, Kunming, China
| | - Xinli Sun
- National Wetland Ecosystem Fixed Research Station of Yunnan Dianchi, Southwest Forestry University, Kunming, China
| | - Qiu Lou
- Department of Internal Medicine, The Affiliated Hospital of Yunnan University, Kunming, China
| | - Hangya Peng
- Department of Internal Medicine, Yunnan Fuwai Cardiovascular Hospital, Kunming, China
| | - Zhuan Qu
- Department of Internal Medicine, Yunnan Fuwai Cardiovascular Hospital, Kunming, China
| | - Jiashuang Fan
- Department of Internal Medicine, Yunnan Fuwai Cardiovascular Hospital, Kunming, China
- *Correspondence: Lifen Dai, ; Jiashuang Fan,
| | - Lifen Dai
- Department of Internal Medicine, Yunnan Fuwai Cardiovascular Hospital, Kunming, China
- Department of Internal Medicine, The Second Affiliated Hospital of Kunming Medical University, Kunming, China
- *Correspondence: Lifen Dai, ; Jiashuang Fan,
| |
Collapse
|
38
|
Zhang Q, Du Q, Liu G. A whole-process interpretable and multi-modal deep reinforcement learning for diagnosis and analysis of Alzheimer's disease ∗. J Neural Eng 2021; 18:066032. [PMID: 34753116 DOI: 10.1088/1741-2552/ac37cc] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 11/09/2021] [Indexed: 01/09/2023]
Abstract
Objective. Alzheimer's disease (AD), a common disease of the elderly with unknown etiology, has been adversely affecting many people, especially with the aging of the population and the younger trend of this disease. Current artificial intelligence (AI) methods based on individual information or magnetic resonance imaging (MRI) can solve the problem of diagnostic sensitivity and specificity, but still face the challenges of interpretability and clinical feasibility. In this study, we propose an interpretable multimodal deep reinforcement learning model for inferring pathological features and the diagnosis of AD.Approach. First, for better clinical feasibility, the compressed-sensing MRI image is reconstructed using an interpretable deep reinforcement learning model. Then, the reconstructed MRI is input into the full convolution neural network to generate a pixel-level disease probability risk map (DPM) of the whole brain for AD. The DPM of important brain regions and individual information are then input into the attention-based fully deep neural network to obtain the diagnosis results and analyze the biomarkers. We used 1349 multi-center samples to construct and test the model.Main results.Finally, the model obtained 99.6% ± 0.2%, 97.9% ± 0.2%, and 96.1% ± 0.3% area under curve in ADNI, AIBL and NACC, respectively. The model also provides an effective analysis of multimodal pathology, predicts the imaging biomarkers in MRI and the weight of each individual item of information. In this study, a deep reinforcement learning model was designed, which can not only accurately diagnose AD, but analyze potential biomarkers.Significance. In this study, a deep reinforcement learning model was designed. The model builds a bridge between clinical practice and AI diagnosis and provides a viewpoint for the interpretability of AI technology.
Collapse
Affiliation(s)
- Quan Zhang
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, People's Republic of China
- Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Nankai University, Tianjin 300350, People's Republic of China
| | - Qian Du
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, People's Republic of China
- Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Nankai University, Tianjin 300350, People's Republic of China
| | - Guohua Liu
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, People's Republic of China
- Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Nankai University, Tianjin 300350, People's Republic of China
- Engineering Research Center of Thin Film Optoelectronics Technology, Ministry of Education, Nankai University, Tianjin 300350, People's Republic of China
| |
Collapse
|
39
|
Corredor G, Toro P, Koyuncu C, Lu C, Buzzy C, Bera K, Fu P, Mehrad M, Ely KA, Mokhtari M, Yang K, Chute D, Adelstein DJ, Thompson LDR, Bishop JA, Faraji F, Thorstad W, Castro P, Sandulache V, Koyfman SA, Lewis JS, Madabhushi A. An Imaging Biomarker of Tumor-Infiltrating Lymphocytes to Risk-Stratify Patients With HPV-Associated Oropharyngeal Cancer. J Natl Cancer Inst 2021; 114:609-617. [PMID: 34850048 PMCID: PMC9002277 DOI: 10.1093/jnci/djab215] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/03/2021] [Accepted: 11/19/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Human papillomavirus (HPV)-associated oropharyngeal squamous cell carcinoma (OPSCC) has excellent control rates compared to nonvirally associated OPSCC. Multiple trials are actively testing whether de-escalation of treatment intensity for these patients can maintain oncologic equipoise while reducing treatment-related toxicity. We have developed OP-TIL, a biomarker that characterizes the spatial interplay between tumor-infiltrating lymphocytes (TILs) and surrounding cells in histology images. Herein, we sought to test whether OP-TIL can segregate stage I HPV-associated OPSCC patients into low-risk and high-risk groups and aid in patient selection for de-escalation clinical trials. METHODS Association between OP-TIL and patient outcome was explored on whole slide hematoxylin and eosin images from 439 stage I HPV-associated OPSCC patients across 6 institutional cohorts. One institutional cohort (n = 94) was used to identify the most prognostic features and train a Cox regression model to predict risk of recurrence and death. Survival analysis was used to validate the algorithm as a biomarker of recurrence or death in the remaining 5 cohorts (n = 345). All statistical tests were 2-sided. RESULTS OP-TIL separated stage I HPV-associated OPSCC patients with 30 or less pack-year smoking history into low-risk (2-year disease-free survival [DFS] = 94.2%; 5-year DFS = 88.4%) and high-risk (2-year DFS = 82.5%; 5-year DFS = 74.2%) groups (hazard ratio = 2.56, 95% confidence interval = 1.52 to 4.32; P < .001), even after adjusting for age, smoking status, T and N classification, and treatment modality on multivariate analysis for DFS (hazard ratio = 2.27, 95% confidence interval = 1.32 to 3.94; P = .003). CONCLUSIONS OP-TIL can identify stage I HPV-associated OPSCC patients likely to be poor candidates for treatment de-escalation. Following validation on previously completed multi-institutional clinical trials, OP-TIL has the potential to be a biomarker, beyond clinical stage and HPV status, that can be used clinically to optimize patient selection for de-escalation.
Collapse
Affiliation(s)
- Germán Corredor
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA,Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA
| | - Paula Toro
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Can Koyuncu
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Cheng Lu
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Christina Buzzy
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Kaustav Bera
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Pingfu Fu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Mitra Mehrad
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kim A Ely
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mojgan Mokhtari
- Department of Biomedical Engineering, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA
| | - Kailin Yang
- Department of Radiation Oncology, Cleveland Clinic, Cleveland, OH, USA
| | - Deborah Chute
- Department of Anatomic Pathology, Cleveland Clinic, Cleveland, OH, USA
| | - David J Adelstein
- Department of Medicine, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Lester D R Thompson
- Department of Pathology, Southern California Permanente Medical Group, Woodland Hills, CA, USA
| | - Justin A Bishop
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Farhoud Faraji
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, UC San Diego Health, La Jolla, CA, USA
| | - Wade Thorstad
- Department of Radiation Oncology, Washington University in St. Louis, St. Louis, MS, USA
| | - Patricia Castro
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Vlad Sandulache
- Department of Otolaryngology-Head and Neck Surgery, Baylor College of Medicine, Houston, TX, USA,ENT Section, Operative Care Line, Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX, USA,Center for Translational Research on Inflammatory Disease (CTRID), Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX, USA
| | - Shlomo A Koyfman
- Department of Radiation Oncology, Cleveland Clinic, Cleveland, OH, USA
| | - James S Lewis
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anant Madabhushi
- Correspondence to: Anant Madabhushi, PhD, Center of Computational Imaging and Personalized Diagnostics, Case Western Reserve University, 2071 Martin Luther King Drive, Cleveland, OH 44106-7207, USA (e-mail: )
| |
Collapse
|
40
|
Papanicolau-Sengos A, Aldape K. DNA Methylation Profiling: An Emerging Paradigm for Cancer Diagnosis. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2021; 17:295-321. [PMID: 34736341 DOI: 10.1146/annurev-pathol-042220-022304] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Histomorphology has been a mainstay of cancer diagnosis in anatomic pathology for many years. DNA methylation profiling is an additional emerging tool that will serve as an adjunct to increase accuracy of pathological diagnosis. Genome-wide interrogation of DNA methylation signatures, in conjunction with machine learning methods, has allowed for the creation of clinical-grade classifiers, most prominently in central nervous system and soft tissue tumors. Tumor DNA methylation profiling has led to the identification of new entities and the consolidation of morphologically disparate cancers into biologically coherent entities, and it will progressively become mainstream in the future. In addition, DNA methylation patterns in circulating tumor DNA hold great promise for minimally invasive cancer detection and classification. Despite practical challenges that accompany any new technology, methylation profiling is here to stay and will become increasingly utilized as a cancer diagnostic tool across a range of tumor types. Expected final online publication date for the Annual Review of Pathology: Mechanisms of Disease, Volume 17 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
| | - Kenneth Aldape
- Laboratory of Pathology, National Cancer Institute, Bethesda, Maryland 20892, USA; ,
| |
Collapse
|
41
|
Tjoa E, Guan C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:4793-4813. [PMID: 33079674 DOI: 10.1109/tnnls.2020.3027314] [Citation(s) in RCA: 311] [Impact Index Per Article: 103.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Recently, artificial intelligence and machine learning in general have demonstrated remarkable performances in many tasks, from image processing to natural language processing, especially with the advent of deep learning (DL). Along with research progress, they have encroached upon many different fields and disciplines. Some of them require high level of accountability and thus transparency, for example, the medical sector. Explanations for machine decisions and predictions are thus needed to justify their reliability. This requires greater interpretability, which often means we need to understand the mechanism underlying the algorithms. Unfortunately, the blackbox nature of the DL is still unresolved, and many machine decisions are still poorly understood. We provide a review on interpretabilities suggested by different research works and categorize them. The different categories show different dimensions in interpretability research, from approaches that provide "obviously" interpretable information to the studies of complex patterns. By applying the same categorization to interpretability in medical research, it is hoped that: 1) clinicians and practitioners can subsequently approach these methods with caution; 2) insight into interpretability will be born with more considerations for medical practices; and 3) initiatives to push forward data-based, mathematically grounded, and technically grounded medical education are encouraged.
Collapse
|
42
|
Updates in deep learning research in ophthalmology. Clin Sci (Lond) 2021; 135:2357-2376. [PMID: 34661658 DOI: 10.1042/cs20210207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 09/14/2021] [Accepted: 09/29/2021] [Indexed: 12/13/2022]
Abstract
Ophthalmology has been one of the early adopters of artificial intelligence (AI) within the medical field. Deep learning (DL), in particular, has garnered significant attention due to the availability of large amounts of data and digitized ocular images. Currently, AI in Ophthalmology is mainly focused on improving disease classification and supporting decision-making when treating ophthalmic diseases such as diabetic retinopathy, age-related macular degeneration (AMD), glaucoma and retinopathy of prematurity (ROP). However, most of the DL systems (DLSs) developed thus far remain in the research stage and only a handful are able to achieve clinical translation. This phenomenon is due to a combination of factors including concerns over security and privacy, poor generalizability, trust and explainability issues, unfavorable end-user perceptions and uncertain economic value. Overcoming this challenge would require a combination approach. Firstly, emerging techniques such as federated learning (FL), generative adversarial networks (GANs), autonomous AI and blockchain will be playing an increasingly critical role to enhance privacy, collaboration and DLS performance. Next, compliance to reporting and regulatory guidelines, such as CONSORT-AI and STARD-AI, will be required to in order to improve transparency, minimize abuse and ensure reproducibility. Thirdly, frameworks will be required to obtain patient consent, perform ethical assessment and evaluate end-user perception. Lastly, proper health economic assessment (HEA) must be performed to provide financial visibility during the early phases of DLS development. This is necessary to manage resources prudently and guide the development of DLS.
Collapse
|
43
|
Klein S, Duda DG. Machine Learning for Future Subtyping of the Tumor Microenvironment of Gastro-Esophageal Adenocarcinomas. Cancers (Basel) 2021; 13:4919. [PMID: 34638408 PMCID: PMC8507866 DOI: 10.3390/cancers13194919] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 09/27/2021] [Accepted: 09/28/2021] [Indexed: 12/11/2022] Open
Abstract
Tumor progression involves an intricate interplay between malignant cells and their surrounding tumor microenvironment (TME) at specific sites. The TME is dynamic and is composed of stromal, parenchymal, and immune cells, which mediate cancer progression and therapy resistance. Evidence from preclinical and clinical studies revealed that TME targeting and reprogramming can be a promising approach to achieve anti-tumor effects in several cancers, including in GEA. Thus, it is of great interest to use modern technology to understand the relevant components of programming the TME. Here, we discuss the approach of machine learning, which recently gained increasing interest recently because of its ability to measure tumor parameters at the cellular level, reveal global features of relevance, and generate prognostic models. In this review, we discuss the relevant stromal composition of the TME in GEAs and discuss how they could be integrated. We also review the current progress in the application of machine learning in different medical disciplines that are relevant for the management and study of GEA.
Collapse
Affiliation(s)
- Sebastian Klein
- Gerhard-Domagk-Institute for Pathology, University Hospital Münster, 48149 Münster, Germany
- Institute for Pathology, Faculty of Medicine, University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
| | - Dan G. Duda
- Edwin L. Steele Laboratories for Tumor Biology, Department of Radiation Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02478, USA
| |
Collapse
|
44
|
Course MM, Sulovari A, Gudsnuk K, Eichler EE, Valdmanis PN. Characterizing nucleotide variation and expansion dynamics in human-specific variable number tandem repeats. Genome Res 2021; 31:1313-1324. [PMID: 34244228 PMCID: PMC8327921 DOI: 10.1101/gr.275560.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/25/2021] [Indexed: 12/14/2022]
Abstract
There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in PCBP3), as well as several VNTRs that are significantly different in length between superpopulations (in ART1, PROP1, DYNC2I1, and LOC102723906). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.
Collapse
Affiliation(s)
- Meredith M Course
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Kathryn Gudsnuk
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Paul N Valdmanis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
45
|
Corredor G, Toro P, Bera K, Rasmussen D, Viswanathan VS, Buzzy C, Fu P, Barton LM, Stroberg E, Duval E, Gilmore H, Mukhopadhyay S, Madabhushi A. Computational pathology reveals unique spatial patterns of immune response in H&E images from COVID-19 autopsies: preliminary findings. J Med Imaging (Bellingham) 2021; 8:017501. [PMID: 34268443 DOI: 10.1117/1.jmi.8.s1.017501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 06/28/2021] [Indexed: 12/22/2022] Open
Abstract
Purpose: We used computerized image analysis and machine learning approaches to characterize spatial arrangement features of the immune response from digitized autopsied H&E tissue images of the lung in coronavirus disease 2019 (COVID-19) patients. Additionally, we applied our approach to tease out potential morphometric differences from autopsies of patients who succumbed to COVID-19 versus H1N1. Approach: H&E lung whole slide images from autopsy specimens of nine COVID-19 and two H1N1 patients were computationally interrogated. 606 image patches ( ∼ 55 per patient) of 1024 × 882 pixels were extracted from the 11 autopsied patient studies. A watershed-based segmentation approach in conjunction with a machine learning classifier was employed to identify two types of nuclei families: lymphocytes and non-lymphocytes (i.e., other nucleated cells such as pneumocytes, macrophages, and neutrophils). Based off the proximity of the individual nuclei, clusters for each nuclei family were constructed. For each of the resulting clusters, a series of quantitative measurements relating to architecture and density of nuclei clusters were calculated. A receiver operating characteristics-based feature selection method, violin plots, and the t-distributed stochastic neighbor embedding algorithm were employed to study differences in immune patterns. Results: In COVID-19, the immune response consistently showed multiple small-size lymphocyte clusters, suggesting that lymphocyte response is rather modest, possibly due to lymphocytopenia. In H1N1, we found larger lymphocyte clusters that were proximal to large clusters of non-lymphocytes, a possible reflection of increased prevalence of macrophages and other immune cells. Conclusion: Our study shows the potential of computational pathology to uncover immune response features that may not be obvious by routine histopathology visual inspection.
Collapse
Affiliation(s)
- Germán Corredor
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States.,Louis Stokes Cleveland VA Medical Center, Cleveland, Ohio, United States
| | - Paula Toro
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States
| | - Kaustav Bera
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States
| | - Dylan Rasmussen
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States
| | - Vidya Sankar Viswanathan
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States
| | - Christina Buzzy
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States
| | - Pingfu Fu
- Case Western Reserve University, Department of Population and Quantitative Health Sciences, Cleveland, Ohio, United States
| | - Lisa M Barton
- Oklahoma Office of the Chief Medical Examiner, Oklahoma City, Oklahoma, United States
| | - Edana Stroberg
- Oklahoma Office of the Chief Medical Examiner, Oklahoma City, Oklahoma, United States
| | - Eric Duval
- Oklahoma Office of the Chief Medical Examiner, Oklahoma City, Oklahoma, United States
| | - Hannah Gilmore
- University Hospitals, Department of Pathology, Cleveland, Ohio, United States
| | | | - Anant Madabhushi
- Case Western Reserve University, Center for Computational Imaging and Personalized Diagnostics, Cleveland, Ohio, United States.,Louis Stokes Cleveland VA Medical Center, Cleveland, Ohio, United States
| |
Collapse
|
46
|
Zhan X, Humbert-Droz M, Mukherjee P, Gevaert O. Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases. PATTERNS (NEW YORK, N.Y.) 2021; 2:100289. [PMID: 34286303 PMCID: PMC8276012 DOI: 10.1016/j.patter.2021.100289] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/24/2021] [Accepted: 05/19/2021] [Indexed: 11/20/2022]
Abstract
Free-text clinical notes in electronic health records are more difficult for data mining while the structured diagnostic codes can be missing or erroneous. To improve the quality of diagnostic codes, this work extracts diagnostic codes from free-text notes: five old and new word vectorization methods were used to vectorize Stanford progress notes and predict eight ICD-10 codes of common cardiovascular diseases with logistic regression. The models showed good performance, with TF-IDF as the best vectorization model showing the highest AUROC (0.9499-0.9915) and AUPRC (0.2956-0.8072). The models also showed transferability when tested on MIMIC-III data with AUROC from 0.7952 to 0.9790 and AUPRC from 0.2353 to 0.8084. Model interpretability was shown by the important words with clinical meanings matching each disease. This study shows the feasibility of accurately extracting structured diagnostic codes, imputing missing codes, and correcting erroneous codes from free-text clinical notes for information retrieval and downstream machine-learning applications.
Collapse
Affiliation(s)
- Xianghao Zhan
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Marie Humbert-Droz
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Pritam Mukherjee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
47
|
Yu Y, Wu X, Chen J, Cheng G, Zhang X, Wan C, Hu J, Miao S, Yin Y, Wang Z, Shan T, Jing S, Wang W, Guo J, Hu X, Liu Y. Characterizing Brain Tumor Regions Using Texture Analysis in Magnetic Resonance Imaging. Front Neurosci 2021; 15:634926. [PMID: 34149343 PMCID: PMC8209330 DOI: 10.3389/fnins.2021.634926] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
Purpose To extract texture features from magnetic resonance imaging (MRI) scans of patients with brain tumors and use them to train a classification model for supporting an early diagnosis. Methods Two groups of regions (control and tumor) were selected from MRI scans of 40 patients with meningioma or glioma. These regions were analyzed to obtain texture features. Statistical analysis was conducted using SPSS (version 20.0), including the Shapiro-Wilk test and Wilcoxon signed-rank test, which were used to test significant differences in each feature between the tumor and healthy regions. T-distributed stochastic neighbor embedding (t-SNE) was used to visualize the data distribution so as to avoid tumor selection bias. The Gini impurity index in random forests (RFs) was used to select the top five out of all features. Based on the five features, three classification models were built respectively with three machine learning classifiers: RF, support vector machine (SVM), and back propagation (BP) neural network. Results Sixteen of the 25 features were significantly different between the tumor and healthy areas. Through the Gini impurity index in RFs, standard deviation, first-order moment, variance, third-order absolute moment, and third-order central moment were selected to build the classification model. The classification model trained using the SVM classifier achieved the best performance, with sensitivity, specificity, and area under the curve of 94.04%, 92.3%, and 0.932, respectively. Conclusion Texture analysis with an SVM classifier can help differentiate between brain tumor and healthy areas with high speed and accuracy, which would facilitate its clinical application.
Collapse
Affiliation(s)
- Yun Yu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Brain Functional Imaging, Nanjing Medical University, Nanjing, China
| | - Xi Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Jiu Chen
- Institute of Brain Functional Imaging, Nanjing Medical University, Nanjing, China
| | - Gong Cheng
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
| | - Xin Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Cheng Wan
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Jie Hu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Shumei Miao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Yuechuchu Yin
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Zhongmin Wang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Tao Shan
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Shenqi Jing
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Wenming Wang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Jianjun Guo
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| | - Xinhua Hu
- Department of Neurosurgery, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
| | - Yun Liu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Institute of Medical Informatics and Management, Nanjing Medical University, Nanjing, China
| |
Collapse
|
48
|
ULGEN A, ÇETİN Ş, BALCI P, ŞIVGIN H, ŞIVGIN S, ÇETİN M, Lİ W. COVID-19 outpatients and surviving inpatients exhibit comparable blood test results that are distinct from non-surviving inpatients. JOURNAL OF HEALTH SCIENCES AND MEDICINE 2021. [DOI: 10.32322/jhsm.900462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
|
49
|
Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization. ENTROPY 2021; 23:e23050600. [PMID: 34068154 PMCID: PMC8152974 DOI: 10.3390/e23050600] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/08/2021] [Accepted: 05/09/2021] [Indexed: 12/25/2022]
Abstract
Time-series generated by complex systems (CS) are often characterized by phenomena such as chaoticity, fractality and memory effects, which pose difficulties in their analysis. The paper explores the dynamics of multidimensional data generated by a CS. The Dow Jones Industrial Average (DJIA) index is selected as a test-bed. The DJIA time-series is normalized and segmented into several time window vectors. These vectors are treated as objects that characterize the DJIA dynamical behavior. The objects are then compared by means of different distances to generate proper inputs to dimensionality reduction and information visualization algorithms. These computational techniques produce meaningful representations of the original dataset according to the (dis)similarities between the objects. The time is displayed as a parametric variable and the non-locality can be visualized by the corresponding evolution of points and the formation of clusters. The generated portraits reveal a complex nature, which is further analyzed in terms of the emerging patterns. The results show that the adoption of dimensionality reduction and visualization tools for processing complex data is a key modeling option with the current computational resources.
Collapse
|
50
|
Onn Chan K, Hutter CR, Wood PL, Su YC, Brown RM. Gene Flow Increases Phylogenetic Structure and Inflates Cryptic Species Estimations: A Case Study on Widespread Philippine Puddle Frogs (Occidozyga laevis). Syst Biol 2021; 71:40-57. [PMID: 33964168 DOI: 10.1093/sysbio/syab034] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 04/29/2021] [Accepted: 05/06/2021] [Indexed: 11/14/2022] Open
Abstract
In cryptic amphibian complexes, there is a growing trend to equate high levels of genetic structure with hidden cryptic species diversity. Typically, phylogenetic structure and distance-based approaches are used to demonstrate the distinctness of clades and justify the recognition of new cryptic species. However, this approach does not account for gene flow, spatial, and environmental processes that can obfuscate phylogenetic inference and bias species delimitation. As a case study, we sequenced genome-wide exons and introns to evince the processes that underlie the diversification of Philippine Puddle Frogs-a group that is widespread, phenotypically conserved, and exhibits high levels of geographically-based genetic structure. We showed that widely adopted tree- and distance-based approaches inferred up to 20 species, compared to genomic analyses that inferred an optimal number of five distinct genetic groups. Using a suite of clustering, admixture, and phylogenetic network analyses, we demonstrate extensive admixture among the five groups and elucidate two specific ways in which gene flow can cause overestimations of species diversity: (1) admixed populations can be inferred as distinct lineages characterized by long branches in phylograms; and (2) admixed lineages can appear to be genetically divergent, even from their parental populations when simple measures of genetic distance are used. We demonstrate that the relationship between mitochondrial and genome-wide nuclear p-distances is decoupled in admixed clades, leading to erroneous estimates of genetic distances and, consequently, species diversity. Additionally, genetic distance was also biased by spatial and environmental processes. Overall, we showed that high levels of genetic diversity in Philippine Puddle Frogs predominantly comprise metapopulation lineages that arose through complex patterns of admixture, isolation-by-distance, and isolation-by-environment as opposed to species divergence. Our findings suggest that speciation may not be the major process underlying the high levels of hidden diversity observed in many taxonomic groups and that widely-adopted tree- and distance-based methods overestimate species diversity in the presence of gene flow.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377 Singapore
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA.,Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, Alabama 36849, USA
| | - Yong-Chao Su
- Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|