1
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
2
|
Choudhury P, Dasgupta S, Bhattacharyya P, Roychowdhury S, Chaudhury K. Understanding pulmonary hypertension: the need for an integrative metabolomics and transcriptomics approach. Mol Omics 2024; 20:366-389. [PMID: 38853716 DOI: 10.1039/d3mo00266g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Pulmonary hypertension (PH), characterised by mean pulmonary arterial pressure (mPAP) >20 mm Hg at rest, is a complex pathophysiological disorder associated with multiple clinical conditions. The high prevalence of the disease along with increased mortality and morbidity makes it a global health burden. Despite major advances in understanding the disease pathophysiology, much of the underlying complex molecular mechanism remains to be elucidated. Lack of a robust diagnostic test and specific therapeutic targets also poses major challenges. This review provides a comprehensive update on the dysregulated pathways and promising candidate markers identified in PH patients using the transcriptomics and metabolomics approach. The review also highlights the need of using an integrative multi-omics approach for obtaining insight into the disease at a molecular level. The integrative multi-omics/pan-omics approach envisaged to help in bridging the gap from genotype to phenotype is outlined. Finally, the challenges commonly encountered while conducting omics-driven studies are also discussed.
Collapse
Affiliation(s)
- Priyanka Choudhury
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India.
| | - Sanjukta Dasgupta
- Department of Biotechnology, Brainware University, Barasat, West Bengal, India
| | | | | | - Koel Chaudhury
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India.
| |
Collapse
|
3
|
Kelly J, Moyeed R, Carroll C, Luo S, Li X. Blood biomarker-based classification study for neurodegenerative diseases. Sci Rep 2023; 13:17191. [PMID: 37821485 PMCID: PMC10567903 DOI: 10.1038/s41598-023-43956-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 09/30/2023] [Indexed: 10/13/2023] Open
Abstract
As the population ages, neurodegenerative diseases are becoming more prevalent, making it crucial to comprehend the underlying disease mechanisms and identify biomarkers to allow for early diagnosis and effective screening for clinical trials. Thanks to advancements in gene expression profiling, it is now possible to search for disease biomarkers on an unprecedented scale.Here we applied a selection of five machine learning (ML) approaches to identify blood-based biomarkers for Alzheimer's (AD) and Parkinson's disease (PD) with the application of multiple feature selection methods. Based on ROC AUC performance, one optimal random forest (RF) model was discovered for AD with 159 gene markers (ROC-AUC = 0.886), while one optimal RF model was discovered for PD (ROC-AUC = 0.743). Additionally, in comparison to traditional ML approaches, deep learning approaches were applied to evaluate their potential applications in future works. We demonstrated that convolutional neural networks perform consistently well across both the Alzheimer's (ROC AUC = 0.810) and Parkinson's (ROC AUC = 0.715) datasets, suggesting its potential in gene expression biomarker detection with increased tuning of their architecture.
Collapse
Affiliation(s)
- Jack Kelly
- Faculty of Medicine, Biology and Health, Centre for Biostatistics, School of Health Sciences, University of Manchester, Manchester, UK.
- Faculty of Health, University of Plymouth, Plymouth, PL6 8BU, UK.
| | - Rana Moyeed
- Faculty of Science and Engineering, University of Plymouth, Plymouth, PL6 8BU, UK
| | - Camille Carroll
- Faculty of Health, University of Plymouth, Plymouth, PL6 8BU, UK
| | - Shouqing Luo
- Faculty of Health, University of Plymouth, Plymouth, PL6 8BU, UK
| | - Xinzhong Li
- School of Health and Life Sciences, Teesside University, Middlesbrough, TS1 3BX, UK.
| |
Collapse
|
4
|
Intrinsic and Extrinsic Transcriptional Profiles That Affect the Clinical Response to PD-1 Inhibitors in Patients with Non-Small Cell Lung Cancer. Cancers (Basel) 2022; 15:cancers15010197. [PMID: 36612193 PMCID: PMC9818269 DOI: 10.3390/cancers15010197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/13/2022] [Accepted: 12/26/2022] [Indexed: 12/30/2022] Open
Abstract
Using a machine learning method, we investigated the intrinsic and extrinsic transcriptional profiles that affect the clinical response to PD-1 inhibitors in 57 patients with non-small cell lung cancer (NSCLC). Among the top 100 genes associated with the responsiveness to PD-1 inhibitors, the proportion of intrinsic genes in lung adenocarcinoma (LUAD) (69%) was higher than in NSCLC overall (36%) and lung squamous cell carcinoma (LUSC) (33%). The intrinsic gene signature of LUAD (mean area under the ROC curve (AUC) = 0.957 and mean accuracy = 0.9) had higher predictive power than either the intrinsic gene signature of NSCLC or LUSC or the extrinsic gene signature of NSCLC, LUAD, or LUSC. The high intrinsic gene signature group had a high overall survival rate in LUAD (p = 0.034). When we performed a pathway enrichment analysis, the cell cycle and cellular senescence pathways were related to the upregulation of intrinsic genes in LUAD. The intrinsic signature of LUAD also showed a positive correlation with other immune checkpoint targets, including CD274, LAG3, and PDCD1LG2 (Spearman correlation coefficient > 0.25). PD-1 inhibitor-related intrinsic gene patterns differed significantly between LUAD and LUSC and may be a particularly useful biomarker in LUAD.
Collapse
|
5
|
Xu J, Zhong Y, Yin H, Linneman J, Luo Y, Xia S, Xia Q, Yang L, Huang X, Kang K, Wang J, Niu Y, Li L, Gou D. Methylation-mediated silencing of PTPRD induces pulmonary hypertension by promoting pulmonary arterial smooth muscle cell migration via the PDGFRB/PLCγ1 axis. J Hypertens 2022; 40:1795-1807. [PMID: 35848503 PMCID: PMC9451921 DOI: 10.1097/hjh.0000000000003220] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 05/15/2022] [Accepted: 05/15/2022] [Indexed: 12/03/2022]
Abstract
OBJECTIVE Pulmonary hypertension is a lethal disease characterized by pulmonary vascular remodeling and is mediated by abnormal proliferation and migration of pulmonary arterial smooth muscle cells (PASMCs). Platelet-derived growth factor BB (PDGF-BB) is the most potent mitogen for PASMCs and is involved in vascular remodeling in pulmonary hypertension development. Therefore, the objective of our study is to identify novel mechanisms underlying vascular remodeling in pulmonary hypertension. METHODS We explored the effects and mechanisms of PTPRD downregulation in PASMCs and PTPRD knockdown rats in pulmonary hypertension induced by hypoxia. RESULTS We demonstrated that PTPRD is dramatically downregulated in PDGF-BB-treated PASMCs, pulmonary arteries from pulmonary hypertension rats, and blood and pulmonary arteries from lung specimens of patients with hypoxic pulmonary arterial hypertension (HPAH) and idiopathic PAH (iPAH). Subsequently, we found that PTPRD was downregulated by promoter methylation via DNMT1. Moreover, we found that PTPRD knockdown altered cell morphology and migration in PASMCs via modulating focal adhesion and cell cytoskeleton. We have demonstrated that the increase in cell migration is mediated by the PDGFRB/PLCγ1 pathway. Furthermore, under hypoxic condition, we observed significant pulmonary arterial remodeling and exacerbation of pulmonary hypertension in heterozygous PTPRD knock-out rats compared with the wild-type group. We also demonstrated that HET group treated with chronic hypoxia have higher expression and activity of PLCγ1 in the pulmonary arteries compared with wild-type group. CONCLUSION We propose that PTPRD likely plays an important role in the process of pulmonary vascular remodeling and development of pulmonary hypertension in vivo .
Collapse
Affiliation(s)
- Junhua Xu
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, Guangdong, China
| | - Yanfeng Zhong
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Haoyang Yin
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - John Linneman
- Washington University School of Medicine, St. Louis, Missouri, USA
| | - Yixuan Luo
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Sijian Xia
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Qinyi Xia
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Lei Yang
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Xingtao Huang
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Kang Kang
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Jun Wang
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Yanqin Niu
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Li Li
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| | - Deming Gou
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Guangdong Provincial Key Laboratory of Regional Immunity and Disease, Carson International Cancer Center
| |
Collapse
|
6
|
Application of explainable artificial intelligence in the identification of Squamous Cell Carcinoma biomarkers. Comput Biol Med 2022; 146:105505. [DOI: 10.1016/j.compbiomed.2022.105505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 04/03/2022] [Accepted: 04/05/2022] [Indexed: 11/23/2022]
|
7
|
Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022; 12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.
Collapse
|
8
|
Integrating Statistical and Machine-Learning Approach for Meta-Analysis of Bisphenol A-Exposure Datasets Reveals Effects on Mouse Gene Expression within Pathways of Apoptosis and Cell Survival. Int J Mol Sci 2021; 22:ijms221910785. [PMID: 34639124 PMCID: PMC8509605 DOI: 10.3390/ijms221910785] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/23/2021] [Accepted: 09/27/2021] [Indexed: 12/19/2022] Open
Abstract
Bisphenols are important environmental pollutants that are extensively studied due to different detrimental effects, while the molecular mechanisms behind these effects are less well understood. Like other environmental pollutants, bisphenols are being tested in various experimental models, creating large expression datasets found in open access storage. The meta-analysis of such datasets is, however, very complicated for various reasons. Here, we developed an integrating statistical and machine-learning model approach for the meta-analysis of bisphenol A (BPA) exposure datasets from different mouse tissues. We constructed three joint datasets following three different strategies for dataset integration: in particular, using all common genes from the datasets, uncorrelated, and not co-expressed genes, respectively. By applying machine learning methods to these datasets, we identified genes whose expression was significantly affected in all of the BPA microanalysis data tested; those involved in the regulation of cell survival include: Tnfr2, Hgf-Met, Agtr1a, Bdkrb2; signaling through Mapk8 (Jnk1)); DNA repair (Hgf-Met, Mgmt); apoptosis (Tmbim6, Bcl2, Apaf1); and cellular junctions (F11r, Cldnd1, Ctnd1 and Yes1). Our results highlight the benefit of combining existing datasets for the integrated analysis of a specific topic when individual datasets are limited in size.
Collapse
|
9
|
Wang W, Jiang Z, Zhang D, Fu L, Wan R, Hong K. Comparative Transcriptional Analysis of Pulmonary Arterial Hypertension Associated With Three Different Diseases. Front Cell Dev Biol 2021; 9:672159. [PMID: 34336829 PMCID: PMC8319719 DOI: 10.3389/fcell.2021.672159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/17/2021] [Indexed: 01/02/2023] Open
Abstract
Pulmonary arterial hypertension (PAH) is a severe cardiovascular disorder with high mortality. Multiple clinical diseases can induce PAH, but the underlying molecular mechanisms shared in PAHs associated with different diseases remain unclear. The aim of this study is to explore the key candidate genes and pathways in PAH associated with congenital heart disease (CHD-PAH), PAH associated with connective tissue disease (CTD-PAH), and idiopathic PAH (IPAH). We performed differential expression analysis based on a public microarray dataset GSE113439 and identified 1,442 differentially expressed genes, of which 80.3% were upregulated. Subsequently, both pathway enrichment analysis and protein–protein interaction network analysis revealed that the “Cell cycle” and “DNA damage” processes were significantly enriched in PAH. The expression of seven upregulated candidate genes (EIF2AK2, TOPBP1, CDC5L, DHX15, and CUL1–3) and three downregulated candidate genes (DLL4, EGFL7, and ACE) were validated by qRT-PCR. Furthermore, cell cycle-related genes Cul1 and Cul2 were identified in pulmonary arterial endothelial cells (PAECs) in vitro. The result revealed an increased expression of Cul2 in PAECs after hypoxic treatment. Silencing Cul2 could inhibit overproliferation and migration of PAECs in hypoxia. Taken together, according to bioinformatic analyses, our work revealed that “Cell cycle” and “DNA damage” process-related genes and pathways were significantly dysregulated expressed in PAHs associated with three different diseases. This commonality in molecular discovery might broaden the genetic perspective and understanding of PAH. Besides, silencing Cul2 showed a protective effect in PAECs in hypoxia. The results may provide new treatment targets in multiple diseases induced by PAH.
Collapse
Affiliation(s)
- Wei Wang
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Zhenhong Jiang
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China.,Jiangxi Key Laboratory of Molecular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Dandan Zhang
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Linghua Fu
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Rong Wan
- Jiangxi Key Laboratory of Molecular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Kui Hong
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China.,Jiangxi Key Laboratory of Molecular Medicine, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| |
Collapse
|
10
|
Cui X, Goff T, Cui S, Menefee D, Wu Q, Rajan N, Nair S, Phillips N, Walker F. Predicting carbon and water vapor fluxes using machine learning and novel feature ranking algorithms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 775:145130. [PMID: 33618314 DOI: 10.1016/j.scitotenv.2021.145130] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 12/15/2020] [Accepted: 01/08/2021] [Indexed: 06/12/2023]
Abstract
Gap-filling eddy covariance flux data using quantitative approaches has increased over the past decade. Numerous methods have been proposed previously, including look-up table approaches, parametric methods, process-based models, and machine learning. Particularly, the REddyProc package from the Max Planck Institute for Biogeochemistry and ONEFlux package from AmeriFlux have been widely used in many studies. However, there is no consensus regarding the optimal model and feature selection method that could be used for predicting different flux targets (Net Ecosystem Exchange, NEE; or Evapotranspiration -ET), due to the limited systematic comparative research based on the identical site-data. Here, we compared NEE and ET gap-filling/prediction performance of the least-square-based linear model, artificial neural network, random forest (RF), and support vector machine (SVM) using data obtained from four major row-crop and forage agroecosystems located in the subtropical or the climate-transition zones in the US. Additionally, we tested the impacts of different training-testing data partitioning settings, including a 10-fold time-series sequential (10FTS), a 10-fold cross validation (CV) routine with single data point (10FCV), daily (10FCVD), weekly (10FCVW) and monthly (10FCVM) gap length, and a 7/14-day flanking window (FW) approach; and implemented a novel Sliced Inverse Regression-based Recursive Feature Elimination algorithm (SIRRFE). We benchmarked the model performance against REddyProc and ONEFlux-produced results. Our results indicated that accurate NEE and ET prediction models could be systematically constructed using SVM/RF and only a few top informative features. The gap-filling performance of ONEFlux is generally satisfactory (R2 = 0.39-0.71), but results from REddyProc could be very limited or even unreliable in many cases (R2 = 0.01-0.67). Overall, SIRRFE-refined SVM models yielded excellent results for predicting NEE (R2 = 0.46-0.92) and ET (R2 = 0.74-0.91). Finally, the performance of various models was greatly affected by the types of ecosystem, predicting targets, and training algorithms; but was insensitive towards training-testing partitioning. Our research provided more insights into constructing novel gap-filling models and understanding the underlying drivers affecting boundary layer carbon/water fluxes on an ecosystem level.
Collapse
Affiliation(s)
- Xia Cui
- Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China.
| | - Thomas Goff
- Center for Computational Science, Middle Tennessee State University, Murfreesboro, TN 37132, USA
| | - Song Cui
- School of Agriculture, Middle Tennessee State University, Murfreesboro, TN 37132, USA
| | - Dorothy Menefee
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Qiang Wu
- Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, USA
| | - Nithya Rajan
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Shyam Nair
- Department of Agricultural Sciences and Engineering Technology, Sam Houston State University, Huntsville, TX 77341, USA
| | - Nate Phillips
- School of Agriculture, Middle Tennessee State University, Murfreesboro, TN 37132, USA
| | - Forbes Walker
- Department of Biosystems Engineering and Soil Science, University of Tennessee, Knoxville, TN 37996, USA
| |
Collapse
|
11
|
Papke DJ, Lohmann S, Downing M, Hufnagl P, Mutter GL. Computational augmentation of neoplastic endometrial glands in digital pathology displays. J Pathol 2020; 253:258-267. [PMID: 33165914 DOI: 10.1002/path.5586] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/14/2020] [Accepted: 11/02/2020] [Indexed: 11/09/2022]
Abstract
The pathologic diagnosis of neoplasia requires localization and classification of lesional tissue, a process that depends on the recognition of an abnormal spatial distribution of neoplastic elements relative to admixed normal background tissue. In endometrial intraepithelial neoplasia (EIN), a pre-cancer usually managed by hysterectomy, a clonally mutated proliferation of cytologically altered glands ('neoplastic-EIN') aggregates in clusters that also contain background non-neoplastic glands ('background-NL'). Here, we used image analysis to classify individual glands within endometrial tissue fragments as neoplastic-EIN or background-NL, and we used the distribution of predictions to localize foci diagnostic of EIN. Nuclear coordinates were automatically assigned and were used as vertices to generate Delaunay triangulations for each gland. Graph statistical variables were used to develop random forest algorithms to classify glands as neoplastic-EIN or background-NL. Individual glands in an independent validation set were scored by a 'ground truth' biomarker (PAX2 immunohistochemistry). We found that exclusion of small glands led to improvement in classification accuracy. Using an inclusion threshold of 200 nuclei per gland, our final model classification accuracy was 77.5% in the validation set, with a positive predictive value of 0.81. We leveraged this high positive predictive value in a point cloud overlay display to assist end-user identification of EIN foci. This study demonstrates that graph theory approaches applied to small-scale anatomic elements in the endometrium allow biologic classification by machine learning, and that spatial superimposition over large-scale tissue expanses can have practical diagnostic utility. We expect this augmented diagnostic approach to be generalizable to commonly encountered problems in other organ systems. © 2020 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- David J Papke
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Sebastian Lohmann
- Institut für Pathologie, Charité - Universitätsmedizin, Berlin, Germany
| | - Michael Downing
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Peter Hufnagl
- Institut für Pathologie, Charité - Universitätsmedizin, Berlin, Germany
| | - George L Mutter
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| |
Collapse
|