1
|
Yap CX, Vo DD, Heffel MG, Bhattacharya A, Wen C, Yang Y, Kemper KE, Zeng J, Zheng Z, Zhu Z, Hannon E, Vellame DS, Franklin A, Caggiano C, Wamsley B, Geschwind DH, Zaitlen N, Gusev A, Pasaniuc B, Mill J, Luo C, Gandal MJ. Brain cell-type shifts in Alzheimer's disease, autism, and schizophrenia interrogated using methylomics and genetics. SCIENCE ADVANCES 2024; 10:eadn7655. [PMID: 38781333 PMCID: PMC11114225 DOI: 10.1126/sciadv.adn7655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 03/14/2024] [Indexed: 05/25/2024]
Abstract
Few neuropsychiatric disorders have replicable biomarkers, prompting high-resolution and large-scale molecular studies. However, we still lack consensus on a more foundational question: whether quantitative shifts in cell types-the functional unit of life-contribute to neuropsychiatric disorders. Leveraging advances in human brain single-cell methylomics, we deconvolve seven major cell types using bulk DNA methylation profiling across 1270 postmortem brains, including from individuals diagnosed with Alzheimer's disease, schizophrenia, and autism. We observe and replicate cell-type compositional shifts for Alzheimer's disease (endothelial cell loss), autism (increased microglia), and schizophrenia (decreased oligodendrocytes), and find age- and sex-related changes. Multiple layers of evidence indicate that endothelial cell loss contributes to Alzheimer's disease, with comparable effect size to APOE genotype among older people. Genome-wide association identified five genetic loci related to cell-type composition, involving plausible genes for the neurovascular unit (P2RX5 and TRPV3) and excitatory neurons (DPY30 and MEMO1). These results implicate specific cell-type shifts in the pathophysiology of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Chloe X. Yap
- Mater Research Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel D. Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Lifespan Brain Institute at Penn Medicine and The Children’s Hospital of Philadelphia, Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew G. Heffel
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Institute for Data Science in Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Cindy Wen
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yuanhao Yang
- Mater Research Institute, University of Queensland, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Kathryn E. Kemper
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zhili Zheng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zhihong Zhu
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
- The National Centre for Register-based Research, Aarhus University, Denmark
| | - Eilis Hannon
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Dorothea Seiler Vellame
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Alice Franklin
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Christa Caggiano
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Brie Wamsley
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel H. Geschwind
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham & Women’s Hospital, Boston, MA, USA
- Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Bogdan Pasaniuc
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jonathan Mill
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Chongyuan Luo
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Michael J. Gandal
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Lifespan Brain Institute at Penn Medicine and The Children’s Hospital of Philadelphia, Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
2
|
Tao X, Zhu Z, Wang L, Li C, Sun L, Wang W, Gong W. Biomarkers of Aging and Relevant Evaluation Techniques: A Comprehensive Review. Aging Dis 2024; 15:977-1005. [PMID: 37611906 PMCID: PMC11081160 DOI: 10.14336/ad.2023.00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 08/08/2023] [Indexed: 08/25/2023] Open
Abstract
The risk of developing chronic illnesses and disabilities is increasing with age. To predict and prevent aging, biomarkers relevant to the aging process must be identified. This paper reviews the known molecular, cellular, and physiological biomarkers of aging. Moreover, we discuss the currently available technologies for identifying these biomarkers, and their applications and potential in aging research. We hope that this review will stimulate further research and innovation in this emerging and fast-growing field.
Collapse
Affiliation(s)
- Xue Tao
- Department of Research, Beijing Rehabilitation Hospital, Capital Medical University, Beijing, China.
| | - Ziman Zhu
- Beijing Rehabilitation Medicine Academy, Capital Medical University, Beijing, China.
| | - Liguo Wang
- Key Laboratory of Protein Sciences, School of Pharmaceutical Sciences, Tsinghua University, Beijing, China.
| | - Chunlin Li
- School of Biomedical Engineering, Capital Medical University, Beijing, China.
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China.
| | - Liwei Sun
- School of Biomedical Engineering, Capital Medical University, Beijing, China.
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China.
| | - Wei Wang
- Department of Rehabilitation Radiology, Beijing Rehabilitation Hospital, Capital Medical University, Beijing, China.
| | - Weijun Gong
- Department of Neurological Rehabilitation, Beijing Rehabilitation Hospital, Capital Medical University, Beijing, China.
| |
Collapse
|
3
|
Zhao J, Li H, Qu J, Zong X, Liu Y, Kuang Z, Wang H. A multi-organization epigenetic age prediction based on a channel attention perceptron networks. Front Genet 2024; 15:1393856. [PMID: 38725481 PMCID: PMC11080615 DOI: 10.3389/fgene.2024.1393856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/09/2024] [Indexed: 05/12/2024] Open
Abstract
DNA methylation indicates the individual's aging, so-called Epigenetic clocks, which will improve the research and diagnosis of aging diseases by investigating the correlation between methylation loci and human aging. Although this discovery has inspired many researchers to develop traditional computational methods to quantify the correlation and predict the chronological age, the performance bottleneck delayed access to the practical application. Since artificial intelligence technology brought great opportunities in research, we proposed a perceptron model integrating a channel attention mechanism named PerSEClock. The model was trained on 24,516 CpG loci that can utilize the samples from all types of methylation identification platforms and tested on 15 independent datasets against seven methylation-based age prediction methods. PerSEClock demonstrated the ability to assign varying weights to different CpG loci. This feature allows the model to enhance the weight of age-related loci while reducing the weight of irrelevant loci. The method is free to use for academics at www.dnamclock.com/#/original.
Collapse
Affiliation(s)
- Jian Zhao
- School of Computer Science and Technology, Changchun University, Changchun, China
| | - Haixia Li
- School of Computer Science and Technology, Changchun University, Changchun, China
| | - Jing Qu
- School of Computer Science and Technology, Jilin University, Changchun, China
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Xizeng Zong
- Clinical Research Centre, Guangzhou First People’s Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong, China
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Yuchen Liu
- Department of Medicine, Boston University School of Medicine, Boston, MA, United States
| | - Zhejun Kuang
- School of Computer Science and Technology, Changchun University, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| |
Collapse
|
4
|
Meng D, Zhang S, Huang Y, Mao K, Han JDJ. Application of AI in biological age prediction. Curr Opin Struct Biol 2024; 85:102777. [PMID: 38310737 DOI: 10.1016/j.sbi.2024.102777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/12/2023] [Accepted: 01/15/2024] [Indexed: 02/06/2024]
Abstract
The development of anti-aging interventions requires quantitative measurement of biological age. Machine learning models, known as "aging clocks," are built by leveraging diverse aging biomarkers that vary across lifespan to predict biological age. In addition to traditional aging clocks harnessing epigenetic signatures derived from bulk samples, emerging technologies allow the biological age estimating at single-cell level to dissect cellular diversity in aging tissues. Moreover, imaging-based aging clocks are increasingly employed with the advantage of non-invasive measurement, making it suitable for large-scale human cohort studies. To fully capture the features in the ever-growing multi-modal and high-dimensional aging-related data and uncover disease associations, deep-learning based approaches, which are effective to learn complex and non-linear relationships without relying on pre-defined features, are increasingly applied. The use of big data and AI-based aging clocks has achieved high accuracy, interpretability and generalizability, guiding clinical applications to delay age-related diseases and extend healthy lifespans.
Collapse
Affiliation(s)
- Dawei Meng
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing 100871, China
| | - Shiqiang Zhang
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing 100871, China
| | - Yuanfang Huang
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing 100871, China
| | - Kehang Mao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing 100871, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing 100871, China.
| |
Collapse
|
5
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
6
|
Prosz A, Pipek O, Börcsök J, Palla G, Szallasi Z, Spisak S, Csabai I. Biologically informed deep learning for explainable epigenetic clocks. Sci Rep 2024; 14:1306. [PMID: 38225268 PMCID: PMC10789766 DOI: 10.1038/s41598-023-50495-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 12/20/2023] [Indexed: 01/17/2024] Open
Abstract
Ageing is often characterised by progressive accumulation of damage, and it is one of the most important risk factors for chronic disease development. Epigenetic mechanisms including DNA methylation could functionally contribute to organismal aging, however the key functions and biological processes may govern ageing are still not understood. Although age predictors called epigenetic clocks can accurately estimate the biological age of an individual based on cellular DNA methylation, their models have limited ability to explain the prediction algorithm behind and underlying key biological processes controlling ageing. Here we present XAI-AGE, a biologically informed, explainable deep neural network model for accurate biological age prediction across multiple tissue types. We show that XAI-AGE outperforms the first-generation age predictors and achieves similar results to deep learning-based models, while opening up the possibility to infer biologically meaningful insights of the activity of pathways and other abstract biological processes directly from the model.
Collapse
Affiliation(s)
- Aurel Prosz
- Danish Cancer Institute, Copenhagen, Denmark
| | - Orsolya Pipek
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Judit Börcsök
- Danish Cancer Institute, Copenhagen, Denmark
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
| | - Gergely Palla
- Department of Biological Physics, ELTE Eötvös Loránd University, Budapest, Hungary
- Health Services Management Training Centre, Semmelweis University, Budapest, Hungary
| | | | - Sandor Spisak
- Institute of Enzymology, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary.
| | - István Csabai
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| |
Collapse
|
7
|
Kalyakulina A, Yusipov I, Moskalev A, Franceschi C, Ivanchenko M. eXplainable Artificial Intelligence (XAI) in aging clock models. Ageing Res Rev 2024; 93:102144. [PMID: 38030090 DOI: 10.1016/j.arr.2023.102144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/07/2023] [Accepted: 11/23/2023] [Indexed: 12/01/2023]
Abstract
XAI is a rapidly progressing field of machine learning, aiming to unravel the predictions of complex models. XAI is especially required in sensitive applications, e.g. in health care, when diagnosis, recommendations and treatment choices might rely on the decisions made by artificial intelligence systems. AI approaches have become widely used in aging research as well, in particular, in developing biological clock models and identifying biomarkers of aging and age-related diseases. However, the potential of XAI here awaits to be fully appreciated. We discuss the application of XAI for developing the "aging clocks" and present a comprehensive analysis of the literature categorized by the focus on particular physiological systems.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Research Center for Trusted Artificial Intelligence, The Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow 109004, Russia; Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia.
| | - Igor Yusipov
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Research Center for Trusted Artificial Intelligence, The Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow 109004, Russia; Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia
| | - Alexey Moskalev
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia
| | - Claudio Franceschi
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia
| | - Mikhail Ivanchenko
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia
| |
Collapse
|
8
|
Shi L, Hai B, Kuang Z, Wang H, Zhao J. ResnetAge: A Resnet-Based DNA Methylation Age Prediction Method. Bioengineering (Basel) 2023; 11:34. [PMID: 38247911 PMCID: PMC10813502 DOI: 10.3390/bioengineering11010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/13/2023] [Accepted: 12/26/2023] [Indexed: 01/23/2024] Open
Abstract
Aging is a significant contributing factor to degenerative diseases such as cancer. The extent of DNA methylation in human cells indicates the aging process and screening for age-related methylation sites can be used to construct epigenetic clocks. Thereby, it can be a new aging-detecting marker for clinical diagnosis and treatments. Predicting the biological age of human individuals is conducive to the study of physical aging problems. Although many researchers have developed epigenetic clock prediction methods based on traditional machine learning and even deep learning, higher prediction accuracy is still required to match the clinical applications. Here, we proposed an epigenetic clock prediction method based on a Resnet neuro networks model named ResnetAge. The model accepts 22,278 CpG sites as a sample input, supporting both the Illumina 27K and 450K identification frameworks. It was trained using 32 public datasets containing multiple tissues such as whole blood, saliva, and mouth. The Mean Absolute Error (MAE) of the training set is 1.29 years, and the Median Absolute Deviation (MAD) is 0.98 years. The Mean Absolute Error (MAE) of the validation set is 3.24 years, and the Median Absolute Deviation (MAD) is 2.3 years. Our method has higher accuracy in age prediction in comparison with other methylation-based age prediction methods.
Collapse
Affiliation(s)
- Lijuan Shi
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Boquan Hai
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Zhejun Kuang
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Han Wang
- The Institution of Computational Biology of Northeast Normal University, Changchun 130000, China;
| | - Jian Zhao
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| |
Collapse
|
9
|
Hughes BK, Wallis R, Bishop CL. Yearning for machine learning: applications for the classification and characterisation of senescence. Cell Tissue Res 2023; 394:1-16. [PMID: 37016180 PMCID: PMC10558380 DOI: 10.1007/s00441-023-03768-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/05/2023] [Indexed: 04/06/2023]
Abstract
Senescence is a widely appreciated tumour suppressive mechanism, which acts as a barrier to cancer development by arresting cell cycle progression in response to harmful stimuli. However, senescent cell accumulation becomes deleterious in aging and contributes to a wide range of age-related pathologies. Furthermore, senescence has beneficial roles and is associated with a growing list of normal physiological processes including wound healing and embryonic development. Therefore, the biological role of senescent cells has become increasingly nuanced and complex. The emergence of sophisticated, next-generation profiling technologies, such as single-cell RNA sequencing, has accelerated our understanding of the heterogeneity of senescence, with distinct final cell states emerging within models as well as between cell types and tissues. In order to explore data sets of increasing size and complexity, the senescence field has begun to employ machine learning (ML) methodologies to probe these intricacies. Most notably, ML has been used to aid the classification of cells as senescent, as well as to characterise the final senescence phenotypes. Here, we provide a background to the principles of ML tasks, as well as some of the most commonly used methodologies from both traditional and deep ML. We focus on the application of these within the context of senescence research, by addressing the utility of ML for the analysis of data from different laboratory technologies (microscopy, transcriptomics, proteomics, methylomics), as well as the potential within senolytic drug discovery. Together, we aim to highlight both the progress and potential for the application of ML within senescence research.
Collapse
Affiliation(s)
- Bethany K Hughes
- Blizard Institute, Barts and The London Faculty of Medicine and Dentistry, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | - Ryan Wallis
- Blizard Institute, Barts and The London Faculty of Medicine and Dentistry, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | - Cleo L Bishop
- Blizard Institute, Barts and The London Faculty of Medicine and Dentistry, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK.
| |
Collapse
|
10
|
Yassi M, Chatterjee A, Parry M. Application of deep learning in cancer epigenetics through DNA methylation analysis. Brief Bioinform 2023; 24:bbad411. [PMID: 37985455 PMCID: PMC10661960 DOI: 10.1093/bib/bbad411] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/08/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
DNA methylation is a fundamental epigenetic modification involved in various biological processes and diseases. Analysis of DNA methylation data at a genome-wide and high-throughput level can provide insights into diseases influenced by epigenetics, such as cancer. Recent technological advances have led to the development of high-throughput approaches, such as genome-scale profiling, that allow for computational analysis of epigenetics. Deep learning (DL) methods are essential in facilitating computational studies in epigenetics for DNA methylation analysis. In this systematic review, we assessed the various applications of DL applied to DNA methylation data or multi-omics data to discover cancer biomarkers, perform classification, imputation and survival analysis. The review first introduces state-of-the-art DL architectures and highlights their usefulness in addressing challenges related to cancer epigenetics. Finally, the review discusses potential limitations and future research directions in this field.
Collapse
Affiliation(s)
- Maryam Yassi
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
- Honorary Professor, UPES University, Dehradun, India
| | - Matthew Parry
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Te Pūnaha Matatini Centre of Research Excellence, University of Auckland, Auckland, New Zealand
| |
Collapse
|
11
|
Martínez-Enguita D, Dwivedi SK, Jörnsten R, Gustafsson M. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures. Brief Bioinform 2023; 24:bbad293. [PMID: 37587790 PMCID: PMC10516364 DOI: 10.1093/bib/bbad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/25/2023] [Accepted: 07/29/2023] [Indexed: 08/18/2023] Open
Abstract
Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- David Martínez-Enguita
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Rebecka Jörnsten
- Department of Mathematical Sciences, Chalmers University of Technology, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| |
Collapse
|
12
|
Yuan T, Edelmann D, Fan Z, Alwers E, Kather JN, Brenner H, Hoffmeister M. Machine learning in the identification of prognostic DNA methylation biomarkers among patients with cancer: A systematic review of epigenome-wide studies. Artif Intell Med 2023; 143:102589. [PMID: 37673571 DOI: 10.1016/j.artmed.2023.102589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 04/19/2023] [Accepted: 04/30/2023] [Indexed: 09/08/2023]
Abstract
BACKGROUND DNA methylation biomarkers have great potential in improving prognostic classification systems for patients with cancer. Machine learning (ML)-based analytic techniques might help overcome the challenges of analyzing high-dimensional data in relatively small sample sizes. This systematic review summarizes the current use of ML-based methods in epigenome-wide studies for the identification of DNA methylation signatures associated with cancer prognosis. METHODS We searched three electronic databases including PubMed, EMBASE, and Web of Science for articles published until 2 January 2023. ML-based methods and workflows used to identify DNA methylation signatures associated with cancer prognosis were extracted and summarized. Two authors independently assessed the methodological quality of included studies by a seven-item checklist adapted from 'A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies (PROBAST)' and from the 'Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK). Different ML methods and workflows used in included studies were summarized and visualized by a sunburst chart, a bubble chart, and Sankey diagrams, respectively. RESULTS Eighty-three studies were included in this review. Three major types of ML-based workflows were identified. 1) unsupervised clustering, 2) supervised feature selection, and 3) deep learning-based feature transformation. For the three workflows, the most frequently used ML techniques were consensus clustering, least absolute shrinkage and selection operator (LASSO), and autoencoder, respectively. The systematic review revealed that the performance of these approaches has not been adequately evaluated yet and that methodological and reporting flaws were common in the identified studies using ML techniques. CONCLUSIONS There is great heterogeneity in ML-based methodological strategies used by epigenome-wide studies to identify DNA methylation markers associated with cancer prognosis. In theory, most existing workflows could not handle the high multi-collinearity and potentially non-linearity interactions in epigenome-wide DNA methylation data. Benchmarking studies are needed to compare the relative performance of various approaches for specific cancer types. Adherence to relevant methodological and reporting guidelines are urgently needed.
Collapse
Affiliation(s)
- Tanwei Yuan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Dominic Edelmann
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ziwen Fan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Elizabeth Alwers
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Medical Oncology, National Center of Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany; German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
13
|
Park S, Rehman MU, Ullah F, Tayara H, Chong KT. iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data. Bioinformatics 2023; 39:btad474. [PMID: 37555812 PMCID: PMC10444964 DOI: 10.1093/bioinformatics/btad474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/11/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open
Abstract
MOTIVATION The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.
Collapse
Affiliation(s)
- Sehi Park
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Farman Ullah
- College of Information Technology in the United Arab Emirates University (UAEU), Abu Dhabi 15551, UAE
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
| |
Collapse
|
14
|
Gedefaw L, Liu CF, Ip RKL, Tse HF, Yeung MHY, Yip SP, Huang CL. Artificial Intelligence-Assisted Diagnostic Cytology and Genomic Testing for Hematologic Disorders. Cells 2023; 12:1755. [PMID: 37443789 PMCID: PMC10340428 DOI: 10.3390/cells12131755] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/21/2023] [Accepted: 06/28/2023] [Indexed: 07/15/2023] Open
Abstract
Artificial intelligence (AI) is a rapidly evolving field of computer science that involves the development of computational programs that can mimic human intelligence. In particular, machine learning and deep learning models have enabled the identification and grouping of patterns within data, leading to the development of AI systems that have been applied in various areas of hematology, including digital pathology, alpha thalassemia patient screening, cytogenetics, immunophenotyping, and sequencing. These AI-assisted methods have shown promise in improving diagnostic accuracy and efficiency, identifying novel biomarkers, and predicting treatment outcomes. However, limitations such as limited databases, lack of validation and standardization, systematic errors, and bias prevent AI from completely replacing manual diagnosis in hematology. In addition, the processing of large amounts of patient data and personal information by AI poses potential data privacy issues, necessitating the development of regulations to evaluate AI systems and address ethical concerns in clinical AI systems. Nonetheless, with continued research and development, AI has the potential to revolutionize the field of hematology and improve patient outcomes. To fully realize this potential, however, the challenges facing AI in hematology must be addressed and overcome.
Collapse
Affiliation(s)
- Lealem Gedefaw
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Chia-Fei Liu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Rosalina Ka Ling Ip
- Department of Pathology, Pamela Youde Nethersole Eastern Hospital, Hong Kong, China; (R.K.L.I.); (H.-F.T.)
| | - Hing-Fung Tse
- Department of Pathology, Pamela Youde Nethersole Eastern Hospital, Hong Kong, China; (R.K.L.I.); (H.-F.T.)
| | - Martin Ho Yin Yeung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Chien-Ling Huang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| |
Collapse
|
15
|
Beaude A, Rafiee Vahid M, Augé F, Zehraoui F, Hanczar B. AttOmics: attention-based architecture for diagnosis and prognosis from omics data. Bioinformatics 2023; 39:i94-i102. [PMID: 37387182 DOI: 10.1093/bioinformatics/btad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient specific but are the same for all patients. RESULTS In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this article show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://forge.ibisc.univ-evry.fr/abeaude/AttOmics. TCGA data can be downloaded from the Genomic Data Commons Data Portal.
Collapse
Affiliation(s)
- Aurélien Beaude
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Milad Rafiee Vahid
- Sanofi R&D Data and Data Science, Artificial Intelligence & Deep Analytics, Omics Data Science, 450 Water Street, Cambridge, MA 02142, United States
| | - Franck Augé
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Farida Zehraoui
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| | - Blaise Hanczar
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| |
Collapse
|
16
|
Li T, Li Y, Zhu X, He Y, Wu Y, Ying T, Xie Z. Artificial intelligence in cancer immunotherapy: Applications in neoantigen recognition, antibody design and immunotherapy response prediction. Semin Cancer Biol 2023; 91:50-69. [PMID: 36870459 DOI: 10.1016/j.semcancer.2023.02.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/13/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Cancer immunotherapy is a method of controlling and eliminating tumors by reactivating the body's cancer-immunity cycle and restoring its antitumor immune response. The increased availability of data, combined with advancements in high-performance computing and innovative artificial intelligence (AI) technology, has resulted in a rise in the use of AI in oncology research. State-of-the-art AI models for functional classification and prediction in immunotherapy research are increasingly used to support laboratory-based experiments. This review offers a glimpse of the current AI applications in immunotherapy, including neoantigen recognition, antibody design, and prediction of immunotherapy response. Advancing in this direction will result in more robust predictive models for developing better targets, drugs, and treatments, and these advancements will eventually make their way into the clinical setting, pushing AI forward in the field of precision oncology.
Collapse
Affiliation(s)
- Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yupeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyi Zhu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Yao He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yanling Wu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Tianlei Ying
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China; Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
17
|
Sugino RP, Ohira M, Mansai SP, Kamijo T. Comparative epigenomics by machine learning approach for neuroblastoma. BMC Genomics 2022; 23:852. [PMID: 36572864 PMCID: PMC9793522 DOI: 10.1186/s12864-022-09061-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 12/02/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Neuroblastoma (NB) is the second most common pediatric solid tumor. Because the number of genetic mutations found in tumors are small, even in some patients with unfavorable NB, epigenetic variation is expected to play an important role in NB progression. DNA methylation is a major epigenetic mechanism, and its relationship with NB prognosis has been a concern. One limitation with the analysis of variation in DNA methylation is the lack of a suitable analytical model. Therefore, in this study, we performed a random forest (RF) analysis of the DNA methylome data of NB from multiple databases. RESULTS RF is a popular machine learning model owing to its simplicity, intuitiveness, and computational cost. RF analysis identified novel intermediate-risk patient groups with characteristic DNA methylation patterns within the low-risk group. Feature selection analysis based on probe annotation revealed that enhancer-annotated regions had strong predictive power, particularly for MYCN-amplified NBs. We developed a gene-based analytical model to identify candidate genes related to disease progression, such as PRDM8 and FAM13A-AS1. RF analysis revealed sufficient predictive power compared to other machine learning models. CONCLUSIONS RF is a useful tool for DNA methylome analysis in cancer epigenetic studies, and has potential to identify a novel cancer-related genes.
Collapse
Affiliation(s)
- Ryuichi P. Sugino
- grid.416695.90000 0000 8855 274XResearch Institute for Clinical Oncology, Saitama Cancer Center, Ina, Saitama, 362-0806 Japan
| | - Miki Ohira
- grid.416695.90000 0000 8855 274XResearch Institute for Clinical Oncology, Saitama Cancer Center, Ina, Saitama, 362-0806 Japan
| | - Sayaka P. Mansai
- grid.416695.90000 0000 8855 274XResearch Institute for Clinical Oncology, Saitama Cancer Center, Ina, Saitama, 362-0806 Japan
| | - Takehiko Kamijo
- grid.416695.90000 0000 8855 274XResearch Institute for Clinical Oncology, Saitama Cancer Center, Ina, Saitama, 362-0806 Japan ,grid.263023.60000 0001 0703 3735Laboratory of Tumor Molecular Biology, Department of Graduate School of Science and Engineering, Saitama University, Kita-Urawa, Saitama, Japan
| |
Collapse
|
18
|
Fryett JJ, Morris AP, Cordell HJ. Investigating the prediction of CpG methylation levels from SNP genotype data to help elucidate relationships between methylation, gene expression and complex traits. Genet Epidemiol 2022; 46:629-643. [PMID: 35930604 PMCID: PMC9804820 DOI: 10.1002/gepi.22496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/27/2022] [Accepted: 07/19/2022] [Indexed: 01/09/2023]
Abstract
As popularised by PrediXcan (and related methods), transcriptome-wide association studies (TWAS), in which gene expression is imputed from single-nucleotide polymorphism (SNP) genotypes and tested for association with a phenotype, are a popular approach for investigating the role of gene expression in complex traits. Like gene expression, DNA methylation is an important biological process and, being under genetic regulation, may be imputable from SNP genotypes. Here, we investigate prediction of CpG methylation levels from SNP genotype data to help elucidate relationships between methylation, gene expression and complex traits. We start by examining how well CpG methylation can be predicted from SNP genotypes, comparing three penalised regression approaches and examining whether changing the window size improves prediction accuracy. Although methylation at most CpG sites cannot be accurately predicted from SNP genotypes, for a subset it can be predicted well. We next apply our methylation prediction models (trained using the optimal method and window size) to carry out a methylome-wide association study (MWAS) of primary biliary cholangitis. We intersect the regions identified via MWAS with those identified via TWAS, providing insight into the interplay between CpG methylation, gene expression and disease status. We conclude that MWAS has the potential to improve understanding of biological mechanisms in complex traits.
Collapse
Affiliation(s)
- James J. Fryett
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Andrew P. Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal ResearchUniversity of ManchesterManchesterUK
| | - Heather J. Cordell
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| |
Collapse
|
19
|
de Lima Camillo LP, Lapierre LR, Singh R. A pan-tissue DNA-methylation epigenetic clock based on deep learning. NPJ AGING 2022. [PMCID: PMC9158789 DOI: 10.1038/s41514-022-00085-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
AbstractSeveral age predictors based on DNA methylation, dubbed epigenetic clocks, have been created in recent years, with the vast majority based on regularized linear regression. This study explores the improvement in the performance and interpretation of epigenetic clocks using deep learning. First, we gathered 142 publicly available data sets from several human tissues to develop AltumAge, a neural network framework that is a highly accurate and precise age predictor. Compared to ElasticNet, AltumAge performs better for within-data set and cross-data set age prediction, being particularly more generalizable in older ages and new tissue types. We then used deep learning interpretation methods to learn which methylation sites contributed to the final model predictions. We observe that while most important CpG sites are linearly related to age, some highly-interacting CpG sites can influence the relevance of such relationships. Using chromatin annotations, we show that the CpG sites with the highest contribution to the model predictions were related to gene regulatory regions in the genome, including proximity to CTCF binding sites. We also found age-related KEGG pathways for genes containing these CpG sites. Lastly, we performed downstream analyses of AltumAge to explore its applicability and compare its age acceleration with Horvath’s 2013 model. We show that our neural network approach predicts higher age acceleration for tumors, for cells that exhibit age-related changes in vitro, such as immune and mitochondrial dysfunction, and for samples from patients with multiple sclerosis, type 2 diabetes, and HIV, among other conditions. Altogether, our neural network approach provides significant improvement and flexibility compared to current epigenetic clocks for both performance and model interpretability.
Collapse
|
20
|
Li A, Koch Z, Ideker T. Epigenetic aging: Biological age prediction and informing a mechanistic theory of aging. J Intern Med 2022; 292:733-744. [PMID: 35726002 DOI: 10.1111/joim.13533] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Numerous studies have shown that epigenetic age-an individual's degree of aging based on patterns of DNA methylation-can be computed and is associated with an array of factors including diet, lifestyle, genetics, and disease. One can expect that still further associations will emerge with additional aging research, but to what end? Prediction of age was an important first step, but-in our view-the focus must shift from chasing increasingly accurate age computations to understanding the links between the epigenome and the mechanisms and physiological changes of aging. Here, we outline emerging areas of epigenetic aging research that prioritize biological understanding and clinical application. First, we survey recent progress in epigenetic clocks, which are beginning to predict not only chronological age but aging outcomes such as all-cause mortality and onset of disease, or which integrate aging signals across multiple biological processes. Second, we discuss research that exemplifies how investigation of the epigenome is building a mechanistic theory of aging and informing clinical practice. Such examples include identifying methylation sites and the genes most strongly predictive of aging-a subset of which have shown strong potential as biomarkers of neurodegenerative disease and cancer; relating epigenetic clock predictions to hallmarks of aging; and using longitudinal studies of DNA methylation to characterize human disease, resulting in the discovery of epigenetic indications of type 1 diabetes and the propensity for psychotic experiences.
Collapse
Affiliation(s)
- Adam Li
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Zane Koch
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
21
|
Chen L, Saykin AJ, Yao B, Zhao F. Multi-task deep autoencoder to predict Alzheimer's disease progression using temporal DNA methylation data in peripheral blood. Comput Struct Biotechnol J 2022; 20:5761-5774. [PMID: 36756173 PMCID: PMC9619306 DOI: 10.1016/j.csbj.2022.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/10/2022] [Accepted: 10/11/2022] [Indexed: 11/03/2022] Open
Abstract
Traditional approaches for diagnosing Alzheimer's disease (AD) such as brain imaging and cerebrospinal fluid are invasive and expensive. It is desirable to develop a useful diagnostic tool by exploiting biomarkers obtained from peripheral tissues due to their noninvasive and easily accessible characteristics. However, the capacity of using DNA methylation data in peripheral blood for predicting AD progression is rarely known. It is also challenging to develop an efficient prediction model considering the complex and high-dimensional DNA methylation data in a longitudinal study. Here, we develop two multi-task deep autoencoders, which are based on the convolutional autoencoder and long short-term memory autoencoder to learn the compressed feature representation by jointly minimizing the reconstruction error and maximizing the prediction accuracy. By benchmarking on longitudinal DNA methylation data collected from the peripheral blood in Alzheimer's Disease Neuroimaging Initiative, we demonstrate that the proposed multi-task deep autoencoders outperform state-of-the-art machine learning approaches for both predicting AD progression and reconstructing the temporal DNA methylation profiles. In addition, the proposed multi-task deep autoencoders can predict AD progression accurately using only the historical DNA methylation data and the performance is further improved by including all temporal DNA methylation data. Availability:: https://github.com/lichen-lab/MTAE.
Collapse
Affiliation(s)
- Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States
| | - Andrew J. Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Bing Yao
- Department of Human Genetics, Emory University, Atlanta, GA 30322, United States
| | - Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States
| | - Alzheimer’s Disease Neuroimaging Initiative (ADNI)
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, United States
- Department of Human Genetics, Emory University, Atlanta, GA 30322, United States
| |
Collapse
|
22
|
Kalyakulina A, Yusipov I, Bacalini MG, Franceschi C, Vedunova M, Ivanchenko M. Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI. Gigascience 2022; 11:giac097. [PMID: 36259657 PMCID: PMC9718659 DOI: 10.1093/gigascience/giac097] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/01/2022] [Accepted: 09/15/2022] [Indexed: 07/25/2023] Open
Abstract
BACKGROUND DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. CONCLUSIONS We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson's disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Correspondence author. Alena Kalyakulina, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Gagarin avenue 22, Nizhny Novgorod 603022, Russia. E-mail:
| | | | | | - Claudio Franceschi
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Maria Vedunova
- Institute of Biology and Biomedicine, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Mikhail Ivanchenko
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| |
Collapse
|
23
|
Abstract
Condition monitoring of high voltage apparatus is of much importance for the maintenance of electric power systems. Whether it is detecting faults or partial discharges that take place in high voltage equipment, or detecting contamination and degradation of outdoor insulators, deep learning which is a branch of machine learning has been extensively investigated. Instead of using hand-crafted manual features as an input for the traditional machine learning algorithms, deep learning algorithms use raw data as the input where the feature extraction stage is integrated in the learning stage, resulting in a more automated process. This is the main advantage of using deep learning instead of traditional machine learning techniques. This paper presents a review of the recent literature on the application of deep learning techniques in monitoring high voltage apparatus such as GIS, transformers, cables, rotating machines, and outdoor insulators.
Collapse
|
24
|
Jeong Y, de Andrade E Sousa LB, Thalmeier D, Toth R, Ganslmeier M, Breuer K, Plass C, Lutsik P. Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. Brief Bioinform 2022; 23:6632618. [PMID: 35794707 PMCID: PMC9294431 DOI: 10.1093/bib/bbac248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 05/18/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022] Open
Abstract
DNA methylation analysis by sequencing is becoming increasingly popular, yielding methylomes at single-base pair and single-molecule resolution. It has tremendous potential for cell-type heterogeneity analysis using intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, systematic evaluation has not been performed yet. Here, we thoroughly benchmark six previously published methods: Bayesian epiallele detection, DXM, PRISM, csmFinder+coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman, as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation, thus each was individually assessed. With this elaborate evaluation, we aimed to establish which method achieves the highest performance in different scenarios of synthetic bulk samples. We found that cell-type deconvolution performance is influenced by different factors depending on the number of cell types within the mixture. Finally, we propose a best-practice deconvolution strategy for sequencing data and point out limitations that need to be handled. Array-based methods—both reference-based and reference-free—generally outperformed sequencing-based methods, despite the absence of read-level information. This implies that the current sequencing-based methods still struggle with correctly identifying cell-type-specific signals and eliminating confounding methylation patterns, which needs to be handled in future studies.
Collapse
Affiliation(s)
- Yunhee Jeong
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.,Faculty of Mathematics and Informatics, Heidelberg University, Im Neuenheimer Feld 205, 69120, Heidelberg, Germany
| | | | - Dominik Thalmeier
- Helmholtz AI, Helmholtz Zentrum München, Ingolstädter Landstraβ e 1, 85764, Neuherberg, Germany
| | - Reka Toth
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Marlene Ganslmeier
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Kersten Breuer
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Christoph Plass
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| |
Collapse
|
25
|
Crawford J, Christensen BC, Chikina M, Greene CS. Widespread redundancy in -omics profiles of cancer mutation states. Genome Biol 2022; 23:137. [PMID: 35761387 PMCID: PMC9238138 DOI: 10.1186/s13059-022-02705-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal remains unclear. RESULTS We consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem that presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. From the TCGA Pan-Cancer Atlas that contains genetic alteration data, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of genes recurrently mutated in cancer, RNA sequencing tends to be the most effective predictor of mutation state. We find that one or more other data types for many of the genes are approximately equally effective predictors. Performance is more variable between mutations than that between data types for the same mutation, and there is little difference between the top data types. We also find that combining data types into a single multi-omics model provides little or no improvement in predictive ability over the best individual data type. CONCLUSIONS Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, there are often multiple -omics types that can serve as effective readouts, although gene expression seems to be a reasonable default option.
Collapse
Affiliation(s)
- Jake Crawford
- grid.25879.310000 0004 1936 8972Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Brock C. Christensen
- grid.254880.30000 0001 2179 2404Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Lebanon, NH USA ,grid.254880.30000 0001 2179 2404Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, NH USA
| | - Maria Chikina
- grid.21925.3d0000 0004 1936 9000Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA
| | - Casey S. Greene
- grid.430503.10000 0001 0703 675XDepartment of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO USA ,grid.430503.10000 0001 0703 675XCenter for Health AI, University of Colorado School of Medicine, Aurora, CO USA
| |
Collapse
|
26
|
Lin Y, Li H, Xiao X, Zhang L, Wang K, Zhao J, Wang M, Zheng F, Zhang M, Yang W, Han J, Yu R. DAISM-DNN XMBD: Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks. PATTERNS (NEW YORK, N.Y.) 2022; 3:100440. [PMID: 35510186 PMCID: PMC9058910 DOI: 10.1016/j.patter.2022.100440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 09/29/2021] [Accepted: 01/06/2022] [Indexed: 12/31/2022]
Abstract
Understanding the immune cell abundance of cancer and other disease-related tissues has an important role in guiding disease treatments. Computational cell type proportion estimation methods have been previously developed to derive such information from bulk RNA sequencing data. Unfortunately, our results show that the performance of these methods can be seriously plagued by the mismatch between training data and real-world data. To tackle this issue, we propose the DAISM-DNNXMBD (XMBD: Xiamen Big Data, a biomedical open software initiative in the National Institute for Data Science in Health and Medicine, Xiamen University, China.) (denoted as DAISM-DNN) pipeline that trains a deep neural network (DNN) with dataset-specific training data populated from a certain amount of calibrated samples using DAISM, a novel data augmentation method with an in silico mixing strategy. The evaluation results demonstrate that the DAISM-DNN pipeline outperforms other existing methods consistently and substantially for all the cell types under evaluation in real-world datasets. We propose a data augmentation method (DAISM) for DNN-based cell type deconvolution DAISM-DNN enables accurate cell type deconvolution with dataset-specific training data DAISM-DNN is robust to random errors in calibration samples Trained DAISM-DNN model is reusable across biomedical experiments following same SOP
Computational cell type deconvolution methods were developed to understand the cellular heterogeneity in disease-related tissues from bulk RNA-seq data. Due to the presence of strong batch effects, the performance of existing methods could fluctuate greatly when applied to different datasets even with the latest development in batch normalization or platform-agnostic signature designs. To tackle this issue, we proposed a DNN-based cell abundance estimation method with dataset-specific training data populated from a certain number of calibrated samples from a target dataset using DAISM, a data augmentation method using an in silico mixing strategy. DAISM-DNN enables accurate cell type proportions prediction and is robust to random errors in the ground truth cell type proportions of calibration samples. Importantly, we showed that with strict SOPs, it is possible to create a “train once, reuse many times” DAISM-DNN model for multiple biomedical experiments without the need for retraining.
Collapse
Affiliation(s)
- Yating Lin
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Haojun Li
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xu Xiao
- School of Informatics, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Lei Zhang
- School of Life Science, Xiamen University, Xiamen 361102, China
| | - Kejia Wang
- School of Medicine, Xiamen University, Xiamen 361102, China
| | | | - Minshu Wang
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China.,School of Medicine, Xiamen University, Xiamen 361102, China
| | | | - Minwei Zhang
- Department of Critical Care Medicine, The First Affiliated Hospital of Xiamen University, Xiamen 361003, China
| | | | - Jiahuai Han
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China.,School of Life Science, Xiamen University, Xiamen 361102, China.,Research Unit of Cellular Stress of CAMS, Cancer Research Center of Xiamen University, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Rongshan Yu
- School of Informatics, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China.,Aginome Scientific, Xiamen, 361005, China
| |
Collapse
|
27
|
Cristoferi I, Giacon TA, Boer K, van Baardwijk M, Neri F, Campisi M, Kimenai HJAN, Clahsen-van Groningen MC, Pavanello S, Furian L, Minnee RC. The applications of DNA methylation as a biomarker in kidney transplantation: a systematic review. Clin Epigenetics 2022; 14:20. [PMID: 35130936 PMCID: PMC8822833 DOI: 10.1186/s13148-022-01241-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 01/27/2022] [Indexed: 12/27/2022] Open
Abstract
Background Although kidney transplantation improves patient survival and quality of life, long-term results are hampered by both immune- and non-immune-mediated complications. Current biomarkers of post-transplant complications, such as allograft rejection, chronic renal allograft dysfunction, and cutaneous squamous cell carcinoma, have a suboptimal predictive value. DNA methylation is an epigenetic modification that directly affects gene expression and plays an important role in processes such as ischemia/reperfusion injury, fibrosis, and alloreactive immune response. Novel techniques can quickly assess the DNA methylation status of multiple loci in different cell types, allowing a deep and interesting study of cells’ activity and function. Therefore, DNA methylation has the potential to become an important biomarker for prediction and monitoring in kidney transplantation.
Purpose of the study The aim of this study was to evaluate the role of DNA methylation as a potential biomarker of graft survival and complications development in kidney transplantation. Material and Methods A systematic review of several databases has been conducted. The Newcastle–Ottawa scale and the Jadad scale have been used to assess the risk of bias for observational and randomized studies, respectively.
Results Twenty articles reporting on DNA methylation as a biomarker for kidney transplantation were included, all using DNA methylation for prediction and monitoring. DNA methylation pattern alterations in cells isolated from different tissues, such as kidney biopsies, urine, and blood, have been associated with ischemia–reperfusion injury and chronic renal allograft dysfunction. These alterations occurred in different and specific loci. DNA methylation status has also proved to be important for immune response modulation, having a crucial role in regulatory T cell definition and activity. Research also focused on a better understanding of the role of this epigenetic modification assessment for regulatory T cells isolation and expansion for future tolerance induction-oriented therapies. Conclusions Studies included in this review are heterogeneous in study design, biological samples, and outcome. More coordinated investigations are needed to affirm DNA methylation as a clinically relevant biomarker important for prevention, monitoring, and intervention. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01241-7.
Collapse
Affiliation(s)
- Iacopo Cristoferi
- Division of HPB and Transplant Surgery, Department of Surgery, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands. .,Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands. .,Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.
| | - Tommaso Antonio Giacon
- Kidney and Pancreas Transplantation Unit, Department of Surgical, Oncological and Gastroenterological Sciences, Padua University Hospital, Via Giustiniani 2, 35128, Padua, Italy.,Occupational Medicine, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, Padua University, Via Giustiniani 2, 35128, Padua, Italy.,Environmental and Respiratory Physiology Laboratory, Department of Biomedical Sciences, Padua University, Via Marzolo 3, 35131, Padua, Italy.,Institute of Anaesthesia and Intensive Care, Department of Medicine - DIMED, Padua University Hospital, Via Cesare Battisti 267, 35128, Padua, Italy
| | - Karin Boer
- Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Division of Nephrology and Transplantation, Department of Internal Medicine, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, The Netherlands
| | - Myrthe van Baardwijk
- Division of HPB and Transplant Surgery, Department of Surgery, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands
| | - Flavia Neri
- Kidney and Pancreas Transplantation Unit, Department of Surgical, Oncological and Gastroenterological Sciences, Padua University Hospital, Via Giustiniani 2, 35128, Padua, Italy
| | - Manuela Campisi
- Occupational Medicine, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, Padua University, Via Giustiniani 2, 35128, Padua, Italy
| | - Hendrikus J A N Kimenai
- Division of HPB and Transplant Surgery, Department of Surgery, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands
| | - Marian C Clahsen-van Groningen
- Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Institute of Experimental Medicine and Systems Biology, RWTH Aachen University, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Sofia Pavanello
- Occupational Medicine, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, Padua University, Via Giustiniani 2, 35128, Padua, Italy
| | - Lucrezia Furian
- Kidney and Pancreas Transplantation Unit, Department of Surgical, Oncological and Gastroenterological Sciences, Padua University Hospital, Via Giustiniani 2, 35128, Padua, Italy
| | - Robert C Minnee
- Division of HPB and Transplant Surgery, Department of Surgery, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands.,Erasmus MC Transplant Institute, Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, 3015GD, Rotterdam, the Netherlands
| |
Collapse
|
28
|
Chow YL, Singh S, Carpenter AE, Way GP. Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic. PLoS Comput Biol 2022; 18:e1009888. [PMID: 35213530 PMCID: PMC8906577 DOI: 10.1371/journal.pcbi.1009888] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/09/2022] [Accepted: 02/01/2022] [Indexed: 01/13/2023] Open
Abstract
A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.
Collapse
Affiliation(s)
- Yuen Ler Chow
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Brookline High School, Brookline, Massachusetts, United States of America
| | - Shantanu Singh
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Anne E. Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Gregory P. Way
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
29
|
DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8393498. [PMID: 35111213 PMCID: PMC8803417 DOI: 10.1155/2022/8393498] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 11/20/2021] [Accepted: 12/22/2021] [Indexed: 12/28/2022]
Abstract
Purpose. Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. Methods. The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson’s correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. Results. Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. Conclusion. These results showed that the proposed model can predict age for healthy and diseased samples.
Collapse
|
30
|
Kanapeckaitė A, Burokienė N, Mažeikienė A, Cottrell GS, Widera D. Biophysics is reshaping our perception of the epigenome: from DNA-level to high-throughput studies. BIOPHYSICAL REPORTS 2021; 1:100028. [PMID: 36425454 PMCID: PMC9680810 DOI: 10.1016/j.bpr.2021.100028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 09/24/2021] [Indexed: 06/16/2023]
Abstract
Epigenetic research holds great promise to advance our understanding of biomarkers and regulatory processes in health and disease. An increasing number of new approaches, ranging from molecular to biophysical analyses, enable identifying epigenetic changes on the level of a single gene or the whole epigenome. The aim of this review is to highlight how the field is shifting from completely molecular-biology-driven solutions to multidisciplinary strategies including more reliance on biophysical analysis tools. Biophysics not only offers technical advancements in imaging or structure analysis but also helps to explore regulatory interactions. New computational methods are also being developed to meet the demand of growing data volumes and their processing. Therefore, it is important to capture these new directions in epigenetics from a biophysical perspective and discuss current challenges as well as multiple applications of biophysical methods and tools. Specifically, we gradually introduce different biophysical research methods by first considering the DNA-level information and eventually higher-order chromatin structures. Moreover, we aim to highlight that the incorporation of bioinformatics, machine learning, and artificial intelligence into biophysical analysis allows gaining new insights into complex epigenetic processes. The gained understanding has already proven useful in translational and clinical research providing better patient stratification options or new therapeutic insights. Together, this offers a better readiness to transform bench-top experiments into industrial high-throughput applications with a possibility to employ developed methods in clinical practice and diagnostics.
Collapse
Affiliation(s)
- Austė Kanapeckaitė
- Algorithm379, Laisvės g. 7, LT 12007, Vilnius, Lithuania
- Reading School of Pharmacy, Whiteknights, Reading, UK, RG6 6UB
| | - Neringa Burokienė
- Clinics of Internal Diseases, Family Medicine and Oncology, Institute of Clinical Medicine, Faculty of Medicine, Vilnius University, M. K. Čiurlionio str. 21/27, LT-03101 Vilnius, Lithuania
| | - Asta Mažeikienė
- Department of Physiology, Biochemistry, Microbiology and Laboratory Medicine, Institute of Biomedical Sciences, Faculty of Medicine, M. K. Čiurlionio str. 21/27, LT-03101 Vilnius, Lithuania
| | | | - Darius Widera
- Reading School of Pharmacy, Whiteknights, Reading, UK, RG6 6UB
| |
Collapse
|
31
|
Deep Learning for Human Disease Detection, Subtype Classification, and Treatment Response Prediction Using Epigenomic Data. Biomedicines 2021; 9:biomedicines9111733. [PMID: 34829962 PMCID: PMC8615388 DOI: 10.3390/biomedicines9111733] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/26/2021] [Accepted: 11/17/2021] [Indexed: 12/25/2022] Open
Abstract
Deep learning (DL) is a distinct class of machine learning that has achieved first-class performance in many fields of study. For epigenomics, the application of DL to assist physicians and scientists in human disease-relevant prediction tasks has been relatively unexplored until very recently. In this article, we critically review published studies that employed DL models to predict disease detection, subtype classification, and treatment responses, using epigenomic data. A comprehensive search on PubMed, Scopus, Web of Science, Google Scholar, and arXiv.org was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Among 1140 initially identified publications, we included 22 articles in our review. DNA methylation and RNA-sequencing data are most frequently used to train the predictive models. The reviewed models achieved a high accuracy ranged from 88.3% to 100.0% for disease detection tasks, from 69.5% to 97.8% for subtype classification tasks, and from 80.0% to 93.0% for treatment response prediction tasks. We generated a workflow to develop a predictive model that encompasses all steps from first defining human disease-related tasks to finally evaluating model performance. DL holds promise for transforming epigenomic big data into valuable knowledge that will enhance the development of translational epigenomics.
Collapse
|
32
|
De Waele G, Clauwaert J, Menschaert G, Waegeman W. CpG Transformer for imputation of single-cell methylomes. Bioinformatics 2021; 38:597-603. [PMID: 34718418 PMCID: PMC8756163 DOI: 10.1093/bioinformatics/btab746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 10/19/2021] [Accepted: 10/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gaetan De Waele
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent 9000, Belgium
| | - Jim Clauwaert
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent 9000, Belgium
| | - Gerben Menschaert
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent 9000, Belgium
| | | |
Collapse
|
33
|
Abstract
High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines.
Collapse
Affiliation(s)
- David S Watson
- Department of Statistical Science, University College London, London, UK.
| |
Collapse
|
34
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
35
|
Huang K, Xiao C, Glass LM, Critchlow CW, Gibson G, Sun J. Machine learning applications for therapeutic tasks with genomics data. PATTERNS (NEW YORK, N.Y.) 2021; 2:100328. [PMID: 34693370 PMCID: PMC8515011 DOI: 10.1016/j.patter.2021.100328] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.
Collapse
Affiliation(s)
- Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Cao Xiao
- Amplitude, San Francisco, CA 94105, USA
| | - Lucas M. Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | | | - Greg Gibson
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jimeng Sun
- Computer Science Department and Carle's Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL 61820, USA
| |
Collapse
|
36
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 190] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
37
|
Simpson DJ, Chandra T. Epigenetic age prediction. Aging Cell 2021; 20:e13452. [PMID: 34415665 PMCID: PMC8441394 DOI: 10.1111/acel.13452] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 12/14/2022] Open
Abstract
Advanced age is the main common risk factor for cancer, cardiovascular disease and neurodegeneration. Yet, more is known about the molecular basis of any of these groups of diseases than the changes that accompany ageing itself. Progress in molecular ageing research was slow because the tools predicting whether someone aged slowly or fast (biological age) were unreliable. To understand ageing as a risk factor for disease and to develop interventions, the molecular ageing field needed a quantitative measure; a clock for biological age. Over the past decade, a number of age predictors utilising DNA methylation have been developed, referred to as epigenetic clocks. While they appear to estimate biological age, it remains unclear whether the methylation changes used to train the clocks are a reflection of other underlying cellular or molecular processes, or whether methylation itself is involved in the ageing process. The precise aspects of ageing that the epigenetic clocks capture remain hidden and seem to vary between predictors. Nonetheless, the use of epigenetic clocks has opened the door towards studying biological ageing quantitatively, and new clocks and applications, such as forensics, appear frequently. In this review, we will discuss the range of epigenetic clocks available, their strengths and weaknesses, and their applicability to various scientific queries.
Collapse
Affiliation(s)
- Daniel J. Simpson
- MRC Human Genetics UnitMRC Institute of Genetics and Molecular MedicineUniversity of EdinburghEdinburghUK
| | - Tamir Chandra
- MRC Human Genetics UnitMRC Institute of Genetics and Molecular MedicineUniversity of EdinburghEdinburghUK
| |
Collapse
|
38
|
Levy JJ, Chen Y, Azizgolshani N, Petersen CL, Titus AJ, Moen EL, Vaickus LJ, Salas LA, Christensen BC. MethylSPWNet and MethylCapsNet: Biologically Motivated Organization of DNAm Neural Networks, Inspired by Capsule Networks. NPJ Syst Biol Appl 2021; 7:33. [PMID: 34417465 PMCID: PMC8379254 DOI: 10.1038/s41540-021-00193-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 07/01/2021] [Indexed: 02/07/2023] Open
Abstract
DNA methylation (DNAm) alterations have been heavily implicated in carcinogenesis and the pathophysiology of diseases through upstream regulation of gene expression. DNAm deep-learning approaches are able to capture features associated with aging, cell type, and disease progression, but lack incorporation of prior biological knowledge. Here, we present modular, user-friendly deep-learning methodology and software, MethylCapsNet and MethylSPWNet, that group CpGs into biologically relevant capsules-such as gene promoter context, CpG island relationship, or user-defined groupings-and relate them to diagnostic and prognostic outcomes. We demonstrate these models' utility on 3,897 individuals in the classification of central nervous system (CNS) tumors. MethylCapsNet and MethylSPWNet provide an opportunity to increase DNAm deep-learning analyses' interpretability by enabling a flexible organization of DNAm data into biologically relevant capsules.
Collapse
Affiliation(s)
- Joshua J Levy
- Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
- Emerging Diagnostic and Investigative Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.
| | - Youdinghuan Chen
- Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Nasim Azizgolshani
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Curtis L Petersen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, NH, USA
| | - Alexander J Titus
- Department of Life Sciences, University of New Hampshire, Manchester, NH, USA
| | - Erika L Moen
- The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, NH, USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Louis J Vaickus
- Emerging Diagnostic and Investigative Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| | - Lucas A Salas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Brock C Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| |
Collapse
|
39
|
Barefoot ME, Loyfer N, Kiliti AJ, McDeed AP, Kaplan T, Wellstein A. Detection of Cell Types Contributing to Cancer From Circulating, Cell-Free Methylated DNA. Front Genet 2021; 12:671057. [PMID: 34386036 PMCID: PMC8353442 DOI: 10.3389/fgene.2021.671057] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/17/2021] [Indexed: 12/24/2022] Open
Abstract
Detection of cellular changes in tissue biopsies has been the basis for cancer diagnostics. However, tissue biopsies are invasive and limited by inaccuracies due to sampling locations, restricted sampling frequency, and poor representation of tissue heterogeneity. Liquid biopsies are emerging as a complementary approach to traditional tissue biopsies to detect dynamic changes in specific cell populations. Cell-free DNA (cfDNA) fragments released into the circulation from dying cells can be traced back to the tissues and cell types they originated from using DNA methylation, an epigenetic regulatory mechanism that is highly cell-type specific. Decoding changes in the cellular origins of cfDNA over time can reveal altered host tissue homeostasis due to local cancer invasion and metastatic spread to distant organs as well as treatment responses. In addition to host-derived cfDNA, changes in cancer cells can be detected from cell-free, circulating tumor DNA (ctDNA) by monitoring DNA mutations carried by cancer cells. Here, we will discuss computational approaches to identify and validate robust biomarkers of changed tissue homeostasis using cell-free, methylated DNA in the circulation. We highlight studies performing genome-wide profiling of cfDNA methylation and those that combine genetic and epigenetic markers to further identify cell-type specific signatures. Finally, we discuss opportunities and current limitations of these approaches for implementation in clinical oncology.
Collapse
Affiliation(s)
- Megan E. Barefoot
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, United States
| | - Netanel Loyfer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Amber J. Kiliti
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, United States
- Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, Washington, DC, United States
| | - A. Patrick McDeed
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University, Washington, DC, United States
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Anton Wellstein
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, United States
| |
Collapse
|
40
|
Chaudhari M, Thapa N, Roy K, Newman RH, Saigo H, B K C D. DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins. Mol Omics 2021; 16:448-454. [PMID: 32555810 DOI: 10.1039/d0mo00025f] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Methylation, which is one of the most prominent post-translational modifications on proteins, regulates many important cellular functions. Though several model-based methylation site predictors have been reported, all existing methods employ machine learning strategies, such as support vector machines and random forest, to predict sites of methylation based on a set of "hand-selected" features. As a consequence, the subsequent models may be biased toward one set of features. Moreover, due to the large number of features, model development can often be computationally expensive. In this paper, we propose an alternative approach based on deep learning to predict arginine methylation sites. Our model, which we termed DeepRMethylSite, is computationally less expensive than traditional feature-based methods while eliminating potential biases that can arise through features selection. Based on independent testing on our dataset, DeepRMethylSite achieved efficiency scores of 68%, 82% and 0.51 with respect to sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Importantly, in side-by-side comparisons with other state-of-the-art methylation site predictors, our method performs on par or better in all scoring metrics tested.
Collapse
Affiliation(s)
- Meenal Chaudhari
- Department of Computational Science and Engineering, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Niraj Thapa
- Department of Computational Science and Engineering, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Kaushik Roy
- Department of Computer Science, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Robert H Newman
- Department of Biology, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Hiroto Saigo
- Department of Informatics, Kyushu University, Fukuoka 819-0395, Japan
| | - Dukka B K C
- Electrical Engineering and Computer Science Department, Wichita State University, Wichita, KS 67260, USA.
| |
Collapse
|
41
|
Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci Rep 2021; 11:2641. [PMID: 33514769 PMCID: PMC7846764 DOI: 10.1038/s41598-021-81773-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/11/2021] [Indexed: 12/16/2022] Open
Abstract
For complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.
Collapse
|
42
|
Santoni D, Pignotti D, Vergni D. A genome-wide study on differential methylation in different cancers using TCGA database. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
43
|
Smyth LJ, Patterson CC, Swan EJ, Maxwell AP, McKnight AJ. DNA Methylation Associated With Diabetic Kidney Disease in Blood-Derived DNA. Front Cell Dev Biol 2020; 8:561907. [PMID: 33178681 PMCID: PMC7593403 DOI: 10.3389/fcell.2020.561907] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 09/15/2020] [Indexed: 12/23/2022] Open
Abstract
A subset of individuals with type 1 diabetes will develop diabetic kidney disease (DKD). DKD is heritable and large-scale genome-wide association studies have begun to identify genetic factors that influence DKD. Complementary to genetic factors, we know that a person’s epigenetic profile is also altered with DKD. This study reports analysis of DNA methylation, a major epigenetic feature, evaluating methylome-wide loci for association with DKD. Unique features (n = 485,577; 482,421 CpG probes) were evaluated in blood-derived DNA from carefully phenotyped White European individuals diagnosed with type 1 diabetes with (cases) or without (controls) DKD (n = 677 samples). Explicitly, 150 cases were compared to 100 controls using the 450K array, with subsequent analysis using data previously generated for a further 96 cases and 96 controls on the 27K array, and de novo methylation data generated for replication in 139 cases and 96 controls. Following stringent quality control, raw data were quantile normalized and beta values calculated to reflect the methylation status at each site. The difference in methylation status was evaluated between cases and controls; resultant P-values for array-based data were adjusted for multiple testing. Genes with significantly increased (hypermethylated) and/or decreased (hypomethylated) levels of DNA methylation were considered for biological relevance by functional enrichment analysis using KEGG pathways. Twenty-two loci demonstrated statistically significant fold changes associated with DKD and additional support for these associated loci was sought using independent samples derived from patients recruited with similar inclusion criteria. Markers associated with CCNL1 and ZNF187 genes are supported as differentially regulated loci (P < 10–8), with evidence also presented for AFF3, which has been identified from a meta-analysis and subsequent replication of genome-wide association studies. Further supporting evidence for differential gene expression in CCNL1 and ZNF187 is presented from kidney biopsy and blood-derived RNA in people with and without kidney disease from NephroSeq. Evidence confirming that methylation sites influence the development of DKD may aid risk prediction tools and stimulate research to identify epigenomic therapies which might be clinically useful for this disease.
Collapse
Affiliation(s)
- Laura J Smyth
- Centre for Public Health, Queen's University Belfast, Belfast, United Kingdom
| | | | - Elizabeth J Swan
- Centre for Public Health, Queen's University Belfast, Belfast, United Kingdom
| | - Alexander P Maxwell
- Centre for Public Health, Queen's University Belfast, Belfast, United Kingdom.,Regional Nephrology Unit, Belfast City Hospital, Belfast, United Kingdom
| | - Amy Jayne McKnight
- Centre for Public Health, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
44
|
Levy JJ, O'Malley AJ. Don't dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med Res Methodol 2020; 20:171. [PMID: 32600277 PMCID: PMC7325087 DOI: 10.1186/s12874-020-01046-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/10/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Machine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each. METHODS We present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions. RESULTS Preliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output. CONCLUSIONS When a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.
Collapse
Affiliation(s)
- Joshua J Levy
- Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, USA.
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, USA.
- Department of Pathology, Geisel School of Medicine at Dartmouth, Hanover, USA.
| | - A James O'Malley
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, USA
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth, Hanover, USA
| |
Collapse
|