151
|
Wang XW, Wang T, Schaub DP, Chen C, Sun Z, Ke S, Hecker J, Maaser-Hecker A, Zeleznik OA, Zeleznik R, Litonjua AA, DeMeo DL, Lasky-Su J, Silverman EK, Liu YY, Weiss ST. Benchmarking omics-based prediction of asthma development in children. Respir Res 2023; 24:63. [PMID: 36842969 PMCID: PMC9969629 DOI: 10.1186/s12931-023-02368-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 02/16/2023] [Indexed: 02/27/2023] Open
Abstract
BACKGROUND Asthma is a heterogeneous disease with high morbidity. Advancement in high-throughput multi-omics approaches has enabled the collection of molecular assessments at different layers, providing a complementary perspective of complex diseases. Numerous computational methods have been developed for the omics-based patient classification or disease outcome prediction. Yet, a systematic benchmarking of those methods using various combinations of omics data for the prediction of asthma development is still lacking. OBJECTIVE We aimed to investigate the computational methods in disease status prediction using multi-omics data. METHOD We systematically benchmarked 18 computational methods using all the 63 combinations of six omics data (GWAS, miRNA, mRNA, microbiome, metabolome, DNA methylation) collected in The Vitamin D Antenatal Asthma Reduction Trial (VDAART) cohort. We evaluated each method using standard performance metrics for each of the 63 omics combinations. RESULTS Our results indicate that overall Logistic Regression, Multi-Layer Perceptron, and MOGONET display superior performance, and the combination of transcriptional, genomic and microbiome data achieves the best prediction. Moreover, we find that including the clinical data can further improve the prediction performance for some but not all the omics combinations. CONCLUSIONS Specific omics combinations can reach the optimal prediction of asthma development in children. And certain computational methods showed superior performance than other methods.
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Tong Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Darius P Schaub
- Department of Mathematics, University of Hamburg, 21109, Hamburg, Germany
| | - Can Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Zheng Sun
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Shanlin Ke
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Julian Hecker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Anna Maaser-Hecker
- Genetics and Aging Research Unit, Department of Neurology, McCance Center for Brain Health, Mass General Institute for Neurodegenerative Disease, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Roman Zeleznik
- Department of Radiation Oncology, Brigham and Women's Hospital, Boston, MA, USA
| | - Augusto A Litonjua
- Division of Pediatric Pulmonology, Golisano Children's Hospital, Rochester, NY, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
- Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
152
|
Han Z, Zhang C, Fu H, Zhou JT. Trusted Multi-View Classification With Dynamic Evidential Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2551-2566. [PMID: 35503823 DOI: 10.1109/tpami.2022.3171983] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
Collapse
|
153
|
Sathyanarayanan A, Mueller TT, Ali Moni M, Schueler K, Baune BT, Lio P, Mehta D, Baune BT, Dierssen M, Ebert B, Fabbri C, Fusar-Poli P, Gennarelli M, Harmer C, Howes OD, Janzing JGE, Lio P, Maron E, Mehta D, Minelli A, Nonell L, Pisanu C, Potier MC, Rybakowski F, Serretti A, Squassina A, Stacey D, van Westrhenen R, Xicota L. Multi-omics data integration methods and their applications in psychiatric disorders. Eur Neuropsychopharmacol 2023; 69:26-46. [PMID: 36706689 DOI: 10.1016/j.euroneuro.2023.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/22/2022] [Accepted: 01/02/2023] [Indexed: 01/27/2023]
Abstract
To study mental illness and health, in the past researchers have often broken down their complexity into individual subsystems (e.g., genomics, transcriptomics, proteomics, clinical data) and explored the components independently. Technological advancements and decreasing costs of high throughput sequencing has led to an unprecedented increase in data generation. Furthermore, over the years it has become increasingly clear that these subsystems do not act in isolation but instead interact with each other to drive mental illness and health. Consequently, individual subsystems are now analysed jointly to promote a holistic understanding of the underlying biological complexity of health and disease. Complementing the increasing data availability, current research is geared towards developing novel methods that can efficiently combine the information rich multi-omics data to discover biologically meaningful biomarkers for diagnosis, treatment, and prognosis. However, clinical translation of the research is still challenging. In this review, we summarise conventional and state-of-the-art statistical and machine learning approaches for discovery of biomarker, diagnosis, as well as outcome and treatment response prediction through integrating multi-omics and clinical data. In addition, we describe the role of biological model systems and in silico multi-omics model designs in clinical translation of psychiatric research from bench to bedside. Finally, we discuss the current challenges and explore the application of multi-omics integration in future psychiatric research. The review provides a structured overview and latest updates in the field of multi-omics in psychiatry.
Collapse
Affiliation(s)
- Anita Sathyanarayanan
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Tamara T Mueller
- Institute for Artificial Intelligence and Informatics in Medicine, TU Munich, 80333 Munich, Germany
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Katja Schueler
- Clinic for Psychosomatics, Hospital zum Heiligen Geist, Frankfurt am Main, Germany; Frankfurt Psychoanalytic Institute, Frankfurt am Main, Germany
| | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia.
| | | | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Mara Dierssen
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Bjarke Ebert
- Medical Strategy & Communication, H. Lundbeck A/S, Valby, Denmark
| | - Chiara Fabbri
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Paolo Fusar-Poli
- Early Psychosis: Intervention and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, King's College London, United Kingdom; Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Massimo Gennarelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Oliver D Howes
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Psychiatric Imaging, Medical Research Council Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, United Kingdom
| | | | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Eduard Maron
- Department of Psychiatry, University of Tartu, Tartu, Estonia; Centre for Neuropsychopharmacology, Division of Brain Sciences, Imperial College London, London, United Kingdom; Documental Ltd, Tallin, Estonia; West Tallinn Central Hospital, Tallinn, Estonia
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Alessandra Minelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Lara Nonell
- MARGenomics, IMIM (Hospital del Mar Research Institute), Barcelona, Spain
| | - Claudia Pisanu
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | | | - Filip Rybakowski
- Department of Psychiatry, Poznan University of Medical Sciences, Poznan, Poland
| | - Alessandro Serretti
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy
| | - Alessio Squassina
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | - David Stacey
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Roos van Westrhenen
- Parnassia Psychiatric Institute, Amsterdam, the Netherlands; Department of Psychiatry and Neuropsychology, Faculty of Health and Sciences, Maastricht University, Maastricht, the Netherlands; Institute of Psychiatry, Psychology & Neuroscience (IoPPN) King's College London, United Kingdom
| | - Laura Xicota
- Paris Brain Institute ICM, Salpetriere Hospital, Paris, France
| |
Collapse
|
154
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
155
|
Liu C, Duan Y, Zhou Q, Wang Y, Gao Y, Kan H, Hu J. A classification method of gastric cancer subtype based on residual graph convolution network. Front Genet 2023; 13:1090394. [PMID: 36685956 PMCID: PMC9845413 DOI: 10.3389/fgene.2022.1090394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 12/09/2022] [Indexed: 01/06/2023] Open
Abstract
Background: Clinical diagnosis and treatment of tumors are greatly complicated by their heterogeneity, and the subtype classification of cancer frequently plays a significant role in the subsequent treatment of tumors. Presently, the majority of studies rely far too heavily on gene expression data, omitting the enormous power of multi-omics fusion data and the potential for patient similarities. Method: In this study, we created a gastric cancer subtype classification model called RRGCN based on residual graph convolutional network (GCN) using multi-omics fusion data and patient similarity network. Given the multi-omics data's high dimensionality, we built an artificial neural network Autoencoder (AE) to reduce the dimensionality of the data and extract hidden layer features. The model is then built using the feature data. In addition, we computed the correlation between patients using the Pearson correlation coefficient, and this relationship between patients forms the edge of the graph structure. Four graph convolutional network layers and two residual networks with skip connections make up RRGCN, which reduces the amount of information lost during transmission between layers and prevents model degradation. Results: The results show that RRGCN significantly outperforms other classification methods with an accuracy as high as 0.87 when compared to four other traditional machine learning methods and deep learning models. Conclusion: In terms of subtype classification, RRGCN excels in all areas and has the potential to offer fresh perspectives on disease mechanisms and disease progression. It has the potential to be used for a broader range of disorders and to aid in clinical diagnosis.
Collapse
Affiliation(s)
- Can Liu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yuchen Duan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Qingqing Zhou
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yongkang Wang
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yong Gao
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Hongxing Kan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Jili Hu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| |
Collapse
|
156
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
- Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China
| |
Collapse
|
157
|
Pal M, Selvaraju S, Khan R. Editorial: Multi-omics approaches in cancer research with applications in tumour prognosis, metastasis and biosensor based diagnosis of biomarkers. Front Oncol 2023; 13:1168975. [PMID: 37025601 PMCID: PMC10071029 DOI: 10.3389/fonc.2023.1168975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 03/02/2023] [Indexed: 04/08/2023] Open
Affiliation(s)
- Mintu Pal
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Bathinda, Punjab, India
- *Correspondence: Mintu Pal,
| | - Sudhagar Selvaraju
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research, Guwahati, Assam, India
| | - Raju Khan
- Industrial Waste Utilization, Nano and Biomaterials, CSIR-Advanced Materials and Processes Research Institute (AMPRI), Bhopal, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
158
|
Deep-Learning Algorithm and Concomitant Biomarker Identification for NSCLC Prediction Using Multi-Omics Data Integration. Biomolecules 2022; 12:biom12121839. [PMID: 36551266 PMCID: PMC9775093 DOI: 10.3390/biom12121839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/05/2022] [Accepted: 12/05/2022] [Indexed: 12/14/2022] Open
Abstract
Early diagnosis of lung cancer to increase the survival rate, which is currently at a low range of mid-30%, remains a critical need. Despite this, multi-omics data have rarely been applied to non-small-cell lung cancer (NSCLC) diagnosis. We developed a multi-omics data-affinitive artificial intelligence algorithm based on the graph convolutional network that integrates mRNA expression, DNA methylation, and DNA sequencing data. This NSCLC prediction model achieved a 93.7% macro F1-score, indicating that values for false positives and negatives were substantially low, which is desirable for accurate classification. Gene ontology enrichment and pathway analysis of features revealed that two major subtypes of NSCLC, lung adenocarcinoma and lung squamous cell carcinoma, have both specific and common GO biological processes. Numerous biomarkers (i.e., microRNA, long non-coding RNA, differentially methylated regions) were newly identified, whereas some biomarkers were consistent with previous findings in NSCLC (e.g., SPRR1B). Thus, using multi-omics data integration, we developed a promising cancer prediction algorithm.
Collapse
|
159
|
Clark C, Rabl M, Dayon L, Popp J. The promise of multi-omics approaches to discover biological alterations with clinical relevance in Alzheimer's disease. Front Aging Neurosci 2022; 14:1065904. [PMID: 36570537 PMCID: PMC9768448 DOI: 10.3389/fnagi.2022.1065904] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022] Open
Abstract
Beyond the core features of Alzheimer's disease (AD) pathology, i.e. amyloid pathology, tau-related neurodegeneration and microglia response, multiple other molecular alterations and pathway dysregulations have been observed in AD. Their inter-individual variations, complex interactions and relevance for clinical manifestation and disease progression remain poorly understood, however. Heterogeneity at both pathophysiological and clinical levels complicates diagnosis, prognosis, treatment and drug design and testing. High-throughput "omics" comprise unbiased and untargeted data-driven methods which allow the exploration of a wide spectrum of disease-related changes at different endophenotype levels without focussing a priori on specific molecular pathways or molecules. Crucially, new methodological and statistical advances now allow for the integrative analysis of data resulting from multiple and different omics methods. These multi-omics approaches offer the unique advantage of providing a more comprehensive characterisation of the AD endophenotype and to capture molecular signatures and interactions spanning various biological levels. These new insights can then help decipher disease mechanisms more deeply. In this review, we describe the different multi-omics tools and approaches currently available and how they have been applied in AD research so far. We discuss how multi-omics can be used to explore molecular alterations related to core features of the AD pathologies and how they interact with comorbid pathological alterations. We further discuss whether the identified pathophysiological changes are relevant for the clinical manifestation of AD, in terms of both cognitive impairment and neuropsychiatric symptoms, and for clinical disease progression over time. Finally, we address the opportunities for multi-omics approaches to help discover novel biomarkers for diagnosis and monitoring of relevant pathophysiological processes, along with personalised intervention strategies in AD.
Collapse
Affiliation(s)
- Christopher Clark
- Department of Psychiatry, Psychotherapy and Psychosomatics, University of Zürich, Zürich, Switzerland,Geriatric Psychiatry, University Hospital of Psychiatry Zürich, Zürich, Switzerland,*Correspondence: Christopher Clark,
| | - Miriam Rabl
- Geriatric Psychiatry, University Hospital of Psychiatry Zürich, Zürich, Switzerland,University of Lausanne, Lausanne, Switzerland
| | - Loïc Dayon
- Nestlé Institute of Food Safety and Analytical Sciences, Nestlé Research, Lausanne, Switzerland,Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Julius Popp
- Department of Psychiatry, Psychotherapy and Psychosomatics, University of Zürich, Zürich, Switzerland,Geriatric Psychiatry, University Hospital of Psychiatry Zürich, Zürich, Switzerland,Old Age Psychiatry, Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland
| |
Collapse
|
160
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
161
|
Zhang Y, Liu Y, Wang Z, Wang M, Xiong S, Huang G, Gong M. Uncovering the Relationship between Tissue-Specific TF-DNA Binding and Chromatin Features through a Transformer-Based Model. Genes (Basel) 2022; 13:1952. [PMID: 36360189 PMCID: PMC9690320 DOI: 10.3390/genes13111952] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/19/2022] [Accepted: 10/23/2022] [Indexed: 09/08/2024] Open
Abstract
Chromatin features can reveal tissue-specific TF-DNA binding, which leads to a better understanding of many critical physiological processes. Accurately identifying TF-DNA bindings and constructing their relationships with chromatin features is a long-standing goal in the bioinformatic field. However, this has remained elusive due to the complex binding mechanisms and heterogeneity among inputs. Here, we have developed the GHTNet (General Hybrid Transformer Network), a transformer-based model to predict TF-DNA binding specificity. The GHTNet decodes the relationship between tissue-specific TF-DNA binding and chromatin features via a specific input scheme of alternative inputs and reveals important gene regions and tissue-specific motifs. Our experiments show that the GHTNet has excellent performance, achieving about a 5% absolute improvement over existing methods. The TF-DNA binding mechanism analysis shows that the importance of TF-DNA binding features varies across tissues. The best predictor is based on the DNA sequence, followed by epigenomics and shape. In addition, cross-species studies address the limited data, thus providing new ideas in this case. Moreover, the GHTNet is applied to interpret the relationship among TFs, chromatin features, and diseases associated with AD46 tissue. This paper demonstrates that the GHTNet is an accurate and robust framework for deciphering tissue-specific TF-DNA binding and interpreting non-coding regions.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Guo Huang
- School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan 614000, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
162
|
He QE, Zhu JX, Wang LY, Ding EC, Song K. DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method. Front Genet 2022; 13:940214. [PMID: 36338981 PMCID: PMC9626520 DOI: 10.3389/fgene.2022.940214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 09/30/2022] [Indexed: 11/17/2022] Open
Abstract
Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data.
Collapse
Affiliation(s)
- Qi-en He
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Jun-xuan Zhu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Li-yan Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - En-ci Ding
- Tianjin First Central Hospital, Tianjin, China
| | - Kai Song
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
- *Correspondence: Kai Song,
| |
Collapse
|
163
|
Li Y, Wu X, Yang P, Jiang G, Luo Y. Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:850-866. [PMID: 36462630 PMCID: PMC10025752 DOI: 10.1016/j.gpb.2022.11.003] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 10/03/2022] [Accepted: 11/17/2022] [Indexed: 12/03/2022]
Abstract
The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively characterized lung cancer through the use of different perspectives from these accrued data. In this review, we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy, including early detection, auxiliary diagnosis, prognosis prediction, and immunotherapy practice. Moreover, we highlight the challenges and opportunities for future applications of machine learning in lung cancer.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Xin Wu
- Department of Medicine, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905 / Scottsdale, AZ 85259, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
164
|
Bang D, Gu J, Park J, Jeong D, Koo B, Yi J, Shin J, Jung I, Kim S, Lee S. A Survey on Computational Methods for Investigation on ncRNA-Disease Association through the Mode of Action Perspective. Int J Mol Sci 2022; 23:ijms231911498. [PMID: 36232792 PMCID: PMC9570358 DOI: 10.3390/ijms231911498] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/18/2022] [Accepted: 09/26/2022] [Indexed: 02/01/2023] Open
Abstract
Molecular and sequencing technologies have been successfully used in decoding biological mechanisms of various diseases. As revealed by many novel discoveries, the role of non-coding RNAs (ncRNAs) in understanding disease mechanisms is becoming increasingly important. Since ncRNAs primarily act as regulators of transcription, associating ncRNAs with diseases involves multiple inference steps. Leveraging the fast-accumulating high-throughput screening results, a number of computational models predicting ncRNA-disease associations have been developed. These tools suggest novel disease-related biomarkers or therapeutic targetable ncRNAs, contributing to the realization of precision medicine. In this survey, we first introduce the biological roles of different ncRNAs and summarize the databases containing ncRNA-disease associations. Then, we suggest a new trend in recent computational prediction of ncRNA-disease association, which is the mode of action (MoA) network perspective. This perspective includes integrating ncRNAs with mRNA, pathway and phenotype information. In the next section, we describe computational methodologies widely used in this research domain. Existing computational studies are then summarized in terms of their coverage of the MoA network. Lastly, we discuss the potential applications and future roles of the MoA network in terms of integrating biological mechanisms for ncRNA-disease associations.
Collapse
Affiliation(s)
- Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
| | - Joonhyeong Park
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Bonil Koo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
| | - Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Inuk Jung
- Department of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Sunho Lee
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
- Correspondence:
| |
Collapse
|
165
|
Singh KS, van der Hooft JJJ, van Wees SCM, Medema MH. Integrative omics approaches for biosynthetic pathway discovery in plants. Nat Prod Rep 2022; 39:1876-1896. [PMID: 35997060 PMCID: PMC9491492 DOI: 10.1039/d2np00032f] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Indexed: 12/13/2022]
Abstract
Covering: up to 2022With the emergence of large amounts of omics data, computational approaches for the identification of plant natural product biosynthetic pathways and their genetic regulation have become increasingly important. While genomes provide clues regarding functional associations between genes based on gene clustering, metabolome mining provides a foundational technology to chart natural product structural diversity in plants, and transcriptomics has been successfully used to identify new members of their biosynthetic pathways based on coexpression. Thus far, most approaches utilizing transcriptomics and metabolomics have been targeted towards specific pathways and use one type of omics data at a time. Recent technological advances now provide new opportunities for integration of multiple omics types and untargeted pathway discovery. Here, we review advances in plant biosynthetic pathway discovery using genomics, transcriptomics, and metabolomics, as well as recent efforts towards omics integration. We highlight how transcriptomics and metabolomics provide complementary information to link genes to metabolites, by associating temporal and spatial gene expression levels with metabolite abundance levels across samples, and by matching mass-spectral features to enzyme families. Furthermore, we suggest that elucidation of gene regulatory networks using time-series data may prove useful for efforts to unwire the complexities of biosynthetic pathway components based on regulatory interactions and events.
Collapse
Affiliation(s)
- Kumar Saurabh Singh
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Plant-Microbe Interactions, Institute of Environmental Biology, Utrecht University, The Netherlands.
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
| | - Saskia C M van Wees
- Plant-Microbe Interactions, Institute of Environmental Biology, Utrecht University, The Netherlands.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
| |
Collapse
|
166
|
|
167
|
Liu H, Xing K, Jiang Y, Liu Y, Wang C, Ding X. Using Machine Learning to Identify Biomarkers Affecting Fat Deposition in Pigs by Integrating Multisource Transcriptome Information. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:10359-10370. [PMID: 35953074 PMCID: PMC9413214 DOI: 10.1021/acs.jafc.2c03339] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Fat deposition in pigs is not only closely related to pig production efficiency and pork quality but also an ideal model for human obesity. Transcriptome sequencing is widely used to study fat deposition. However, due to small sample sizes, high false positive rates, and poor consistency of results from different studies, new strategies are urgently needed. Machine learning, a new analysis method, can effectively fit complex data and accurately identify samples and genes. In this study, 36 samples of adipose tissue, muscle tissue, and liver tissue were collected from Songliao black pigs and Landrace pigs, and the mRNA of all the samples was sequenced. In addition, we collected transcriptome data for 64 samples in the GEO database from four different sources. After standardization and imputation of missing values in the data set comprising 100 samples, traditional differential expression analysis was carried out, and different numbers of expressed genes were selected as features for the training model of eight machine learning methods. In the 1000 replications of fourfold cross validation with 100 samples, AdaBoost performed best, with an average prediction accuracy greater than 93% and the highest mean area under the curve in predicting the high- and low-fat content groups among the eight ML methods. According to their performance-based ranks inferred by AdaBoost, 12 genes related to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose tissue, and APOA1 was specifically expressed in the liver, which could be important candidate biomarkers affecting fat deposition.
Collapse
|
168
|
Yip HF, Chowdhury D, Wang K, Liu Y, Gao Y, Lan L, Zheng C, Guan D, Lam KF, Zhu H, Tai X, Lu A. ReDisX, a machine learning approach, rationalizes rheumatoid arthritis and coronary artery disease patients uniquely upon identifying subpopulation differentiation markers from their genomic data. Front Med (Lausanne) 2022; 9:931860. [PMID: 36072953 PMCID: PMC9441882 DOI: 10.3389/fmed.2022.931860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/28/2022] [Indexed: 11/29/2022] Open
Abstract
Diseases originate at the molecular-genetic layer, manifest through altered biochemical homeostasis, and develop symptoms later. Hence, symptomatic diagnosis is inadequate to explain the underlying molecular-genetic abnormality and individual genomic disparities. The current trends include molecular-genetic information relying on algorithms to recognize the disease subtypes through gene expressions. Despite their disposition toward disease-specific heterogeneity and cross-disease homogeneity, a gap still exists in describing the extent of homogeneity within the heterogeneous subpopulation of different diseases. They are limited to obtaining the holistic sense of the whole genome-based diagnosis resulting in inaccurate diagnosis and subsequent management. Addressing those ambiguities, our proposed framework, ReDisX, introduces a unique classification system for the patients based on their genomic signatures. In this study, it is a scalable machine learning algorithm deployed to re-categorize the patients with rheumatoid arthritis and coronary artery disease. It reveals heterogeneous subpopulations within a disease and homogenous subpopulations across different diseases. Besides, it identifies granzyme B (GZMB) as a subpopulation-differentiation marker that plausibly serves as a prominent indicator for GZMB-targeted drug repurposing. The ReDisX framework offers a novel strategy to redefine disease diagnosis through characterizing personalized genomic signatures. It may rejuvenate the landscape of precision and personalized diagnosis and a clue to drug repurposing.
Collapse
Affiliation(s)
- Hiu F. Yip
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Debajyoti Chowdhury
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Kexin Wang
- National Key Clinical Specialty, Engineering Technology Research Center of Education Ministry of China, Guangzhou, China
- Guangdong Provincial Key Laboratory on Brain Function Repair and Regeneration, Neurosurgery Institute, Department of Neurosurgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yujie Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Yao Gao
- Department of Psychiatry, First Hospital, First Clinical Medical College of Shanxi Medical University, Taiyuan, China
| | - Liang Lan
- Department of Communication Studies, School of Communication, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Chaochao Zheng
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Daogang Guan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Guangzhou, China
| | - Kei F. Lam
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Hailong Zhu
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Xuecheng Tai
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Aiping Lu
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| |
Collapse
|
169
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai, Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs, Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
170
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People’s Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People’s Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| |
Collapse
|
171
|
Moon S, Hwang J, Lee H. SDGCCA: Supervised Deep Generalized Canonical Correlation Analysis for Multi-Omics Integration. J Comput Biol 2022; 29:892-907. [PMID: 35951002 PMCID: PMC9805883 DOI: 10.1089/cmb.2021.0598] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Integration of multi-omics data provides opportunities for revealing biological mechanisms related to certain phenotypes. We propose a novel method of multi-omics integration called supervised deep generalized canonical correlation analysis (SDGCCA) for modeling correlation structures between nonlinear multi-omics manifolds that aims at improving the classification of phenotypes and revealing the biomarkers related to phenotypes. SDGCCA addresses the limitations of other canonical correlation analysis (CCA)-based models (such as deep CCA, deep generalized CCA) by considering complex/nonlinear cross-data correlations between multiple (≥2) modalities. Although there are a few methods to learn nonlinear CCA projections for classifying phenotypes, they only consider two views. Methods extended to multiple views either do not perform classification or do not provide feature ranking. In contrast, SDGCCA is a nonlinear multi-view CCA projection method that performs classification and ranks features. When we applied SDGCCA in predicting patients with Alzheimer's disease (AD) and discrimination of early- and late-stage cancers, it outperformed other CCA-based and other supervised methods. In addition, we demonstrate that SDGCCA can be applied for feature selection to identify important multi-omics biomarkers. On applying AD data, SDGCCA identified clusters of genes in multi-omics data, well known to be associated with AD.
Collapse
Affiliation(s)
- Sehwan Moon
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Jeongyoung Hwang
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea.,Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, South Korea.,Address correspondence to: Dr. Hyunju Lee, School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| |
Collapse
|
172
|
Dong G, Zhang ZC, Feng J, Zhao XM. MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease network. Brief Bioinform 2022; 23:6627601. [PMID: 35780382 DOI: 10.1093/bib/bbac255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/17/2022] [Accepted: 06/01/2022] [Indexed: 02/06/2023] Open
Abstract
Exploring multimorbidity relationships among diseases is of great importance for understanding their shared mechanisms, precise diagnosis and treatment. However, the landscape of multimorbidities is still far from complete due to the complex nature of multimorbidity. Although various types of biological data, such as biomolecules and clinical symptoms, have been used to identify multimorbidities, the population phenotype information (e.g. physical activity and diet) remains less explored for multimorbidity. Here, we present a graph convolutional network (GCN) model, named MorbidGCN, for multimorbidity prediction by integrating population phenotypes and disease network. Specifically, MorbidGCN treats the multimorbidity prediction as a missing link prediction problem in the disease network, where a novel feature selection method is embedded to select important phenotypes. Benchmarking results on two large-scale multimorbidity data sets, i.e. the UK Biobank (UKB) and Human Disease Network (HuDiNe) data sets, demonstrate that MorbidGCN outperforms other competitive methods. With MorbidGCN, 9742 and 14 010 novel multimorbidities are identified in the UKB and HuDiNe data sets, respectively. Moreover, we notice that the selected phenotypes that are generally differentially distributed between multimorbidity patients and single-disease patients can help interpret multimorbidities and show potential for prognosis of multimorbidities.
Collapse
Affiliation(s)
- Guiying Dong
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zi-Chao Zhang
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| |
Collapse
|
173
|
Li R, Zhou W. Multi-omics analysis to screen potential therapeutic biomarkers for anti-cancer compounds. Heliyon 2022; 8:e09616. [PMID: 36091949 PMCID: PMC9450078 DOI: 10.1016/j.heliyon.2022.e09616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/26/2022] [Accepted: 05/26/2022] [Indexed: 12/02/2022] Open
Abstract
Discover potential biomarkers of the response for anti-cancer therapies, including traditional Chinese medicine (TCM), is a critical but much different task in the field of cancer research. Based on accumulated data and sophisticated methods, multi-omics analysis provides a feasible strategy for the discovery of potential therapeutic biomarkers. Here, we screened the potential therapeutic biomarkers for anti-cancer compounds in TCM through multi-omics data analysis. Firstly, compounds in TCM were collected from the public databases. Then, the molecules that those compounds can intervene on cell lines were carefully filtered out from existing drug bioactivity datasets. Finally, multi-omics analysis including gene mutation analysis, differential expression gene analysis, copy number variation analysis and clinical survival analysis for pan-cancer were conducted to screen potential therapeutic biomarkers for compounds in TCM. 13 molecules of compounds in TCM namely ERBB2, MYC, FLT4, TEK, GLI1, TOP2A, PDE10A, SLC6A3, GPR55, TERT, EGFR, KCNA3 and HDAC4 are differentially expressed, high frequently mutated, obtain high copy number variation rate and also significant in survival, are considered as the potential therapeutic biomarkers.
Collapse
|
174
|
Tabakhi S, Lu H. Multi-agent Feature Selection for Integrative Multi-omics Analysis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1638-1642. [PMID: 36086594 DOI: 10.1109/embc48229.2022.9871758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiomics data integration is key for cancer prediction as it captures different aspects of molecular mechanisms. Nevertheless, the high-dimensionality of multi-omics data with a relatively small number of patients presents a challenge for the cancer prediction tasks. While feature selection techniques have been widely used to tackle the curse of dimensionality of multi-omics data, most existing methods have been applied to each type of omics data separately. In this paper, we propose a multi-agent architecture for feature selection, called MAgentOmics, to consider all omics data together. MAgentOmics extends the ant colony optimization algorithm to multi-omics data, which iteratively builds candidate solutions and evaluates them. Moreover, a new fitness function is introduced to assess the candidate feature subsets without using prediction target such as survival time of patients. Therefore, it can be considered as an unsupervised method. We evaluate the performance of MAgentOmics on the TCGA ovarian cancer multi-omics data from 176 patients using a 5-fold cross-validation. The results demonstrate that the integration power of MAgentOmics is relatively better than the state-of-the-art supervised multi-view method. The code is publicly available at https://github.com/SinaTabakhi/MAgentOmics. Clinical relevance- Discovering knowledge in existing multi-omics datasets through better feature selection enhances the clinical understanding of cancers and speeds-up decision-making in the clinic.
Collapse
|
175
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
176
|
You Y, Lai X, Pan Y, Zheng H, Vera J, Liu S, Deng S, Zhang L. Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther 2022; 7:156. [PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0] [Citation(s) in RCA: 142] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 03/14/2022] [Accepted: 04/05/2022] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Yujie You
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, 91052, Germany
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Room D513, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, China
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, BT15 1ED, UK
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, 91052, Germany
| | - Suran Liu
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Senyi Deng
- Institute of Thoracic Oncology, Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, 610065, China.
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China.
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
| |
Collapse
|
177
|
Wang P, Schumacher AM, Shiu SH. Computational prediction of plant metabolic pathways. CURRENT OPINION IN PLANT BIOLOGY 2022; 66:102171. [PMID: 35078130 DOI: 10.1016/j.pbi.2021.102171] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/07/2021] [Accepted: 12/18/2021] [Indexed: 06/14/2023]
Abstract
Uncovering genes encoding enzymes responsible for the biosynthesis of diverse plant metabolites is essential for metabolic engineering and production of plant metabolite-derived medicine. With the availability of multi-omics data for an ever-increasing number of plant species and the development of computational approaches, the metabolic pathways of many important plant compounds can be predicted, complementing a more traditional genetic and/or biochemical approach. Here, we summarize recent progress in predicting plant metabolic pathways using genome, transcriptome, proteome, interactome, and/or metabolome data, and the utility of integrating these data with machine learning to further improve metabolic pathway predictions.
Collapse
Affiliation(s)
- Peipei Wang
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA.
| | - Ally M Schumacher
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA; Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
178
|
Zhu Y, Bu D, Ma L. Integration of Multiplied Omics, a Step Forward in Systematic Dairy Research. Metabolites 2022; 12:metabo12030225. [PMID: 35323668 PMCID: PMC8955540 DOI: 10.3390/metabo12030225] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 02/22/2022] [Accepted: 02/23/2022] [Indexed: 02/07/2023] Open
Abstract
Due to their unique multi-gastric digestion system highly adapted for rumination, dairy livestock has complicated physiology different from monogastric animals. However, the microbiome-based mechanism of the digestion system is congenial for biology approaches. Different omics and their integration have been widely applied in the dairy sciences since the previous decade for investigating their physiology, pathology, and the development of feed and management protocols. The rumen microbiome can digest dietary components into utilizable sugars, proteins, and volatile fatty acids, contributing to the energy intake and feed efficiency of dairy animals, which has become one target of the basis for omics applications in dairy science. Rumen, liver, and mammary gland are also frequently targeted in omics because of their crucial impact on dairy animals’ energy metabolism, production performance, and health status. The application of omics has made outstanding contributions to a more profound understanding of the physiology, etiology, and optimizing the management strategy of dairy animals, while the multi-omics method could draw information of different levels and organs together, providing an unprecedented broad scope on traits of dairy animals. This article reviewed recent omics and multi-omics researches on physiology, feeding, and pathology on dairy animals and also performed the potential of multi-omics on systematic dairy research.
Collapse
Affiliation(s)
- Yingkun Zhu
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
- School of Agriculture & Food Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
| | - Dengpan Bu
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
- Joint Laboratory on Integrated Crop-Tree-Livestock Systems of the Chinese Academy of Agricultural Sciences (CAAS), Ethiopian Institute of Agricultural Research (EIAR), and World Agroforestry Center (ICRAF), Beijing 100193, China
- Correspondence: (D.B.); (L.M.)
| | - Lu Ma
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
- Correspondence: (D.B.); (L.M.)
| |
Collapse
|
179
|
Arjmand B, Hamidpour SK, Tayanloo-Beik A, Goodarzi P, Aghayan HR, Adibi H, Larijani B. Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer. Front Genet 2022; 13:824451. [PMID: 35154283 PMCID: PMC8829119 DOI: 10.3389/fgene.2022.824451] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/10/2022] [Indexed: 12/11/2022] Open
Abstract
Cancer is defined as a large group of diseases that is associated with abnormal cell growth, uncontrollable cell division, and may tend to impinge on other tissues of the body by different mechanisms through metastasis. What makes cancer so important is that the cancer incidence rate is growing worldwide which can have major health, economic, and even social impacts on both patients and the governments. Thereby, the early cancer prognosis, diagnosis, and treatment can play a crucial role at the front line of combating cancer. The onset and progression of cancer can occur under the influence of complicated mechanisms and some alterations in the level of genome, proteome, transcriptome, metabolome etc. Consequently, the advent of omics science and its broad research branches (such as genomics, proteomics, transcriptomics, metabolomics, and so forth) as revolutionary biological approaches have opened new doors to the comprehensive perception of the cancer landscape. Due to the complexities of the formation and development of cancer, the study of mechanisms underlying cancer has gone beyond just one field of the omics arena. Therefore, making a connection between the resultant data from different branches of omics science and examining them in a multi-omics field can pave the way for facilitating the discovery of novel prognostic, diagnostic, and therapeutic approaches. As the volume and complexity of data from the omics studies in cancer are increasing dramatically, the use of leading-edge technologies such as machine learning can have a promising role in the assessments of cancer research resultant data. Machine learning is categorized as a subset of artificial intelligence which aims to data parsing, classification, and data pattern identification by applying statistical methods and algorithms. This acquired knowledge subsequently allows computers to learn and improve accurate predictions through experiences from data processing. In this context, the application of machine learning, as a novel computational technology offers new opportunities for achieving in-depth knowledge of cancer by analysis of resultant data from multi-omics studies. Therefore, it can be concluded that the use of artificial intelligence technologies such as machine learning can have revolutionary roles in the fight against cancer.
Collapse
Affiliation(s)
- Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| | - Shayesteh Kokabi Hamidpour
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Akram Tayanloo-Beik
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Parisa Goodarzi
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hamid Reza Aghayan
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hossein Adibi
- Diabetes Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| |
Collapse
|
180
|
Watson ER, Taherian Fard A, Mar JC. Computational Methods for Single-Cell Imaging and Omics Data Integration. Front Mol Biosci 2022; 8:768106. [PMID: 35111809 PMCID: PMC8801747 DOI: 10.3389/fmolb.2021.768106] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open
Abstract
Integrating single cell omics and single cell imaging allows for a more effective characterisation of the underlying mechanisms that drive a phenotype at the tissue level, creating a comprehensive profile at the cellular level. Although the use of imaging data is well established in biomedical research, its primary application has been to observe phenotypes at the tissue or organ level, often using medical imaging techniques such as MRI, CT, and PET. These imaging technologies complement omics-based data in biomedical research because they are helpful for identifying associations between genotype and phenotype, along with functional changes occurring at the tissue level. Single cell imaging can act as an intermediary between these levels. Meanwhile new technologies continue to arrive that can be used to interrogate the genome of single cells and its related omics datasets. As these two areas, single cell imaging and single cell omics, each advance independently with the development of novel techniques, the opportunity to integrate these data types becomes more and more attractive. This review outlines some of the technologies and methods currently available for generating, processing, and analysing single-cell omics- and imaging data, and how they could be integrated to further our understanding of complex biological phenomena like ageing. We include an emphasis on machine learning algorithms because of their ability to identify complex patterns in large multidimensional data.
Collapse
Affiliation(s)
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
181
|
Abstract
Systematic bio- and databanks are key prerequisites for modern radiation research to investigate radiation response mechanisms in the context of genetic, environmental and lifestyle-associated factors. This report presents the current status of the German Uranium Miners’ Biobank. In 2008, the bio- and databank was established at the Federal Office for Radiation Protection, and the sampling of biological materials from former uranium miners with and without lung cancer was initiated. For this purpose, various biological specimens, such as DNA and RNA, were isolated from blood samples as well as from formalin-fixed paraffin-embedded lung tissue. High-quality biomaterials suitable for OMICs research and the associated data on occupational radiation and dust exposure, and medical and lifestyle data from over 1000 individuals have been stored so far. Various experimental data, e.g., genome-wide SNPs, whole genome transcriptomic and miRNA data, as well as individual chromosomal aberration data from subgroups of biobank samples, are already available upon request for in-depth research on radiation-induced long-term effects, individual radiation susceptibility to lung cancer and radon-induced fingerprints in lung cancer. This biobank is the first systematic uranium miners´ biobank worldwide that is suitable for OMICs research on radiation-exposed workers. It offers the opportunity to link radiation-induced perturbations of biological pathways or processes and putative adverse outcome(s) by OMICs profiling at different biological organization levels.
Collapse
|
182
|
Pursuit of precision medicine: Systems biology approaches in Alzheimer's disease mouse models. Neurobiol Dis 2021; 161:105558. [PMID: 34767943 PMCID: PMC10112395 DOI: 10.1016/j.nbd.2021.105558] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022] Open
Abstract
Alzheimer's disease (AD) is a complex disease that is mediated by numerous factors and manifests in various forms. A systems biology approach to studying AD involves analyses of various body systems, biological scales, environmental elements, and clinical outcomes to understand the genotype to phenotype relationship that potentially drives AD development. Currently, there are many research investigations probing how modifiable and nonmodifiable factors impact AD symptom presentation. This review specifically focuses on how imaging modalities can be integrated into systems biology approaches using model mouse populations to link brain level functional and structural changes to disease onset and progression. Combining imaging and omics data promotes the classification of AD into subtypes and paves the way for precision medicine solutions to prevent and treat AD.
Collapse
|
183
|
Westerlund AM, Hawe JS, Heinig M, Schunkert H. Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence. Int J Mol Sci 2021; 22:10291. [PMID: 34638627 PMCID: PMC8508897 DOI: 10.3390/ijms221910291] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 12/11/2022] Open
Abstract
Cardiovascular diseases (CVD) annually take almost 18 million lives worldwide. Most lethal events occur months or years after the initial presentation. Indeed, many patients experience repeated complications or require multiple interventions (recurrent events). Apart from affecting the individual, this leads to high medical costs for society. Personalized treatment strategies aiming at prediction and prevention of recurrent events rely on early diagnosis and precise prognosis. Complementing the traditional environmental and clinical risk factors, multi-omics data provide a holistic view of the patient and disease progression, enabling studies to probe novel angles in risk stratification. Specifically, predictive molecular markers allow insights into regulatory networks, pathways, and mechanisms underlying disease. Moreover, artificial intelligence (AI) represents a powerful, yet adaptive, framework able to recognize complex patterns in large-scale clinical and molecular data with the potential to improve risk prediction. Here, we review the most recent advances in risk prediction of recurrent cardiovascular events, and discuss the value of molecular data and biomarkers for understanding patient risk in a systems biology context. Finally, we introduce explainable AI which may improve clinical decision systems by making predictions transparent to the medical practitioner.
Collapse
Affiliation(s)
- Annie M. Westerlund
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
| | - Johann S. Hawe
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
| | - Matthias Heinig
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
- Department of Informatics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Heribert Schunkert
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Deutsches Zentrum für Herz- und Kreislaufforschung (DZHK), Munich Heart Alliance, Biedersteiner Strasse 29, 80802 Munich, Germany
| |
Collapse
|
184
|
Li F, Song J, Zhang Y, Wang S, Wang J, Lin L, Yang C, Li P, Huang H. LINT-Web: A Web-Based Lipidomic Data Mining Tool Using Intra-Omic Integrative Correlation Strategy. SMALL METHODS 2021; 5:e2100206. [PMID: 34928054 DOI: 10.1002/smtd.202100206] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 07/14/2021] [Indexed: 06/14/2023]
Abstract
Lipidomics is a younger member of the "omics" family. It aims to profile lipidome alterations occurring in biological systems. Similar to the other "omics", lipidomic data is highly dimensional and contains a massive amount of information awaiting deciphering and data mining. Currently, the available bioinformatic tools targeting lipidomic data processing and lipid pathway analysis are limited. A few tools designed for lipidomic analysis perform only basic statistical analyses, and lipid pathway analyses rely heavily on public databases (KEGG, Reactome, and HMDB). Due to the inadequate understanding of lipid signaling and metabolism, the use of public databases for lipid pathway analysis can be biased and misleading. Instead of using public databases to interpret lipidomic ontology, the authors introduce an intra-omic integrative correlation strategy for lipidomic data mining. Such an intra-omic strategy allows researchers to unscramble and predict lipid biological functions from correlated genomic ontological results using statistical approaches. To simplify and improve the lipidomic data processing experience, they designed an interactive web-based tool: LINT-web (http://www.lintwebomics.info/) to perform the intra-omic analysis strategy, and validated the functions of LINT-web using two biological systems. Users without sophisticated statistical experience can easily process lipidomic datasets and predict the potential lipid biological functions using LINT-web.
Collapse
Affiliation(s)
- Fengsheng Li
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Yingkun Zhang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Shuaikang Wang
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Jinhui Wang
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Li Lin
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Chaoyong Yang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Peng Li
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
- Shanghai Qi Zhi Institute, Shanghai, 200030, China
| | - He Huang
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
- Shanghai Qi Zhi Institute, Shanghai, 200030, China
| |
Collapse
|