1
|
de Andrade AFC, Andrade Torres M, Fukumasu H, Lázaro Rochetti A, Kitamura Martins SMM, de Novais FJ, Ramirez C, Cooper B. Molecular Determinants in Seminal Plasma and Spermatozoa: Nontargeted Metabolomics. Methods Mol Biol 2025; 2897:627-636. [PMID: 40202665 DOI: 10.1007/978-1-0716-4406-5_42] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025]
Abstract
The metabolites present in seminal plasma are products of several functions in spermatozoa, such as energy production, motility, protection, pH control, and regulation of metabolic activity, among others. The use of metabolomics tools to search for biomarkers in human and animal andrology has grown in recent years and has proven to be highly efficient. With the present technique, it was possible to identify more than 1286 molecules in seminal plasma and more than 1393 molecules in boars' sperm.
Collapse
Affiliation(s)
- André Furugen Cesar de Andrade
- Department of Animal Reproduction, School of Veterinary Medicine and Animal Science, University of São Paulo, Pirassununga, São Paulo, Brazil.
| | - Mariana Andrade Torres
- Department of Animal Reproduction, School of Veterinary Medicine and Animal Science, University of São Paulo, Pirassununga, São Paulo, Brazil
| | - Heidge Fukumasu
- Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, São Paulo, Brazil
| | - Arina Lázaro Rochetti
- Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, São Paulo, Brazil
| | | | - Francisco José de Novais
- Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, São Paulo, Brazil
| | - Christina Ramirez
- Bindley Bioscience Center, Purdue University, West Lafayette, IN, USA
| | - Bruce Cooper
- Bindley Bioscience Center, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
2
|
Liang W, Bai Y, Zhang H, Mo Y, Li X, Huang J, Lei Y, Gao F, Dong M, Li S, Liang J. Identification and Analysis of Potential Biomarkers Associated with Neutrophil Extracellular Traps in Cervicitis. Biochem Genet 2024:10.1007/s10528-024-10919-x. [PMID: 39419909 DOI: 10.1007/s10528-024-10919-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 09/14/2024] [Indexed: 10/19/2024]
Abstract
Early diagnosis of cervicitis is important. Previous studies have found that neutrophil extracellular traps (NETs) play pro-inflammatory and anti-inflammatory roles in many diseases, suggesting that they may be involved in the inflammation of the uterine cervix and NETs-related genes may serve as biomarkers of cervicitis. However, what NETs-related genes are associated with cervicitis remains to be determined. Transcriptome analysis was performed using samples of exfoliated cervical cells from 15 patients with cervicitis and 15 patients without cervicitis as the control group. First, the intersection of differentially expressed genes (DEGs) and neutrophil extracellular trap-related genes (NETRGs) were taken to obtain genes, followed by functional enrichment analysis. We obtained hub genes through two machine learning algorithms. We then performed Artificial Neural Network (ANN) and nomogram construction, confusion matrix, receiver operating characteristic (ROC), gene set enrichment analysis (GSEA), and immune cell infiltration analysis. Moreover, we constructed ceRNA network, mRNA-transcription factor (TF) network, and hub genes-drug network. We obtained 19 intersecting genes by intersecting 1398 DEGs and 136 NETRGs. 5 hub genes were obtained through 2 machine learning algorithms, namely PKM, ATG7, CTSG, RIPK3, and ENO1. Confusion matrix and ROC curve evaluation ANN model showed high accuracy and stability. A nomogram containing the 5 hub genes was established to assess the disease rate in patients. The correlation analysis revealed that the expression of ATG7 was synergistic with RIPK3. The GSEA showed that most of the hub genes were related to ECM receptor interactions. It was predicted that the ceRNA network contained 2 hub genes, 3 targeted miRNAs, and 27 targeted lnRNAs, and that 5 mRNAs were regulated by 28 TFs. In addition, 36 small molecule drugs that target hub genes may improve the treatment of cervicitis. In this study, five hub genes (PKM, ATG7, CTSG, RIPK3, ENO1) provided new directions for the diagnosis and treatment of patients with cervicitis.
Collapse
Affiliation(s)
- Wantao Liang
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Yanyuan Bai
- Guangxi University of Chinese Medicine, Nanning, 530001, Guangxi, China
| | - Hua Zhang
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Yan Mo
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Xiufang Li
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Junming Huang
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Yangliu Lei
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Fangping Gao
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Mengmeng Dong
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Shan Li
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China
| | - Juan Liang
- The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, 530023, Guangxi, China.
| |
Collapse
|
3
|
Gervas-Arruga J, Barba-Romero MÁ, Fernández-Martín JJ, Gómez-Cerezo JF, Segú-Vergés C, Ronzoni G, Cebolla JJ. In Silico Modeling of Fabry Disease Pathophysiology for the Identification of Early Cellular Damage Biomarker Candidates. Int J Mol Sci 2024; 25:10329. [PMID: 39408658 PMCID: PMC11477023 DOI: 10.3390/ijms251910329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 09/19/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024] Open
Abstract
Fabry disease (FD) is an X-linked lysosomal disease whose ultimate consequences are the accumulation of sphingolipids and subsequent inflammatory events, mainly at the endothelial level. The outcomes include different nervous system manifestations as well as multiple organ damage. Despite the availability of known biomarkers, early detection of FD remains a medical need. This study aimed to develop an in silico model based on machine learning to identify candidate vascular and nervous system proteins for early FD damage detection at the cellular level. A combined systems biology and machine learning approach was carried out considering molecular characteristics of FD to create a computational model of vascular and nervous system disease. A data science strategy was applied to identify risk classifiers by using 10 K-fold cross-validation. Further biological and clinical criteria were used to prioritize the most promising candidates, resulting in the identification of 36 biomarker candidates with classifier abilities, which are easily measurable in body fluids. Among them, we propose four candidates, CAMK2A, ILK, LMNA, and KHSRP, which have high classification capabilities according to our models (cross-validated accuracy ≥ 90%) and are related to the vascular and nervous systems. These biomarkers show promise as high-risk cellular and tissue damage indicators that are potentially applicable in clinical settings, although in vivo validation is still needed.
Collapse
Affiliation(s)
| | - Miguel Ángel Barba-Romero
- Department of Internal Medicine, Albacete University Hospital, 02006 Albacete, Spain;
- Albacete Medical School, Castilla-La Mancha University, 02006 Albacete, Spain
| | | | - Jorge Francisco Gómez-Cerezo
- Department of Internal Medicine, Infanta Sofía University Hospital, 28702 Madrid, Spain;
- Faculty of Medicine, European University of Madrid, 28670 Madrid, Spain
| | | | | | | |
Collapse
|
4
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
5
|
Boadu VG, Teye E, Lamptey FP, Amuah CLY, Sam-Amoah L. Novel authentication of African geographical coffee types (bean, roasted, powdered) by handheld NIR spectroscopic method. Heliyon 2024; 10:e35512. [PMID: 39170384 PMCID: PMC11336767 DOI: 10.1016/j.heliyon.2024.e35512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 08/23/2024] Open
Abstract
African coffee is among the best traded coffee types worldwide, and rapid identification of its geographical origin is very important when trading the commodity. The study was important because it used NIR techniques to geographically differentiate between various types of coffee and provide a supply chain traceability method to avoid fraud. In this study, geographic differentiation of African coffee types (bean, roasted, and powder) was achieved using handheld near-infrared spectroscopy and multivariant data processing. Five African countries were used as the origins for the collection of Robusta coffee. The samples were individually scanned at a wavelength of 740-1070 nm, and their spectra profiles were preprocessed with mean centering (MC), multiplicative scatter correction (MSC), and standard normal variate (SNV). Support vector machines (SVM), linear discriminant analysis (LDA), neural networks (NN), random forests (RF), and partial least square discriminate analysis (PLS-DA) were then used to develop a prediction model for African coffee types. The performance of the model was assessed using accuracy and F1-score. Proximate chemical composition was also conducted on the raw and roasted coffee types. The best classification algorithms were developed for the following coffee types: raw bean coffee, SD-PLSDA, and MC + SD-PLSDA. These models had an accuracy of 0.87 and an F1-score of 0.88. SNV + SD-SVM and MSC + SD-NN both had accuracy and F1 scores of 0.97 for roasted coffee beans and 0.96 for roasted coffee powder, respectively. The results revealed that efficient quality assurance may be achieved by using handheld NIR spectroscopy combined with chemometrics to differentiate between different African coffee types according to their geographical origins.
Collapse
Affiliation(s)
- Vida Gyimah Boadu
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
- Akenten Appiah-Menka University of Skills Training and Entrepreneurial Development, Department of Hospitality and Tourism Education, Kumasi, Ghana
| | - Ernest Teye
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
| | - Francis Padi Lamptey
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
- Cape Coast Technical University, Department of Food Science and Postharvest Technology, Cape Coast, Ghana
| | - Charles Lloyd Yeboah Amuah
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Physical Sciences, Department of Physics, Cape Coast, Ghana
| | - L.K. Sam-Amoah
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
| |
Collapse
|
6
|
Cebolla JJ, Giraldo P, Gómez J, Montoto C, Gervas-Arruga J. Machine Learning-Driven Biomarker Discovery for Skeletal Complications in Type 1 Gaucher Disease Patients. Int J Mol Sci 2024; 25:8586. [PMID: 39201273 PMCID: PMC11354847 DOI: 10.3390/ijms25168586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 09/02/2024] Open
Abstract
Type 1 Gaucher disease (GD1) is a rare, autosomal recessive disorder caused by glucocerebrosidase deficiency. Skeletal manifestations represent one of the most debilitating and potentially irreversible complications of GD1. Although imaging studies are the gold standard, early diagnostic/prognostic tools, such as molecular biomarkers, are needed for the rapid management of skeletal complications. This study aimed to identify potential protein biomarkers capable of predicting the early diagnosis of bone skeletal complications in GD1 patients using artificial intelligence. An in silico study was performed using the novel Therapeutic Performance Mapping System methodology to construct mathematical models of GD1-associated complications at the protein level. Pathophysiological characterization was performed before modeling, and a data science strategy was applied to the predicted protein activity for each protein in the models to identify classifiers. Statistical criteria were used to prioritize the most promising candidates, and 18 candidates were identified. Among them, PDGFB, IL1R2, PTH and CCL3 (MIP-1α) were highlighted due to their ease of measurement in blood. This study proposes a validated novel tool to discover new protein biomarkers to support clinician decision-making in an area where medical needs have not yet been met. However, confirming the results using in vitro and/or in vivo studies is necessary.
Collapse
Affiliation(s)
| | - Pilar Giraldo
- FEETEG, 50006 Zaragoza, Spain;
- Hospital QuirónSalud Zaragoza, 50012 Zaragoza, Spain
| | | | | | | |
Collapse
|
7
|
Lopez E, Etxebarria-Elezgarai J, García-Sebastián M, Altuna M, Ecay-Torres M, Estanga A, Tainta M, López C, Martínez-Lage P, Amigo JM, Seifert A. Unlocking Preclinical Alzheimer's: A Multi-Year Label-Free In Vitro Raman Spectroscopy Study Empowered by Chemometrics. Int J Mol Sci 2024; 25:4737. [PMID: 38731955 PMCID: PMC11084676 DOI: 10.3390/ijms25094737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/13/2024] Open
Abstract
Alzheimer's disease is a progressive neurodegenerative disorder, the early detection of which is crucial for timely intervention and enrollment in clinical trials. However, the preclinical diagnosis of Alzheimer's encounters difficulties with gold-standard methods. The current definitive diagnosis of Alzheimer's still relies on expensive instrumentation and post-mortem histological examinations. Here, we explore label-free Raman spectroscopy with machine learning as an alternative to preclinical Alzheimer's diagnosis. A special feature of this study is the inclusion of patient samples from different cohorts, sampled and measured in different years. To develop reliable classification models, partial least squares discriminant analysis in combination with variable selection methods identified discriminative molecules, including nucleic acids, amino acids, proteins, and carbohydrates such as taurine/hypotaurine and guanine, when applied to Raman spectra taken from dried samples of cerebrospinal fluid. The robustness of the model is remarkable, as the discriminative molecules could be identified in different cohorts and years. A unified model notably classifies preclinical Alzheimer's, which is particularly surprising because of Raman spectroscopy's high sensitivity regarding different measurement conditions. The presented results demonstrate the capability of Raman spectroscopy to detect preclinical Alzheimer's disease for the first time and offer invaluable opportunities for future clinical applications and diagnostic methods.
Collapse
Affiliation(s)
- Eneko Lopez
- CIC nanoGUNE BRTA, 20018 San Sebasián, Spain; (E.L.); (J.E.-E.)
- Department of Physics, University of the Basque Country (UPV/EHU), 20018 San Sebastián, Spain
| | | | - Maite García-Sebastián
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Miren Altuna
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Mirian Ecay-Torres
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Ainara Estanga
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Mikel Tainta
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Carolina López
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Pablo Martínez-Lage
- Center for Research and Advanced Therapies, CITA-Alzhéimer Foundation, 20009 San Sebastián, Spain; (M.G.-S.); (M.A.); (M.E.-T.); (A.E.); (M.T.); (C.L.); (P.M.-L.)
| | - Jose Manuel Amigo
- IKERBASQUE, Basque Foundation for Science, 48009 Bilbao, Spain
- Department of Analytical Chemistry, University of the Basque Country, 48940 Leioa, Spain
| | - Andreas Seifert
- CIC nanoGUNE BRTA, 20018 San Sebasián, Spain; (E.L.); (J.E.-E.)
- IKERBASQUE, Basque Foundation for Science, 48009 Bilbao, Spain
| |
Collapse
|
8
|
Mateus Pereira de Souza N, Kimberli Abeg da Rosa D, de Moraes C, Caeran M, Bordin Hoffmann M, Pozzobon Aita E, Prochnow L, Lya Assmann da Motta A, Antonio Corbellini V, Rieger A. Structural characterization of DNA amplicons by ATR-FTIR spectroscopy as a guide for screening metainflammatory disorders in blood plasma. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 310:123897. [PMID: 38266599 DOI: 10.1016/j.saa.2024.123897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 01/08/2024] [Accepted: 01/14/2024] [Indexed: 01/26/2024]
Abstract
Attenuated total reflectance (ATR) Fourier transform infrared (FTIR) spectroscopy is a promising rapid, reagent-free, and low-cost technique considered for clinical translation. It allows to characterize biofluids proteome, lipidome, and metabolome at once. Metainflammatory disorders share a constellation of chronic systemic inflammation, oxidative stress, aberrant adipogenesis, and hypoxia, that significantly increased cardiovascular and cancer risk. As a result, these patients have elevated concentration of cfDNA in the bloodstream. Considering this, DNA amplicons were analyzed by ATR-FTIR at 3 concentrations with 1:100 dilution: (IU/mL): 718, 7.18, and 0.0718. The generated IR spectrum was used as a guide for variable selection. The main peaks in the biofingerprint (1800-900 cm-1) give important information about the base, base-sugar, phosphate, and sugar-phosphate transitions of DNA. To validate our method of selecting variables in blood plasma, 38 control subjects and 12 with metabolic syndrome were used. Using the wavenumbers of the peaks in the biofingerprint of the DNA amplicons, was generated a discriminant analysis model with Mahalanobis distance in blood plasma, and 100 % discrimination accuracy was obtained. In addition, the interval 1475-1188 cm-1 showed the greatest sensitivity to variation in the concentration of DNA amplicons, so curve fitting with Gaussian funcion was performed, obtaining adjusted-R2 of 0.993. PCA with Mahalanobis distance in the interval 1475-1188 cm-1 obtained an accuracy of 96 % and PLS-DA modeling in the interval 1475-1088 cm-1 obtained AUC = 0.991 with sensitivity of 95 % and specificity of 100 %. Therefore, ATR-FTIR spectroscopy with variable selection guided by DNA IR peaks is a promising and efficient method to be applied in metainflammatory disorders.
Collapse
Affiliation(s)
| | - Dhuli Kimberli Abeg da Rosa
- Bioprocess Engineering and Biotechnology, State University of Rio Grande do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Caroline de Moraes
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Mariana Caeran
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Mairim Bordin Hoffmann
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Eduardo Pozzobon Aita
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Laura Prochnow
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Anna Lya Assmann da Motta
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Valeriano Antonio Corbellini
- Department of Sciences, Humanities, and Education, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil; Postgraduate Program in Health Promotion, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil; Postgraduate Program in Environmental Technology, University of Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| | - Alexandre Rieger
- Department of Life Sciences, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil; Postgraduate Program in Health Promotion, University of Santa Cruz do Sul, Santa Cruz do Sul, Rio Grande do Sul, Brazil; Postgraduate Program in Environmental Technology, University of Santa Cruz do Sul, Rio Grande do Sul, Brazil.
| |
Collapse
|
9
|
Tang J, Mou M, Zheng X, Yan J, Pan Z, Zhang J, Li B, Yang Q, Wang Y, Zhang Y, Gao J, Li S, Yang H, Zhu F. Strategy for Identifying a Robust Metabolomic Signature Reveals the Altered Lipid Metabolism in Pituitary Adenoma. Anal Chem 2024; 96:4745-4755. [PMID: 38417094 DOI: 10.1021/acs.analchem.3c03796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Despite the well-established connection between systematic metabolic abnormalities and the pathophysiology of pituitary adenoma (PA), current metabolomic studies have reported an extremely limited number of metabolites associated with PA. Moreover, there was very little consistency in the identified metabolite signatures, resulting in a lack of robust metabolic biomarkers for the diagnosis and treatment of PA. Herein, we performed a global untargeted plasma metabolomic profiling on PA and identified a highly robust metabolomic signature based on a strategy. Specifically, this strategy is unique in (1) integrating repeated random sampling and a consensus evaluation-based feature selection algorithm and (2) evaluating the consistency of metabolomic signatures among different sample groups. This strategy demonstrated superior robustness and stronger discriminative ability compared with that of other feature selection methods including Student's t-test, partial least-squares-discriminant analysis, support vector machine recursive feature elimination, and random forest recursive feature elimination. More importantly, a highly robust metabolomic signature comprising 45 PA-specific differential metabolites was identified. Moreover, metabolite set enrichment analysis of these potential metabolic biomarkers revealed altered lipid metabolism in PA. In conclusion, our findings contribute to a better understanding of the metabolic changes in PA and may have implications for the development of diagnostic and therapeutic approaches targeting lipid metabolism in PA. We believe that the proposed strategy serves as a valuable tool for screening robust, discriminating metabolic features in the field of metabolomics.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xin Zheng
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Jin Yan
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Bo Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Qingxia Yang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Song Li
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Hui Yang
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
10
|
Lopez E, Etxebarria-Elezgarai J, Amigo JM, Seifert A. The importance of choosing a proper validation strategy in predictive models. A tutorial with real examples. Anal Chim Acta 2023; 1275:341532. [PMID: 37524478 DOI: 10.1016/j.aca.2023.341532] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/02/2023]
Abstract
Machine learning is the art of combining a set of measurement data and predictive variables to forecast future events. Every day, new model approaches (with high levels of sophistication) can be found in the literature. However, less importance is given to the crucial stage of validation. Validation is the assessment that the model reliably links the measurements and the predictive variables. Nevertheless, there are many ways in which a model can be validated and cross-validated reliably, but still, it may be a model that wrongly reflects the real nature of the data and cannot be used to predict external samples. This manuscript shows in a didactical manner how important the data structure is when a model is constructed and how easy it is to obtain models that look promising with wrong-designed cross-validation and external validation strategies. A comprehensive overview of the main validation strategies is shown, exemplified by three different scenarios, all of them focused on classification.
Collapse
Affiliation(s)
- Eneko Lopez
- CIC NanoGUNE BRTA, Tolosa Hiribidea 76, San Sebastián, 20018, Spain; Department of Physics, University of the Basque Country (UPV/EHU), San Sebastián, 20018, Spain
| | | | - Jose Manuel Amigo
- IKERBASQUE, Basque Foundation for Science, Plaza Euskadi, 5, Bilbao, 48009, Spain; Department of Analytical Chemistry, University of the Basque Country, Barrio Sarriena S/N, Leioa, 48940, Spain.
| | - Andreas Seifert
- CIC NanoGUNE BRTA, Tolosa Hiribidea 76, San Sebastián, 20018, Spain; IKERBASQUE, Basque Foundation for Science, Plaza Euskadi, 5, Bilbao, 48009, Spain.
| |
Collapse
|
11
|
Teimouri H, Medvedeva A, Kolomeisky AB. Bacteria-Specific Feature Selection for Enhanced Antimicrobial Peptide Activity Predictions Using Machine-Learning Methods. J Chem Inf Model 2023; 63:1723-1733. [PMID: 36912047 DOI: 10.1021/acs.jcim.2c01551] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
There are several classes of short peptide molecules, known as antimicrobial peptides (AMPs), which are produced during the immune responses of living organisms against various infections. In recent years, substantial progress has been achieved in applying machine-learning methods to predict the activities of AMPs against bacteria. In most investigated cases, however, the outcome is not bacterium-specific since the specific features of bacteria, such as chemical composition and structure of membranes, are not considered. To overcome this problem, we developed a new computational approach that allowed us to train several supervised machine-learning models using a specific set of data associated with peptides targeting E. coli bacteria. LASSO regression and Support Vector Machine techniques have been utilized to select, among more than 1500 physicochemical descriptors, the most important features that can be used to classify a peptide as antimicrobial or ineffective against E. coli. We then performed the classification of active versus inactive AMPs using the Support Vector classifiers, Logistic Regression, and Random Forest methods. This computational study allows us to make recommendations of how to design more efficient antibacterial drug therapies.
Collapse
Affiliation(s)
- Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
12
|
Hyperspectral Imaging Coupled with Multivariate Analyses for Efficient Prediction of Chemical, Biological and Physical Properties of Seafood Products. FOOD ENGINEERING REVIEWS 2023. [DOI: 10.1007/s12393-022-09327-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
13
|
Fernández-Carballido C, Sanchez-Piedra C, Valls R, Garg K, Sánchez-Alonso F, Artigas L, Mas JM, Jovaní V, Manrique S, Campos C, Freire M, Martínez-González O, Castrejón I, Perella C, Coma M, van der Horst-Bruinsma IE. Female Sex, Age, and Unfavorable Response to Tumor Necrosis Factor Inhibitors in Patients With Axial Spondyloarthritis: Results of Statistical and Artificial Intelligence-Based Data Analyses of a National Multicenter Prospective Registry. Arthritis Care Res (Hoboken) 2023; 75:115-124. [PMID: 36278846 DOI: 10.1002/acr.25048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 10/17/2022] [Accepted: 10/20/2022] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Real-world studies are needed to identify factors associated with response to biologic therapies in patients with axial spondyloarthritis (SpA). The objective was to assess sex differences in response to tumor necrosis factor inhibitors (TNFi) and to explore possible risk factors associated with TNFi efficacy. METHODS A total of 969 patients with axial SpA (315 females, 654 males) enrolled in the BIOBADASER registry (2000-2019) who initiated a TNFi (first, second, or further lines) were studied. Statistical and artificial intelligence (AI)-based data analyses were used to explore the association of sex differences and other factors to TNFi response, using the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), to calculate the BASDAI50, with an improvement of at least 50% of the BASDAI score, and using the Ankylosing Spondylitis Disease Activity Score, calculated using the C-reactive protein level (ASDAS-CRP). RESULTS Females had a lower probability of reaching a BASDAI50 response with a first line TNFi treatment at the second year of follow-up (P = 0.018) and a lesser reduction of the ASDAS-CRP at this time point. The logistic regression model showed lower BASDAI50 responses to TNFi in females (P = 0.05). Other factors, such as older age (P = 0.004), were associated with unfavorable responses. The AI data analyses reinforced the idea that age at the beginning of the treatment was the main factor associated with an unfavorable response. The combination of age with other clinical characteristics (female sex or cardiovascular risk factors and events) potentially contributed to an unfavorable response to TNFi. CONCLUSION In this national multicenter registry, female sex was associated with less response to a first-line TNFi by the second year of follow-up. A higher age at the start of the TNFi was the main factor associated with an unfavorable response to TNFi.
Collapse
Affiliation(s)
| | - Carlos Sanchez-Piedra
- Health Technology Assessment Agency of Carlos III Institute of Health, Madrid, Spain
| | | | | | | | | | | | - Vega Jovaní
- Hospital General Universitario Dr. Balmis, Alicante, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Zhou Y, Zhang Y, Li F, Lian X, Zhu Q, Zhu F, Qiu Y. SISPRO: signature identification for spatial proteomics. J Mol Biol 2023. [DOI: 10.1016/j.jmb.2022.167944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
15
|
Alshawaqfeh M, Rababah S, Hayajneh A, Gharaibeh A, Serpedin E. MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification. BMC Med Res Methodol 2022; 22:336. [PMID: 36577938 PMCID: PMC9795700 DOI: 10.1186/s12874-022-01812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality. RESULTS To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets. CONCLUSIONS The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .
Collapse
Affiliation(s)
- Mustafa Alshawaqfeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Salahelden Rababah
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan ,grid.264260.40000 0001 2164 4508Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, NY, USA
| | - Abdullah Hayajneh
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| | - Ammar Gharaibeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Erchin Serpedin
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| |
Collapse
|
16
|
Yang Q, Li B, Wang P, Xie J, Feng Y, Liu Z, Zhu F. LargeMetabo: an out-of-the-box tool for processing and analyzing large-scale metabolomic data. Brief Bioinform 2022; 23:bbac455. [PMID: 36274234 DOI: 10.1093/bib/bbac455] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/06/2022] [Accepted: 09/24/2022] [Indexed: 12/14/2022] Open
Abstract
Large-scale metabolomics is a powerful technique that has attracted widespread attention in biomedical studies focused on identifying biomarkers and interpreting the mechanisms of complex diseases. Despite a rapid increase in the number of large-scale metabolomic studies, the analysis of metabolomic data remains a key challenge. Specifically, diverse unwanted variations and batch effects in processing many samples have a substantial impact on identifying true biological markers, and it is a daunting challenge to annotate a plethora of peaks as metabolites in untargeted mass spectrometry-based metabolomics. Therefore, the development of an out-of-the-box tool is urgently needed to realize data integration and to accurately annotate metabolites with enhanced functions. In this study, the LargeMetabo package based on R code was developed for processing and analyzing large-scale metabolomic data. This package is unique because it is capable of (1) integrating multiple analytical experiments to effectively boost the power of statistical analysis; (2) selecting the appropriate biomarker identification method by intelligent assessment for large-scale metabolic data and (3) providing metabolite annotation and enrichment analysis based on an enhanced metabolite database. The LargeMetabo package can facilitate flexibility and reproducibility in large-scale metabolomics. The package is freely available from https://github.com/LargeMetabo/LargeMetabo.
Collapse
Affiliation(s)
- Qingxia Yang
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, Chongqing 401331, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Jicheng Xie
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yuhao Feng
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Ziqiang Liu
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
17
|
Waury K, Willemse EAJ, Vanmechelen E, Zetterberg H, Teunissen CE, Abeln S. Bioinformatics tools and data resources for assay development of fluid protein biomarkers. Biomark Res 2022; 10:83. [DOI: 10.1186/s40364-022-00425-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractFluid protein biomarkers are important tools in clinical research and health care to support diagnosis and to monitor patients. Especially within the field of dementia, novel biomarkers could address the current challenges of providing an early diagnosis and of selecting trial participants. While the great potential of fluid biomarkers is recognized, their implementation in routine clinical use has been slow. One major obstacle is the often unsuccessful translation of biomarker candidates from explorative high-throughput techniques to sensitive antibody-based immunoassays. In this review, we propose the incorporation of bioinformatics into the workflow of novel immunoassay development to overcome this bottleneck and thus facilitate the development of novel biomarkers towards clinical laboratory practice. Due to the rapid progress within the field of bioinformatics many freely available and easy-to-use tools and data resources exist which can aid the researcher at various stages. Current prediction methods and databases can support the selection of suitable biomarker candidates, as well as the choice of appropriate commercial affinity reagents. Additionally, we examine methods that can determine or predict the epitope - an antibody’s binding region on its antigen - and can help to make an informed choice on the immunogenic peptide used for novel antibody production. Selected use cases for biomarker candidates help illustrate the application and interpretation of the introduced tools.
Collapse
|
18
|
Yang Q, Li Y, Li B, Gong Y. A novel multi-class classification model for schizophrenia, bipolar disorder and healthy controls using comprehensive transcriptomic data. Comput Biol Med 2022; 148:105956. [PMID: 35981456 DOI: 10.1016/j.compbiomed.2022.105956] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 07/30/2022] [Accepted: 08/06/2022] [Indexed: 01/01/2023]
Abstract
Two common psychiatric disorders, schizophrenia (SCZ) and bipolar disorder (BP), confer lifelong disability and collectively affect 2% of the world population. Because the diagnosis of psychiatry is based only on symptoms, developing more effective methods for the diagnosis of psychiatric disorders is a major international public health priority. Furthermore, SCZ and BP overlap considerably in terms of symptoms and risk genes. Therefore, the clarity of the underlying etiology and pathology remains lacking for these two disorders. Although many studies have been conducted, a classification model with higher accuracy and consistency was found to still be necessary for accurate diagnoses of SCZ and BP. In this study, a comprehensive dataset was combined from five independent transcriptomic studies. This dataset comprised 120 patients with SCZ, 101 patients with BP, and 149 healthy subjects. The partial least squares discriminant analysis (PLS-DA) method was applied to identify the gene signature among multiple groups, and 341 differentially expressed genes (DEGs) were identified. Then, the disease relevance of these DEGs was systematically performed, including (α) the great disease relevance of the identified signature, (β) the hub genes of the protein-protein interaction network playing a key role in psychiatric disorders, and (γ) gene ontology terms and enriched pathways playing a key role in psychiatric disorders. Finally, a popular multi-class classifier, support vector machine (SVM), was applied to construct a novel multi-class classification model using the identified signature for SCZ and BP. Using the independent test sets, the classification capacity of this multi-class model was assessed, which showed this model had a strong classification ability.
Collapse
Affiliation(s)
- Qingxia Yang
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Yi Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, Chongqing, 401331, China
| | - Yaguo Gong
- School of Pharmacy, Macau University of Science and Technology, Macau, China.
| |
Collapse
|
19
|
Mendonca-Neto R, Li Z, Fenyo D, Silva CT, Nakamura FG, Nakamura EF. A Gene Selection Method Based on Outliers for Breast Cancer Subtype Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2547-2559. [PMID: 34860652 DOI: 10.1109/tcbb.2021.3132339] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. Gene expression data is a viable alternative to be employed on cancer subtype classification, as they represent the state of a cell at the molecular level, but generally has a relatively small number of samples compared to a large number of genes. Gene selection is a promising approach that addresses this uneven high-dimensional matrix of genes versus samples and plays an important role in the development of efficient cancer subtype classification. In this work, an innovative outlier-based gene selection (OGS) method is proposed to select relevant genes for efficiently and effectively classify breast cancer subtypes. Experiments show that our strategy presents an F1 score of 1.0 for basal and 0.86 for her 2, the two subtypes with the worst prognoses, respectively. Compared to other methods, our proposed method outperforms in the F1 score using 80% less genes. In general, our method selects only a few highly relevant genes, speeding up the classification, and significantly improving the classifier's performance.
Collapse
|
20
|
Identification of key candidate genes for IgA nephropathy using machine learning and statistics based bioinformatics models. Sci Rep 2022; 12:13963. [PMID: 35978028 PMCID: PMC9385868 DOI: 10.1038/s41598-022-18273-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/08/2022] [Indexed: 11/08/2022] Open
Abstract
Immunoglobulin-A-nephropathy (IgAN) is a kidney disease caused by the accumulation of IgAN deposits in the kidneys, which causes inflammation and damage to the kidney tissues. Various bioinformatics analysis-based approaches are widely used to predict novel candidate genes and pathways associated with IgAN. However, there is still some scope to clearly explore the molecular mechanisms and causes of IgAN development and progression. Therefore, the present study aimed to identify key candidate genes for IgAN using machine learning (ML) and statistics-based bioinformatics models. First, differentially expressed genes (DEGs) were identified using limma, and then enrichment analysis was performed on DEGs using DAVID. Protein-protein interaction (PPI) was constructed using STRING and Cytoscape was used to determine hub genes based on connectivity and hub modules based on MCODE scores and their associated genes from DEGs. Furthermore, ML-based algorithms, namely support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and partial least square discriminant analysis (PLS-DA) were applied to identify the discriminative genes of IgAN from DEGs. Finally, the key candidate genes (FOS, JUN, EGR1, FOSB, and DUSP1) were identified as overlapping genes among the selected hub genes, hub module genes, and discriminative genes from SVM, LASSO, and PLS-DA, respectively which can be used for the diagnosis and treatment of IgAN.
Collapse
|
21
|
Matthaiou EI, Sharifi H, O'Donnell C, Chiu W, Owyang C, Chatterjee P, Turk I, Johnston L, Brondstetter T, Morris K, Cheng GS, Hsu JL. The safety and tolerability of pirfenidone for bronchiolitis obliterans syndrome after hematopoietic cell transplant (STOP-BOS) trial. Bone Marrow Transplant 2022; 57:1319-1326. [PMID: 35641662 PMCID: PMC9357121 DOI: 10.1038/s41409-022-01716-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/09/2022] [Accepted: 05/13/2022] [Indexed: 02/03/2023]
Abstract
Bronchiolitis obliterans syndrome (BOS) is the most morbid form of chronic graft-versus-host disease (cGVHD) after hematopoietic cell transplantation (HCT). Progressive airway fibrosis leads to a 5-year survival of 40%. Treatment options for BOS are limited. A single arm, 52-week, Phase I study of pirfenidone was conducted. The primary outcome was tolerability defined as maintaining the recommended dose of pirfenidone (2403 mg/day) without a dose reduction totaling more than 21 days, due to adverse events (AEs) or severe AEs (SAEs). Secondary outcomes included pulmonary function tests (PFTs) and patient reported outcomes (PROs). Among 22 participants treated for 1 year, 13 (59%) tolerated the recommended dose, with an average daily tolerated dose of 2325.6 mg/day. Twenty-two SAEs were observed, with 90.9% related to infections, none were attributed to pirfenidone. There was an increase in the average percent predicted forced expiratory volume in 1 s (FEV1%) of 7 percentage points annually and improvements in PROs related to symptoms of cGVHD. In this Phase I study, treatment with pirfenidone was safe. The stabilization in PFTs and improvements in PROs suggest the potential of pirfenidone for BOS treatment and support the value of a randomized controlled trial to evaluate the efficacy of pirfenidone in BOS after HCT. The study is registered in ClinicalTrials.gov (NCT03315741).
Collapse
Affiliation(s)
- Efthymia Iliana Matthaiou
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Husham Sharifi
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Christian O'Donnell
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Wayland Chiu
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Clark Owyang
- Department of Medicine, Division of Pulmonary and Critical Care Medicine, New York-Presbyterian Hospital/Weill Cornell Medical Center, New York, NY, USA
| | - Paulami Chatterjee
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Ihsan Turk
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Laura Johnston
- Department of Medicine, Division of Blood and Marrow Transplantation, Stanford University School of Medicine, Stanford, CA, USA
| | - Theresa Brondstetter
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Blood and Marrow Transplantation, Stanford University School of Medicine, Stanford, CA, USA
| | - Karen Morris
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Guang-Shing Cheng
- Clinical Research Division, Section of Pulmonary and Critical Care, Fred Hutchinson Cancer Research Center, Department of Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA, USA
| | - Joe L Hsu
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
22
|
Li F, Yin J, Lu M, Yang Q, Zeng Z, Zhang B, Li Z, Qiu Y, Dai H, Chen Y, Zhu F. ConSIG: consistent discovery of molecular signature from OMIC data. Brief Bioinform 2022; 23:6618243. [PMID: 35758241 DOI: 10.1093/bib/bbac253] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 12/12/2022] Open
Abstract
The discovery of proper molecular signature from OMIC data is indispensable for determining biological state, physiological condition, disease etiology, and therapeutic response. However, the identified signature is reported to be highly inconsistent, and there is little overlap among the signatures identified from different biological datasets. Such inconsistency raises doubts about the reliability of reported signatures and significantly hampers its biological and clinical applications. Herein, an online tool, ConSIG, was constructed to realize consistent discovery of gene/protein signature from any uploaded transcriptomic/proteomic data. This tool is unique in a) integrating a novel strategy capable of significantly enhancing the consistency of signature discovery, b) determining the optimal signature by collective assessment, and c) confirming the biological relevance by enriching the disease/gene ontology. With the increasingly accumulated concerns about signature consistency and biological relevance, this online tool is expected to be used as an essential complement to other existing tools for OMIC-based signature discovery. ConSIG is freely accessible to all users without login requirement at https://idrblab.org/consig/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.,Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
23
|
Data mining analyses for precision medicine in acromegaly: a proof of concept. Sci Rep 2022; 12:8979. [PMID: 35643771 PMCID: PMC9148300 DOI: 10.1038/s41598-022-12955-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/13/2022] [Indexed: 11/21/2022] Open
Abstract
Predicting which acromegaly patients could benefit from somatostatin receptor ligands (SRL) is a must for personalized medicine. Although many biomarkers linked to SRL response have been identified, there is no consensus criterion on how to assign this pharmacologic treatment according to biomarker levels. Our aim is to provide better predictive tools for an accurate acromegaly patient stratification regarding the ability to respond to SRL. We took advantage of a multicenter study of 71 acromegaly patients and we used advanced mathematical modelling to predict SRL response combining molecular and clinical information. Different models of patient stratification were obtained, with a much higher accuracy when the studied cohort is fragmented according to relevant clinical characteristics. Considering all the models, a patient stratification based on the extrasellar growth of the tumor, sex, age and the expression of E-cadherin, GHRL, IN1-GHRL, DRD2, SSTR5 and PEBP1 is proposed, with accuracies that stand between 71 to 95%. In conclusion, the use of data mining could be very useful for implementation of personalized medicine in acromegaly through an interdisciplinary work between computer science, mathematics, biology and medicine. This new methodology opens a door to more precise and personalized medicine for acromegaly patients.
Collapse
|
24
|
Lin MH, Wu PS, Wong TH, Lin IY, Lin J, Cox J, Yu SH. Benchmarking differential expression, imputation and quantification methods for proteomics data. Brief Bioinform 2022; 23:6566001. [PMID: 35397162 DOI: 10.1093/bib/bbac138] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 11/14/2022] Open
Abstract
Data analysis is a critical part of quantitative proteomics studies in interpreting biological questions. Numerous computational tools for protein quantification, imputation and differential expression (DE) analysis were generated in the past decade and the search for optimal tools is still going on. Moreover, due to the rapid development of RNA sequencing (RNA-seq) technology, a vast number of DE analysis methods were created for that purpose. The applicability of these newly developed RNA-seq-oriented tools to proteomics data remains in doubt. In order to benchmark these analysis methods, a proteomics dataset consisting of proteins derived from humans, yeast and drosophila, in defined ratios, was generated in this study. Based on this dataset, DE analysis tools, including microarray- and RNA-seq-based ones, imputation algorithms and protein quantification methods were compared and benchmarked. Furthermore, applying these approaches to two public datasets showed that RNA-seq-based DE tools achieved higher accuracy (ACC) in identifying DEPs. This study provides useful guidelines for analyzing quantitative proteomics datasets. All the methods used in this study were integrated into the Perseus software, version 2.0.3.0, which is available at https://www.maxquant.org/perseus.
Collapse
Affiliation(s)
- Miao-Hsia Lin
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - Pei-Shan Wu
- Genome and Systems Biology Degree Program, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Tzu-Hsuan Wong
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - I-Ying Lin
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - Johnathan Lin
- Institute of Precision Medicine, National Sun Yat-set University, No.70 Lien-hai Rd., Kaohsiung 80424, Taiwan
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Sung-Huan Yu
- Institute of Precision Medicine, National Sun Yat-set University, No.70 Lien-hai Rd., Kaohsiung 80424, Taiwan
| |
Collapse
|
25
|
Transcriptomic Biomarker Signatures for Discrimination of Oral Cancer Surgical Margins. Biomolecules 2022; 12:biom12030464. [PMID: 35327656 PMCID: PMC8946245 DOI: 10.3390/biom12030464] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/01/2022] [Accepted: 03/11/2022] [Indexed: 02/01/2023] Open
Abstract
Relapse after surgery for oral squamous cell carcinoma (OSCC) contributes significantly to morbidity, mortality and poor outcomes. The current histopathological diagnostic techniques are insufficiently sensitive for the detection of oral cancer and minimal residual disease in surgical margins. We used whole-transcriptome gene expression and small noncoding RNA profiles from tumour, close margin and distant margin biopsies from 18 patients undergoing surgical resection for OSCC. By applying multivariate regression algorithms (sPLS-DA) suitable for higher dimension data, we objectively identified biomarker signatures for tumour and marginal tissue zones. We were able to define molecular signatures that discriminated tumours from the marginal zones and between the close and distant margins. These signatures included genes not previously associated with OSCC, such as MAMDC2, SYNPO2 and ARMH4. For discrimination of the normal and tumour sampling zones, we were able to derive an effective gene-based classifying model for molecular abnormality based on a panel of eight genes (MMP1, MMP12, MYO1B, TNFRSF12A, WDR66, LAMC2, SLC16A1 and PLAU). We demonstrated the classification performance of these gene signatures in an independent validation dataset of OSCC tumour and marginal gene expression profiles. These biomarker signatures may contribute to the earlier detection of tumour cells and complement existing surgical and histopathological techniques used to determine clear surgical margins.
Collapse
|
26
|
Li F, Zhou Y, Zhang Y, Yin J, Qiu Y, Gao J, Zhu F. POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability. Brief Bioinform 2022; 23:6532538. [PMID: 35183059 DOI: 10.1093/bib/bbac040] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/21/2022] [Accepted: 01/27/2022] [Indexed: 12/17/2022] Open
Abstract
Mass spectrometry-based proteomic technique has become indispensable in current exploration of complex and dynamic biological processes. Instrument development has largely ensured the effective production of proteomic data, which necessitates commensurate advances in statistical framework to discover the optimal proteomic signature. Current framework mainly emphasizes the generalizability of the identified signature in predicting the independent data but neglects the reproducibility among signatures identified from independently repeated trials on different sub-dataset. These problems seriously restricted the wide application of the proteomic technique in molecular biology and other related directions. Thus, it is crucial to enable the generalizable and reproducible discovery of the proteomic signature with the subsequent indication of phenotype association. However, no such tool has been developed and available yet. Herein, an online tool, POSREG, was therefore constructed to identify the optimal signature for a set of proteomic data. It works by (i) identifying the proteomic signature of good reproducibility and aggregating them to ensemble feature ranking by ensemble learning, (ii) assessing the generalizability of ensemble feature ranking to acquire the optimal signature and (iii) indicating the phenotype association of discovered signature. POSREG is unique in its capacity of discovering the proteomic signature by simultaneously optimizing its reproducibility and generalizability. It is now accessible free of charge without any registration or login requirement at https://idrblab.org/posreg/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Jianqing Gao
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
27
|
Rahman SM, Lan J, Kaeli D, Dy J, Alshawabkeh A, Gu AZ. Machine learning-based biomarkers identification from toxicogenomics - Bridging to regulatory relevant phenotypic endpoints. JOURNAL OF HAZARDOUS MATERIALS 2022; 423:127141. [PMID: 34560480 PMCID: PMC9628282 DOI: 10.1016/j.jhazmat.2021.127141] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 05/30/2023]
Abstract
One of the major challenges in realization and implementations of the Tox21 vision is the urgent need to establish quantitative link between in-vitro assay molecular endpoint and in-vivo regulatory-relevant phenotypic toxicity endpoint. Current toxicomics approach still mostly rely on large number of redundant markers without pre-selection or ranking, therefore, selection of relevant biomarkers with minimal redundancy would reduce the number of markers to be monitored and reduce the cost, time, and complexity of the toxicity screening and risk monitoring. Here, we demonstrated that, using time series toxicomics in-vitro assay along with machine learning-based feature selection (maximum relevance and minimum redundancy (MRMR)) and classification method (support vector machine (SVM)), an "optimal" number of biomarkers with minimum redundancy can be identified for prediction of phenotypic toxicity endpoints with good accuracy. We included two case studies for in-vivo carcinogenicity and Ames genotoxicity prediction, using 20 selected chemicals including model genotoxic chemicals and negative controls, respectively. The results suggested that, employing the adverse outcome pathway (AOP) concept, molecular endpoints based on a relatively small number of properly selected biomarker-ensemble involved in the conserved DNA-damage and repair pathways among eukaryotes, were able to predict both Ames genotoxicity endpoints and in-vivo carcinogenicity in rats. A prediction accuracy of 76% with AUC = 0.81 was achieved while predicting in-vivo carcinogenicity with the top-ranked five biomarkers. For Ames genotoxicity prediction, the top-ranked five biomarkers were able to achieve prediction accuracy of 70% with AUC = 0.75. However, the specific biomarkers identified as the top-ranked five biomarkers are different for the two different phenotypic genotoxicity assays. The top-ranked biomarkers for the in-vivo carcinogenicity prediction mainly focused on double strand break repair and DNA recombination, whereas the selected top-ranked biomarkers for Ames genotoxicity prediction are associated with base- and nucleotide-excision repair The method developed in this study will help to fill in the knowledge gap in phenotypic anchoring and predictive toxicology, and contribute to the progress in the implementation of tox 21 vision for environmental and health applications.
Collapse
Affiliation(s)
- Sheikh Mokhlesur Rahman
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; Department of Civil Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
| | - Jiaqi Lan
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - David Kaeli
- Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - Jennifer Dy
- Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - Akram Alshawabkeh
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - April Z Gu
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; School of Civil and Environmental Engineering, Cornell University, 263 Hollister Hall, Ithaca, NY 14853, USA.
| |
Collapse
|
28
|
Wang T, Tang L, Lin R, He D, Wu Y, Zhang Y, Yang P, He J. Individual variability in human urinary metabolites identifies age-related, body mass index-related, and sex-related biomarkers. Mol Genet Genomic Med 2021; 9:e1738. [PMID: 34293245 PMCID: PMC8404239 DOI: 10.1002/mgg3.1738] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 05/05/2019] [Accepted: 05/22/2019] [Indexed: 12/14/2022] Open
Abstract
Background Metabolites present in human urine can be influenced by individual physiological parameters (e.g., body mass index [BMI], age, and sex). Observation of altered metabolites concentrations could provide insight into underlying disease pathology, disease prognosis and diagnosis, and facilitate discovery of novel biomarkers. Methods Quantitative metabolomics analysis in the urine of 183 healthy individuals was performed based on high‐resolution liquid chromatography–mass spectrometry (LC–MS). Coefficients of variation were obtained for 109 urine metabolites of all the 183 human healthy subjects. Results Three urine metabolites (such as dehydroepiandrosterone sulfate, acetaminophen glucuronide, and p‐anisic acid) with CV183 > 0.3, for which metabolomics studies have been scarce, are considered highly variable here. We identified 30 age‐related metabolites, 18 BMI‐related metabolites, and 42 sex‐related metabolites. Among the identified metabolites, three metabolites were found to be associated with all three physiological parameters (age, BMI, and sex), which included dehydroepiandrosterone sulfate, 3‐methylcrotonylglycine and N‐acetyl‐aspartic acid. Pearson's coefficients demonstrated that some age‐, BMI‐, and sex‐related compounds are strongly correlated, suggesting that age, BMI, and sex could affect them concomitantly. Conclusion Metabolic differences between distinct physiological statuses were found to be related to several metabolic pathways (such as the caffeine metabolism, the amino acid metabolism, and the carbohydrate metabolism), and these findings may be key for the discovery of new diagnostics and treatments as well as new understandings on the mechanisms of some related diseases.
Collapse
Affiliation(s)
- Tianling Wang
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China.,Dingxi Campus of Gansu, University of Traditional Chinese Medicine, Dingxi, China
| | - Lei Tang
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China
| | - Ruili Lin
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China
| | - Dian He
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China.,Gansu Institute for Drug Control, Lanzhou, China
| | - Yanqing Wu
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China
| | - Yang Zhang
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Pingrong Yang
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China.,Gansu Institute for Drug Control, Lanzhou, China
| | - Junquan He
- Materia Medica Development Group, Institute of Medicinal Chemistry, Lanzhou University School of Pharmacy, Lanzhou, China.,Gansu Institute for Drug Control, Lanzhou, China
| |
Collapse
|
29
|
Yang Q, Li B, Chen S, Tang J, Li Y, Li Y, Zhang S, Shi C, Zhang Y, Mou M, Xue W, Zhu F. MMEASE: Online meta-analysis of metabolomic data by enhanced metabolite annotation, marker selection and enrichment analysis. J Proteomics 2021; 232:104023. [PMID: 33130111 DOI: 10.1016/j.jprot.2020.104023] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 10/12/2020] [Accepted: 10/22/2020] [Indexed: 12/17/2022]
Abstract
Large-scale and long-term metabolomic studies have attracted widespread attention in the biomedical studies yet remain challenging despite recent technique progresses. In particular, the ineffective way of experiment integration and limited capacity in metabolite annotation are known issues. Herein, we constructed an online tool MMEASE enabling the integration of multiple analytical experiments with an enhanced metabolite annotation and enrichment analysis (https://idrblab.org/mmease/). MMEASE was unique in capable of (1) integrating multiple analytical blocks; (2) providing enriched annotation for >330 thousands of metabolites; (3) conducting enrichment analysis using various categories/sub-categories. All in all, MMEASE aimed at supplying a comprehensive service for large-scale and long-term metabolomics, which might provide valuable guidance to current biomedical studies. SIGNIFICANCE: To facilitate the studies of large-scale and long-term metabolomic analysis, MMEASE was developed to (1) achieve the online integration of multiple datasets from different analytical experiments, (2) provide the most diverse strategies for marker discovery, enabling performance assessment and (3) significantly amplify metabolite annotation and subsequent enrichment analysis. MMEASE aimed at supplying a comprehensive service for long-term and large-scale metabolomics, which might provide valuable guidance to current biomedical studies.
Collapse
Affiliation(s)
- Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, Chongqing 401331, China
| | - Sijie Chen
- School of Pharmaceutical Sciences, School of Big Data and Software Engineering, Chongqing University, Chongqing, Chongqing 401331, China
| | - Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing, Chongqing 400016, China
| | - Yinghong Li
- School of Pharmaceutical Sciences, School of Big Data and Software Engineering, Chongqing University, Chongqing, Chongqing 401331, China
| | - Yi Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Song Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Cheng Shi
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, School of Big Data and Software Engineering, Chongqing University, Chongqing, Chongqing 401331, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; School of Pharmaceutical Sciences, School of Big Data and Software Engineering, Chongqing University, Chongqing, Chongqing 401331, China.
| |
Collapse
|
30
|
Fu J, Luo Y, Mou M, Zhang H, Tang J, Wang Y, Zhu F. Advances in Current Diabetes Proteomics: From the Perspectives of Label- free Quantification and Biomarker Selection. Curr Drug Targets 2021; 21:34-54. [PMID: 31433754 DOI: 10.2174/1389450120666190821160207] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 07/17/2019] [Accepted: 07/24/2019] [Indexed: 12/13/2022]
Abstract
BACKGROUND Due to its prevalence and negative impacts on both the economy and society, the diabetes mellitus (DM) has emerged as a worldwide concern. In light of this, the label-free quantification (LFQ) proteomics and diabetic marker selection methods have been applied to elucidate the underlying mechanisms associated with insulin resistance, explore novel protein biomarkers, and discover innovative therapeutic protein targets. OBJECTIVE The purpose of this manuscript is to review and analyze the recent computational advances and development of label-free quantification and diabetic marker selection in diabetes proteomics. METHODS Web of Science database, PubMed database and Google Scholar were utilized for searching label-free quantification, computational advances, feature selection and diabetes proteomics. RESULTS In this study, we systematically review the computational advances of label-free quantification and diabetic marker selection methods which were applied to get the understanding of DM pathological mechanisms. Firstly, different popular quantification measurements and proteomic quantification software tools which have been applied to the diabetes studies are comprehensively discussed. Secondly, a number of popular manipulation methods including transformation, pretreatment (centering, scaling, and normalization), missing value imputation methods and a variety of popular feature selection techniques applied to diabetes proteomic data are overviewed with objective evaluation on their advantages and disadvantages. Finally, the guidelines for the efficient use of the computationbased LFQ technology and feature selection methods in diabetes proteomics are proposed. CONCLUSION In summary, this review provides guidelines for researchers who will engage in proteomics biomarker discovery and by properly applying these proteomic computational advances, more reliable therapeutic targets will be found in the field of diabetes mellitus.
Collapse
Affiliation(s)
- Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| |
Collapse
|
31
|
Ruiz-Perez D, Guan H, Madhivanan P, Mathee K, Narasimhan G. So you think you can PLS-DA? BMC Bioinformatics 2020; 21:2. [PMID: 33297937 PMCID: PMC7724830 DOI: 10.1186/s12859-019-3310-7] [Citation(s) in RCA: 174] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 12/09/2019] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). RESULTS We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda CONCLUSIONS: Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.
Collapse
Affiliation(s)
- Daniel Ruiz-Perez
- Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, Miami, 33199, FL, USA
| | - Haibin Guan
- Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, Miami, 33199, FL, USA
| | - Purnima Madhivanan
- Department of Epidemiology, Florida International University, 11200 SW 8th St, Miami, 24105, FL, USA
| | - Kalai Mathee
- Herbert Wertheim College of Medicine, Florida International University, 11200 SW 8th St, Miami, 24105, FL, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, Miami, 33199, FL, USA.
| |
Collapse
|
32
|
Hossen Z, Abrar MA, Ara SR, Hasan MK. RATE-iPATH: On the design of integrated ultrasonic biomarkers for breast cancer detection. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2020.102053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
33
|
Dong H, Liu Y, Zeng WF, Shu K, Zhu Y, Chang C. A Deep Learning-Based Tumor Classifier Directly Using MS Raw Data. Proteomics 2020; 20:e1900344. [PMID: 32643271 DOI: 10.1002/pmic.201900344] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 06/21/2020] [Indexed: 12/11/2022]
Abstract
Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.
Collapse
Affiliation(s)
- Hao Dong
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.,School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.,Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Yi Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.,College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100023, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kunxian Shu
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.,Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| |
Collapse
|
34
|
Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, Xiao Z, Lou Y, Qiu Y, Zhu F. Computational advances of tumor marker selection and sample classification in cancer proteomics. Comput Struct Biotechnol J 2020; 18:2012-2025. [PMID: 32802273 PMCID: PMC7403885 DOI: 10.1016/j.csbj.2020.07.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 12/11/2022] Open
Abstract
Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.
Collapse
Key Words
- ANN, Artificial Neural Network
- ANOVA, Analysis of Variance
- CFS, Correlation-based Feature Selection
- Cancer proteomics
- Computational methods
- DAPC, Discriminant Analysis of Principal Component
- DT, Decision Trees
- EDA, Estimation of Distribution Algorithm
- FC, Fold Change
- GA, Genetic Algorithms
- GR, Gain Ratio
- HC, Hill Climbing
- HCA, Hierarchical Cluster Analysis
- IG, Information Gain
- LDA, Linear Discriminant Analysis
- LIMMA, Linear Models for Microarray Data
- MBF, Markov Blanket Filter
- MWW, Mann–Whitney–Wilcoxon test
- OPLS-DA, Orthogonal Partial Least Squares Discriminant Analysis
- PCA, Principal Component Analysis
- PLS-DA, Partial Least Square Discriminant Analysis
- RF, Random Forest
- RF-RFE, Random Forest with Recursive Feature Elimination
- SA, Simulated Annealing
- SAM, Significance Analysis of Microarrays
- SBE, Sequential Backward Elimination
- SFS, and Sequential Forward Selection
- SOM, Self-organizing Map
- SU, Symmetrical Uncertainty
- SVM, Support Vector Machine
- SVM-RFE, Support Vector Machine with Recursive Feature Elimination
- Sample classification
- Tumor marker selection
- sPLSDA, Sparse Partial Least Squares Discriminant Analysis
- t-SNE, Student t Distribution
- χ2, Chi-square
Collapse
Affiliation(s)
- Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| | - Yi Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziyu Xiao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Feng Zhu
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
35
|
Tang J, Wang Y, Fu J, Zhou Y, Luo Y, Zhang Y, Li B, Yang Q, Xue W, Lou Y, Qiu Y, Zhu F. A critical assessment of the feature selection methods used for biomarker discovery in current metaproteomics studies. Brief Bioinform 2020; 21:1378-1390. [PMID: 31197323 DOI: 10.1093/bib/bbz061] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 04/14/2019] [Indexed: 05/16/2025] Open
Abstract
Microbial community (MC) has great impact on mediating complex disease indications, biogeochemical cycling and agricultural productivities, which makes metaproteomics powerful technique for quantifying diverse and dynamic composition of proteins or peptides. The key role of biostatistical strategies in MC study is reported to be underestimated, especially the appropriate application of feature selection method (FSM) is largely ignored. Although extensive efforts have been devoted to assessing the performance of FSMs, previous studies focused only on their classification accuracy without considering their ability to correctly and comprehensively identify the spiked proteins. In this study, the performances of 14 FSMs were comprehensively assessed based on two key criteria (both sample classification and spiked protein discovery) using a variety of metaproteomics benchmarks. First, the classification accuracies of those 14 FSMs were evaluated. Then, their abilities in identifying the proteins of different spiked concentrations were assessed. Finally, seven FSMs (FC, LMEB, OPLS-DA, PLS-DA, SAM, SVM-RFE and T-Test) were identified as performing consistently superior or good under both criteria with the PLS-DA performing consistently superior. In summary, this study served as comprehensive analysis on the performances of current FSMs and could provide a valuable guideline for researchers in metaproteomics.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Department of Bioinformatics, Chongqing Medical University, Chongqing, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Bo Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| |
Collapse
|
36
|
Tang J, Mou M, Wang Y, Luo Y, Zhu F. MetaFS: Performance assessment of biomarker discovery in metaproteomics. Brief Bioinform 2020; 22:5854399. [PMID: 32510556 DOI: 10.1093/bib/bbaa105] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 04/17/2020] [Accepted: 05/05/2020] [Indexed: 12/19/2022] Open
Abstract
Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.
Collapse
|
37
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
38
|
Label-free plasma proteomics identifies haptoglobin-related protein as candidate marker of idiopathic pulmonary fibrosis and dysregulation of complement and oxidative pathways. Sci Rep 2020; 10:7787. [PMID: 32385381 PMCID: PMC7211010 DOI: 10.1038/s41598-020-64759-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 04/19/2020] [Indexed: 02/06/2023] Open
Abstract
Idiopathic pulmonary fibrosis (IPF) is a lung parenchymal disease of unknown cause usually occurring in older adults. It is a chronic and progressive condition with poor prognosis and diagnosis is largely clinical. Currently, there exist few biomarkers that can predict patient outcome or response to therapies. Together with lack of markers, the need for novel markers for the detection and monitoring of IPF, is paramount. We have performed label-free plasma proteomics of thirty six individuals, 17 of which had confirmed IPF. Proteomics data was analyzed by volcano plot, hierarchical clustering, Partial-least square discriminant analysis (PLS-DA) and Ingenuity pathway analysis. Univariate and multivariate statistical analysis overlap identified haptoglobin-related protein as a possible marker of IPF when compared to control samples (Area under the curve 0.851, ROC-analysis). LXR/RXR activation and complement activation pathways were enriched in t-test significant proteins and oxidative regulators, complement proteins and protease inhibitors were enriched in PLS-DA significant proteins. Our pilot study points towards aberrations in complement activation and oxidative damage in IPF patients and provides haptoglobin-related protein as a new candidate biomarker of IPF.
Collapse
|
39
|
Cui X, Qin F, Song L, Wang T, Geng B, Zhang W, Jin L, Wang W, Li S, Tian X, Zhang H, Cai J. Novel Biomarkers for the Precisive Diagnosis and Activity Classification of Takayasu Arteritis. CIRCULATION-GENOMIC AND PRECISION MEDICINE 2020; 12:e002080. [PMID: 30645172 DOI: 10.1161/circgen.117.002080] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Establishing the diagnosis and determining disease activity of Takayasu arteritis (TA) remains challenging. Novel biomarkers might help to solve this problem. METHODS In the screening phase, by using large-scale protein arrays detecting samples from 90 subjects (TA active, 29; TA inactive 31; and controls, 30). In the validation phase, by using enzyme-linked immunosorbent assay (ELISA), potential biomarkers for TA diagnosis, and activity classification were measured in independent cohorts, respectively. RESULTS In the screening phase, 18 cytokines significantly differentially enriched between TA patients and controls and another 15 cytokines significantly differentially enriched between TA patient in active and inactive status were identified (adjusted P<0.05). In the validation phase, TIMP (tissue inhibitor of metalloproteinases)-1 was identified as a specific biomarker for TA diagnosis that a cutoff value of 221.86 μg/L could provide a specificity of 89.58% and a positive predictive value of 0.92. Meanwhile, we found it unreliable to use a single biomarker for TA activity classification. Considering this, we further built a logistic regression model based on multiple cytokines, including CA (cancer antigen) 125, FLRG (follistatin-related protein), IGFBP (insulin-like growth factor-binding protein)-2, CA15-3, GROa (growth-regulated alpha protein), LYVE (lymphatic vessel endothelial hyaluronic acid receptor)-1, ULBP (UL16-binding protein)-2, and CD (cluster of differentiation) 99, with an area under the curve reaching 0.909 for discriminating TA activity status. CONCLUSIONS This study suggested TIMP-1 as a specific biomarker for TA diagnosis with a cutoff value of 221.86 μg/L. Furthermore, we provided a logistic regression model based on 8 biomarkers for the precisive activity classification of TA with an area under the curve of 0.909.
Collapse
Affiliation(s)
- Xiao Cui
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China.,Cardiovascular Disease Center, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, P.R. China (X.C.)
| | - Fang Qin
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China.,Department of Cardiology, Chongqing Cardiac Arrhythmias Therapeutic Service Center, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, P.R. China (F.Q.)
| | - Lei Song
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Tian Wang
- Department of Rheumatology, Beijing Anzhen Hospital, Capital Medical University, Beijing Institute of Heart, Lung and Blood Vessel Disease, Beijing, P.R. China (T.W.)
| | - Bin Geng
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Weili Zhang
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Ling Jin
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Wenjie Wang
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Shuangyue Li
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Xinping Tian
- Department of Rheumatology, Peking Union Medical College Hospital (X.T.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Huimin Zhang
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| | - Jun Cai
- Hypertension Center, Fuwai Hospital, State Key Laboratory of Cardiovascular Diseases, National Center for Cardiovascular Diseases (X.C., F.Q., L.S., B.G., W.Z., L.J., W.W., S.L., H.Z., J.C.), Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, P.R. China
| |
Collapse
|
40
|
Machine learning for the detection of early immunological markers as predictors of multi-organ dysfunction. Sci Data 2019; 6:328. [PMID: 31857590 PMCID: PMC6923383 DOI: 10.1038/s41597-019-0337-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 12/05/2019] [Indexed: 12/14/2022] Open
Abstract
The immune response to major trauma has been analysed mainly within post-hospital admission settings where the inflammatory response is already underway and the early drivers of clinical outcome cannot be readily determined. Thus, there is a need to better understand the immediate immune response to injury and how this might influence important patient outcomes such as multi-organ dysfunction syndrome (MODS). In this study, we have assessed the immune response to trauma in 61 patients at three different post-injury time points (ultra-early (<=1 h), 4-12 h, 48-72 h) and analysed relationships with the development of MODS. We developed a pipeline using Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods that were able to identify 3 physiological features (decrease in neutrophil CD62L and CD63 expression and monocyte CD63 expression and frequency) as possible biomarkers for MODS development. After univariate and multivariate analysis for each feature alongside a stability analysis, the addition of these 3 markers to standard clinical trauma injury severity scores yields a Generalized Liner Model (GLM) with an average Area Under the Curve value of 0.92 ± 0.06. This performance provides an 8% improvement over the Probability of Survival (PS14) outcome measure and a 13% improvement over the New Injury Severity Score (NISS) for identifying patients at risk of MODS.
Collapse
|
41
|
Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, Neumann S. The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites 2019; 9:E200. [PMID: 31548506 PMCID: PMC6835268 DOI: 10.3390/metabo9100200] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/17/2022] Open
Abstract
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.
Collapse
Affiliation(s)
- Jan Stanstrup
- Preventive and Clinical Nutrition, University of Copenhagen, Rolighedsvej 30, 1958 Frederiksberg C, Denmark.
| | - Corey D Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, The Netherlands.
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Thomas Naake
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | - Luca Nicolotti
- The Australian Wine Research Institute, Metabolomics Australia, PO Box 197, Adelaide SA 5064, Australia.
| | - Kristian Peters
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Johannes Rainer
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, 39100 Bolzano, Italy.
| | - Reza M Salek
- The International Agency for Research on Cancer, 150 cours Albert Thomas, CEDEX 08, 69372 Lyon, France.
| | - Tobias Schulze
- Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research-UFZ, Permoserstraße 15, 04318 Leipzig, Germany.
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
| | - Michael A Stravs
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dubendorf, Switzerland.
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Sciences and Decision, MetaboHUB, Gif-Sur-Yvette F-91191, France.
| | - Hendrik Treutler
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Ralf J M Weber
- Phenome Centre Birmingham and School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
- Chair of Analytical Food Chemistry, Technische Universität München, 85354 Weihenstephan, Germany.
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig Deutscher, Platz 5e, 04103 Leipzig, Germany.
| |
Collapse
|
42
|
Erber L, Luo A, Chen Y. Targeted and Interactome Proteomics Revealed the Role of PHD2 in Regulating BRD4 Proline Hydroxylation. Mol Cell Proteomics 2019; 18:1772-1781. [PMID: 31239290 PMCID: PMC6731074 DOI: 10.1074/mcp.ra119.001535] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/19/2019] [Indexed: 12/18/2022] Open
Abstract
Proline hydroxylation is a critical cellular mechanism regulating energy homeostasis and development. Our previous study identified and validated Bromodomain-containing protein 4 (BRD4) as a proline hydroxylation substrate in cancer cells. Yet, the regulatory mechanism and the functional significance of the modification remain unknown. In this study, we developed targeted quantification assays using parallel-reaction monitoring and biochemical analysis to identify the major regulatory enzyme of BRD4 proline hydroxylation. We further performed quantitative interactome analysis to determine the functional significance of the modification pathway in BRD4-mediated protein-protein interactions and gene transcription. Our findings revealed that PHD2 is the key regulatory enzyme of BRD4 proline hydroxylation and the modification significantly affects BRD4 interactions with key transcription factors as well as BRD4-mediated transcriptional activation. Taken together, this study provided mechanistic insights into the oxygen-dependent modification of BRD4 and revealed new roles of the pathway in regulating BRD4-dependent gene expression.
Collapse
Affiliation(s)
- Luke Erber
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455
| | - Ang Luo
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455
| | - Yue Chen
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455.
| |
Collapse
|
43
|
Rago D, Rasmussen MA, Lee-Sarwar KA, Weiss ST, Lasky-Su J, Stokholm J, Bønnelykke K, Chawes BL, Bisgaard H. Fish-oil supplementation in pregnancy, child metabolomics and asthma risk. EBioMedicine 2019; 46:399-410. [PMID: 31399385 PMCID: PMC6712349 DOI: 10.1016/j.ebiom.2019.07.057] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 07/22/2019] [Accepted: 07/22/2019] [Indexed: 12/15/2022] Open
Abstract
Background We recently demonstrated that maternal dietary supplementation with fish oil-derived n-3 long-chain polyunsaturated fatty acids (n-3 LCPUFAs) during pregnancy reduces the risk of asthma in the offspring but the mechanisms involved are unknown. Methods Here we investigated potential metabolic mechanisms using untargeted liquid chromatography-mass spectrometry-based metabolomics on 577 plasma samples collected at age 6 months in the offspring of mothers participating in the n-3 LCPUFA randomized controlled trial. First, associations between the n-3 LCPUFA supplementation groups and child metabolite levels were investigated using univariate regression models and data-driven partial least square discriminant analyses (PLS-DA). Second, we analyzed the association between the n-3 LCPUFA metabolomic profile and asthma development using Cox-regression. Third, we conducted mediation analyses to investigate whether the protective effect of n-3 LCPUFA on asthma was mediated via the metabolome. Findings The univariate analyses and the PLS-DA showed that maternal fish oil supplementation affected the child's metabolome, especially with lower levels of the n-6 LCPUFA pathway-related metabolites and saturated and monounsaturated long-chain fatty acids-containing compounds, lower levels of metabolites of the tryptophan pathway, and higher levels of metabolites in the tyrosine and glutamic acid pathway. This fish oil-related metabolic profile at age 6 months was significantly associated with a reduced risk of asthma by age 5 and the metabolic profile explained 24% of the observed asthma-protective effect in the mediation analysis. Interpretation Several of the observed pathways may be involved in the asthma-protective effect of maternal n-3 LCPUFA supplementation and act as mediators between the intervention and disease development. Funding COPSAC is funded by private and public research funds all listed on www.copsac.com.
Collapse
Affiliation(s)
- Daniela Rago
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Morten A Rasmussen
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark; Department of Food Science, University of Copenhagen, Copenhagen, Denmark
| | - Kathleen A Lee-Sarwar
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jakob Stokholm
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark; Department of Pediatrics, Slagelse Hospital, Slagelse, Denmark
| | - Klaus Bønnelykke
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Bo L Chawes
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark.
| | - Hans Bisgaard
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
44
|
Goh WWB, Wong L. Advanced bioinformatics methods for practical applications in proteomics. Brief Bioinform 2019; 20:347-355. [PMID: 30657890 DOI: 10.1093/bib/bbx128] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Indexed: 12/22/2022] Open
Abstract
Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.
Collapse
|
45
|
Wang S, Jeong HH, Sohn KA. ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction. BMC Med Genomics 2019; 12:95. [PMID: 31296201 PMCID: PMC6624178 DOI: 10.1186/s12920-019-0512-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information. RESULTS In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets. CONCLUSIONS The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.
Collapse
Affiliation(s)
- Sehee Wang
- Department of Computer Engineering, Ajou University, Suwon, 16499 South Korea
| | - Hyun-Hwan Jeong
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon, 16499 South Korea
| |
Collapse
|
46
|
Carrasco‐Rozas A, Fernández‐Simón E, Lleixà MC, Belmonte I, Pedrosa-Hernandez I, Montiel-Morillo E, Nuñez‐Peralta C, Llauger Rossello J, Segovia S, De Luna N, Suarez‐Calvet X, Illa I, Díaz‐Manera J, Gallardo E. Identification of serum microRNAs as potential biomarkers in Pompe disease. Ann Clin Transl Neurol 2019; 6:1214-1224. [PMID: 31353854 PMCID: PMC6649638 DOI: 10.1002/acn3.50800] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/03/2019] [Accepted: 05/04/2019] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE To analyze the microRNA profile in serum of patients with Adult Onset Pompe disease (AOPD). METHODS We analyzed the expression of 185 microRNAs in serum of 15 AOPD patients and five controls using microRNA PCR Panels. The expression levels of microRNAs that were deregulated were further studied in 35 AOPD patients and 10 controls using Real-Time PCR. Additionally, the skeletal muscle expression of microRNAs which showed significant increase levels in serum samples was also studied. Correlations between microRNA serum levels and muscle function test, spirometry, and quantitative muscle MRI were performed (these data correspond to the study NCT01914536 at ClinicalTrials.gov). RESULTS We identified 14 microRNAs that showed different expression levels in serum samples of AOPD patients compared to controls. We validated these results in a larger cohort of patients and we found increased levels of three microRNAs, the so called dystromirs: miR-1-3p, miR-133a-3p, and miR-206. These microRNAs are involved in muscle regeneration and the expression of these was increased in patients' muscle biopsies. Significant correlations between microRNA levels and muscle function test were found. INTERPRETATION Serum expression levels of dystromirs may represent additional biomarkers for the follow-up of AOPD patients.
Collapse
Affiliation(s)
- Ana Carrasco‐Rozas
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Esther Fernández‐Simón
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Maria Cinta Lleixà
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Izaskun Belmonte
- Rehabilitation and Physiotherapy DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Irene Pedrosa-Hernandez
- Rehabilitation and Physiotherapy DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Elena Montiel-Morillo
- Rehabilitation and Physiotherapy DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Claudia Nuñez‐Peralta
- Radiology DepartmentHospital de la Santa Creu I Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Jaume Llauger Rossello
- Radiology DepartmentHospital de la Santa Creu I Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Sonia Segovia
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| | - Noemí De Luna
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| | - Xavier Suarez‐Calvet
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| | - Isabel Illa
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| | - Jordi Díaz‐Manera
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| | - Eduard Gallardo
- Neuromuscular Disorders Unit, Neurology DepartmentHospital de la Santa Creu i Sant Pau, Universitat Autònoma de BarcelonaBarcelonaSpain
- Centro de Investigación Biomédica en Red en Enfermedades RarasValenciaSpain
| |
Collapse
|
47
|
Comparison of Bi- and Tri-Linear PLS Models for Variable Selection in Metabolomic Time-Series Experiments. Metabolites 2019; 9:metabo9050092. [PMID: 31075899 PMCID: PMC6571821 DOI: 10.3390/metabo9050092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Revised: 05/07/2019] [Accepted: 05/08/2019] [Indexed: 11/25/2022] Open
Abstract
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as putative biomarkers. To incorporate time-series information into the variable (biomarker) selection in partial least squares regression (PLS) models, we created PLS models with different combinations of bilinear/trilinear X and group/time response dummy Y. In total, five PLS models were evaluated on two real datasets, and also on simulated datasets with varying characteristics (number of subjects, number of variables, inter-individual variability, intra-individual variability and number of time points). Variables showing specific temporal patterns observed visually and determined statistically were labelled as discriminating variables. Bootstrapped-VIP scores were calculated for variable selection and the variable selection performance of five PLS models were assessed based on their capacity to correctly select the discriminating variables. The results showed that the bilinear PLS model with group × time response as dummy Y provided the highest recall (true positive rate) of 83–95% with high precision, independent of most characteristics of the datasets. Trilinear PLS models tend to select a small number of variables with high precision but relatively high false negative rate (lower power). They are also less affected by the noise compared to bilinear PLS models. In datasets with high inter-individual variability, bilinear PLS models tend to provide higher recall while trilinear models tend to provide higher precision. Overall, we recommend bilinear PLS with group x time response Y for variable selection applications in metabolomics intervention time series studies.
Collapse
|
48
|
Lualdi M, Fasano M. Statistical analysis of proteomics data: A review on feature selection. J Proteomics 2019; 198:18-26. [DOI: 10.1016/j.jprot.2018.12.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 11/27/2018] [Accepted: 12/05/2018] [Indexed: 12/19/2022]
|
49
|
Mnatsakanyan R, Shema G, Basik M, Batist G, Borchers CH, Sickmann A, Zahedi RP. Detecting post-translational modification signatures as potential biomarkers in clinical mass spectrometry. Expert Rev Proteomics 2019; 15:515-535. [PMID: 29893147 DOI: 10.1080/14789450.2018.1483340] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
INTRODUCTION Numerous diseases are caused by changes in post-translational modifications (PTMs). Therefore, the number of clinical proteomics studies that include the analysis of PTMs is increasing. Combining complementary information-for example changes in protein abundance, PTM levels, with the genome and transcriptome (proteogenomics)-holds great promise for discovering important drivers and markers of disease, as variations in copy number, expression levels, or mutations without spatial/functional/isoform information is often insufficient or even misleading. Areas covered: We discuss general considerations, requirements, pitfalls, and future perspectives in applying PTM-centric proteomics to clinical samples. This includes samples obtained from a human subject, for instance (i) bodily fluids such as plasma, urine, or cerebrospinal fluid, (ii) primary cells such as reproductive cells, blood cells, and (iii) tissue samples/biopsies. Expert commentary: PTM-centric discovery proteomics can substantially contribute to the understanding of disease mechanisms by identifying signatures with potential diagnostic or even therapeutic relevance but may require coordinated efforts of interdisciplinary and eventually multi-national consortia, such as initiated in the cancer moonshot program. Additionally, robust and standardized mass spectrometry (MS) assays-particularly targeted MS, MALDI imaging, and immuno-MALDI-may be transferred to the clinic to improve patient stratification for precision medicine, and guide therapies.
Collapse
Affiliation(s)
- Ruzanna Mnatsakanyan
- a Protein Dynamics , Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V , Dortmund , 44227 , Germany
| | - Gerta Shema
- a Protein Dynamics , Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V , Dortmund , 44227 , Germany
| | - Mark Basik
- b Gerald Bronfman Department of Oncology , Jewish General Hospital, McGill University , Montreal , Quebec H4A 3T2 , Canada
| | - Gerald Batist
- b Gerald Bronfman Department of Oncology , Jewish General Hospital, McGill University , Montreal , Quebec H4A 3T2 , Canada
| | - Christoph H Borchers
- b Gerald Bronfman Department of Oncology , Jewish General Hospital, McGill University , Montreal , Quebec H4A 3T2 , Canada.,c University of Victoria-Genome British Columbia Proteomics Centre, University of Victoria , Victoria , British Columbia V8Z 7X8 , Canada.,d Department of Biochemistry and Microbiology , University of Victoria , Victoria , British Columbia , V8P 5C2 , Canada.,e Segal Cancer Proteomics Centre, Lady Davis Institute, Jewish General Hospital, McGill University , Montreal , Quebec H3T 1E2 , Canada
| | - Albert Sickmann
- a Protein Dynamics , Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V , Dortmund , 44227 , Germany.,f Medizinische Fakultät, Medizinische Proteom-Center (MPC), Ruhr-Universität Bochum , 44801 Bochum , Germany.,g Department of Chemistry , College of Physical Sciences, University of Aberdeen , Aberdeen AB24 3FX , Scotland , United Kingdom
| | - René P Zahedi
- a Protein Dynamics , Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V , Dortmund , 44227 , Germany.,b Gerald Bronfman Department of Oncology , Jewish General Hospital, McGill University , Montreal , Quebec H4A 3T2 , Canada.,e Segal Cancer Proteomics Centre, Lady Davis Institute, Jewish General Hospital, McGill University , Montreal , Quebec H3T 1E2 , Canada
| |
Collapse
|
50
|
Shi L, Westerhuis JA, Rosén J, Landberg R, Brunius C. Variable selection and validation in multivariate modelling. Bioinformatics 2019; 35:972-980. [PMID: 30165467 PMCID: PMC6419897 DOI: 10.1093/bioinformatics/bty710] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 07/04/2018] [Accepted: 08/24/2018] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed. RESULTS We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omics datasets, MUVR yielded parsimonious models with minimal overfitting and improved model performance compared with state-of-the-art rdCV. Moreover, MUVR showed advantages over other variable selection algorithms, i.e. Boruta and VSURF, including simultaneous variable selection and validation scheme and wider applicability. AVAILABILITY AND IMPLEMENTATION Algorithms, data, scripts and tutorial are open source and available as an R package ('MUVR') at https://gitlab.com/CarlBrunius/MUVR.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lin Shi
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| | - Johan A Westerhuis
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam XH, The Netherlands
- Metabolomics Center, North-West University, X6001, Potchefstroom, South Africa
| | - Johan Rosén
- Swedish National Food Agency, Uppsala, Sweden
| | - Rikard Landberg
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| | - Carl Brunius
- Department of Biology and Biological Engineering, Food and Nutrition Science, Chalmers University of Technology, Gothenburg SE-412 96, Sweden
| |
Collapse
|