1
|
Zhang C, Chen X, Wang S, Hu J, Wang C, Liu X. Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011-2018. Psychiatry Res 2021; 306:114261. [PMID: 34781111 DOI: 10.1016/j.psychres.2021.114261] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 10/10/2021] [Accepted: 10/30/2021] [Indexed: 12/16/2022]
Abstract
Depression is one of the most common mental health problems in middle-aged and elderly people. The establishment of risk factor-based depression risk assessment model is conducive to early detection and early treatment of high-risk groups of depression. Five machine learning models (logistic regression (LR); back propagation (BP); random forest (RF); support vector machines (SVM); category boosting (CatBoost) were used to evaluate the depression among 8374 middle-aged people and 4636 elderly people in the NHANES database from 2011 to 2018. In the 2011-2018 cycle, the estimated prevalence of depression was 8.97% in the middle-aged participants and 8.02% in the elderly participants. Among the middle-aged and elderly participants, CatBoost was the best model to identify depression, and its area under the working characteristic curve (AUC) reaches the highest. The second is LR model and SVM model, while the performance of BP and RF model was slightly worse. The primary influencing factor of depression in middle-aged male is alanine aminotransferase. All five machine learning models can identify the occurrence of depression in the NHANES data set through social demographics, lifestyle, laboratory data and other data of middle-aged and elderly people, and among five models, the CatBoost model performed best.
Collapse
Affiliation(s)
- Chenyang Zhang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin 130021, China
| | - Xiaofei Chen
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin 130021, China
| | - Song Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin 130021, China
| | - Junjun Hu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin 130021, China
| | - Chunpeng Wang
- School of Mathematics and Statistics, Northeast Normal University, Changchun 130000, China.
| | - Xin Liu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin 130021, China.
| |
Collapse
|
2
|
Arabameri A, Asemani D, Teymourpour P. Detection of Colorectal Carcinoma Based on Microbiota Analysis Using Generalized Regression Neural Networks and Nonlinear Feature Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:547-557. [PMID: 30222584 DOI: 10.1109/tcbb.2018.2870124] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To obtain a screening tool for colorectal cancer (CRC) based on gut microbiota, we seek here to identify an optimal classifier for CRC detection as well as a novel nonlinear feature selection method for determining the most discriminative microbial species. In this study, the intestinal microflora in feces of 141 patients were modeled using general regression neural networks (GRNNs) combined with the proposed feature selection method. The proposed model led to slightly higher accuracy (AUC = 0.911) than previous studies . The results show that the Clostridium scindens and Bifidobacterium angulatum are indicators of healthy gut flora and CRC happens to reduce these bacterial species. In addition, Fusobacterium gonidiaformans was found to be closely correlated with the CRC. The occurrence of colorectal adenoma was not sufficiently discriminatory based on fecal microbiota implicating that the change of colonic flora happens in the advanced phase of CRC development rather than initial adenoma. Integrating the proposed model with fecal occult blood test (FOBT), the CRC detection accuracy remained nearly unchanged (AUC = 0.915). The performance of the proposed method is validated using independent cohorts from America and Austria. Our results suggest that the proposed feature selection method combined with GRNN is potentially an accurate method for CRC detection.
Collapse
|
3
|
Meinke P, Hintze S, Limmer S, Schoser B. Myotonic Dystrophy-A Progeroid Disease? Front Neurol 2018; 9:601. [PMID: 30140252 PMCID: PMC6095001 DOI: 10.3389/fneur.2018.00601] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 07/06/2018] [Indexed: 12/27/2022] Open
Abstract
Myotonic dystrophies (DM) are slowly progressing multisystemic disorders caused by repeat expansions in the DMPK or CNBP genes. The multisystemic involvement in DM patients often reflects the appearance of accelerated aging. This is partly due to visible features such as cataracts, muscle weakness, and frontal baldness, but there are also less obvious features like cardiac arrhythmia, diabetes or hypogammaglobulinemia. These aging features suggest the hypothesis that DM could be a segmental progeroid disease. To identify the molecular cause of this characteristic appearance of accelerated aging we compare clinical features of DM to “typical” segmental progeroid disorders caused by mutations in DNA repair or nuclear envelope proteins. Furthermore, we characterize if this premature aging effect is also reflected on the cellular level in DM and investigate overlaps with “classical” progeroid disorders. To investigate the molecular similarities at the cellular level we use primary DM and control cell lines. This analysis reveals many similarities to progeroid syndromes linked to the nuclear envelope. Our comparison on both clinical and molecular levels argues for qualification of DM as a segmental progeroid disorder.
Collapse
Affiliation(s)
- Peter Meinke
- Friedrich-Baur-Institute at the Department of Neurology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Stefan Hintze
- Friedrich-Baur-Institute at the Department of Neurology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Sarah Limmer
- Friedrich-Baur-Institute at the Department of Neurology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Benedikt Schoser
- Friedrich-Baur-Institute at the Department of Neurology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| |
Collapse
|
4
|
Santoro M, Fontana L, Maiorca F, Centofanti F, Massa R, Silvestri G, Novelli G, Botta A. Expanded [CCTG]n repetitions are not associated with abnormal methylation at the CNBP locus in myotonic dystrophy type 2 (DM2) patients. Biochim Biophys Acta Mol Basis Dis 2018; 1864:917-924. [DOI: 10.1016/j.bbadis.2017.12.037] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 12/21/2017] [Accepted: 12/28/2017] [Indexed: 01/10/2023]
|
5
|
Ai L, Tian H, Chen Z, Chen H, Xu J, Fang JY. Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. Oncotarget 2017; 8:9546-9556. [PMID: 28061434 PMCID: PMC5354752 DOI: 10.18632/oncotarget.14488] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022] Open
Abstract
Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.
Collapse
Affiliation(s)
- Luoyan Ai
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Haiying Tian
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Zhaofei Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Huimin Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jie Xu
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jing-Yuan Fang
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| |
Collapse
|
6
|
Ghorbani M, Themis M, Payne A. Genome wide classification and characterisation of CpG sites in cancer and normal cells. Comput Biol Med 2015; 68:57-66. [PMID: 26615449 DOI: 10.1016/j.compbiomed.2015.09.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 09/16/2015] [Accepted: 09/29/2015] [Indexed: 11/30/2022]
Abstract
This study identifies common methylation patterns across different cancer types in an effort to identify common molecular events in diverse types of cancer cells and provides evidence for the sequence surrounding a CpG to influence its susceptibility to aberrant methylation. CpG sites throughout the genome were divided into four classes: sites that either become hypo or hyper-methylated in a variety cancers using all the freely available microarray data (HypoCancer and HyperCancer classes) and those found in a constant hypo (Never methylated class) or hyper-methylated (Always methylated class) state in both normal and cancer cells. Our data shows that most CpG sites included in the HumanMethylation450K microarray remain unmethylated in normal and cancerous cells; however, certain sites in all the cancers investigated become specifically modified. More detailed analysis of the sites revealed that majority of those in the never methylated class were in CpG islands whereas those in the HyperCancer class were mostly associated with miRNA coding regions. The sites in the Hypermethylated class are associated with genes involved in initiating or maintaining the cancerous state, being enriched for processes involved in apoptosis, and with transcription factors predicted to bind to these genes linked to apoptosis and tumourgenesis (notably including E2F). Further we show that more LINE elements are associated with the HypoCancer class and more Alu repeats are associated with the HyperCancer class. Motifs that classify the classes were identified to distinguish them based on the surrounding DNA sequence alone, and for the identification of DNA sequences that could render sites more prone to aberrant methylation in cancer cells. This provides evidence that the sequence surrounding a CpG site has an influence on whether a site is hypo or hyper methylated.
Collapse
Affiliation(s)
- Mohammadmersad Ghorbani
- Department of Computer Science, Brunel University, Uxbridge, Middlesex UB8 3PH, UK; Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute
| | - Michael Themis
- Department of Biosciences, Brunel University, Uxbridge, Middlesex UB8 3PH, UK
| | - Annette Payne
- Department of Computer Science, Brunel University, Uxbridge, Middlesex UB8 3PH, UK.
| |
Collapse
|
7
|
Mateos-Aierdi AJ, Goicoechea M, Aiastui A, Fernández-Torrón R, Garcia-Puga M, Matheu A, López de Munain A. Muscle wasting in myotonic dystrophies: a model of premature aging. Front Aging Neurosci 2015. [PMID: 26217220 PMCID: PMC4496580 DOI: 10.3389/fnagi.2015.00125] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Myotonic dystrophy type 1 (DM1 or Steinert’s disease) and type 2 (DM2) are multisystem disorders of genetic origin. Progressive muscular weakness, atrophy and myotonia are the most prominent neuromuscular features of these diseases, while other clinical manifestations such as cardiomyopathy, insulin resistance and cataracts are also common. From a clinical perspective, most DM symptoms are interpreted as a result of an accelerated aging (cataracts, muscular weakness and atrophy, cognitive decline, metabolic dysfunction, etc.), including an increased risk of developing tumors. From this point of view, DM1 could be described as a progeroid syndrome since a notable age-dependent dysfunction of all systems occurs. The underlying molecular disorder in DM1 consists of the existence of a pathological (CTG) triplet expansion in the 3′ untranslated region (UTR) of the Dystrophia Myotonica Protein Kinase (DMPK) gene, whereas (CCTG)n repeats in the first intron of the Cellular Nucleic acid Binding Protein/Zinc Finger Protein 9(CNBP/ZNF9) gene cause DM2. The expansions are transcribed into (CUG)n and (CCUG)n-containing RNA, respectively, which form secondary structures and sequester RNA-binding proteins, such as the splicing factor muscleblind-like protein (MBNL), forming nuclear aggregates known as foci. Other splicing factors, such as CUGBP, are also disrupted, leading to a spliceopathy of a large number of downstream genes linked to the clinical features of these diseases. Skeletal muscle regeneration relies on muscle progenitor cells, known as satellite cells, which are activated after muscle damage, and which proliferate and differentiate to muscle cells, thus regenerating the damaged tissue. Satellite cell dysfunction seems to be a common feature of both age-dependent muscle degeneration (sarcopenia) and muscle wasting in DM and other muscle degenerative diseases. This review aims to describe the cellular, molecular and macrostructural processes involved in the muscular degeneration seen in DM patients, highlighting the similarities found with muscle aging.
Collapse
Affiliation(s)
- Alba Judith Mateos-Aierdi
- Neuroscience Area, Biodonostia Health Research Institute San Sebastián, Spain ; CIBERNED, Instituto Carlos III, Ministerio de Economía y Competitividad Madrid, Spain
| | - Maria Goicoechea
- Neuroscience Area, Biodonostia Health Research Institute San Sebastián, Spain ; CIBERNED, Instituto Carlos III, Ministerio de Economía y Competitividad Madrid, Spain
| | - Ana Aiastui
- CIBERNED, Instituto Carlos III, Ministerio de Economía y Competitividad Madrid, Spain ; Cell Culture Platform, Biodonostia Health Research Institute, San Sebastián Spain
| | - Roberto Fernández-Torrón
- Neuroscience Area, Biodonostia Health Research Institute San Sebastián, Spain ; CIBERNED, Instituto Carlos III, Ministerio de Economía y Competitividad Madrid, Spain ; Department of Neurology, Hospital Universitario Donostia, San Sebastián Spain
| | - Mikel Garcia-Puga
- Oncology Area, Biodonostia Health Research Institute San Sebastián, Spain
| | - Ander Matheu
- Oncology Area, Biodonostia Health Research Institute San Sebastián, Spain
| | - Adolfo López de Munain
- Neuroscience Area, Biodonostia Health Research Institute San Sebastián, Spain ; CIBERNED, Instituto Carlos III, Ministerio de Economía y Competitividad Madrid, Spain ; Department of Neurology, Hospital Universitario Donostia, San Sebastián Spain ; Department of Neuroscience, Universidad del País Vasco UPV-EHU San Sebastián, Spain
| |
Collapse
|
8
|
Azé J, Sola C, Zhang J, Lafosse-Marin F, Yasmin M, Siddiqui R, Kremer K, van Soolingen D, Refrégier G. Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm. PLoS One 2015; 10:e0130912. [PMID: 26154264 PMCID: PMC4496040 DOI: 10.1371/journal.pone.0130912] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 05/25/2015] [Indexed: 11/18/2022] Open
Abstract
Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.
Collapse
Affiliation(s)
- Jérôme Azé
- LIRMM UM CNRS, UMR 5506, 860 rue de St Priest, 34095 Montpellier cedex 5, France
| | - Christophe Sola
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, rue Gregor Mendel, Bât 400, 91405 Orsay cedex, France
| | - Jian Zhang
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, rue Gregor Mendel, Bât 400, 91405 Orsay cedex, France
| | - Florian Lafosse-Marin
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, rue Gregor Mendel, Bât 400, 91405 Orsay cedex, France
| | - Memona Yasmin
- Pakistan Institute for Engineering and Applied Sciences (PIEAS), Lehtrar Road, Nilore, Islamabad, Pakistan
- Health Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), P.O. Box # 577, Jhang Road, Faisalabad, Pakistan
| | - Rubina Siddiqui
- Health Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), P.O. Box # 577, Jhang Road, Faisalabad, Pakistan
| | - Kristin Kremer
- National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA Bilthoven, The Netherlands
| | - Dick van Soolingen
- National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA Bilthoven, The Netherlands
- Department of Pulmonary Diseases and Department of Microbiology, Radbout University Nijmegen Medical Centre, University Lung Centre Dekkerswald, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Guislaine Refrégier
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, rue Gregor Mendel, Bât 400, 91405 Orsay cedex, France
- * E-mail:
| |
Collapse
|
9
|
Padeken J, Zeller P, Gasser SM. Repeat DNA in genome organization and stability. Curr Opin Genet Dev 2015; 31:12-9. [PMID: 25917896 DOI: 10.1016/j.gde.2015.03.009] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 03/23/2015] [Accepted: 03/24/2015] [Indexed: 01/03/2023]
Abstract
Eukaryotic genomes contain millions of copies of repetitive elements (RE). Although the euchromatic parts of most genomes are clearly annotated, the repetitive/heterochromatic parts are poorly defined. It is estimated that between 50 and 70% of the human genome is composed of REs. Despite this, we know surprisingly little about the physiological relevance, molecular regulation and the composition of these regions. This primarily reflects the difficulty that REs pose for PCR-based assays, and their poor map-ability in next generation sequencing experiments. Here we first summarize the nature and classification of REs and then examine how this has been used in the recent years to broaden our understanding of mechanisms that keep the repetitive regions of our genomes silent and stable.
Collapse
Affiliation(s)
- Jan Padeken
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH-4058 Basel, Switzerland
| | - Peter Zeller
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH-4058 Basel, Switzerland; Faculty of Natural Sciences, University of Basel, Basel, Switzerland
| | - Susan M Gasser
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, CH-4058 Basel, Switzerland; Faculty of Natural Sciences, University of Basel, Basel, Switzerland.
| |
Collapse
|