2
|
Silverman JD, Roche K, Mukherjee S, David LA. Naught all zeros in sequence count data are the same. Comput Struct Biotechnol J 2020; 18:2789-2798. [PMID: 33101615 PMCID: PMC7568192 DOI: 10.1016/j.csbj.2020.09.014] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/09/2020] [Accepted: 09/10/2020] [Indexed: 12/21/2022] Open
Abstract
Genomic studies feature multivariate count data from high-throughput DNA sequencing experiments, which often contain many zero values. These zeros can cause artifacts for statistical analyses and multiple modeling approaches have been developed in response. Here, we apply different zero-handling models to gene-expression and microbiome datasets and show models can disagree substantially in terms of identifying the most differentially expressed sequences. Next, to rationally examine how different zero handling models behave, we developed a conceptual framework outlining four types of processes that may give rise to zero values in sequence count data. Last, we performed simulations to test how zero handling models behave in the presence of these different zero generating processes. Our simulations showed that simple count models are sufficient across multiple processes, even when the true underlying process is unknown. On the other hand, a common zero handling technique known as "zero-inflation" was only suitable under a zero generating process associated with an unlikely set of biological and experimental conditions. In concert, our work here suggests several specific guidelines for developing and choosing state-of-the-art models for analyzing sparse sequence count data.
Collapse
Affiliation(s)
- Justin D Silverman
- College of Information Science and Technology, Pennsylvania State University, State College, PA 16802, United States
- Institute for Computational and Data Science, Pennsylvania State University, State College, PA 16802, United States
- Department of Medicine, Pennsylvania State University, Hershey, PA 17033, United States
| | - Kimberly Roche
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Sayan Mukherjee
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, United States
- Departments of Statistical Science, Mathematics, Computer Science, Biostatistics & Bioinformatics, Duke University, Durham, NC 27708, United States
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, United States
| | - Lawrence A David
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, United States
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, United States
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, United States
| |
Collapse
|
3
|
Li F, Shen Y, Lv D, Lin J, Liu B, He F, Wang Z. A Bayesian classification model for discriminating common infectious diseases in Zhejiang province, China. Medicine (Baltimore) 2020; 99:e19218. [PMID: 32080115 PMCID: PMC7034623 DOI: 10.1097/md.0000000000019218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
To develop a classification model for accurately discriminating common infectious diseases in Zhejiang province, China.Symptoms and signs, abnormal lab test results, epidemiological features, as well as the incidence rates were treated as predictors, and were collected from the published literature and a national surveillance system of infectious disease. A classification model was established using naïve Bayesian classifier. Dataset from historical outbreaks was applied for model validation, while sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC) and M-index were presented.A total of 146 predictors were included in the classification model, for discriminating 25 common infectious diseases. The sensitivity ranged from 44.44% for hepatitis E to 96.67% for measles. The specificity varied from 96.36% for dengue fever to 100% for 5 diseases. The median of total accuracy was 97.41% (range: 93.85%-99.04%). The AUCs exceeded 0.98 in 11 of 12 diseases, except in dengue fever (0.613). The M-index was 0.960 (95%CI 0.941-0.978).A novel classification model was constructed based on Bayesian approach to discriminate common infectious diseases in Zhejiang province, China. After entering symptoms and signs, abnormal lab test results, epidemiological features and city of disease origin, an output list of possible diseases ranked according to the calculated probabilities can be provided. The discrimination performance was reasonably good, making it useful in epidemiological applications.
Collapse
Affiliation(s)
- Fudong Li
- Zhejiang Provincial Center for Disease Control and Prevention
| | - Yi Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Zhejiang University
| | - Duo Lv
- The First Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang Province, People's Republic of China
| | - Junfen Lin
- Zhejiang Provincial Center for Disease Control and Prevention
| | - Biyao Liu
- Zhejiang Provincial Center for Disease Control and Prevention
| | - Fan He
- Zhejiang Provincial Center for Disease Control and Prevention
| | - Zhen Wang
- Zhejiang Provincial Center for Disease Control and Prevention
| |
Collapse
|
4
|
Hollister EB, Oezguen N, Chumpitazi BP, Luna RA, Weidler EM, Rubio-Gonzales M, Dahdouli M, Cope JL, Mistretta TA, Raza S, Metcalf GA, Muzny DM, Gibbs RA, Petrosino JF, Heitkemper M, Savidge TC, Shulman RJ, Versalovic J. Leveraging Human Microbiome Features to Diagnose and Stratify Children with Irritable Bowel Syndrome. J Mol Diagn 2019; 21:449-461. [PMID: 31005411 PMCID: PMC6504675 DOI: 10.1016/j.jmoldx.2019.01.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 10/30/2018] [Accepted: 01/06/2019] [Indexed: 02/06/2023] Open
Abstract
Accurate diagnosis and stratification of children with irritable bowel syndrome (IBS) remain challenging. Given the central role of recurrent abdominal pain in IBS, we evaluated the relationships of pediatric IBS and abdominal pain with intestinal microbes and fecal metabolites using a comprehensive clinical characterization and multiomics strategy. Using rigorous clinical phenotyping, we identified preadolescent children (aged 7 to 12 years) with Rome III IBS (n = 23) and healthy controls (n = 22) and characterized their fecal microbial communities using whole-genome shotgun metagenomics and global unbiased fecal metabolomic profiling. Correlation-based approaches and machine learning algorithms identified associations between microbes, metabolites, and abdominal pain. IBS cases differed from controls with respect to key bacterial taxa (eg, Flavonifractor plautii and Lachnospiraceae bacterium 7_1_58FAA), metagenomic functions (eg, carbohydrate metabolism and amino acid metabolism), and higher-order metabolites (eg, secondary bile acids, sterols, and steroid-like compounds). Significant associations between abdominal pain frequency and severity and intestinal microbial features were identified. A random forest classifier built on metagenomic and metabolic markers successfully distinguished IBS cases from controls (area under the curve, 0.93). Leveraging multiple lines of evidence, intestinal microbes, genes/pathways, and metabolites were associated with IBS, and these features were capable of distinguishing children with IBS from healthy children. These multi-omics features, and their links to childhood IBS coupled with nutritional interventions, may lead to new microbiome-guided diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Emily B Hollister
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Diversigen, Inc., Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Numan Oezguen
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Bruno P Chumpitazi
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas; Section of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Texas Children's Hospital, Houston, Texas
| | - Ruth Ann Luna
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Erica M Weidler
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas; Section of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Texas Children's Hospital, Houston, Texas; Children's Nutrition Research Center, Houston, Texas
| | - Michelle Rubio-Gonzales
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Mahmoud Dahdouli
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Julia L Cope
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Toni-Ann Mistretta
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Sabeen Raza
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Joseph F Petrosino
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas; Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, Texas
| | - Margaret Heitkemper
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington
| | - Tor C Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas
| | - Robert J Shulman
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas; Section of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Texas Children's Hospital, Houston, Texas; Children's Nutrition Research Center, Houston, Texas
| | - James Versalovic
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas; Texas Children's Microbiome Center, Texas Children's Hospital, Houston, Texas; Department of Pathology, Texas Children's Hospital, Houston, Texas; Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas.
| |
Collapse
|