Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gerber S, Pospisil L, Navandar M, Horenko I. Low-cost scalable discretization, prediction, and feature selection for complex systems. Sci Adv 2020;6:eaaw0961. [PMID: 32064328 PMCID: PMC6989146 DOI: 10.1126/sciadv.aaw0961] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 11/22/2019] [Indexed: 06/10/2023]

For:	Gerber S, Pospisil L, Navandar M, Horenko I. Low-cost scalable discretization, prediction, and feature selection for complex systems. Sci Adv 2020;6:eaaw0961. [PMID: 32064328 PMCID: PMC6989146 DOI: 10.1126/sciadv.aaw0961] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 11/22/2019] [Indexed: 06/10/2023]

Number

Cited by Other Article(s)

Bassetti D, Pospíšil L, Horenko I. On Entropic Learning from Noisy Time Series in the Small Data Regime. ENTROPY (BASEL, SWITZERLAND) 2024;26:553. [PMID: 39056915 PMCID: PMC11276242 DOI: 10.3390/e26070553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/24/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024]

Vecchi E, Bassetti D, Graziato F, Pospíšil L, Horenko I. Gauge-Optimal Approximate Learning for Small Data Classification. Neural Comput 2024;36:1198-1227. [PMID: 38669692 DOI: 10.1162/neco_a_01664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/28/2024]

On cheap entropy-sparsified regression learning. Proc Natl Acad Sci U S A 2023;120:e2214972120. [PMID: 36580592 PMCID: PMC9910478 DOI: 10.1073/pnas.2214972120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Horenko I, Pospíšil L, Vecchi E, Albrecht S, Gerber A, Rehbock B, Stroh A, Gerber S. Low-Cost Probabilistic 3D Denoising with Applications for Ultra-Low-Radiation Computed Tomography. J Imaging 2022;8:jimaging8060156. [PMID: 35735955 PMCID: PMC9224620 DOI: 10.3390/jimaging8060156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 12/04/2022] Open

Vecchi E, Pospíšil L, Albrecht S, O'Kane TJ, Horenko I. eSPA+: Scalable Entropy-Optimal Machine Learning Classification for Small Data Problems. Neural Comput 2022;34:1220-1255. [PMID: 35344997 DOI: 10.1162/neco_a_01490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 11/04/2022]

Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification. Proc Natl Acad Sci U S A 2022;119:2119659119. [PMID: 35197293 PMCID: PMC8917346 DOI: 10.1073/pnas.2119659119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2022] [Indexed: 11/21/2022] Open

Gerber S, Pospisil L, Sys S, Hewel C, Torkamani A, Horenko I. Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics. Front Artif Intell 2022;4:739432. [PMID: 35072059 PMCID: PMC8766632 DOI: 10.3389/frai.2021.739432] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open

Pfenninger M, Reuss F, Kiebler A, Schönnenbeck P, Caliendo C, Gerber S, Cocchiararo B, Reuter S, Blüthgen N, Mody K, Mishra B, Bálint M, Thines M, Feldmeyer B. Genomic basis for drought resistance in European beech forests threatened by climate change. eLife 2021;10:e65532. [PMID: 34132196 PMCID: PMC8266386 DOI: 10.7554/elife.65532] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/07/2021] [Indexed: 12/30/2022] Open

Affiliation(s)

Markus Pfenninger Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany Institute for Organismic and Molecular Evolution, Johannes Gutenberg UniversityMainzGermany LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany
Friederike Reuss Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
Angelika Kiebler Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
Philipp Schönnenbeck Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
Cosima Caliendo Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
Susanne Gerber Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
Berardino Cocchiararo LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany Conservation Genetics Section, Senckenberg Research Institute and Natural History Museum FrankfurtGelnhausenGermany
Sabrina Reuter Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany
Nico Blüthgen Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany
Karsten Mody Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany Department of Applied Ecology, Hochschule Geisenheim UniversityGeisenheimGermany
Bagdevi Mishra Biological Archives, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
Miklós Bálint LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany Functional Environmental Genomics, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany Agricultural Sciences, Nutritional Sciences, and Environmental Management, Universität GiessenGiessenGermany
Marco Thines LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany Biological Archives, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany Institute for Ecology, Evolution and Diversity, Johann Wolfgang Goethe-UniversityFrankfurt am MainGermany
Barbara Feldmeyer Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany

Collapse

Rodrigues DR, Everschor-Sitte K, Gerber S, Horenko I. A deeper look into natural sciences with physics-based and data-driven measures. iScience 2021;24:102171. [PMID: 33665584 PMCID: PMC7907479 DOI: 10.1016/j.isci.2021.102171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

Weißbach S, Sys S, Hewel C, Todorov H, Schweiger S, Winter J, Pfenninger M, Torkamani A, Evans D, Burger J, Everschor-Sitte K, May-Simera HL, Gerber S. Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines. BMC Genomics 2021;22:62. [PMID: 33468057 PMCID: PMC7814447 DOI: 10.1186/s12864-020-07362-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 12/30/2020] [Indexed: 12/14/2022] Open

Abstract

Background

Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact.

Results

The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups.

Conclusion

We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-020-07362-8.

Collapse

Affiliation(s)

Stephan Weißbach Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.,Institute of Developmental Biology and Neurobiology, Johannes Gutenberg-University Mainz, Mainz, Germany
Stanislav Sys Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany
Charlotte Hewel Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany
Hristo Todorov Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany
Susann Schweiger Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.,Leibniz Institute for Resilience Research, Mainz, Germany
Jennifer Winter Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.,Leibniz Institute for Resilience Research, Mainz, Germany
Markus Pfenninger Department of Molecular Ecology, Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany.,Institute for Molecular and Organismic Evolution, Johannes Gutenberg-University Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany.,LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity, and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany
Ali Torkamani Department of Integrative Structural and Computational Biology, Scripps Research Translational Institute, California Campus, San Diego, USA
Doug Evans Department of Integrative Structural and Computational Biology, Scripps Research Translational Institute, California Campus, San Diego, USA
Joachim Burger Institute of Anthropology, Johannes Gutenberg-University Mainz, Mainz, Germany
Karin Everschor-Sitte Institute of Physics, Johannes Gutenberg-University Mainz, Mainz, Germany
Helen Louise May-Simera Institute of Molecular Physiology, Johannes Gutenberg-University Mainz, Mainz, Germany
Susanne Gerber Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.

Collapse

Horenko I. On a Scalable Entropic Breaching of the Overfitting Barrier for Small Data Problems in Machine Learning. Neural Comput 2020;32:1563-1579. [PMID: 32521216 DOI: 10.1162/neco_a_01296] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]