1
|
Hu S, Yang J, Huang J, Li D, Li C. An Improved Kernel Entropy Component Analysis for Damage Detection Under Environmental and Operational Variations. SENSORS (BASEL, SWITZERLAND) 2025; 25:1332. [PMID: 40096133 PMCID: PMC11902547 DOI: 10.3390/s25051332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Revised: 02/17/2025] [Accepted: 02/19/2025] [Indexed: 03/19/2025]
Abstract
Environmental effects often trigger false alarms in vibration-based damage detection methods used for structural health monitoring (SHM). While conventional techniques like Principal Component Analysis (PCA) and cointegration have been somewhat effective in addressing this issue, challenges such as measurement noise, nonlinear behavior, and non-Gaussian data distribution continue to affect their performance. To address these limitations, a novel damage detection method combining Variational Mode Decomposition (VMD) and Dynamic Kernel Entropy Component Analysis (DKECA) is proposed. The proposed method initially uses the VMD technique to remove seasonal patterns and noise from the modal frequencies. Subsequently, a DKECA model is constructed based on a time-delay data matrix, and the principal components that maximize the Rényi entropy in the high-dimensional space are selected. Using these principal components, a damage detector developed from the T2 statistic is used to determine damage indices for SHM. The effectiveness of the proposed method is verified through both a simulated 7-DOF model and real-world data from the Z24 bridge, with comparative studies highlighting its advantages over existing techniques.
Collapse
Affiliation(s)
- Shuigen Hu
- Anhui Provincial International Joint Research Center of Data Diagnosis and Smart Maintenance on Bridge Structures, Chuzhou 239099, China;
| | - Jian Yang
- Department of Civil and Intelligent Construction Engineering, Shantou University, Shantou 515063, China
| | - Jiezhong Huang
- Department of Civil and Intelligent Construction Engineering, Shantou University, Shantou 515063, China
- Guangdong Engineering Center for Structure Safety and Health Monitoring, Shantou University, Shantou 515063, China
| | - Dongsheng Li
- Department of Civil and Intelligent Construction Engineering, Shantou University, Shantou 515063, China
- Guangdong Engineering Center for Structure Safety and Health Monitoring, Shantou University, Shantou 515063, China
- Shantou Key Laboratory of Offshore Wind Energy, Shantou 515063, China
| | - Cheng Li
- Key Laboratory for Health and Safety of Bridge Structures, Wuhan 430034, China
| |
Collapse
|
2
|
Pržulj N, Malod-Dognin N. Simplicity within biological complexity. BIOINFORMATICS ADVANCES 2025; 5:vbae164. [PMID: 39927291 PMCID: PMC11805345 DOI: 10.1093/bioadv/vbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/01/2024] [Accepted: 10/23/2024] [Indexed: 02/11/2025]
Abstract
Motivation Heterogeneous, interconnected, systems-level, molecular (multi-omic) data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. Results In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods (also called graph representation learning) map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications, and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics, focusing on precision medicine and personalized drug discovery. It will lead to a paradigm shift in the computational and biomedical understanding of data and diseases that will open up ways to solve some of the major bottlenecks in precision medicine and other domains.
Collapse
Affiliation(s)
- Nataša Pržulj
- Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
- Barcelona Supercomputing Center, Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E6BT, United Kingdom
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
| | | |
Collapse
|
3
|
Taş G, Westerdijk T, Postma E, Veldink JH, Schönhuth A, Balvert M. Computing linkage disequilibrium aware genome embeddings using autoencoders. Bioinformatics 2024; 40:btae326. [PMID: 38775680 PMCID: PMC11208726 DOI: 10.1093/bioinformatics/btae326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/23/2024] [Accepted: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
MOTIVATION The completion of the genome has paved the way for genome-wide association studies (GWAS), which explained certain proportions of heritability. GWAS are not optimally suited to detect non-linear effects in disease risk, possibly hidden in non-additive interactions (epistasis). Alternative methods for epistasis detection using, e.g. deep neural networks (DNNs) are currently under active development. However, DNNs are constrained by finite computational resources, which can be rapidly depleted due to increasing complexity with the sheer size of the genome. Besides, the curse of dimensionality complicates the task of capturing meaningful genetic patterns for DNNs; therefore necessitates dimensionality reduction. RESULTS We propose a method to compress single nucleotide polymorphism (SNP) data, while leveraging the linkage disequilibrium (LD) structure and preserving potential epistasis. This method involves clustering correlated SNPs into haplotype blocks and training per-block autoencoders to learn a compressed representation of the block's genetic content. We provide an adjustable autoencoder design to accommodate diverse blocks and bypass extensive hyperparameter tuning. We applied this method to genotyping data from Project MinE, and achieved 99% average test reconstruction accuracy-i.e. minimal information loss-while compressing the input to nearly 10% of the original size. We demonstrate that haplotype-block based autoencoders outperform linear Principal Component Analysis (PCA) by approximately 3% chromosome-wide accuracy of reconstructed variants. To the extent of our knowledge, our approach is the first to simultaneously leverage haplotype structure and DNNs for dimensionality reduction of genetic data. AVAILABILITY AND IMPLEMENTATION Data are available for academic use through Project MinE at https://www.projectmine.com/research/data-sharing/, contingent upon terms and requirements specified by the source studies. Code is available at https://github.com/gizem-tas/haploblock-autoencoders.
Collapse
Affiliation(s)
- Gizem Taş
- Department of Econometrics and Operations Research, Tilburg University, Tilburg 5037AB, The Netherlands
| | - Timo Westerdijk
- Department of Neurology, University Medical Center Utrecht, Utrecht 3584CX, The Netherlands
| | - Eric Postma
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg 5037AB, The Netherlands
| | - Jan H Veldink
- Department of Neurology, University Medical Center Utrecht, Utrecht 3584CX, The Netherlands
| | | | - Marleen Balvert
- Department of Econometrics and Operations Research, Tilburg University, Tilburg 5037AB, The Netherlands
| |
Collapse
|
4
|
Khadirnaikar S, Shukla S, Prasanna SRM. Integration of pan-cancer multi-omics data for novel mixed subgroup identification using machine learning methods. PLoS One 2023; 18:e0287176. [PMID: 37856446 PMCID: PMC10586677 DOI: 10.1371/journal.pone.0287176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 05/30/2023] [Indexed: 10/21/2023] Open
Abstract
Cancer is a heterogeneous disease, and patients with tumors from different organs can share similar epigenetic and genetic alterations. Therefore, it is crucial to identify the novel subgroups of patients with similar molecular characteristics. It is possible to propose a better treatment strategy when the heterogeneity of the patient is accounted for during subgroup identification, irrespective of the tissue of origin. This work proposes a machine learning (ML) based pipeline for subgroup identification in pan-cancer. Here, mRNA, miRNA, DNA methylation, and protein expression features from pan-cancer samples were concatenated and non-linearly projected to a lower dimension using an ML algorithm. This data was then clustered to identify multi-omics-based novel subgroups. The clinical characterization of these ML subgroups indicated significant differences in overall survival (OS) and disease-free survival (DFS) (p-value<0.0001). The subgroups formed by the patients from different tumors shared similar molecular alterations in terms of immune microenvironment, mutation profile, and enriched pathways. Further, decision-level and feature-level fused classification models were built to identify the novel subgroups for unseen samples. Additionally, the classification models were used to obtain the class labels for the validation samples, and the molecular characteristics were verified. To summarize, this work identified novel ML subgroups using multi-omics data and showed that the patients with different tumor types could be similar molecularly. We also proposed and validated the classification models for subgroup identification. The proposed classification models can be used to identify the novel multi-omics subgroups, and the molecular characteristics of each subgroup can be used to design appropriate treatment regimen.
Collapse
Affiliation(s)
- Seema Khadirnaikar
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - Sudhanshu Shukla
- Department of Biosciences and Bioengineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - S. R. M. Prasanna
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| |
Collapse
|
5
|
Khadirnaikar S, Shukla S, Prasanna SRM. Machine learning based combination of multi-omics data for subgroup identification in non-small cell lung cancer. Sci Rep 2023; 13:4636. [PMID: 36944673 PMCID: PMC10030850 DOI: 10.1038/s41598-023-31426-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/11/2023] [Indexed: 03/23/2023] Open
Abstract
Non-small Cell Lung Cancer (NSCLC) is a heterogeneous disease with a poor prognosis. Identifying novel subtypes in cancer can help classify patients with similar molecular and clinical phenotypes. This work proposes an end-to-end pipeline for subgroup identification in NSCLC. Here, we used a machine learning (ML) based approach to compress the multi-omics NSCLC data to a lower dimensional space. This data is subjected to consensus K-means clustering to identify the five novel clusters (C1-C5). Survival analysis of the resulting clusters revealed a significant difference in the overall survival of clusters (p-value: 0.019). Each cluster was then molecularly characterized to identify specific molecular characteristics. We found that cluster C3 showed minimal genetic aberration with a high prognosis. Next, classification models were developed using data from each omic level to predict the subgroup of unseen patients. Decision‑level fused classification models were then built using these classifiers, which were used to classify unseen patients into five novel clusters. We also showed that the multi-omics-based classification model outperformed single-omic-based models, and the combination of classifiers proved to be a more accurate prediction model than the individual classifiers. In summary, we have used ML models to develop a classification method and identified five novel NSCLC clusters with different genetic and clinical characteristics.
Collapse
Affiliation(s)
- Seema Khadirnaikar
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, India
| | - Sudhanshu Shukla
- Department of Biosciences and Bioengineering, Indian Institute of Technology Dharwad, Dharwad, India.
| | - S R M Prasanna
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, India
| |
Collapse
|
6
|
Saikia D, Jadhav P, Hole AR, Krishna CM, Singh SP. Growth Kinetics Monitoring of Gram-Negative Pathogenic Microbes Using Raman Spectroscopy. APPLIED SPECTROSCOPY 2022; 76:1263-1271. [PMID: 35694822 DOI: 10.1177/00037028221109624] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Optical density based measurements are routinely performed to monitor the growth of microbes. These measurements solely depend upon the number of cells and do not provide any information about the changes in the biochemical milieu or biological status. An objective information about these parameters is essential for evaluation of novel therapies and for maximizing the metabolite production. In the present study, we have applied Raman spectroscopy to monitor growth kinetics of three different pathogenic Gram-negative microbes Escherichia coli, Pseudomonas aeruginosa, and Acinetobacter baumannii. Spectral measurements were performed under 532 nm excitation with 5 seconds of exposure time. Spectral features suggest temporal changes in the "peptide" and "nucleic acid" content of cells under different growth stages. Using principal component analysis (PCA), successful discrimination between growth phases was also achieved. Overall, the findings are supportive of the prospective adoption of Raman based approaches for monitoring microbial growth.
Collapse
Affiliation(s)
- Dimple Saikia
- Department of Biosciences and Bioengineering, 477529Indian Institute of Technology Dharwad, Dharwad, India
| | - Priyanka Jadhav
- Tata Memorial Centre, 29435Advanced Centre for Treatment Research and Education in Cancer, Navi Mumbai, India
- Training School Complex, Homi Bhabha National Institute, Anushakti Nagar, India
| | - Arti R Hole
- Tata Memorial Centre, 29435Advanced Centre for Treatment Research and Education in Cancer, Navi Mumbai, India
| | - Chilakapati Murali Krishna
- Tata Memorial Centre, 29435Advanced Centre for Treatment Research and Education in Cancer, Navi Mumbai, India
- Training School Complex, Homi Bhabha National Institute, Anushakti Nagar, India
| | - Surya P Singh
- Department of Biosciences and Bioengineering, 477529Indian Institute of Technology Dharwad, Dharwad, India
| |
Collapse
|
7
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:bbac202. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
8
|
Ausmees K, Nettelblad C. A deep learning framework for characterization of genotype data. G3 GENES|GENOMES|GENETICS 2022; 12:6515290. [PMID: 35078229 PMCID: PMC8896001 DOI: 10.1093/g3journal/jkac020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 01/18/2022] [Indexed: 01/05/2023]
Abstract
Dimensionality reduction is a data transformation technique widely used in various fields of genomics research. The application of dimensionality reduction to genotype data is known to capture genetic similarity between individuals, and is used for visualization of genetic variation, identification of population structure as well as ancestry mapping. Among frequently used methods are principal component analysis, which is a linear transform that often misses more fine-scale structures, and neighbor-graph based methods which focus on local relationships rather than large-scale patterns. Deep learning models are a type of nonlinear machine learning method in which the features used in data transformation are decided by the model in a data-driven manner, rather than by the researcher, and have been shown to present a promising alternative to traditional statistical methods for various applications in omics research. In this study, we propose a deep learning model based on a convolutional autoencoder architecture for dimensionality reduction of genotype data. Using a highly diverse cohort of human samples, we demonstrate that the model can identify population clusters and provide richer visual information in comparison to principal component analysis, while preserving global geometry to a higher extent than t-SNE and UMAP, yielding results that are comparable to an alternative deep learning approach based on variational autoencoders. We also discuss the use of the methodology for more general characterization of genotype data, showing that it preserves spatial properties in the form of decay of linkage disequilibrium with distance along the genome and demonstrating its use as a genetic clustering method, comparing results to the ADMIXTURE software frequently used in population genetic studies.
Collapse
Affiliation(s)
- Kristiina Ausmees
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Qin X, Lock TR, Kallenbach RL. DA: Population structure inference using discriminant analysis. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13748] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xinghu Qin
- Beijing Institute of Genomics Chinese Academy of Sciences Beijing China
| | - Thomas Ryan Lock
- Division of Plant Sciences University of Missouri Columbia MO USA
| | | |
Collapse
|
10
|
Durán C, Ciucci S, Palladini A, Ijaz UZ, Zippo AG, Sterbini FP, Masucci L, Cammarota G, Ianiro G, Spuul P, Schroeder M, Grill SW, Parsons BN, Pritchard DM, Posteraro B, Sanguinetti M, Gasbarrini G, Gasbarrini A, Cannistraci CV. Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome. Nat Commun 2021; 12:1926. [PMID: 33771992 PMCID: PMC7997970 DOI: 10.1038/s41467-021-22135-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 02/24/2021] [Indexed: 12/11/2022] Open
Abstract
The stomach is inhabited by diverse microbial communities, co-existing in a dynamic balance. Long-term use of drugs such as proton pump inhibitors (PPIs), or bacterial infection such as Helicobacter pylori, cause significant microbial alterations. Yet, studies revealing how the commensal bacteria re-organize, due to these perturbations of the gastric environment, are in early phase and rely principally on linear techniques for multivariate analysis. Here we disclose the importance of complementing linear dimensionality reduction techniques with nonlinear ones to unveil hidden patterns that remain unseen by linear embedding. Then, we prove the advantages to complete multivariate pattern analysis with differential network analysis, to reveal mechanisms of bacterial network re-organizations which emerge from perturbations induced by a medical treatment (PPIs) or an infectious state (H. pylori). Finally, we show how to build bacteria-metabolite multilayer networks that can deepen our understanding of the metabolite pathways significantly associated to the perturbed microbial communities.
Collapse
Affiliation(s)
- Claudio Durán
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
| | - Sara Ciucci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
| | - Alessandra Palladini
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
- Paul Langerhans Institute Dresden, Helmholtz Zentrum Munchen, Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany
| | - Umer Z Ijaz
- Department of Infrastructure and Environment University of Glasgow, School of Engineering, Glasgow, UK
| | - Antonio G Zippo
- Institute of Neuroscience, Consiglio Nazionale delle Ricerche, Milan, Italy
| | | | - Luca Masucci
- Institute of Microbiology, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Giovanni Cammarota
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Gianluca Ianiro
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Pirjo Spuul
- Department of Chemistry and Biotechnology, Division of Gene Technology, Tallinn University of Technology, Tallinn, 12618, Estonia
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Dresden, Germany
| | - Stephan W Grill
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Dresden, Germany
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Bryony N Parsons
- Department of Cellular and Molecular Physiology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK
| | - D Mark Pritchard
- Department of Cellular and Molecular Physiology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK
- Department of Gastroenterology, Royal Liverpool and Broadgreen University Hospitals NHS Trust, Liverpool, UK
| | - Brunella Posteraro
- Institute of Microbiology, Università Cattolica del Sacro Cuore, Rome, Italy
| | | | - Giovanni Gasbarrini
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Antonio Gasbarrini
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany.
- Center for Complex Network Intelligence (CCNI) at Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Biomedical Engineering, Tsinghua University, Beijing, China.
| |
Collapse
|
11
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
12
|
Abegaz F, Chaichoompu K, Génin E, Fardo DW, König IR, Mahachie John JM, Van Steen K. Principals about principal components in statistical genetics. Brief Bioinform 2020; 20:2200-2216. [PMID: 30219892 DOI: 10.1093/bib/bby081] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 07/21/2018] [Accepted: 08/12/2018] [Indexed: 12/13/2022] Open
Abstract
Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.
Collapse
|
13
|
Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration-a comparison of unsupervised clustering methodologies. Brief Bioinform 2020; 20:1269-1279. [PMID: 29272335 DOI: 10.1093/bib/bbx167] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
With the recent developments in the field of multi-omics integration, the interest in factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased. In this work, the impact of these factors is explored when solving the problem of sample classification, by comparing the performances of five unsupervised algorithms: Multiple Canonical Correlation Analysis, Multiple Co-Inertia Analysis, Multiple Factor Analysis, Joint and Individual Variation Explained and Similarity Network Fusion. These methods were applied to three real data sets taken from literature and several ad hoc simulated scenarios to discuss classification performance in different conditions of noise and signal strength across the data types. The impact of experimental design, feature selection and parameter training has been also evaluated to unravel important conditions that can affect the accuracy of the result.
Collapse
|
14
|
Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
Affiliation(s)
- K Van Steen
- WELBIO, GIGA-R Medical Genomics-BIO3, University of Liège, Liege, Belgium.
- Department of Human Genetics, University of Leuven, Leuven, Belgium.
| | - J H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
15
|
Chaichoompu K, Abegaz F, Tongsima S, Shaw PJ, Sakuntabhai A, Pereira L, Van Steen K. IPCAPS: an R package for iterative pruning to capture population structure. SOURCE CODE FOR BIOLOGY AND MEDICINE 2019; 14:2. [PMID: 30936940 PMCID: PMC6427891 DOI: 10.1186/s13029-019-0072-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 02/21/2019] [Indexed: 01/29/2023]
Abstract
Background Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target. Results This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors. Conclusions IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from http://bio3.giga.ulg.ac.be/ipcaps.
Collapse
Affiliation(s)
- Kridsadakorn Chaichoompu
- 1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium
| | - Fentaw Abegaz
- 1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium
| | - Sissades Tongsima
- 2Genome Technology Research Unit, National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Phahonyothin Road, Khlong Neung, Khlong Luang, Pathum Thani 12120 Thailand
| | - Philip James Shaw
- 3Medical Molecular Biology Research Unit, National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Phahonyothin Road, Khlong Neung, Khlong Luang, Pathum Thani 12120 Thailand
| | - Anavaj Sakuntabhai
- 4Functional Genetics of Infectious Diseases Unit, Institut Pasteur, 25-28, rue du Docteur Roux, 75015 Paris, France.,5Centre National de la Recherche Scientifique, URA3012, Paris, France
| | - Luísa Pereira
- 6Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen, 208, 4200-135 Porto, Portugal.,7Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Rua Júlio Amaral de Carvalho, 45, 4200-135 Porto, Portugal
| | - Kristel Van Steen
- 1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), Avenue Pasteur 6, 1300 Wavre, Belgium
| |
Collapse
|
16
|
Zanardi A, Conti A, Cremonesi M, D'Adamo P, Gilberti E, Apostoli P, Cannistraci CV, Piperno A, David S, Alessio M. Ceruloplasmin replacement therapy ameliorates neurological symptoms in a preclinical model of aceruloplasminemia. EMBO Mol Med 2019; 10:91-106. [PMID: 29183916 PMCID: PMC5760856 DOI: 10.15252/emmm.201708361] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Aceruloplasminemia is a monogenic disease caused by mutations in the ceruloplasmin gene that result in loss of protein ferroxidase activity. Ceruloplasmin plays a role in iron homeostasis, and its activity impairment leads to iron accumulation in liver, pancreas, and brain. Iron deposition promotes diabetes, retinal degeneration, and progressive neurodegeneration. Current therapies mainly based on iron chelation, partially control systemic iron deposition but are ineffective on neurodegeneration. We investigated the potential of ceruloplasmin replacement therapy in reducing the neurological pathology in the ceruloplasmin-knockout (CpKO) mouse model of aceruloplasminemia. CpKO mice were intraperitoneal administered for 2 months with human ceruloplasmin that was able to enter the brain inducing replacement of the protein levels and rescue of ferroxidase activity. Ceruloplasmin-treated mice showed amelioration of motor incoordination that was associated with diminished loss of Purkinje neurons and reduced brain iron deposition, in particular in the choroid plexus. Computational analysis showed that ceruloplasmin-treated CpKO mice share a similar pattern with wild-type animals, highlighting the efficacy of the therapy. These data suggest that enzyme replacement therapy may be a promising strategy for the treatment of aceruloplasminemia.
Collapse
Affiliation(s)
- Alan Zanardi
- Proteome Biochemistry, Division of Genetics and Cell Biology, IRCCS-San Raffaele Scientific Institute, Milan, Italy
| | - Antonio Conti
- Proteome Biochemistry, Division of Genetics and Cell Biology, IRCCS-San Raffaele Scientific Institute, Milan, Italy
| | - Marco Cremonesi
- Proteome Biochemistry, Division of Genetics and Cell Biology, IRCCS-San Raffaele Scientific Institute, Milan, Italy
| | - Patrizia D'Adamo
- Molecular Genetics of Intellectual Disabilities, Division of Neuroscience, IRCCS-San Raffaele Scientific Institute, Milan, Italy
| | - Enrica Gilberti
- Unit of Occupational Health and Industrial Hygiene, Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
| | - Pietro Apostoli
- Unit of Occupational Health and Industrial Hygiene, Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
| | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Dresden, Germany.,Brain Bio-Inspired Computation (BBC) Lab, IRCCS Centro Neurolesi "Bonino Pulejo", Messina, Italy
| | - Alberto Piperno
- School of Medicine and Surgery, University of Milano Bicocca, Monza, Italy.,Centre for Diagnosis and Treatment of Hemochromatosis, ASST-S.Gerardo Hospital, Monza, Italy
| | - Samuel David
- Center for Research in Neuroscience, The Research Institute of The McGill University Health Center, Montreal, QC, Canada
| | - Massimo Alessio
- Proteome Biochemistry, Division of Genetics and Cell Biology, IRCCS-San Raffaele Scientific Institute, Milan, Italy
| |
Collapse
|
17
|
Härtner F, Andrade-Navarro MA, Alanis-Lobato G. Geometric characterisation of disease modules. APPLIED NETWORK SCIENCE 2018; 3:10. [PMID: 30839777 PMCID: PMC6214295 DOI: 10.1007/s41109-018-0066-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 05/28/2018] [Indexed: 05/07/2023]
Abstract
There is an increasing accumulation of evidence supporting the existence of a hyperbolic geometry underlying the network representation of complex systems. In particular, it has been shown that the latent geometry of the human protein network (hPIN) captures biologically relevant information, leading to a meaningful visual representation of protein-protein interactions and translating challenging systems biology problems into measuring distances between proteins. Moreover, proteins can efficiently communicate with each other, without global knowledge of the hPIN structure, via a greedy routing (GR) process in which hyperbolic distances guide biological signals from source to target proteins. It is thanks to this effective information routing throughout the hPIN that the cell operates, communicates with other cells and reacts to environmental changes. As a result, the malfunction of one or a few members of this intricate system can disturb its dynamics and derive in disease phenotypes. In fact, it is known that the proteins associated with a single disease agglomerate non-randomly in the same region of the hPIN, forming one or several connected components known as the disease module (DM). Here, we present a geometric characterisation of DMs. First, we found that DM positions on the two-dimensional hyperbolic plane reflect their fragmentation and functional heterogeneity, rendering an informative picture of the cellular processes that the disease is affecting. Second, we used a distance-based dissimilarity measure to cluster DMs with shared clinical features. Finally, we took advantage of the GR strategy to study how defective proteins affect the transduction of signals throughout the hPIN.
Collapse
Affiliation(s)
- Franziska Härtner
- Faculty for Physics, Mathematics and Computer Science, Johannes Gutenberg Universität, Institute of Computer Science, Staudingerweg 7, Mainz, 55128 Germany
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
| | - Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
| |
Collapse
|
18
|
Miendlarzewska EA, Ciucci S, Cannistraci CV, Bavelier D, Schwartz S. Reward-enhanced encoding improves relearning of forgotten associations. Sci Rep 2018; 8:8557. [PMID: 29867116 PMCID: PMC5986818 DOI: 10.1038/s41598-018-26929-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 05/18/2018] [Indexed: 12/16/2022] Open
Abstract
Research on human memory has shown that monetary incentives can enhance hippocampal memory consolidation and thereby protect memory traces from forgetting. However, it is not known whether initial reward may facilitate the recovery of already forgotten memories weeks after learning. Here, we investigated the influence of monetary reward on later relearning. Nineteen healthy human participants learned object-location associations, for half of which we offered money. Six weeks later, most of these associations had been forgotten as measured by a test of declarative memory. Yet, relearning in the absence of any reward was faster for the originally rewarded associations. Thus, associative memories encoded in a state of monetary reward motivation may persist in a latent form despite the failure to retrieve them explicitly. Alternatively, such facilitation could be analogous to the renewal effect observed in animal conditioning, whereby a reward-associated cue can reinstate anticipatory arousal, which would in turn modulate relearning. This finding has important implications for learning and education, suggesting that even when learned information is no longer accessible via explicit retrieval, the enduring effects of a past prospect of reward could facilitate its recovery.
Collapse
Affiliation(s)
- Ewa A Miendlarzewska
- Department of Neuroscience, University of Geneva, Geneva, Switzerland. .,Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland. .,Geneva Finance Research Institute, University of Geneva, Geneva, Switzerland.
| | - Sara Ciucci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307, Dresden, Germany.,Lipotype GmbH, Tatzberg 47, 01307, Dresden, Germany
| | - Carlo V Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307, Dresden, Germany.,Brain Bio-Inspired Computing (BBC) Lab, IRCCS Centro Neurolesi "Bonino Pulejo", Messina, 98124, Italy
| | - Daphne Bavelier
- Psychology Section, FPSE, University of Geneva, Geneva, Switzerland.,Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States
| | - Sophie Schwartz
- Department of Neuroscience, University of Geneva, Geneva, Switzerland. .,Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland. .,Geneva Neuroscience Center, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
19
|
Lötsch J, Lippmann C, Kringel D, Ultsch A. Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective. Front Mol Neurosci 2017; 10:252. [PMID: 28848388 PMCID: PMC5550731 DOI: 10.3389/fnmol.2017.00252] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 07/26/2017] [Indexed: 12/31/2022] Open
Abstract
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence.
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany.,Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Catharina Lippmann
- Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Dario Kringel
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany
| | - Alfred Ultsch
- DataBionics Research Group, University of MarburgMarburg, Germany
| |
Collapse
|
20
|
Lorimer T, Held J, Stoop R. Clustering: how much bias do we need? PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2017; 375:rsta.2016.0293. [PMID: 28507238 PMCID: PMC5434083 DOI: 10.1098/rsta.2016.0293] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/05/2016] [Indexed: 05/05/2023]
Abstract
Scientific investigations in medicine and beyond increasingly require observations to be described by more features than can be simultaneously visualized. Simply reducing the dimensionality by projections destroys essential relationships in the data. Similarly, traditional clustering algorithms introduce data bias that prevents detection of natural structures expected from generic nonlinear processes. We examine how these problems can best be addressed, where in particular we focus on two recent clustering approaches, Phenograph and Hebbian learning clustering, applied to synthetic and natural data examples. Our results reveal that already for very basic questions, minimizing clustering bias is essential, but that results can benefit further from biased post-processing.This article is part of the themed issue 'Mathematical methods in medicine: neuroscience, cardiology and pathology'.
Collapse
Affiliation(s)
- Tom Lorimer
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Jenny Held
- Eawag, Überlandstrasse 133, 8600 Dübendorf, Switzerland
| | - Ruedi Stoop
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| |
Collapse
|
21
|
Ciucci S, Ge Y, Durán C, Palladini A, Jiménez-Jiménez V, Martínez-Sánchez LM, Wang Y, Sales S, Shevchenko A, Poser SW, Herbig M, Otto O, Androutsellis-Theotokis A, Guck J, Gerl MJ, Cannistraci CV. Enlightening discriminative network functional modules behind Principal Component Analysis separation in differential-omic science studies. Sci Rep 2017; 7:43946. [PMID: 28287094 PMCID: PMC5347127 DOI: 10.1038/srep43946] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/06/2017] [Indexed: 01/08/2023] Open
Abstract
Omic science is rapidly growing and one of the most employed techniques to explore differential patterns in omic datasets is principal component analysis (PCA). However, a method to enlighten the network of omic features that mostly contribute to the sample separation obtained by PCA is missing. An alternative is to build correlation networks between univariately-selected significant omic features, but this neglects the multivariate unsupervised feature compression responsible for the PCA sample segregation. Biologists and medical researchers often prefer effective methods that offer an immediate interpretation to complicated algorithms that in principle promise an improvement but in practice are difficult to be applied and interpreted. Here we present PC-corr: a simple algorithm that associates to any PCA segregation a discriminative network of features. Such network can be inspected in search of functional modules useful in the definition of combinatorial and multiscale biomarkers from multifaceted omic data in systems and precision biomedicine. We offer proofs of PC-corr efficacy on lipidomic, metagenomic, developmental genomic, population genetic, cancer promoteromic and cancer stem-cell mechanomic data. Finally, PC-corr is a general functional network inference approach that can be easily adopted for big data exploration in computer science and analysis of complex systems in physics.
Collapse
Affiliation(s)
- Sara Ciucci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.,Lipotype GmbH, Tatzberg 47, 01307 Dresden, Germany
| | - Yan Ge
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Claudio Durán
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Alessandra Palladini
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.,Lipotype GmbH, Tatzberg 47, 01307 Dresden, Germany.,Membrane Biochemistry Group, DZD Paul Langerhans Institute, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Víctor Jiménez-Jiménez
- Integrin Signalling Group, Fundación Centro Nacional de Investigaciones Cardiovasculares Carlos III, Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Luisa María Martínez-Sánchez
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Yuting Wang
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany.,Center for Regenerative Therapies Dresden (CRTD), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstraße 105, 01307 Dresden, Germany
| | - Susanne Sales
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany
| | - Andrej Shevchenko
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany
| | - Steven W Poser
- Department of Internal Medicine III, University Hospital Carl Gustav Carus at the Technische Universität Dresden, Fetscherstr.74, 01307 Dresden, Germany
| | - Maik Herbig
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Oliver Otto
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Andreas Androutsellis-Theotokis
- Center for Regenerative Therapies Dresden (CRTD), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstraße 105, 01307 Dresden, Germany.,Department of Internal Medicine III, University Hospital Carl Gustav Carus at the Technische Universität Dresden, Fetscherstr.74, 01307 Dresden, Germany.,Department of Stem Cell Biology, Centre for Biomolecular Sciences, Division of Cancer and Stem Cells, School of Medicine, University of Nottingham, Nottingham NG7 2RD, U.K
| | - Jochen Guck
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | | | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| |
Collapse
|
22
|
Gender, Contraceptives and Individual Metabolic Predisposition Shape a Healthy Plasma Lipidome. Sci Rep 2016; 6:27710. [PMID: 27295977 PMCID: PMC4906355 DOI: 10.1038/srep27710] [Citation(s) in RCA: 88] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 05/24/2016] [Indexed: 12/26/2022] Open
Abstract
Lipidomics of human blood plasma is an emerging biomarker discovery approach that compares lipid profiles under pathological and physiologically normal conditions, but how a healthy lipidome varies within the population is poorly understood. By quantifying 281 molecular species from 27 major lipid classes in the plasma of 71 healthy young Caucasians whose 35 clinical blood test and anthropometric indices matched the medical norm, we provided a comprehensive, expandable and clinically relevant resource of reference molar concentrations of individual lipids. We established that gender is a major lipidomic factor, whose impact is strongly enhanced by hormonal contraceptives and mediated by sex hormone-binding globulin. In lipidomics epidemiological studies should avoid mixed-gender cohorts and females taking hormonal contraceptives should be considered as a separate sub-cohort. Within a gender-restricted cohort lipidomics revealed a compositional signature that indicates the predisposition towards an early development of metabolic syndrome in ca. 25% of healthy male individuals suggesting a healthy plasma lipidome as resource for early biomarker discovery.
Collapse
|
23
|
Alanis-Lobato G. Mining protein interactomes to improve their reliability and support the advancement of network medicine. Front Genet 2015; 6:296. [PMID: 26442112 PMCID: PMC4585290 DOI: 10.3389/fgene.2015.00296] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/07/2015] [Indexed: 12/12/2022] Open
Abstract
High-throughput detection of protein interactions has had a major impact in our understanding of the intricate molecular machinery underlying the living cell, and has permitted the construction of very large protein interactomes. The protein networks that are currently available are incomplete and a significant percentage of their interactions are false positives. Fortunately, the structural properties observed in good quality social or technological networks are also present in biological systems. This has encouraged the development of tools, to improve the reliability of protein networks and predict new interactions based merely on the topological characteristics of their components. Since diseases are rarely caused by the malfunction of a single protein, having a more complete and reliable interactome is crucial in order to identify groups of inter-related proteins involved in disease etiology. These system components can then be targeted with minimal collateral damage. In this article, an important number of network mining tools is reviewed, together with resources from which reliable protein interactomes can be constructed. In addition to the review, a few representative examples of how molecular and clinical data can be integrated to deepen our understanding of pathogenesis are discussed.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Faculty of Biology, Institute of Molecular Biology, Johannes Gutenberg University of Mainz Mainz, Germany ; Integrative Systems Biology Lab, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology Thuwal, Saudi Arabia
| |
Collapse
|