Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim H, Golub GH, Park H. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 2004;21:187-98. [PMID: 15333461 DOI: 10.1093/bioinformatics/bth499] [Citation(s) in RCA: 198] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Kim H, Golub GH, Park H. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 2004;21:187-98. [PMID: 15333461 DOI: 10.1093/bioinformatics/bth499] [Citation(s) in RCA: 198] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Sakthivel K, Lal SB, Srivastava S, Chaturvedi KK, Khan YJ, Mishra DC, Madival SD, Vaidhyanathan R, Jha GK. A Statistical Approach for Identifying the Best Combination of Normalization and Imputation Methods for Label-Free Proteomics Expression Data. J Proteome Res 2025;24:158-170. [PMID: 39659155 DOI: 10.1021/acs.jproteome.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]

Schumann Y, Gocke A, Neumann JE. Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets. Proteomics 2025;25:e202400100. [PMID: 39740174 DOI: 10.1002/pmic.202400100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/08/2024] [Accepted: 11/26/2024] [Indexed: 01/02/2025]

Abstract

Molecular profiling of different omic-modalities (e.g., DNA methylomics, transcriptomics, proteomics) in biological systems represents the basis for research and clinical decision-making. Measurement-specific biases, so-called batch effects, often hinder the integration of independently acquired datasets, and missing values further hamper the applicability of typical data processing algorithms. In addition to careful experimental design, well-defined standards in data acquisition and data exchange, the alleviation of these phenomena particularly requires a dedicated data integration and preprocessing pipeline. This review aims to give a comprehensive overview of computational methods for data integration and missing value imputation for omic data analyses. We provide formal definitions for missing value mechanisms and propose a novel statistical taxonomy for batch effects, especially in the presence of missing data. Based on an automated document search and systematic literature review, we describe 32 distinct data integration methods from five main methodological categories, as well as 37 algorithms for missing value imputation from five separate categories. Additionally, this review highlights multiple quantitative evaluation methods to aid researchers in selecting a suitable set of methods for their work. Finally, this work provides an integrated discussion of the relevance of batch effects and missing values in omics with corresponding method recommendations. We then propose a comprehensive three-step workflow from the study conception to final data analysis and deduce perspectives for future research. Eventually, we present a comprehensive flow chart as well as exemplary decision trees to aid practitioners in the selection of specific approaches for imputation and data integration in their studies.

Collapse

Etourneau L, Fancello L, Wieczorek S, Varoquaux N, Burger T. Penalized likelihood optimization for censored missing value imputation in proteomics. Biostatistics 2024;26:kxaf006. [PMID: 40120089 DOI: 10.1093/biostatistics/kxaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 01/31/2025] [Accepted: 02/03/2025] [Indexed: 03/25/2025] Open

Ryan-Despraz J, Wissler A. Imputation methods for mixed datasets in bioarchaeology. ARCHAEOLOGICAL AND ANTHROPOLOGICAL SCIENCES 2024;16:187. [PMID: 39450370 PMCID: PMC11496361 DOI: 10.1007/s12520-024-02078-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/16/2024] [Indexed: 10/26/2024]

Chungnoy K, Tanantong T, Songmuang P. Missing value imputation on gene expression data using bee-based algorithm to improve classification performance. PLoS One 2024;19:e0305492. [PMID: 39208345 PMCID: PMC11361674 DOI: 10.1371/journal.pone.0305492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 05/28/2024] [Indexed: 09/04/2024] Open

Manis G, Platakis D, Sassi R. Sample Entropy Computation on Signals with Missing Values. ENTROPY (BASEL, SWITZERLAND) 2024;26:704. [PMID: 39202174 PMCID: PMC11353543 DOI: 10.3390/e26080704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/03/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024]

Lane RE, Korbie D, Khanna KK, Mohamed A, Hill MM, Trau M. Defining the relationship between cellular and extracellular vesicle (EV) content in breast cancer via an integrative multi-omic analysis. Proteomics 2024;24:e2300089. [PMID: 38168906 DOI: 10.1002/pmic.202300089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 01/05/2024]

Li Q, Button-Simons KA, Sievert MAC, Chahoud E, Foster GF, Meis K, Ferdig MT, Milenković T. Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum. Genes (Basel) 2024;15:685. [PMID: 38927622 PMCID: PMC11202799 DOI: 10.3390/genes15060685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open

Abstract

BACKGROUND

Malaria results in more than 550,000 deaths each year due to drug resistance in the most lethal Plasmodium (P.) species P. falciparum. A full P. falciparum genome was published in 2002, yet 44.6% of its genes have unknown functions. Improving the functional annotation of genes is important for identifying drug targets and understanding the evolution of drug resistance.

RESULTS

Genes function by interacting with one another. So, analyzing gene co-expression networks can enhance functional annotations and prioritize genes for wet lab validation. Earlier efforts to build gene co-expression networks in P. falciparum have been limited to a single network inference method or gaining biological understanding for only a single gene and its interacting partners. Here, we explore multiple inference methods and aim to systematically predict functional annotations for all P. falciparum genes. We evaluate each inferred network based on how well it predicts existing gene-Gene Ontology (GO) term annotations using network clustering and leave-one-out crossvalidation. We assess overlaps of the different networks' edges (gene co-expression relationships), as well as predicted functional knowledge. The networks' edges are overall complementary: 47-85% of all edges are unique to each network. In terms of the accuracy of predicting gene functional annotations, all networks yielded relatively high precision (as high as 87% for the network inferred using mutual information), but the highest recall reached was below 15%. All networks having low recall means that none of them capture a large amount of all existing gene-GO term annotations. In fact, their annotation predictions are highly complementary, with the largest pairwise overlap of only 27%. We provide ranked lists of inferred gene-gene interactions and predicted gene-GO term annotations for future use and wet lab validation by the malaria community.

CONCLUSIONS

The different networks seem to capture different aspects of the P. falciparum biology in terms of both inferred interactions and predicted gene functional annotations. Thus, relying on a single network inference method should be avoided when possible.

SUPPLEMENTARY DATA

Attached.

Collapse

Gong Y, Ding W, Wang P, Wu Q, Yao X, Yang Q. Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics. J Chem Inf Model 2023;63:7628-7641. [PMID: 38079572 DOI: 10.1021/acs.jcim.3c01525] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]

Abstract

Multiclass metabolomic studies have become popular for revealing the differences in multiple stages of complex diseases, various lifestyles, or the effects of specific treatments. In multiclass metabolomics, there are multiple data manipulation steps for analyzing raw data, which consist of data filtering, the imputation of missing values, data normalization, marker identification, sample separation, classification, and so on. In each step, several to dozens of machine learning methods can be chosen for the given data set, with potentially hundreds or thousands of method combinations in the whole data processing chain. Therefore, a clear understanding of these machine learning methods is helpful for selecting an appropriate method combination for obtaining stable and reliable analytical results of specific data. However, there has rarely been an overall introduction or evaluation of these methods based on multiclass metabolomic data. Herein, detailed descriptions of these machine learning methods in multiple data manipulation steps are reviewed. Moreover, an assessment of these methods was performed using a benchmark data set for multiclass metabolomics. First, 12 imputation methods for imputing missing values were evaluated based on the PSS (Procrustes statistical shape analysis) and NRMSE (normalized root-mean-square error) values. Second, 17 normalization methods for processing multiclass metabolomic data were evaluated by applying the PMAD (pooled median absolute deviation) value. Third, different methods of identifying markers of multiclass metabolomics were evaluated based on the CWrel (relative weighted consistency) value. Fourth, nine classification methods for constructing multiclass models were assessed using the AUC (area under the curve) value. Performance evaluations of machine learning methods are highly recommended to select the most appropriate method combination before performing the final analysis of the given data. Overall, detailed descriptions and evaluation of various machine learning methods are expected to improve analyses of multiclass metabolomic data.

Collapse

Jung M, Zimmermann R. Quantitative Mass Spectrometry Characterizes Client Spectra of Components for Targeting of Membrane Proteins to and Their Insertion into the Membrane of the Human ER. Int J Mol Sci 2023;24:14166. [PMID: 37762469 PMCID: PMC10532041 DOI: 10.3390/ijms241814166] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/07/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open

Abstract

To elucidate the redundancy in the components for the targeting of membrane proteins to the endoplasmic reticulum (ER) and/or their insertion into the ER membrane under physiological conditions, we previously analyzed different human cells by label-free quantitative mass spectrometry. The HeLa and HEK293 cells had been depleted of a certain component by siRNA or CRISPR/Cas9 treatment or were deficient patient fibroblasts and compared to the respective control cells by differential protein abundance analysis. In addition to clients of the SRP and Sec61 complex, we identified membrane protein clients of components of the TRC/GET, SND, and PEX3 pathways for ER targeting, and Sec62, Sec63, TRAM1, and TRAP as putative auxiliary components of the Sec61 complex. Here, a comprehensive evaluation of these previously described differential protein abundance analyses, as well as similar analyses on the Sec61-co-operating EMC and the characteristics of the topogenic sequences of the various membrane protein clients, i.e., the client spectra of the components, are reported. As expected, the analysis characterized membrane protein precursors with cleavable amino-terminal signal peptides or amino-terminal transmembrane helices as predominant clients of SRP, as well as the Sec61 complex, while precursors with more central or even carboxy-terminal ones were found to dominate the client spectra of the SND and TRC/GET pathways for membrane targeting. For membrane protein insertion, the auxiliary Sec61 channel components indeed share the client spectra of the Sec61 complex to a large extent. However, we also detected some unexpected differences, particularly related to EMC, TRAP, and TRAM1. The possible mechanistic implications for membrane protein biogenesis at the human ER are discussed and can be expected to eventually advance our understanding of the mechanisms that are involved in the so-called Sec61-channelopathies, resulting from deficient ER protein import.

Collapse

Kong W, Wong BJH, Hui HWH, Lim KP, Wang Y, Wong L, Goh WWB. ProJect: a powerful mixed-model missing value imputation method. Brief Bioinform 2023:bbad233. [PMID: 37419612 DOI: 10.1093/bib/bbad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/24/2023] [Accepted: 06/05/2023] [Indexed: 07/09/2023] Open

Abstract

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.

Collapse

Dutt M, Hartel G, Richards RS, Shah AK, Mohamed A, Apostolidou S, Gentry‐Maharaj A, Australian Ovarian Cancer Study Group, Hooper JD, Perrin LC, Menon U, Hill MM. Discovery and validation of serum glycoprotein biomarkers for high grade serous ovarian cancer. Proteomics Clin Appl 2023;17:e2200114. [PMID: 37147936 PMCID: PMC7615076 DOI: 10.1002/prca.202200114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/06/2023] [Accepted: 04/27/2023] [Indexed: 05/07/2023]

Sun W, He Q, Liu J, Xiao X, Wu Y, Zhou S, Ma S, Wang R. Dynamic monitoring of maize grain quality based on remote sensing data. FRONTIERS IN PLANT SCIENCE 2023;14:1177477. [PMID: 37426960 PMCID: PMC10325687 DOI: 10.3389/fpls.2023.1177477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 05/31/2023] [Indexed: 07/11/2023]

Wu E, Trevino AE, Wu Z, Swanson K, Kim HJ, D’Angio HB, Preska R, Chiou AE, Charville GW, Dalerba P, Duvvuri U, Colevas AD, Levi J, Bedi N, Chang S, Sunwoo J, Egloff AM, Uppaluri R, Mayer AT, Zou J. 7-UP: Generating in silico CODEX from a small set of immunofluorescence markers. PNAS NEXUS 2023;2:pgad171. [PMID: 37275261 PMCID: PMC10236358 DOI: 10.1093/pnasnexus/pgad171] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 05/15/2023] [Indexed: 06/07/2023]

Sapashnik D, Newman R, Pietras CM, Zhou D, Devkota K, Qu F, Kofman L, Boudreau S, Fried I, Slonim DK. Cell-specific imputation of drug connectivity mapping with incomplete data. PLoS One 2023;18:e0278289. [PMID: 36795645 PMCID: PMC9934325 DOI: 10.1371/journal.pone.0278289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 11/15/2022] [Indexed: 02/17/2023] Open

Fan S, Wilson CM, Fridley BL, Li Q. Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis. Methods Mol Biol 2023;2629:247-269. [PMID: 36929081 DOI: 10.1007/978-1-0716-2986-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]

Qureshi R, Zou B, Alam T, Wu J, Lee VHF, Yan H. Computational Methods for the Analysis and Prediction of EGFR-Mutated Lung Cancer Drug Resistance: Recent Advances in Drug Design, Challenges and Future Prospects. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:238-255. [PMID: 35007197 DOI: 10.1109/tcbb.2022.3141697] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022;22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]

Li H, Cao Q, Bai Q, Li Z, Hu H. Multistate time series imputation using generative adversarial network with applications to traffic data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07961-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Dubey A, Rasool A. Usage of Clustering and Weighted Nearest Neighbors for Efficient Missing Data Imputation of Microarray Gene Expression Dataset. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Soemartojo SM, Siswantining T, Fernando Y, Sarwinda D, Al-Ash HS, Syarofina S, Saputra N. Iterative bicluster-based Bayesian principal component analysis and least squares for missing-value imputation in microarray and RNA-sequencing data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022;19:8741-8759. [PMID: 35942733 DOI: 10.3934/mbe.2022405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data. Soft comput 2022. [DOI: 10.1007/s00500-022-07180-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Pham TH, Qiu Y, Liu J, Zimmer S, O’Neill E, Xie L, Zhang P. Chemical-induced gene expression ranking and its application to pancreatic cancer drug repurposing. PATTERNS 2022;3:100441. [PMID: 35465231 PMCID: PMC9023899 DOI: 10.1016/j.patter.2022.100441] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/13/2021] [Accepted: 01/12/2022] [Indexed: 12/18/2022]

Abstract

Chemical-induced gene expression profiles provide critical information of chemicals in a biological system, thus offering new opportunities for drug discovery. Despite their success, large-scale analysis leveraging gene expressions is limited by time and cost. Although several methods for predicting gene expressions were proposed, they only focused on imputation and classification settings, which have limited applications to real-world scenarios of drug discovery. Therefore, a chemical-induced gene expression ranking (CIGER) framework is proposed to target a more realistic but more challenging setting in which overall rankings in gene expression profiles induced by de novo chemicals are predicted. The experimental results show that CIGER significantly outperforms existing methods in both ranking and classification metrics. Furthermore, a drug screening pipeline based on CIGER is proposed to identify potential treatments of drug-resistant pancreatic cancer. Our predictions have been validated by experiments, thereby showing the effectiveness of CIGER for phenotypic compound screening of precision medicine.

•

A new deep-learning method (CIGER) for chemical-induced gene expression ranking

•

CIGER can predict gene expression for de novo chemicals from chemical structures

•

We discovered drugs for the treatment of drug-resistant pancreatic cancer

In recent years, a phenotype-based drug discovery approach using chemical-induced gene expressions has shown to be effective in drug discovery and precision medicine. However, it is not feasible to experimentally determine chemical-induced gene expressions for all available chemicals of interest, thereby hindering the application of gene expression-based compound screening on a large scale. Thus, it is crucial to design a computational approach that can generate gene expression information for any chemicals. We proposed a new, deep-learning framework named chemical-induced gene expression ranking (CIGER) to predict a landmark gene expression profile (i.e., gene ranking) induced by de novo chemicals based on their chemical structures. Leveraging CIGER, we predicted and experimentally validated that several existing drugs can increase the therapeutic response on drug-resistant pancreatic cancer. Our results demonstrated the effectiveness of CIGER for precision drug discovery in practice.

Collapse

Shi Q, Miao T, Liu Y, Hu L, Yang H, Shen H, Piao M, Huang Z, Zhang Z. Fabrication and Decryption of a Microarray of Digital Dithiosuccinimide Oligomers. Macromol Rapid Commun 2022;43:e2200029. [PMID: 35322486 DOI: 10.1002/marc.202200029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/11/2022] [Indexed: 11/11/2022]

Mohammad Mirzaei N, Changizi N, Asadpoure A, Su S, Sofia D, Tatarova Z, Zervantonakis IK, Chang YH, Shahriyari L. Investigating key cell types and molecules dynamics in PyMT mice model of breast cancer through a mathematical model. PLoS Comput Biol 2022;18:e1009953. [PMID: 35294447 PMCID: PMC8959189 DOI: 10.1371/journal.pcbi.1009953] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/28/2022] [Accepted: 02/22/2022] [Indexed: 02/07/2023] Open

Ni Z, Zheng X, Zheng X, Zou X. scLRTD : A Novel Low Rank Tensor Decomposition Method for Imputing Missing Values in Single-Cell Multi-Omics Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1144-1153. [PMID: 32960767 DOI: 10.1109/tcbb.2020.3025804] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Baruzzo G, Patuzzi I, Di Camillo B. Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results. BMC Bioinformatics 2022;22:618. [PMID: 35130833 PMCID: PMC8822630 DOI: 10.1186/s12859-022-04587-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered.

RESULTS

In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis.

CONCLUSIONS

The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts.

Collapse

Dubey A, Rasool A. Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour. Sci Rep 2021;11:24297. [PMID: 34934107 PMCID: PMC8692342 DOI: 10.1038/s41598-021-03438-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/22/2021] [Indexed: 02/03/2023] Open

Gwark S, Ahn HS, Yeom J, Yu J, Oh Y, Jeong JH, Ahn JH, Jung KH, Kim SB, Lee HJ, Gong G, Lee SB, Chung IY, Kim HJ, Ko BS, Lee JW, Son BH, Ahn SH, Kim K, Kim J. Plasma Proteome Signature to Predict the Outcome of Breast Cancer Patients Receiving Neoadjuvant Chemotherapy. Cancers (Basel) 2021;13:6267. [PMID: 34944885 PMCID: PMC8699627 DOI: 10.3390/cancers13246267] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 12/07/2021] [Accepted: 12/10/2021] [Indexed: 12/31/2022] Open

Affiliation(s)

Sungchan Gwark Department of Surgery, Ewha Womans University Mokdong Hospital, Ewha Womans University College of Medicine, Seoul 07985, Korea;
Hee-Sung Ahn Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea; (H.-S.A.); (J.Y.); (Y.O.) Convergence Medicine Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea;
Jeonghun Yeom Convergence Medicine Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea;
Jiyoung Yu Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea; (H.-S.A.); (J.Y.); (Y.O.)
Yumi Oh Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea; (H.-S.A.); (J.Y.); (Y.O.) Department of Biomedical Sciences, University of Ulsan College of Medicine, Seoul 05505, Korea
Jae Ho Jeong Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (J.H.J.); (J.-H.A.); (K.H.J.); (S.-B.K.)
Jin-Hee Ahn Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (J.H.J.); (J.-H.A.); (K.H.J.); (S.-B.K.)
Kyung Hae Jung Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (J.H.J.); (J.-H.A.); (K.H.J.); (S.-B.K.)
Sung-Bae Kim Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (J.H.J.); (J.-H.A.); (K.H.J.); (S.-B.K.)
Hee Jin Lee Department of Pathology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (H.J.L.); (G.G.)
Gyungyub Gong Department of Pathology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (H.J.L.); (G.G.)
Sae Byul Lee Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Il Yong Chung Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Hee Jeong Kim Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Beom Seok Ko Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Jong Won Lee Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Byung Ho Son Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Sei Hyun Ahn Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)
Kyunggon Kim Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea; (H.-S.A.); (J.Y.); (Y.O.) Convergence Medicine Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea; Department of Biomedical Sciences, University of Ulsan College of Medicine, Seoul 05505, Korea Clinical Proteomics Core Laboratory, Convergence Medicine Research Center, Asan Medical Center, Seoul 05505, Korea Bio-Medical Institute of Technology, Asan Medical Center, Seoul 05505, Korea
Jisun Kim Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea; (S.B.L.); (I.Y.C.); (H.J.K.); (B.S.K.); (J.W.L.); (B.H.S.); (S.H.A.)

Collapse

Climer S. Connecting the dots: The boons and banes of network modeling. PATTERNS (NEW YORK, N.Y.) 2021;2:100374. [PMID: 34950902 PMCID: PMC8672149 DOI: 10.1016/j.patter.2021.100374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Zimmermann R, Lang S, Lerner M, Förster F, Nguyen D, Helms V, Schrul B. Quantitative Proteomics and Differential Protein Abundance Analysis after the Depletion of PEX3 from Human Cells Identifies Additional Aspects of Protein Targeting to the ER. Int J Mol Sci 2021;22:ijms222313028. [PMID: 34884833 PMCID: PMC8658024 DOI: 10.3390/ijms222313028] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/19/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open

Abstract

Protein import into the endoplasmic reticulum (ER) is the first step in the biogenesis of around 10,000 different soluble and membrane proteins in humans. It involves the co- or post-translational targeting of precursor polypeptides to the ER, and their subsequent membrane insertion or translocation. So far, three pathways for the ER targeting of precursor polypeptides and four pathways for the ER targeting of mRNAs have been described. Typically, these pathways deliver their substrates to the Sec61 polypeptide-conducting channel in the ER membrane. Next, the precursor polypeptides are inserted into the ER membrane or translocated into the ER lumen, which may involve auxiliary translocation components, such as the TRAP and Sec62/Sec63 complexes, or auxiliary membrane protein insertases, such as EMC and the TMCO1 complex. Recently, the PEX19/PEX3-dependent pathway, which has a well-known function in targeting and inserting various peroxisomal membrane proteins into pre-existent peroxisomal membranes, was also found to act in the targeting and, putatively, insertion of monotopic hairpin proteins into the ER. These either remain in the ER as resident ER membrane proteins, or are pinched off from the ER as components of new lipid droplets. Therefore, the question arose as to whether this pathway may play a more general role in ER protein targeting, i.e., whether it represents a fourth pathway for the ER targeting of precursor polypeptides. Thus, we addressed the client spectrum of the PEX19/PEX3-dependent pathway in both PEX3-depleted HeLa cells and PEX3-deficient Zellweger patient fibroblasts by an established approach which involved the label-free quantitative mass spectrometry of the total proteome of depleted or deficient cells, as well as differential protein abundance analysis. The negatively affected proteins included twelve peroxisomal proteins and two hairpin proteins of the ER, thus confirming two previously identified classes of putative PEX19/PEX3 clients in human cells. Interestingly, fourteen collagen-related proteins with signal peptides or N-terminal transmembrane helices belonging to the secretory pathway were also negatively affected by PEX3 deficiency, which may suggest compromised collagen biogenesis as a hitherto-unknown contributor to organ failures in the respective Zellweger patients.

Collapse

Wang J, Zou Q, Lin C. A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data. Brief Bioinform 2021;23:6361043. [PMID: 34472590 DOI: 10.1093/bib/bbab345] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 07/22/2021] [Accepted: 08/04/2021] [Indexed: 11/13/2022] Open

Wu Y, Qie R, Cheng M, Zeng Y, Huang S, Guo C, Zhou Q, Li Q, Tian G, Han M, Zhang Y, Wu X, Li Y, Zhao Y, Yang X, Feng Y, Liu D, Qin P, Hu D, Hu F, Xu L, Zhang M. Air pollution and DNA methylation in adults: A systematic review and meta-analysis of observational studies. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2021;284:117152. [PMID: 33895575 DOI: 10.1016/j.envpol.2021.117152] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 05/24/2023]

Abstract

This systematic review and meta-analysis aimed to investigate the association between air pollution and DNA methylation in adults from published observational studies. PubMed, Web of Science and Embase databases were systematically searched for available studies on the association between air pollution and DNA methylation published up to March 9, 2021. Three DNA methylation approaches were considered: global methylation, candidate-gene, and epigenome-wide association studies (EWAS). Meta-analysis was used to summarize the combined estimates for the association between air pollutants and global DNA methylation levels. Heterogeneity was assessed with the Cochran Q test and quantified with the I² statistic. In total, 38 articles were included in this study: 16 using global methylation, 18 using candidate genes, and 11 using EWAS, with 7 studies using more than one approach. Meta-analysis revealed an imprecise but inverse association between exposure to PM_2.5 and global DNA methylation (for each 10-μg/m³ PM_2.5, combined estimate: 0.39; 95% confidence interval: 0.97 - 0.19). The candidate-gene results were consistent for the ERCC3 and SOX2 genes, suggesting hypermethylation in ERCC3 associated with benzene and that in SOX2 associated with PM_2.5 exposure. EWAS identified 201 CpG sites and 148 differentially methylated regions that showed differential methylation associated with air pollution. Among the 307 genes investigated in 11 EWAS, a locus in nucleoredoxin gene was found to be positively associated with PM_2.5 in two studies. Current meta-analysis indicates that PM_2.5 is imprecisely and inversely associated with DNA methylation. The candidate-gene results consistently suggest hypermethylation in ERCC3 associated with benzene exposure and that in SOX2 associated with PM_2.5 exposure. The Kyoto Encyclopedia of Genes and Genomes (KEGG) network analyses revealed that these genes were associated with African trypanosomiasis, Malaria, Antifolate resistance, Graft-versus-host disease, and so on. More evidence is needed to clarify the association between air pollution and DNA methylation.

Collapse

Affiliation(s)

Yuying Wu Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Ranran Qie Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Min Cheng Department of Cardiology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Yunhong Zeng Center for Health Management, The Affiliated Shenzhen Hospital of University of Chinese Academy of Sciences, Shenzhen, Guangdong, People's Republic of China
Shengbing Huang Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Chunmei Guo Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Qionggui Zhou Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Quanman Li Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Gang Tian Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Minghui Han Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Yanyan Zhang Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Xiaoyan Wu Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Yang Li Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Yang Zhao Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Xingjin Yang Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Yifei Feng Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Dechen Liu Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Pei Qin Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Dongsheng Hu Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, People's Republic of China
Fulan Hu Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Lidan Xu Department of Nutrition, The Second Affiliated Hospital, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China
Ming Zhang Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China; Guangdong Provincial Key Laboratory of Regional Immunity and Diseases, Shenzhen University Health Science Center, Shenzhen, Guangdong, People's Republic of China.

Collapse

Nguyen T, Nguyen DH, Nguyen H, Nguyen BT, Wade BA. EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.077] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Rahmatbakhsh M, Gagarinova A, Babu M. Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections. Front Genet 2021;12:667936. [PMID: 34276775 PMCID: PMC8283032 DOI: 10.3389/fgene.2021.667936] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/08/2021] [Indexed: 12/13/2022] Open

Abstract

Microbial pathogens have evolved numerous mechanisms to hijack host's systems, thus causing disease. This is mediated by alterations in the combined host-pathogen proteome in time and space. Mass spectrometry-based proteomics approaches have been developed and tailored to map disease progression. The result is complex multidimensional data that pose numerous analytic challenges for downstream interpretation. However, a systematic review of approaches for the downstream analysis of such data has been lacking in the field. In this review, we detail the steps of a typical temporal and spatial analysis, including data pre-processing steps (i.e., quality control, data normalization, the imputation of missing values, and dimensionality reduction), different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. We also discuss current best practices for these steps based on a collection of independent studies to guide users in selecting the most suitable strategies for their dataset and analysis objectives. Moreover, we also compiled the list of commonly used R software packages for each step of the analysis. These could be easily integrated into one's analysis pipeline. Furthermore, we guide readers through various analysis steps by applying these workflows to mock and host-pathogen interaction data from public datasets. The workflows presented in this review will serve as an introduction for data analysis novices, while also helping established users update their data analysis pipelines. We conclude the review by discussing future directions and developments in temporal and spatial proteomics and data analysis approaches. Data analysis codes, prepared for this review are available from https://github.com/BabuLab-UofR/TempSpac, where guidelines and sample datasets are also offered for testing purposes.

Collapse

Faisal S, Tutz G. Imputation methods for high-dimensional mixed-type datasets by nearest neighbors. Comput Biol Med 2021;135:104577. [PMID: 34216892 DOI: 10.1016/j.compbiomed.2021.104577] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 06/10/2021] [Accepted: 06/11/2021] [Indexed: 11/18/2022]

Bhadra P, Schorr S, Lerner M, Nguyen D, Dudek J, Förster F, Helms V, Lang S, Zimmermann R. Quantitative Proteomics and Differential Protein Abundance Analysis after Depletion of Putative mRNA Receptors in the ER Membrane of Human Cells Identifies Novel Aspects of mRNA Targeting to the ER. Molecules 2021;26:3591. [PMID: 34208277 PMCID: PMC8230838 DOI: 10.3390/molecules26123591] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/28/2022] Open

Dabke K, Kreimer S, Jones MR, Parker SJ. A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets. J Proteome Res 2021;20:3214-3229. [PMID: 33939434 DOI: 10.1021/acs.jproteome.1c00070] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Abstract

Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification level-fragment level-improved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set's most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

Collapse

Zhu X, Wang J, Sun B, Ren C, Yang T, Ding J. An efficient ensemble method for missing value imputation in microarray gene expression data. BMC Bioinformatics 2021;22:188. [PMID: 33849444 PMCID: PMC8045198 DOI: 10.1186/s12859-021-04109-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 03/29/2021] [Indexed: 11/10/2022] Open

Pham TH, Qiu Y, Zeng J, Xie L, Zhang P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. NAT MACH INTELL 2021;3:247-257. [PMID: 33796820 PMCID: PMC8009091 DOI: 10.1038/s42256-020-00285-9] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 12/15/2020] [Indexed: 12/15/2022]

A comparative study of evaluating missing value imputation methods in label-free proteomics. Sci Rep 2021;11:1760. [PMID: 33469060 PMCID: PMC7815892 DOI: 10.1038/s41598-021-81279-4] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/31/2020] [Indexed: 12/29/2022] Open

Mancuso CA, Canfield JL, Singla D, Krishnan A. A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes. Nucleic Acids Res 2020;48:e125. [PMID: 33074331 PMCID: PMC7708069 DOI: 10.1093/nar/gkaa881] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 08/24/2020] [Accepted: 09/28/2020] [Indexed: 12/15/2022] Open

An Exploratory Pilot Study with Plasma Protein Signatures Associated with Response of Patients with Depression to Antidepressant Treatment for 10 Weeks. Biomedicines 2020;8:biomedicines8110455. [PMID: 33126421 PMCID: PMC7692261 DOI: 10.3390/biomedicines8110455] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 10/26/2020] [Accepted: 10/26/2020] [Indexed: 12/11/2022] Open

A deep learning-based, unsupervised method to impute missing values in electronic health records for improved patient management. J Biomed Inform 2020;111:103576. [PMID: 33010424 DOI: 10.1016/j.jbi.2020.103576] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 09/13/2020] [Accepted: 09/19/2020] [Indexed: 01/23/2023]

Wang S, Li W, Hu L, Cheng J, Yang H, Liu Y. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 2020;48:e83. [PMID: 32526036 PMCID: PMC7641313 DOI: 10.1093/nar/gkaa498] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/20/2020] [Accepted: 06/08/2020] [Indexed: 02/05/2023] Open

Ma Q, Lee WC, Fu TY, Gu Y, Yu G. MIDIA: exploring denoising autoencoders for missing data imputation. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00706-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Pham TH, Qiu Y, Zeng J, Xie L, Zhang P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32743586 PMCID: PMC7386506 DOI: 10.1101/2020.07.19.211235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Abstract

Target-based high-throughput compound screening dominates conventional one-drug-one-gene drug discovery process. However, the readout from the chemical modulation of a single protein is poorly correlated with phenotypic response of organism, leading to high failure rate in drug development. Chemical-induced gene expression profile provides an attractive solution to phenotype-based screening. However, the use of such data is currently limited by their sparseness, unreliability, and relatively low throughput. Several methods have been proposed to impute missing values for gene expression datasets. However, few existing methods can perform de novo chemical compound screening. In this study, we propose a mechanism-driven neural network-based method named DeepCE (Deep Chemical Expression) which utilizes graph convolutional neural network to learn chemical representation and multi-head attention mechanism to model chemical substructure-gene and gene-gene feature associations. In addition, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves the superior performances not only in de novo chemical setting but also in traditional imputation setting compared to state-of-the-art baselines for the prediction of chemical-induced gene expression. We further verify the effectiveness of gene expression profiles generated from DeepCE by comparing them with gene expression profiles in L1000 dataset for downstream classification tasks including drug-target and disease predictions. To demonstrate the value of DeepCE, we apply it to patient-specific drug repurposing of COVID-19 for the first time, and generate novel lead compounds consistent with clinical evidences. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data as well as screening novel chemicals for the modulation of systemic response to disease.

Collapse

Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9040227] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Schorr S, Nguyen D, Haßdenteufel S, Nagaraj N, Cavalié A, Greiner M, Weissgerber P, Loi M, Paton AW, Paton JC, Molinari M, Förster F, Dudek J, Lang S, Helms V, Zimmermann R. Identification of signal peptide features for substrate specificity in human Sec62/Sec63-dependent ER protein import. FEBS J 2020;287:4612-4640. [PMID: 32133789 DOI: 10.1111/febs.15274] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 01/22/2020] [Accepted: 03/02/2020] [Indexed: 02/06/2023]

Abstract

In mammalian cells, one-third of all polypeptides are integrated into the membrane or translocated into the lumen of the endoplasmic reticulum (ER) via the Sec61 channel. While the Sec61 complex facilitates ER import of most precursor polypeptides, the Sec61-associated Sec62/Sec63 complex supports ER import in a substrate-specific manner. So far, mainly posttranslationally imported precursors and the two cotranslationally imported precursors of ERj3 and prion protein were found to depend on the Sec62/Sec63 complex in vitro. Therefore, we determined the rules for engagement of Sec62/Sec63 in ER import in intact human cells using a recently established unbiased proteomics approach. In addition to confirming ERj3, we identified 22 novel Sec62/Sec63 substrates under these in vivo-like conditions. As a common feature, those previously unknown substrates share signal peptides (SP) with comparatively longer but less hydrophobic hydrophobic region of SP and lower carboxy-terminal region of SP (C-region) polarity. Further analyses with four substrates, and ERj3 in particular, revealed the combination of a slowly gating SP and a downstream translocation-disruptive positively charged cluster of amino acid residues as decisive for the Sec62/Sec63 requirement. In the case of ERj3, these features were found to be responsible for an additional immunoglobulin heavy-chain binding protein (BiP) requirement and to correlate with sensitivity toward the Sec61-channel inhibitor CAM741. Thus, the human Sec62/Sec63 complex may support Sec61-channel opening for precursor polypeptides with slowly gating SPs by direct interaction with the cytosolic amino-terminal peptide of Sec61α or via recruitment of BiP and its interaction with the ER-lumenal loop 7 of Sec61α. These novel insights into the mechanism of human ER protein import contribute to our understanding of the etiology of SEC63-linked polycystic liver disease. DATABASES: The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (http://www.ebi.ac.uk/pride/archive/projects/Identifiers) with the dataset identifiers: PXD008178, PXD011993, and PXD012078. Supplementary information was deposited at Mendeley Data (https://data.mendeley.com/datasets/6s5hn73jcv/2).

Collapse

Zhang L, Zhang S. Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:376-389. [PMID: 29994128 DOI: 10.1109/tcbb.2018.2848633] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]