1
|
Sullivan GJ, Barquist L, Cain AK. A method to correct for local alterations in DNA copy number that bias functional genomics assays applied to antibiotic-treated bacteria. mSystems 2024; 9:e0066523. [PMID: 38470252 PMCID: PMC11019837 DOI: 10.1128/msystems.00665-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 02/13/2024] [Indexed: 03/13/2024] Open
Abstract
Functional genomics techniques, such as transposon insertion sequencing and RNA-sequencing, are key to studying relative differences in bacterial mutant fitness or gene expression under selective conditions. However, certain stress conditions, mutations, or antibiotics can directly interfere with DNA synthesis, resulting in systematic changes in local DNA copy numbers along the chromosome. This can lead to artifacts in sequencing-based functional genomics data when comparing antibiotic treatment to an unstressed control. Further, relative differences in gene-wise read counts may result from alterations in chromosomal replication dynamics, rather than selection or direct gene regulation. We term this artifact "chromosomal location bias" and implement a principled statistical approach to correct it by calculating local normalization factors along the chromosome. These normalization factors are then directly incorporated into statistical analyses using standard RNA-sequencing analysis methods without modifying the read counts themselves, preserving important information about the mean-variance relationship in the data. We illustrate the utility of this approach by generating and analyzing a ciprofloxacin-treated transposon insertion sequencing data set in Escherichia coli as a case study. We show that ciprofloxacin treatment generates chromosomal location bias in the resulting data, and we further demonstrate that failing to correct for this bias leads to false predictions of mutant drug sensitivity as measured by minimum inhibitory concentrations. We have developed an R package and user-friendly graphical Shiny application, ChromoCorrect, that detects and corrects for chromosomal bias in read count data, enabling the application of functional genomics technologies to the study of antibiotic stress.IMPORTANCEAltered gene dosage due to changes in DNA replication has been observed under a variety of stresses with a variety of experimental techniques. However, the implications of changes in gene dosage for sequencing-based functional genomics assays are rarely considered. We present a statistically principled approach to correcting for the effect of changes in gene dosage, enabling testing for differences in the fitness effects or regulation of individual genes in the presence of confounding differences in DNA copy number. We show that failing to correct for these effects can lead to incorrect predictions of resistance phenotype when applying functional genomics assays to investigate antibiotic stress, and we provide a user-friendly application to detect and correct for changes in DNA copy number.
Collapse
Affiliation(s)
- Geraldine J. Sullivan
- ARC Centre of Excellence in Synthetic Biology, School of Natural Sciences, Macquarie University, Sydney, Australia
| | - Lars Barquist
- Faculty of Medicine, University of Würzburg, Würzburg, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Center for Infection Research (HZI), Würzburg, Germany
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| | - Amy K. Cain
- ARC Centre of Excellence in Synthetic Biology, School of Natural Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
2
|
Liu J, Kreimer A, Li WV. Differential variability analysis of single-cell gene expression data. Brief Bioinform 2023; 24:bbad294. [PMID: 37598422 PMCID: PMC10516347 DOI: 10.1093/bib/bbad294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/18/2023] [Accepted: 07/29/2023] [Indexed: 08/22/2023] Open
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technologies has enabled gene expression profiling at the single-cell resolution, thereby enabling the quantification and comparison of transcriptional variability among individual cells. Although alterations in transcriptional variability have been observed in various biological states, statistical methods for quantifying and testing differential variability between groups of cells are still lacking. To identify the best practices in differential variability analysis of single-cell gene expression data, we propose and compare 12 statistical pipelines using different combinations of methods for normalization, feature selection, dimensionality reduction and variability calculation. Using high-quality synthetic scRNA-seq datasets, we benchmarked the proposed pipelines and found that the most powerful and accurate pipeline performs simple library size normalization, retains all genes in analysis and uses denSNE-based distances to cluster medoids as the variability measure. By applying this pipeline to scRNA-seq datasets of COVID-19 and autism patients, we have identified cellular variability changes between patients with different severity status or between patients and healthy controls.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Programs in Molecular Biosciences, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, 08854, NJ, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, 08854, NJ, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, 08854, NJ, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, 08854, NJ, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, 08854, NJ, USA
| | - Wei Vivian Li
- Department of Statistics, University of California, Riverside, 900 University Ave, Riverside, 92521, CA, USA
- Previous affiliation where part of the work was completed: Department of Biostatistics and Epidemiology, Rutgers, The State University of New Jersey, 683 Hoes Lane West, Piscataway, 08854, NJ, USA
| |
Collapse
|
3
|
Zhang Y, Fan S, Wohlgemuth G, Fiehn O. Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography-Mass Spectrometry. Metabolites 2023; 13:944. [PMID: 37623887 PMCID: PMC10456436 DOI: 10.3390/metabo13080944] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023] Open
Abstract
Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography-mass spectrometry (GC-MS). Here, we compare the results of different normalization methods for a study with more than 4000 human plasma samples involved in a type 2 diabetes cohort study, in addition to 413 pooled quality control (QC) samples, 413 commercial pooled plasma samples, and a set of 25 stable isotope-labeled internal standards used for every sample. Data acquisition was conducted across 1.2 years, including seven column changes. In total, 413 pooled QC (training) and 413 BioIVT samples (validation) were used for normalization comparisons. Surprisingly, neither internal standards nor sum-based normalizations yielded median precision of less than 30% across all 563 metabolite annotations. While the machine-learning-based SERRF algorithm gave 19% median precision based on the pooled quality control samples, external cross-validation with BioIVT plasma pools yielded a median 34% relative standard deviation (RSD). We developed a new method: systematic error reduction by denoising autoencoder (SERDA). SERDA lowered the median standard deviations of the training QC samples down to 16% RSD, yielding an overall error of 19% RSD when applied to the independent BioIVT validation QC samples. This is the largest study on GC-MS metabolomics ever reported, demonstrating that technical errors can be normalized and handled effectively for this assay. SERDA was further validated on two additional large-scale GC-MS-based human plasma metabolomics studies, confirming the superior performance of SERDA over SERRF or sum normalizations.
Collapse
Affiliation(s)
| | | | | | - Oliver Fiehn
- West Coast Metabolomics Center, UC Davis, 451 Health Sciences Drive, Davis, CA 95616, USA; (Y.Z.); (S.F.); (G.W.)
| |
Collapse
|
4
|
Leacy E, Batten I, Sanelli L, McElheron M, Brady G, Little MA, Khouri H. Optimal LC-MS metabolomic profiling reveals emergent changes to monocyte metabolism in response to lipopolysaccharide. Front Immunol 2023; 14:1116760. [PMID: 37033938 PMCID: PMC10077522 DOI: 10.3389/fimmu.2023.1116760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 03/03/2023] [Indexed: 04/11/2023] Open
Abstract
Introduction Immunometabolism examines the links between immune cell function and metabolism. Dysregulation of immune cell metabolism is now an established feature of innate immune cell activation. Advances in liquid chromatography mass spectrometry (LC-MS) technologies have allowed discovery of unique insights into cellular metabolomics. Here we have studied and compared different sample preparation techniques and data normalisation methods described in the literature when applied to metabolomic profiling of human monocytes. Methods Primary monocytes stimulated with lipopolysaccharide (LPS) for four hours was used as a study model. Monocytes (n=24) were freshly isolated from whole blood and stimulated for four hours with lipopolysaccharide (LPS). A methanol-based extraction protocol was developed and metabolomic profiling carried out using a Hydrophilic Interaction Liquid Chromatography (HILIC) LC-MS method. Data analysis pipelines used both targeted and untargeted approaches, and over 40 different data normalisation techniques to account for technical and biological variation were examined. Cytokine levels in supernatants were measured by ELISA. Results This method provided broad coverage of the monocyte metabolome. The most efficient and consistent normalisation method was measurement of residual protein in the metabolite fraction, which was further validated and optimised using a commercial kit. Alterations to the monocyte metabolome in response to LPS can be detected as early as four hours post stimulation. Broad and profound changes in monocyte metabolism were seen, in line with increased cytokine production. Elevated levels of amino acids and Krebs cycle metabolites were noted and decreases in aspartate and β-alanine are also reported for the first time. In the untargeted analysis, 154 metabolite entities were significantly altered compared to unstimulated cells. Pathway analysis revealed the most prominent changes occurred to (phospho-) inositol metabolism, glycolysis, and the pentose phosphate pathway. Discussion These data report the emergent changes to monocyte metabolism in response to LPS, in line with reports from later time points. A number of these metabolites are reported to alter inflammatory gene expression, which may facilitate the increases in cytokine production. Further validation is needed to confirm the link between metabolic activation and upregulation of inflammatory responses.
Collapse
Affiliation(s)
- Emma Leacy
- Trinity Translational Medicine Institute, Faculty of Health Sciences, Trinity College Dublin, Dublin, Ireland
- *Correspondence: Emma Leacy, ; Mark A. Little,
| | - Isabella Batten
- Trinity Translational Medicine Institute, Faculty of Health Sciences, Trinity College Dublin, Dublin, Ireland
| | - Laetitia Sanelli
- Faculty of Health Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Matthew McElheron
- Trinity Translational Medicine Institute, Faculty of Health Sciences, Trinity College Dublin, Dublin, Ireland
| | - Gareth Brady
- Trinity Translational Medicine Institute, Faculty of Health Sciences, Trinity College Dublin, Dublin, Ireland
| | - Mark A. Little
- Trinity Translational Medicine Institute, Faculty of Health Sciences, Trinity College Dublin, Dublin, Ireland
- Trinity Health Kidney Centre, Tallaght University Hospital, Dublin, Ireland
- *Correspondence: Emma Leacy, ; Mark A. Little,
| | - Hania Khouri
- Agilent Technologies, Stockpoty, England, United Kingdom
| |
Collapse
|
5
|
Shi Z, Li H, Zhang W, Chen Y, Zeng C, Kang X, Xu X, Xia Z, Qing B, Yuan Y, Song G, Caldana C, Hu J, Willmitzer L, Li Y. A Comprehensive Mass Spectrometry-Based Workflow for Clinical Metabolomics Cohort Studies. Metabolites 2022; 12. [PMID: 36557207 DOI: 10.3390/metabo12121168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022] Open
Abstract
As a comprehensive analysis of all metabolites in a biological system, metabolomics is being widely applied in various clinical/health areas for disease prediction, diagnosis, and prognosis. However, challenges remain in dealing with the metabolomic complexity, massive data, metabolite identification, intra- and inter-individual variation, and reproducibility, which largely limit its widespread implementation. This study provided a comprehensive workflow for clinical metabolomics, including sample collection and preparation, mass spectrometry (MS) data acquisition, and data processing and analysis. Sample collection from multiple clinical sites was strictly carried out with standardized operation procedures (SOP). During data acquisition, three types of quality control (QC) samples were set for respective MS platforms (GC-MS, LC-MS polar, and LC-MS lipid) to assess the MS performance, facilitate metabolite identification, and eliminate contamination. Compounds annotation and identification were implemented with commercial software and in-house-developed PAppLineTM and UlibMS library. The batch effects were removed using a deep learning model method (NormAE). Potential biomarkers identification was performed with tree-based modeling algorithms including random forest, AdaBoost, and XGBoost. The modeling performance was evaluated using the F1 score based on a 10-times repeated trial for each. Finally, a sub-cohort case study validated the reliability of the entire workflow.
Collapse
|
6
|
Zheng H, Zhao H, Zhang X, Liang Z, He Q. Systematic Identification and Validation of Suitable Reference Genes for the Normalization of Gene Expression in Prunella vulgaris under Different Organs and Spike Development Stages. Genes (Basel) 2022; 13:1947. [PMID: 36360184 PMCID: PMC9689956 DOI: 10.3390/genes13111947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/19/2022] [Accepted: 10/24/2022] [Indexed: 08/01/2023] Open
Abstract
The quantitative real-time PCR (qRT-PCR) is an efficient and sensitive method for determining gene expression levels, but the accuracy of the results substantially depends on the stability of the reference gene (RG). Therefore, choosing an appropriate reference gene is a critical step in normalizing qRT-PCR data. Prunella vulgaris L. is a traditional Chinese medicine herb widely used in China. Its main medicinal part is the fruiting spike which is termed Spica Prunellae. However, thus far, few studies have been conducted on the mechanism of Spica Prunellae development. Meanwhile, no reliable RGs have been reported in P. vulgaris. The expression levels of 14 candidate RGs were analyzed in this study in various organs and at different stages of Spica Prunellae development. Four statistical algorithms (Delta Ct, BestKeeper, NormFinder, and geNorm) were utilized to identify the RGs' stability, and an integrated stability rating was generated via the RefFinder website online. The final ranking results revealed that eIF-2 was the most stable RG, whereas VAB2 was the least suitable as an RG. Furthermore, eIF-2 + Histon3.3 was identified as the best RG combination in different periods and the total samples. Finally, the expressions of the PvTAT and Pv4CL2 genes related to the regulation of rosmarinic acid synthesis in different organs were used to verify the stable and unstable RGs. The stable RGs in P. vulgaris were originally identified and verified in this work. This achievement provides strong support for obtaining a reliable qPCR analysis and lays the foundation for in-depth research on the developmental mechanism of Spica Prunellae.
Collapse
Affiliation(s)
- Hui Zheng
- Key Laboratory of Plant Secondary Metabolism and Regulation of Zhejiang Province, College of Life Science and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Hongguang Zhao
- Tasly Botanical Pharmaceutical Co., Ltd., Shangluo 726000, China
| | - Xuemin Zhang
- Tasly R&D Institute, Tasly Holding Group Co., Ltd., Tianjin 300410, China
| | - Zongsuo Liang
- Shaoxing Academy of Biomedicine, Zhejiang Sci-Tech University, Shaoxing 312000, China
| | - Qiuling He
- Key Laboratory of Plant Secondary Metabolism and Regulation of Zhejiang Province, College of Life Science and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China
| |
Collapse
|
7
|
Rodriguez J, Gomez-Cano L, Grotewold E, de Leon N. Normalizing and Correcting Variable and Complex LC-MS Metabolomic Data with the R Package pseudoDrift. Metabolites 2022; 12:435. [PMID: 35629939 PMCID: PMC9144304 DOI: 10.3390/metabo12050435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/09/2022] [Accepted: 05/10/2022] [Indexed: 01/27/2023] Open
Abstract
In biological research domains, liquid chromatography-mass spectroscopy (LC-MS) has prevailed as the preferred technique for generating high quality metabolomic data. However, even with advanced instrumentation and established data acquisition protocols, technical errors are still routinely encountered and can pose a significant challenge to unveiling biologically relevant information. In large-scale studies, signal drift and batch effects are how technical errors are most commonly manifested. We developed pseudoDrift, an R package with capabilities for data simulation and outlier detection, and a new training and testing approach that is implemented to capture and to optionally correct for technical errors in LC-MS metabolomic data. Using data simulation, we demonstrate here that our approach performs equally as well as existing methods and offers increased flexibility to the researcher. As part of our study, we generated a targeted LC-MS dataset that profiled 33 phenolic compounds from seedling stem tissue in 602 genetically diverse non-transgenic maize inbred lines. This dataset provides a unique opportunity to investigate the dynamics of specialized metabolism in plants.
Collapse
Affiliation(s)
- Jonas Rodriguez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA;
| | - Lina Gomez-Cano
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA; (L.G.-C.); (E.G.)
| | - Erich Grotewold
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA; (L.G.-C.); (E.G.)
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA;
| |
Collapse
|
8
|
Hirsch SM, Chapman CJ, Frost DM, Beach TAC. Comparison of 5 Normalization Methods for Knee Joint Moments in the Single-Leg Squat. J Appl Biomech 2022;:1-10. [PMID: 35042188 DOI: 10.1123/jab.2021-0143] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/13/2021] [Accepted: 12/01/2021] [Indexed: 11/18/2022]
Abstract
Ratio scaling is the most common magnitude normalization approach for net joint moment (NJM) data. Generally, researchers compute a ratio between NJM and (some combination of) physical body characteristics (eg, mass, height, limb length, etc). However, 3 assumptions must be verified when normalizing NJM data this way. First, the regression line between NJM and the characteristic(s) used passes through the origin. Second, normalizing NJM eliminates its correlation with the characteristic(s). Third, the statistical interpretations following normalization are consistent with adjusted linear models. The study purpose was to assess these assumptions using data collected from 16 males and 16 females who performed a single-leg squat. Standard inverse dynamics analyses were conducted, and ratios were computed between the mediolateral and anteroposterior components of the knee NJM and participant mass, height, leg length, mass × height, and mass × leg length. Normalizing NJM-mediolateral by mass × height and mass × leg length satisfied all 3 assumptions. Normalizing NJM-anteroposterior by height and leg length satisfied all 3 assumptions. Therefore, if normalization of the knee NJM is deemed necessary to address a given research question, it can neither be assumed that using (any combination of) participant mass, height, or leg length as the denominator is appropriate nor consistent across joint axes.
Collapse
|
9
|
Kim YJ, Kim KG. Detection and Weak Segmentation of Masses in Gray-Scale Breast Mammogram Images Using Deep Learning. Yonsei Med J 2022; 63:S63-S73. [PMID: 35040607 PMCID: PMC8790585 DOI: 10.3349/ymj.2022.63.s63] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/10/2021] [Accepted: 11/11/2021] [Indexed: 11/27/2022] Open
Abstract
PURPOSE In this paper, we propose deep-learning methodology with which to enhance the mass differentiation performance of convolutional neural network (CNN)-based architecture. MATERIALS AND METHODS We differentiated breast mass lesions from gray-scale X-ray mammography images based on regions of interest (ROIs). Our dataset comprised breast mammogram images for 150 cases of malignant masses from which we extracted the mass ROI, and we composed a CNN-based deep learning model trained on this dataset to identify ROI mass lesions. The test dataset was created by shifting some of the training data images. Thus, although both datasets were different, they retained a deep structural similarity. We then applied our trained deep-learning model to detect masses on 8-bit mammogram images containing malignant masses. The input images were preprocessed by applying a scaling parameter of intensity before being used to train the CNN model for mass differentiation. RESULTS The highest area under the receiver operating characteristic curve was 0.897 (Î 20). CONCLUSION Our results indicated that the proposed patch-wise detection method can be utilized as a mass detection and segmentation tool.
Collapse
Affiliation(s)
- Young Jae Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon, Korea.
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon, Korea.
| |
Collapse
|
10
|
Kubinski R, Djamen-Kepaou JY, Zhanabaev T, Hernandez-Garcia A, Bauer S, Hildebrand F, Korcsmaros T, Karam S, Jantchou P, Kafi K, Martin RD. Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease. Front Genet 2022; 13:784397. [PMID: 35251123 PMCID: PMC8895431 DOI: 10.3389/fgene.2022.784397] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/13/2022] [Indexed: 12/14/2022] Open
Abstract
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
Collapse
Affiliation(s)
- Ryszard Kubinski
- Phyla Technologies Inc, Montréal, QC, Canada
- *Correspondence: Ryszard Kubinski, ; Ryan D. Martin,
| | | | | | - Alex Hernandez-Garcia
- Mila, Quebec Artificial Intelligence Institute, University of Montreal, Montréal, QC, Canada
| | - Stefan Bauer
- Max Planck Institute for Intelligent Systems, Tübingen, Germany
| | - Falk Hildebrand
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, United Kingdom
- Earlham Institute, Norwich, United Kingdom
| | - Tamas Korcsmaros
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, United Kingdom
- Earlham Institute, Norwich, United Kingdom
| | - Sani Karam
- Phyla Technologies Inc, Montréal, QC, Canada
| | - Prévost Jantchou
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, QC, Canada
| | - Kamran Kafi
- Phyla Technologies Inc, Montréal, QC, Canada
| | - Ryan D. Martin
- Phyla Technologies Inc, Montréal, QC, Canada
- *Correspondence: Ryszard Kubinski, ; Ryan D. Martin,
| |
Collapse
|
11
|
Ivanova L, Rangel-Huerta OD, Tartor H, Gjessing MC, Dahle MK, Uhlig S. Fish Skin and Gill Mucus: A Source of Metabolites for Non-Invasive Health Monitoring and Research. Metabolites 2021; 12:28. [PMID: 35050150 PMCID: PMC8781917 DOI: 10.3390/metabo12010028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/16/2021] [Accepted: 12/25/2021] [Indexed: 11/28/2022] Open
Abstract
Mucous membranes such as the gill and skin mucosa in fish protect them against a multitude of environmental factors. At the same time, changes in the molecular composition of mucus may provide valuable information about the interaction of the fish with their environment, as well as their health and welfare. In this study, the metabolite profiles of the plasma, skin and gill mucus of freshwater Atlantic salmon (Salmo salar) were compared using liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS). Several normalization procedures aimed to reduce unwanted variation in the untargeted data were tested. In addition, the basal metabolism of skin and gills, and the impact of the anesthetic benzocaine for euthanisation were studied. For targeted metabolomics, the commercial AbsoluteIDQ p400 HR kit was used to evaluate the potential differences in metabolic composition in epidermal mucus as compared to the plasma. The targeted metabolomics data showed a high level of correlation between different types of biological fluids from the same individual, indicating that mucus metabolite composition could be used for fish health monitoring and research.
Collapse
Affiliation(s)
- Lada Ivanova
- Norwegian Veterinary Institute, P.O. Box 64, N-1431 Ås, Norway; (O.D.R.-H.); (H.T.); (M.C.G.); (M.K.D.); (S.U.)
| | | | | | | | | | | |
Collapse
|
12
|
Herrmann HA, Rusz M, Baier D, Jakupec MA, Keppler BK, Berger W, Koellensperger G, Zanghellini J. Thermodynamic Genome-Scale Metabolic Modeling of Metallodrug Resistance in Colorectal Cancer. Cancers (Basel) 2021; 13:cancers13164130. [PMID: 34439283 PMCID: PMC8391396 DOI: 10.3390/cancers13164130] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/23/2021] [Accepted: 08/03/2021] [Indexed: 12/11/2022] Open
Abstract
Simple Summary Cancer, but also its treatment, can lead to a reprogramming of cellular metabolism. These changes are observable in metabolite abundances, which can be unbiasedly measured via mass spectrometry metabolomics. However, even when the metabolome changes strongly, a (mechanistic) interpretation is difficult as metabolite levels do not necessarily directly correspond to pathway activities. Here we measure the changes of the cellular metabolome in colorectal cancer cell lines sensitive and resistant to the ruthenium-based drug BOLD-100/KP1339 and the platinum-based drug oxaliplatin. We map these changes onto a cancer-specific genome-scale metabolic model, which allows us not only to compute intracellular flux distributions, but also to disentangle drug-specific effects from growth differences from differences in metabolic adaptations due to resistance. Specifically, we find that resistance to BOLD-100/KP1339 induces more extensive reprogramming than oxaliplatin, especially with respect to fatty acid and amino acid metabolism. Abstract Background: Mass spectrometry-based metabolomics approaches provide an immense opportunity to enhance our understanding of the mechanisms that underpin the cellular reprogramming of cancers. Accurate comparative metabolic profiling of heterogeneous conditions, however, is still a challenge. Methods: Measuring both intracellular and extracellular metabolite concentrations, we constrain four instances of a thermodynamic genome-scale metabolic model of the HCT116 colorectal carcinoma cell line to compare the metabolic flux profiles of cells that are either sensitive or resistant to ruthenium- or platinum-based treatments with BOLD-100/KP1339 and oxaliplatin, respectively. Results: Normalizing according to growth rate and normalizing resistant cells according to their respective sensitive controls, we are able to dissect metabolic responses specific to the drug and to the resistance states. We find the normalization steps to be crucial in the interpretation of the metabolomics data and show that the metabolic reprogramming in resistant cells is limited to a select number of pathways. Conclusions: Here, we elucidate the key importance of normalization steps in the interpretation of metabolomics data, allowing us to uncover drug-specific metabolic reprogramming during acquired metal-drug resistance.
Collapse
Affiliation(s)
- Helena A. Herrmann
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
| | - Mate Rusz
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
| | - Dina Baier
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
| | - Michael A. Jakupec
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
| | - Bernhard K. Keppler
- Institute of Inorganic Chemistry, University of Vienna, 1090 Vienna, Austria; (D.B.); (M.A.J.); (B.K.K.)
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
| | - Walter Berger
- Research Cluster Translational Cancer Therapy Research, University of Vienna and Medical University of Vienna, 1090 Vienna, Austria;
- Institute of Cancer Research and Comprehensive Cancer Center, Medical University of Vienna, 1090 Vienna, Austria
| | - Gunda Koellensperger
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Vienna Metabolomics Center (VIME), University of Vienna, 1090 Vienna, Austria
- Research Network Chemistry Meets Microbiology, University of Vienna, 1090 Vienna, Austria
- Correspondence: (G.K.); (J.Z.)
| | - Jürgen Zanghellini
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (H.A.H.); (M.R.)
- Correspondence: (G.K.); (J.Z.)
| |
Collapse
|
13
|
Haering M, Habermann BH. RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis. F1000Res 2021; 10:654. [PMID: 35186266 PMCID: PMC8825645 DOI: 10.12688/f1000research.54533.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2021] [Indexed: 09/23/2023] Open
Abstract
RNA sequencing (RNA-seq) is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species. With RNfuzzyApp, we provide a user-friendly, web-based R shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, fully automated pipeline for soft clustering with the Mfuzz R package, including methods to aid in cluster number selection, cluster overlap analysis, Mfuzz loop computations, as well as cluster enrichments. RNfuzzyApp is an intuitive, easy to use and interactive R shiny app for RNA-seq differential expression and time-series analysis, offering a rich selection of interactive plots, providing a quick overview of raw data and generating rapid analysis results. Furthermore, its assignment of orthologs, enrichment analysis, as well as ID conversion functions are accessible to non-model organisms.
Collapse
Affiliation(s)
- Margaux Haering
- Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems, Marseille, 13009, France
| | - Bianca H Habermann
- Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems, Marseille, 13009, France
| |
Collapse
|
14
|
Ni A, Qin LX. Performance evaluation of transcriptomics data normalization for survival risk prediction. Brief Bioinform 2021; 22:6317608. [PMID: 34245143 PMCID: PMC8575026 DOI: 10.1093/bib/bbab257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 05/20/2021] [Accepted: 06/17/2021] [Indexed: 11/13/2022] Open
Abstract
One pivotal feature of transcriptomics data is the unwanted variations caused by disparate experimental handling, known as handling effects. Various data normalization methods were developed to alleviate the adverse impact of handling effects in the setting of differential expression analysis. However, little research has been done to evaluate their performance in the setting of survival outcome prediction, an important analysis goal for transcriptomics data in biomedical research. Leveraging a unique pair of datasets for the same set of tumor samples—one with handling effects and the other without, we developed a benchmarking tool for conducting such an evaluation in microRNA microarrays. We applied this tool to evaluate the performance of three popular normalization methods—quantile normalization, median normalization and variance stabilizing normalization—in survival prediction using various approaches for model building and designs for sample assignment. We showed that handling effects can have a strong impact on survival prediction and that quantile normalization, a most popular method in current practice, tends to underperform median normalization and variance stabilizing normalization. We demonstrated with a small example the reason for quantile normalization’s poor performance in this setting. Our finding highlights the importance of putting normalization evaluation in the context of the downstream analysis setting and the potential of improving the development of survival predictors by applying median normalization. We make available our benchmarking tool for performing such evaluation on additional normalization methods in connection with prediction modeling approaches.
Collapse
Affiliation(s)
- Ai Ni
- Ohio State University, New York, NY 10017 USA
| | - Li-Xuan Qin
- Memorial Sloan Kettering Cancer Center, New York, NY 10017 USA
| |
Collapse
|
15
|
Zhao Z, Zhou H, Nie Z, Wang X, Luo B, Yi Z, Li X, Hu X, Yang T. Appropriate Reference Genes for RT-qPCR Normalization in Various Organs of Anemone flaccida Fr. Schmidt at Different Growing Stages. Genes (Basel) 2021; 12:459. [PMID: 33807101 DOI: 10.3390/genes12030459] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022] Open
Abstract
Anemone flaccida Fr. Schmidt is a traditional medicinal herb in southwestern China and has multiple pharmacological effects on bruise injuries and rheumatoid arthritis (RA). A new drug with a good curative effect on RA has recently been developed from the extract of A. flaccida rhizomes, of which the main medicinal ingredients are triterpenoid saponins. Due to excessive exploitation, the wild population has been scarce and endangered in a few of its natural habitats and research on the cultivation of the plant commenced. Studies on the gene expressions related to the biosynthesis of triterpenoid saponins are not only helpful for understanding the effects of environmental factors on the medicinal ingredient accumulations but also necessary for monitoring the herb quality of the cultivated plants. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) as a sensitive and powerful technique has been widely used to detect gene expression across tissues in plants at different stages; however, its accuracy and reliability depend largely on the reference gene selection. In this study, the expressions of 10 candidate reference genes were evaluated in various organs of the wild and cultivated plants at different stages, using the algorithms of geNorm, NormFinder and BestKeeper, respectively. The purpose of this study was to identify the suitable reference genes for RT-qPCR detection in A. flaccida. The results showed that two reference genes were sufficient for RT-qPCR data normalization in A. flaccida. PUBQ and ETIF1a can be used as suitable reference genes in most organs at various stages because of their expression stabilitywhereas the PUBQ and EF1Α genes were desirable in the rhizomes of the plant at the vegetative stage.
Collapse
|
16
|
Abstract
Big data and its approaches are generally helpful for healthcare and biomedical sectors for predicting the disease. For trivial symptoms, the difficulty is to meet the doctors at any time in the hospital. Thus, big data provides essential data regarding the diseases on the basis of the patient's symptoms. For several medical organizations, disease prediction is important for making the best feasible health care decisions. Conversely, the conventional medical care model offers input as structured that requires more accurate and consistent prediction. This paper is planned to develop the multi-disease prediction using the improvised deep learning concept. Here, the different datasets pertain to "Diabetes, Hepatitis, lung cancer, liver tumor, heart disease, Parkinson's disease, and Alzheimer's disease", from the benchmark UCI repository is gathered for conducting the experiment. The proposed model involves three phases (a) Data normalization (b) Weighted normalized feature extraction, and (c) prediction. Initially, the dataset is normalized in order to make the attribute's range at a certain level. Further, weighted feature extraction is performed, in which a weight function is multiplied with each attribute value for making large scale deviation. Here, the weight function is optimized using the combination of two meta-heuristic algorithms termed as Jaya Algorithm-based Multi-Verse Optimization algorithm (JA-MVO). The optimally extracted features are subjected to the hybrid deep learning algorithms like "Deep Belief Network (DBN) and Recurrent Neural Network (RNN)". As a modification to hybrid deep learning architecture, the weight of both DBN and RNN is optimized using the same hybrid optimization algorithm. Further, the comparative evaluation of the proposed prediction over the existing models certifies its effectiveness through various performance measures.
Collapse
Affiliation(s)
- Anusha Ampavathi
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
| | - T Vijaya Saradhi
- Department of Computer Science and Engineering, Sreenidhi Institute of Science and Technology - SNIST, Hyderabad, Telangana, India
| |
Collapse
|
17
|
Philips A, Nowis K, Stelmaszczuk M, Jackowiak P, Podkowiński J, Handschuh L, Figlerowicz M. Expression Landscape of circRNAs in Arabidopsis thaliana Seedlings and Adult Tissues. Front Plant Sci 2020; 11:576581. [PMID: 33014000 PMCID: PMC7511659 DOI: 10.3389/fpls.2020.576581] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 08/25/2020] [Indexed: 05/27/2023]
Abstract
RNA-seq is currently the only method that can provide a comprehensive landscape of circular RNA (circRNAs) in the whole organism and its particular organs. Recent years have brought an increasing number of RNA-seq-based reports on plant circRNAs. Notably, the picture they revealed is questionable and depends on the applied circRNA identification and quantification techniques. In consequence, little is known about the biogenesis and functions of circRNAs in plants. In this work, we tested two experimental and six bioinformatics procedures of circRNA analysis to determine the optimal approach for studying the profiles of circRNAs in Arabidopsis thaliana. Then using the optimized strategy, we determined the accumulation of circular and corresponding linear transcripts in plant seedlings and organs. We observed that only a small fraction of circRNAs was reproducibly generated. Among them, two groups of circRNAs were discovered: ubiquitous and organ-specific. The highest number of circRNAs with significantly increased accumulation in comparison to other organs/seedlings was found in roots. The circRNAs in seedlings, leaves and flowers originated mainly from genes involved in photosynthesis and the response to stimulus. The levels of circular and linear transcripts were not correlated. Although RNase R treatment enriches the analyzed RNA samples in circular transcripts, it may also have a negative impact on the stability of some of the circRNAs. We also showed that the normalization of NGS data by the library size is not proper for circRNAs quantification. Alternatively, we proposed four other normalization types whose accuracy was confirmed by ddPCR. Moreover, we provided a comprehensive characterization of circRNAs in A. thaliana organs and in seedlings. Our analyses revealed that plant circRNAs are formed in both stochastic and controlled processes. The latter are less frequent and likely engage circRNA-specific mechanisms. Only a few circRNAs were organ-specific. The lack of correlation between the accumulation of linear and circular transcripts indicated that their biogenesis depends on different mechanisms.
Collapse
Affiliation(s)
- Anna Philips
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Katarzyna Nowis
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Michal Stelmaszczuk
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Paulina Jackowiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jan Podkowiński
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Luiza Handschuh
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| |
Collapse
|
18
|
Benedetti E, Gerstner N, Pučić-Baković M, Keser T, Reiding KR, Ruhaak LR, Štambuk T, Selman MH, Rudan I, Polašek O, Hayward C, Beekman M, Slagboom E, Wuhrer M, Dunlop MG, Lauc G, Krumsiek J. Systematic Evaluation of Normalization Methods for Glycomics Data Based on Performance of Network Inference. Metabolites 2020; 10:E271. [PMID: 32630764 PMCID: PMC7408386 DOI: 10.3390/metabo10070271] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/29/2020] [Accepted: 06/04/2020] [Indexed: 01/15/2023] Open
Abstract
Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography - ElectroSpray Ionization - Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization - Furier Transform Ion Cyclotron Resonance - Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the 'Probabilistic Quotient' method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.
Collapse
Affiliation(s)
- Elisa Benedetti
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10022, USA;
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
| | - Nathalie Gerstner
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Max Planck Institute for Psychiatry, 80804 Munich, Germany
| | - Maja Pučić-Baković
- Genos Glycoscience Research Laboratory, 10000 Zagreb, Croatia; (M.P.-B.); (G.L.)
| | - Toma Keser
- Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia; (T.K.); (T.Š.)
| | - Karli R. Reiding
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, 3584 CH Utrecht, The Netherlands; (K.R.R.); (M.H.J.S.)
- Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands; (L.R.R.); (M.W.)
| | - L. Renee Ruhaak
- Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands; (L.R.R.); (M.W.)
- Department of Clinical Chemistry and Laboratory Medicine, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | - Tamara Štambuk
- Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia; (T.K.); (T.Š.)
| | - Maurice H.J. Selman
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, 3584 CH Utrecht, The Netherlands; (K.R.R.); (M.H.J.S.)
| | - Igor Rudan
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK;
| | - Ozren Polašek
- Medical School, University of Split, 21000 Split, Croatia;
- Gen-Info Ltd., 10000 Zagreb, Croatia
| | - Caroline Hayward
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK;
| | - Marian Beekman
- Section of Molecular Epidemiology, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands; (M.B.); (E.S.)
| | - Eline Slagboom
- Section of Molecular Epidemiology, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands; (M.B.); (E.S.)
| | - Manfred Wuhrer
- Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands; (L.R.R.); (M.W.)
| | - Malcolm G. Dunlop
- Colon Cancer Genetics Group, Institute of Genetics and Molecular Medicine, University of Edinburgh and Medical Research Council Human Genetics Unit, Edinburgh EH8 9YL, UK;
| | - Gordan Lauc
- Genos Glycoscience Research Laboratory, 10000 Zagreb, Croatia; (M.P.-B.); (G.L.)
- Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia; (T.K.); (T.Š.)
| | - Jan Krumsiek
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10022, USA;
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
| |
Collapse
|
19
|
Liu F, Singhal K, Matney R, Acharya S, Akdis CA, Nadeau KC, Chien AS, Leib RD. Enhancing Data Reliability in TOMAHAQ for Large-Scale Protein Quantification. Proteomics 2020; 20:e1900105. [PMID: 32032464 DOI: 10.1002/pmic.201900105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 01/19/2020] [Indexed: 11/10/2022]
Abstract
The analytical scale of most mass-spectrometry-based targeted proteomics assays is usually limited by assay performance and instrument utilization. A recently introduced method, called triggered by offset, multiplexed, accurate mass, high resolution, and absolute quantitation (TOMAHAQ), combines both peptide and sample multiplexing to simultaneously improve analytical scale and quantitative performance. In the present work, critical technical requirements and data analysis considerations for successful implementation of the TOMAHAQ technique based on the study of a total of 185 target peptides across over 200 clinical plasma samples are discussed. Importantly, it is observed that significant interference originate from the TMTzero reporter ion used for the synthetic trigger peptides. This interference is not expected because only TMT10plex reporter ions from the target peptides should be observed under typical TOMAHAQ conditions. In order to unlock the great promise of the technique for high throughput quantification, here a post-acquisition data correction strategy to deconvolute the reporter ion superposition and recover reliable data is proposed.
Collapse
Affiliation(s)
- Fang Liu
- Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, 94305, USA
| | - Kratika Singhal
- Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, 94305, USA
| | - Rowan Matney
- Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, 94305, USA
| | - Swati Acharya
- Sean Parker Center, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Cezmi A Akdis
- Swiss Institute of Allergy and Asthma Research, University of Zurich, Davos Platz, CH-7270, Switzerland
| | - Kari C Nadeau
- Sean Parker Center, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Allis S Chien
- Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, 94305, USA
| | - Ryan D Leib
- Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
20
|
Li F, Rao G, Du J, Xiang Y, Zhang Y, Selek S, Hamilton JE, Xu H, Tao C. Ontological representation-oriented term normalization and standardization of the Research Domain Criteria. Health Informatics J 2019; 26:726-737. [PMID: 30843449 DOI: 10.1177/1460458219832059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Research Domain Criteria, launched by the National Institute of Mental Health, is a new dimensional and interdisciplinary research framework for mental disorders. The Research Domain Criteria matrix is its core part. Since an ontology has the strengths of supporting semantic inferencing and automatic data processing, we would like to transform the Research Domain Criteria matrix into an ontological structure. In terms of data normalization, which is the essential part of an ontology representation, the Research Domain Criteria elements (mainly in the Units of Analysis) have some limitations. In this article, we propose a series of solutions to improve data normalization of the Research Domain Criteria elements in the Units of Analysis, including leveraging standard terminologies (i.e. the Unified Medical Language System Metathesaurus), context-combining queries, and domain expertise. The evaluation results show the positive (Yes) percentage is more than 80 percent, indicating our work is favorably received by the mental health professionals, and we have formed a good data foundation for the Research Domain Criteria ontological representation in the future work.
Collapse
Affiliation(s)
- Fang Li
- The University of Texas Health Science Center at Houston, USA
| | | | | | | | | | | | | | | | - Cui Tao
- The University of Texas Health Science Center at Houston, USA
| |
Collapse
|
21
|
Krasnov GS, Kudryavtseva AV, Snezhkina AV, Lakunina VA, Beniaminov AD, Melnikova NV, Dmitriev AA. Pan-Cancer Analysis of TCGA Data Revealed Promising Reference Genes for qPCR Normalization. Front Genet 2019; 10:97. [PMID: 30881377 PMCID: PMC6406071 DOI: 10.3389/fgene.2019.00097] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/29/2019] [Indexed: 11/20/2022] Open
Abstract
Quantitative PCR (qPCR) remains the most widely used technique for gene expression evaluation. Obtaining reliable data using this method requires reference genes (RGs) with stable mRNA level under experimental conditions. This issue is especially crucial in cancer studies because each tumor has a unique molecular portrait. The Cancer Genome Atlas (TCGA) project provides RNA-Seq data for thousands of samples corresponding to dozens of cancers and presents the basis for assessment of the suitability of genes as reference ones for qPCR data normalization. Using TCGA RNA-Seq data and previously developed CrossHub tool, we evaluated mRNA level of 32 traditionally used RGs in 12 cancer types, including those of lung, breast, prostate, kidney, and colon. We developed an 11-component scoring system for the assessment of gene expression stability. Among the 32 genes, PUM1 was one of the most stably expressed in the majority of examined cancers, whereas GAPDH, which is widely used as a RG, showed significant mRNA level alterations in more than a half of cases. For each of 12 cancer types, we suggested a pair of genes that are the most suitable for use as reference ones. These genes are characterized by high expression stability and absence of correlation between their mRNA levels. Next, the scoring system was expanded with several features of a gene: mutation rate, number of transcript isoforms and pseudogenes, participation in cancer-related processes on the basis of Gene Ontology, and mentions in PubMed-indexed articles. All the genes covered by RNA-Seq data in TCGA were analyzed using the expanded scoring system that allowed us to reveal novel promising RGs for each examined cancer type and identify several "universal" pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4. The choice of RGs is the basis for precise gene expression evaluation by qPCR. Here, we suggested optimal pairs of traditionally used RGs for 12 cancer types and identified novel promising RGs that demonstrate high expression stability and other features of reliable and convenient RGs (high expression level, low mutation rate, non-involvement in cancer-related processes, single transcript isoform, and absence of pseudogenes).
Collapse
Affiliation(s)
- George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | | | | | | | | | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
22
|
Taylor SC, Nadeau K, Abbasi M, Lachance C, Nguyen M, Fenrich J. The Ultimate qPCR Experiment: Producing Publication Quality, Reproducible Data the First Time. Trends Biotechnol 2019; 37:761-774. [PMID: 30654913 DOI: 10.1016/j.tibtech.2018.12.002] [Citation(s) in RCA: 352] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 11/30/2018] [Accepted: 12/07/2018] [Indexed: 12/20/2022]
Abstract
Quantitative PCR (qPCR) is one of the most common techniques for quantification of nucleic acid molecules in biological and environmental samples. Although the methodology is perceived to be relatively simple, there are a number of steps and reagents that require optimization and validation to ensure reproducible data that accurately reflect the biological question(s) being posed. This review article describes and illustrates the critical pitfalls and sources of error in qPCR experiments, along with a rigorous, stepwise process to minimize variability, time, and cost in generating reproducible, publication quality data every time. Finally, an approach to make an informed choice between qPCR and digital PCR technologies is described.
Collapse
Affiliation(s)
- Sean C Taylor
- Bio-Rad Laboratories Canada Inc., 1329 Meyerside Drive, Mississauga, Ontario L5T1C9, Canada.
| | - Katia Nadeau
- Bio-Rad Laboratories Canada Inc., 1329 Meyerside Drive, Mississauga, Ontario L5T1C9, Canada
| | - Meysam Abbasi
- Bio-Rad Laboratories Canada Inc., 1329 Meyerside Drive, Mississauga, Ontario L5T1C9, Canada
| | - Claude Lachance
- Bio-Rad Laboratories Canada Inc., 1329 Meyerside Drive, Mississauga, Ontario L5T1C9, Canada
| | - Marie Nguyen
- Bio-Rad Laboratories, 255 Linus Pauling Drive, Hercules, CA 94547, USA
| | - Joshua Fenrich
- Bio-Rad Laboratories, 255 Linus Pauling Drive, Hercules, CA 94547, USA
| |
Collapse
|
23
|
Zacharias HU, Altenbuchinger M, Gronwald W. Statistical Analysis of NMR Metabolic Fingerprints: Established Methods and Recent Advances. Metabolites 2018; 8:E47. [PMID: 30154338 PMCID: PMC6161311 DOI: 10.3390/metabo8030047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 08/01/2018] [Accepted: 08/18/2018] [Indexed: 01/02/2023] Open
Abstract
In this review, we summarize established and recent bioinformatic and statistical methods for the analysis of NMR-based metabolomics. Data analysis of NMR metabolic fingerprints exhibits several challenges, including unwanted biases, high dimensionality, and typically low sample numbers. Common analysis tasks comprise the identification of differential metabolites and the classification of specimens. However, analysis results strongly depend on the preprocessing of the data, and there is no consensus yet on how to remove unwanted biases and experimental variance prior to statistical analysis. Here, we first review established and new preprocessing protocols and illustrate their pros and cons, including different data normalizations and transformations. Second, we give a brief overview of state-of-the-art statistical analysis in NMR-based metabolomics. Finally, we discuss a recent development in statistical data analysis, where data normalization becomes obsolete. This method, called zero-sum regression, builds metabolite signatures whose estimation as well as predictions are independent of prior normalization.
Collapse
Affiliation(s)
- Helena U Zacharias
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.
| | - Michael Altenbuchinger
- Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053 Regensburg, Germany.
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053 Regensburg, Germany.
| |
Collapse
|
24
|
Hochrein J, Zacharias HU, Taruttis F, Samol C, Engelmann JC, Spang R, Oefner PJ, Gronwald W. Data Normalization of (1)H NMR Metabolite Fingerprinting Data Sets in the Presence of Unbalanced Metabolite Regulation. J Proteome Res 2015; 14:3217-28. [PMID: 26147738 DOI: 10.1021/acs.jproteome.5b00192] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Data normalization is an essential step in NMR-based metabolomics. Conducted properly, it improves data quality and removes unwanted biases. The choice of the appropriate normalization method is critical and depends on the inherent properties of the data set in question. In particular, the presence of unbalanced metabolic regulation, where the different specimens and cohorts under investigation do not contain approximately equal shares of up- and down-regulated features, may strongly influence data normalization. Here, we demonstrate the suitability of the Shapiro-Wilk test to detect such unbalanced regulation. Next, employing a Latin-square design consisting of eight metabolites spiked into a urine specimen at eight different known concentrations, we show that commonly used normalization and scaling methods fail to retrieve true metabolite concentrations in the presence of increasing amounts of glucose added to simulate unbalanced regulation. However, by learning the normalization parameters on a subset of nonregulated features only, Linear Baseline Normalization, Probabilistic Quotient Normalization, and Variance Stabilization Normalization were found to account well for different dilutions of the samples without distorting the true spike-in levels even in the presence of marked unbalanced metabolic regulation. Finally, the methods described were applied successfully to a real world example of unbalanced regulation, namely, a set of plasma specimens collected from patients with and without acute kidney injury after cardiac surgery with cardiopulmonary bypass use.
Collapse
Affiliation(s)
- Jochen Hochrein
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Helena U Zacharias
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Franziska Taruttis
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Claudia Samol
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Julia C Engelmann
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Rainer Spang
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Peter J Oefner
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| |
Collapse
|
25
|
Vigelsø A, Dybboe R, Hansen CN, Dela F, Helge JW, Guadalupe Grau A. GAPDH and β-actin protein decreases with aging, making Stain-Free technology a superior loading control in Western blotting of human skeletal muscle. J Appl Physiol (1985) 2014; 118:386-94. [PMID: 25429098 DOI: 10.1152/japplphysiol.00840.2014] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Reference proteins (RP) or the total protein (TP) loaded is used to correct for uneven loading and/or transfer in Western blotting. However, the signal sensitivity and the influence of physiological conditions may question the normalization methods. Therefore, three widely used reference proteins [β-actin, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), and α-tubulin], as well as TP loaded measured by Stain-Free technology (SF) as normalization tool were tested. This was done using skeletal muscle samples from men subjected to physiological conditions often investigated in applied physiology where the intervention has been suggested to impede normalization (ageing, muscle atrophy, and different muscle fiber type composition). The linearity of signal and the methodological variation coefficient was obtained. Furthermore, the inter- and intraindividual variation in signals obtained from SF and RP was measured in relation to ageing, muscle atrophy, and different muscle fiber type composition, respectively. A stronger linearity of SF and β-actin compared with GAPDH and α-tubulin was observed. The methodological variation was relatively low in all four methods (4-11%). Protein level of β-actin and GAPDH was lower in older men compared with young men. In conclusion, β-actin, GAPDH, and α-tubulin may not be used for normalization in studies that include subjects with a large age difference. In contrast, the RPs may not be affected in studies that include muscle wasting and differences in muscle fiber type. The novel SF technology adds lower variation to the results compared with the existing methods for correcting for loading inaccuracy in Western blotting of human skeletal muscle in applied physiology.
Collapse
Affiliation(s)
- Andreas Vigelsø
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Rie Dybboe
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Christina Neigaard Hansen
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Flemming Dela
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Jørn W Helge
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Amelia Guadalupe Grau
- Center for Healthy Aging, Department of Biomedical Sciences, Faculty of Health Sciences, University of Copenhagen, Denmark
| |
Collapse
|
26
|
Mangat CS, Bharat A, Gehrke SS, Brown ED. Rank ordering plate data facilitates data visualization and normalization in high-throughput screening. ACTA ACUST UNITED AC 2014; 19:1314-20. [PMID: 24828052 DOI: 10.1177/1087057114534298] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
High-throughput screening (HTS) of chemical and microbial strain collections is an indispensable tool for modern chemical and systems biology; however, HTS data sets have inherent systematic and random error, which may lead to false-positive or false-negative results. Several methods of normalization of data exist; nevertheless, due to the limitations of each, no single method has been universally adopted. Here, we present a method of data visualization and normalization that is effective, intuitive, and easy to implement in a spreadsheet program. For each plate, the data are ordered by ascending values and a plot thereof yields a curve that is a signature of the plate data. Curve shape characteristics provide intuitive visualization of the frequency and strength of inhibitors, activators, and noise on the plate, allowing potentially problematic plates to be flagged. To reduce plate-to-plate variation, the data can be normalized by the mean of the middle 50% of ordered values, also called the interquartile mean (IQM) or the 50% trimmed mean of the plate. Positional effects due to bias in columns, rows, or wells can be corrected using the interquartile mean of each well position across all plates (IQMW) as a second level of normalization. We illustrate the utility of this method using data sets from biochemical and phenotypic screens.
Collapse
Affiliation(s)
- Chand S Mangat
- M. G. DeGroote Institute for Infectious Disease Research and Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Amrita Bharat
- M. G. DeGroote Institute for Infectious Disease Research and Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Sebastian S Gehrke
- M. G. DeGroote Institute for Infectious Disease Research and Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Eric D Brown
- M. G. DeGroote Institute for Infectious Disease Research and Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada McMaster High Throughput Screening Laboratory, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
27
|
Abstract
Copy number variation (CNV) has emerged as an important genetic component in human diseases, which are increasingly being studied for large numbers of samples by sequencing the coding regions of the genome, i.e., exome sequencing. Nonetheless, detecting this variation from such targeted sequencing data is a difficult task, involving sorting out signal from noise, for which we have recently developed a set of statistical and computational tools called XHMM. In this unit, we give detailed instructions on how to run XHMM and how to use the resulting CNV calls in biological analyses.
Collapse
Affiliation(s)
- Menachem Fromer
- Division of Psychiatric Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA,Stanley Center for Psychiatric Research and Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA,Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Shaun M. Purcell
- Division of Psychiatric Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA,Stanley Center for Psychiatric Research and Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA,Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| |
Collapse
|
28
|
Zauber H, Schüler V, Schulze W. Systematic evaluation of reference protein normalization in proteomic experiments. Front Plant Sci 2013; 4:25. [PMID: 23450762 PMCID: PMC3583035 DOI: 10.3389/fpls.2013.00025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 02/04/2013] [Indexed: 06/01/2023]
Abstract
Quantitative comparative analyses of protein abundances using peptide ion intensities and their modifications have become a widely used technique in studying various biological questions. In the past years, several methods for quantitative proteomics were established using stable-isotope labeling and label-free approaches. We systematically evaluated the application of reference protein normalization (RPN) for proteomic experiments using a high mass accuracy LC-MS/MS platform. In RPN all sample peptide intensities were normalized to an average protein intensity of a spiked reference protein. The main advantage of this method is that it avoids fraction of total based relative analysis of proteomic data, which is often very much dependent on sample complexity. We could show that reference protein ion intensity sums are sufficiently reproducible to ensure a reliable normalization. We validated the RPN strategy by analyzing changes in protein abundances induced by nutrient starvation in Arabidopsis thaliana. Beyond that, we provide a principle guideline for determining optimal combination of sample protein and reference protein load on individual LC-MS/MS systems.
Collapse
Affiliation(s)
- Henrik Zauber
- Max Planck Institute of Molecular Plant PhysiologyGolm, Germany
| | - Vivian Schüler
- Max Planck Institute of Molecular Plant PhysiologyGolm, Germany
| | - Waltraud Schulze
- Max Planck Institute of Molecular Plant PhysiologyGolm, Germany
- Plant Systems Biology, University of HohenheimStuttgart, Germany
| |
Collapse
|
29
|
Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W. State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 2012; 8:146-160. [PMID: 22593726 PMCID: PMC3337420 DOI: 10.1007/s11306-011-0350-z] [Citation(s) in RCA: 141] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 08/01/2011] [Indexed: 12/20/2022]
Abstract
Extracting biomedical information from large metabolomic datasets by multivariate data analysis is of considerable complexity. Common challenges include among others screening for differentially produced metabolites, estimation of fold changes, and sample classification. Prior to these analysis steps, it is important to minimize contributions from unwanted biases and experimental variance. This is the goal of data preprocessing. In this work, different data normalization methods were compared systematically employing two different datasets generated by means of nuclear magnetic resonance (NMR) spectroscopy. To this end, two different types of normalization methods were used, one aiming to remove unwanted sample-to-sample variation while the other adjusts the variance of the different metabolites by variable scaling and variance stabilization methods. The impact of all methods tested on sample classification was evaluated on urinary NMR fingerprints obtained from healthy volunteers and patients suffering from autosomal polycystic kidney disease (ADPKD). Performance in terms of screening for differentially produced metabolites was investigated on a dataset following a Latin-square design, where varied amounts of 8 different metabolites were spiked into a human urine matrix while keeping the total spike-in amount constant. In addition, specific tests were conducted to systematically investigate the influence of the different preprocessing methods on the structure of the analyzed data. In conclusion, preprocessing methods originally developed for DNA microarray analysis, in particular, Quantile and Cubic-Spline Normalization, performed best in reducing bias, accurately detecting fold changes, and classifying samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0350-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefanie M. Kohl
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| | - Matthias S. Klein
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| | - Jochen Hochrein
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| | - Peter J. Oefner
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| | - Rainer Spang
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, Josef-Engert-Strasse 9, 93053 Regensburg, Germany
| |
Collapse
|