1
|
Hall LM, Hill DW, Bugden K, Cawley S, Hall LH, Chen MH, Grant DF. Development of a Reverse Phase HPLC Retention Index Model for Nontargeted Metabolomics Using Synthetic Compounds. J Chem Inf Model 2018; 58:591-604. [PMID: 29489351 DOI: 10.1021/acs.jcim.7b00496] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The MolFind application has been developed as a nontargeted metabolomics chemometric tool to facilitate structure identification when HPLC biofluids analysis reveals a feature of interest. Here synthetic compounds are selected and measured to form the basis of a new, more accurate, HPLC retention index model for use with MolFind. We show that relatively inexpensive synthetic screening compounds with simple structures can be used to develop an artificial neural network model that is successful in making quality predictions for human metabolites. A total of 1955 compounds were obtained and measured for the model. A separate set of 202 human metabolites was used for independent validation. The new ANN model showed improved accuracy over previous models. The model, based on relatively simple compounds, was able to make quality predictions for complex compounds not similar to training data. Independent validation metabolites with feature combinations found in three or more training compounds were predicted with 97% sensitivity while metabolites with feature combinations found in less than three training compounds were predicted with >90% sensitivity. The study describes the method used to select synthetic compounds and new descriptors developed to encode the relationship between lipophilic molecular subgraphs and HPLC retention. Finally, we introduce the QRI (qualitative range of interest) modification of neural network backpropagation learning to generate models simultaneously based on quantitative and qualitative data.
Collapse
Affiliation(s)
- L Mark Hall
- Hall Associates Consulting , Quincy , Massachusetts 02170 , United States
| | - Dennis W Hill
- Department of Pharmaceutical Sciences , University of Connecticut , Storrs , Connecticut 06269 , United States
| | - Kelly Bugden
- South Carolina Law Enforcement Division , Toxicology Department , Columbia , South Carolina 29210 , United States
| | | | - Lowell H Hall
- Department of Chemistry , Eastern Nazarene College , Quincy , Massachusetts 02170 , United States
| | - Ming-Hui Chen
- Department of Statistics , University of Connecticut , Storrs , Connecticut 06269 , United States
| | - David F Grant
- Department of Pharmaceutical Sciences , University of Connecticut , Storrs , Connecticut 06269 , United States
| |
Collapse
|
2
|
Dona AC, Kyriakides M, Scott F, Shephard EA, Varshavi D, Veselkov K, Everett JR. A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments. Comput Struct Biotechnol J 2016; 14:135-53. [PMID: 27087910 PMCID: PMC4821453 DOI: 10.1016/j.csbj.2016.02.005] [Citation(s) in RCA: 207] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 02/16/2016] [Accepted: 02/23/2016] [Indexed: 01/14/2023] Open
Abstract
Metabonomics/metabolomics is an important science for the understanding of biological systems and the prediction of their behaviour, through the profiling of metabolites. Two technologies are routinely used in order to analyse metabolite profiles in biological fluids: nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS), the latter typically with hyphenation to a chromatography system such as liquid chromatography (LC), in a configuration known as LC-MS. With both NMR and MS-based detection technologies, the identification of the metabolites in the biological sample remains a significant obstacle and bottleneck. This article provides guidance on methods for metabolite identification in biological fluids using NMR spectroscopy, and is illustrated with examples from recent studies on mice.
Collapse
Affiliation(s)
- Anthony C Dona
- Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, SW7 2AZ, United Kingdom
| | - Michael Kyriakides
- Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, SW7 2AZ, United Kingdom
| | - Flora Scott
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | - Elizabeth A Shephard
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | - Dorsa Varshavi
- Medway Metabonomics Research Group, University of Greenwich, Chatham Maritime, Kent ME4 4TB, United Kingdom
| | - Kirill Veselkov
- Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, SW7 2AZ, United Kingdom
| | - Jeremy R Everett
- Medway Metabonomics Research Group, University of Greenwich, Chatham Maritime, Kent ME4 4TB, United Kingdom
| |
Collapse
|
3
|
Optimizing artificial neural network models for metabolomics and systems biology: an example using HPLC retention index data. Bioanalysis 2016; 7:939-55. [PMID: 25966007 DOI: 10.4155/bio.15.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Artificial Neural Networks (ANN) are extensively used to model 'omics' data. Different modeling methodologies and combinations of adjustable parameters influence model performance and complicate model optimization. METHODOLOGY We evaluated optimization of four ANN modeling parameters (learning rate annealing, stopping criteria, data split method, network architecture) using retention index (RI) data for 390 compounds. Models were assessed by independent validation (I-Val) using newly measured RI values for 1492 compounds. CONCLUSION The best model demonstrated an I-Val standard error of 55 RI units and was built using a Ward's clustering data split and a minimally nonlinear network architecture. Use of validation statistics for stopping and final model selection resulted in better independent validation performance than the use of test set statistics.
Collapse
|
4
|
Jeanneret F, Tonoli D, Rossier MF, Saugy M, Boccard J, Rudaz S. Evaluation of steroidomics by liquid chromatography hyphenated to mass spectrometry as a powerful analytical strategy for measuring human steroid perturbations. J Chromatogr A 2015. [PMID: 26195035 DOI: 10.1016/j.chroma.2015.07.008] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This review presents the evolution of steroid analytical techniques, including gas chromatography coupled to mass spectrometry (GC-MS), immunoassay (IA) and targeted liquid chromatography coupled to mass spectrometry (LC-MS), and it evaluates the potential of extended steroid profiles by a metabolomics-based approach, namely steroidomics. Steroids regulate essential biological functions including growth and reproduction, and perturbations of the steroid homeostasis can generate serious physiological issues; therefore, specific and sensitive methods have been developed to measure steroid concentrations. GC-MS measuring several steroids simultaneously was considered the first historical standard method for analysis. Steroids were then quantified by immunoassay, allowing a higher throughput; however, major drawbacks included the measurement of a single compound instead of a panel and cross-reactivity reactions. Targeted LC-MS methods with selected reaction monitoring (SRM) were then introduced for quantifying a small steroid subset without the problems of cross-reactivity. The next step was the integration of metabolomic approaches in the context of steroid analyses. As metabolomics tends to identify and quantify all the metabolites (i.e., the metabolome) in a specific system, appropriate strategies were proposed for discovering new biomarkers. Steroidomics, defined as the untargeted analysis of the steroid content in a sample, was implemented in several fields, including doping analysis, clinical studies, in vivo or in vitro toxicology assays, and more. This review discusses the current analytical methods for assessing steroid changes and compares them to steroidomics. Steroids, their pathways, their implications in diseases and the biological matrices in which they are analysed will first be described. Then, the different analytical strategies will be presented with a focus on their ability to obtain relevant information on the steroid pattern. The future technical requirements for improving steroid analysis will also be presented.
Collapse
Affiliation(s)
- Fabienne Jeanneret
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland; Human Protein Sciences Department, University of Geneva, 1211 Geneva 4, Switzerland; Swiss Centre for Applied Human Toxicology, Geneva, Switzerland
| | - David Tonoli
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland; Human Protein Sciences Department, University of Geneva, 1211 Geneva 4, Switzerland; Swiss Centre for Applied Human Toxicology, Geneva, Switzerland
| | - Michel F Rossier
- Swiss Centre for Applied Human Toxicology, Geneva, Switzerland; Institut Central (ICHV), Hôpital du Valais, Sion, Switzerland
| | - Martial Saugy
- Swiss Laboratory for Doping Analyses, University Center of Legal Medicine, Epalinges, Switzerland
| | - Julien Boccard
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland
| | - Serge Rudaz
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland; Swiss Centre for Applied Human Toxicology, Geneva, Switzerland.
| |
Collapse
|
5
|
Hamdalla MA, Ammar RA, Rajasekaran S. A molecular structure matching approach to efficient identification of endogenous mammalian biochemical structures. BMC Bioinformatics 2015; 16 Suppl 5:S11. [PMID: 25859612 PMCID: PMC4402589 DOI: 10.1186/1471-2105-16-s5-s11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Metabolomics is the study of small molecules, called metabolites, of a cell, tissue or organism. It is of particular interest as endogenous metabolites represent the phenotype resulting from gene expression. A major challenge in metabolomics research is the structural identification of unknown biochemical compounds in complex biofluids. In this paper we present an efficient cheminformatics tool, BioSMXpress that uses known endogenous mammalian biochemicals and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSMXpress identifies endogenous mammalian biochemical structures with high accuracy. BioSMXpress is 8 times faster than our previous work BioSM without compromising the accuracy of the predictions made. BioSMXpress is freely available at http://engr.uconn.edu/~rajasek/BioSMXpress.zip
Collapse
|
6
|
Kussmann M, Morine MJ, Hager J, Sonderegger B, Kaput J. Perspective: a systems approach to diabetes research. Front Genet 2013; 4:205. [PMID: 24187547 PMCID: PMC3807566 DOI: 10.3389/fgene.2013.00205] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Accepted: 09/24/2013] [Indexed: 12/17/2022] Open
Abstract
We review here the status of human type 2 diabetes studies from a genetic, epidemiological, and clinical (intervention) perspective. Most studies limit analyses to one or a few omic technologies providing data of components of physiological processes. Since all chronic diseases are multifactorial and arise from complex interactions between genetic makeup and environment, type 2 diabetes mellitus (T2DM) is a collection of sub-phenotypes resulting in high fasting glucose. The underlying gene–environment interactions that produce these classes of T2DM are imperfectly characterized. Based on assessments of the complexity of T2DM, we propose a systems biology approach to advance the understanding of origin, onset, development, prevention, and treatment of this complex disease. This systems-based strategy is based on new study design principles and the integrated application of omics technologies: we pursue longitudinal studies in which each subject is analyzed at both homeostasis and after (healthy and safe) challenges. Each enrolled subject functions thereby as their own case and control and this design avoids assigning the subjects a priori to case and control groups based on limited phenotyping. Analyses at different time points along this longitudinal investigation are performed with a comprehensive set of omics platforms. These data sets are generated in a biological context, rather than biochemical compound class-driven manner, which we term “systems omics.”
Collapse
Affiliation(s)
- Martin Kussmann
- Nestlé Institute of Health Sciences SA Lausanne, Switzerland ; Faculty of Life Sciences, Ecole Polytechnique Fédérale Lausanne, Switzerland ; Faculty of Science, Aarhus University Aarhus, Denmark
| | | | | | | | | |
Collapse
|
7
|
Menikarachchi LC, Hill DW, Hamdalla MA, Mandoiu II, Grant DF. In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. J Chem Inf Model 2013; 53:2483-92. [PMID: 23991755 DOI: 10.1021/ci400368v] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Current methods of structure identification in mass-spectrometry-based nontargeted metabolomics rely on matching experimentally determined features of an unknown compound to those of candidate compounds contained in biochemical databases. A major limitation of this approach is the relatively small number of compounds currently included in these databases. If the correct structure is not present in a database, it cannot be identified, and if it cannot be identified, it cannot be included in a database. Thus, there is an urgent need to augment metabolomics databases with rationally designed biochemical structures using alternative means. Here we present the In Vivo/In Silico Metabolites Database (IIMDB), a database of in silico enzymatically synthesized metabolites, to partially address this problem. The database, which is available at http://metabolomics.pharm.uconn.edu/iimdb/, includes ~23,000 known compounds (mammalian metabolites, drugs, secondary plant metabolites, and glycerophospholipids) collected from existing biochemical databases plus more than 400,000 computationally generated human phase-I and phase-II metabolites of these known compounds. IIMDB features a user-friendly web interface and a programmer-friendly RESTful web service. Ninety-five percent of the computationally generated metabolites in IIMDB were not found in any existing database. However, 21,640 were identical to compounds already listed in PubChem, HMDB, KEGG, or HumanCyc. Furthermore, the vast majority of these in silico metabolites were scored as biological using BioSM, a software program that identifies biochemical structures in chemical structure space. These results suggest that in silico biochemical synthesis represents a viable approach for significantly augmenting biochemical databases for nontargeted metabolomics applications.
Collapse
Affiliation(s)
- Lochana C Menikarachchi
- Department of Pharmaceutical Sciences, University of Connecticut , 69 North Eagleville Road, Storrs, Connecticut 06269, United States
| | | | | | | | | |
Collapse
|
8
|
Menikarachchi LC, Hamdalla MA, Hill DW, Grant DF. Chemical structure identification in metabolomics: computational modeling of experimental features. Comput Struct Biotechnol J 2013; 5:e201302005. [PMID: 24688698 PMCID: PMC3962140 DOI: 10.5936/csbj.201302005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Revised: 12/20/2012] [Accepted: 12/24/2012] [Indexed: 11/30/2022] Open
Abstract
The identification of compounds in complex mixtures remains challenging despite recent advances in analytical techniques. At present, no single method can detect and quantify the vast array of compounds that might be of potential interest in metabolomics studies. High performance liquid chromatography/mass spectrometry (HPLC/MS) is often considered the analytical method of choice for analysis of biofluids. The positive identification of an unknown involves matching at least two orthogonal HPLC/MS measurements (exact mass, retention index, drift time etc.) against an authentic standard. However, due to the limited availability of authentic standards, an alternative approach involves matching known and measured features of the unknown compound with computationally predicted features for a set of candidate compounds downloaded from a chemical database. Computationally predicted features include retention index, ECOM50 (energy required to decompose 50% of a selected precursor ion in a collision induced dissociation cell), drift time, whether the unknown compound is biological or synthetic and a collision induced dissociation (CID) spectrum. Computational predictions are used to filter the initial “bin” of candidate compounds. The final output is a ranked list of candidates that best match the known and measured features. In this mini review, we discuss cheminformatics methods underlying this database search-filter identification approach.
Collapse
Affiliation(s)
- Lochana C Menikarachchi
- Department of Pharmaceutical Sciences, University of Connecticut, 69 N Eagleville Rd, Storrs, CT 06269, United States
| | - Mai A Hamdalla
- Department of Computer Science & Engineering, University of Connecticut, 371 Fairfield Road, Unit 2155 Storrs, CT 06269, United States
| | - Dennis W Hill
- Department of Pharmaceutical Sciences, University of Connecticut, 69 N Eagleville Rd, Storrs, CT 06269, United States
| | - David F Grant
- Department of Pharmaceutical Sciences, University of Connecticut, 69 N Eagleville Rd, Storrs, CT 06269, United States
| |
Collapse
|
9
|
Hamdalla MA, Mandoiu II, Hill DW, Rajasekaran S, Grant DF. BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. J Chem Inf Model 2013; 53:601-12. [PMID: 23330685 DOI: 10.1021/ci300512q] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The structural identification of unknown biochemical compounds in complex biofluids continues to be a major challenge in metabolomics research. Using LC/MS, there are currently two major options for solving this problem: searching small biochemical databases, which often do not contain the unknown of interest or searching large chemical databases which include large numbers of nonbiochemical compounds. Searching larger chemical databases (larger chemical space) increases the odds of identifying an unknown biochemical compound, but only if nonbiochemical structures can be eliminated from consideration. In this paper we present BioSM; a cheminformatics tool that uses known endogenous mammalian biochemical compounds (as scaffolds) and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSM identifies endogenous mammalian biochemical structures with high accuracy. In a leave-one-out cross validation experiment, BioSM correctly predicted 95% of 1388 Kyoto Encyclopedia of Genes and Genomes (KEGG) compounds as endogenous mammalian biochemicals using 1565 scaffolds. Analysis of two additional biological data sets containing 2330 human metabolites (HMDB) and 2416 plant secondary metabolites (KEGG) resulted in biochemical annotations of 89% and 72% of the compounds, respectively. When a data set of 3895 drugs (DrugBank and USAN) was tested, 48% of these structures were predicted to be biochemical. However, when a set of synthetic chemical compounds (Chembridge and Chemsynthesis databases) were examined, only 29% of the 458,207 structures were predicted to be biochemical. Moreover, BioSM predicted that 34% of 883,199 randomly selected compounds from PubChem were biochemical. We then expanded the scaffold list to 3927 biochemical compounds and reevaluated the above data sets to determine whether scaffold number influenced model performance. Although there were significant improvements in model sensitivity and specificity using the larger scaffold list, the data set comparison results were very similar. These results suggest that additional biochemical scaffolds will not further improve our representation of biochemical structure space and that the model is reasonably robust. BioSM provides a qualitative (yes/no) and quantitative (ranking) method for endogenous mammalian biochemical annotation of chemical space and, thus, will be useful in the identification of unknown biochemical structures in metabolomics. BioSM is freely available at http://metabolomics.pharm.uconn.edu.
Collapse
Affiliation(s)
- Mai A Hamdalla
- Computer Science and Engineering Department and ‡Pharmaceutical Sciences Department, University of Connecticut , Connecticut, United States
| | | | | | | | | |
Collapse
|
10
|
Menikarachchi LC, Cawley S, Hill DW, Hall LM, Hall L, Lai S, Wilder J, Grant DF. MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. Anal Chem 2012; 84:9388-94. [PMID: 23039714 PMCID: PMC3523192 DOI: 10.1021/ac302048x] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In this paper, we present MolFind, a highly multithreaded pipeline type software package for use as an aid in identifying chemical structures in complex biofluids and mixtures. MolFind is specifically designed for high-performance liquid chromatography/mass spectrometry (HPLC/MS) data inputs typical of metabolomics studies where structure identification is the ultimate goal. MolFind enables compound identification by matching HPLC/MS-based experimental data obtained for an unknown compound with computationally derived HPLC/MS values for candidate compounds downloaded from chemical databases such as PubChem. The downloaded "bins" consist of all compounds matching the monoisotopic molecular weight of the unknown. The computational HPLC/MS values predicted include retention index (RI), ECOM(50) (energy required to fragment 50% of a selected precursor ion), drift time, and collision induced dissociation (CID) spectrum. RI, ECOM(50), and drift-time models are used for filtering compounds downloaded from PubChem. The remaining candidates are then ranked based on CID spectra matching. Current RI and ECOM(50) models allow for the removal of about 28% of compounds from PubChem bins. Our estimates suggest that this could be improved to as much as 87% with additional chemical structures included in the computational models. Quantitative structure property relationship-based modeling of drift times showed a better correlation with experimentally determined drift times than did Mobcal cross-sectional areas. In 23 of 35 example cases, filtering PubChem bins with RI and ECOM(50) predictive models resulted in improved ranking of the unknown compounds compared to previous studies using CID spectra matching alone. In 19 of 35 examples, the correct candidate was ranked within the top 20 compounds in bins containing an average of 1635 compounds.
Collapse
Affiliation(s)
- Lochana C. Menikarachchi
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - Shannon Cawley
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - Dennis W. Hill
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - L. Mark Hall
- Hall Associates Consulting, Quincy, Massachusetts, United States
| | - Lowell Hall
- Department of Chemistry, Eastern Nazarene College, Quincy, Massachusetts, United States
| | - Steven Lai
- Waters Corporation, Beverly, Massachusetts, United States
| | - Janine Wilder
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - David F. Grant
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| |
Collapse
|
11
|
Hall LM, Hall LH, Kertesz TM, Hill DW, Sharp TR, Oblak EZ, Dong YW, Wishart DS, Chen MH, Grant DF. Development of Ecom₅₀ and retention index models for nontargeted metabolomics: identification of 1,3-dicyclohexylurea in human serum by HPLC/mass spectrometry. J Chem Inf Model 2012; 52:1222-37. [PMID: 22489687 DOI: 10.1021/ci300092s] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom₅₀ (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom₅₀ models were developed based on both Molconn and CODESSA structural descriptors. These models gave r² values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v² = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom₅₀ and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom₅₀ and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.
Collapse
Affiliation(s)
- L Mark Hall
- Hall Associates Consulting , Quincy, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Hamdalla M, Grant D, Mandoiu I, Hill D, Rajasekaran S, Ammar R. The use of graph matching algorithms to identify biochemical substructures in synthetic chemical compounds: Application to metabolomics. IEEE ... INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES : [PROCEEDINGS]. IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES 2012; 2012. [PMID: 26448899 DOI: 10.1109/iccabs.2012.6182637] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Metabolomics is a rapidly growing field studying the small-molecule metabolite profile of a biological organism. Studying metabolism has a potential to contribute to biomedical research as well as drug discovery. One of the current challenges in metabolomics is the identification of unknown metabolites as existing chemical databases are incomplete. We present a novel way of utilizing known mammalian metabolites in an effort to identify unknown ones. The system relies on a mammalian scaffolds database to aid the classification process. The results show that 96% of the mammalian compounds were identified as truly mammalian in a leave-one-out experiment. The system was also tested with a random set of synthetic compounds, downloaded from ChemBridge and ChemSynthesis databases. The system was able to eliminate 54% of the set, leaving 46% of the compounds as potentially unknown mammalian metabolites.
Collapse
Affiliation(s)
- Mai Hamdalla
- Computer Science and Engineering Department, University of Connecticut, Connecticut, USA
| | - David Grant
- Pharmaceutical Sciences Department, University of Connecticut, Connecticut, USA
| | - Ion Mandoiu
- Computer Science and Engineering Department, University of Connecticut, Connecticut, USA
| | - Dennis Hill
- Pharmaceutical Sciences Department, University of Connecticut, Connecticut, USA
| | | | - Reda Ammar
- Computer Science and Engineering Department, University of Connecticut, Connecticut, USA
| |
Collapse
|
13
|
Kind T, Fiehn O. Advances in structure elucidation of small molecules using mass spectrometry. BIOANALYTICAL REVIEWS 2010; 2:23-60. [PMID: 21289855 PMCID: PMC3015162 DOI: 10.1007/s12566-010-0015-9] [Citation(s) in RCA: 310] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2010] [Accepted: 08/03/2010] [Indexed: 12/22/2022]
Abstract
The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successive chapter discusses mass spectral fragmentation pathways, biotransformation reactions and drug metabolism studies, the mass spectral simulation and generation of in silico mass spectra, expert systems for mass spectral interpretation, and the use of computational chemistry to explain gas-phase phenomena. A single chapter discusses data handling for hyphenated approaches including mass spectral deconvolution for clean mass spectra, cheminformatics approaches and structure retention relationships, and retention index predictions for gas and liquid chromatography. The last section reviews the current state of electronic data sharing of mass spectra and discusses the importance of software development for the advancement of structure elucidation of small molecules. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12566-010-0015-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tobias Kind
- Genome Center–Metabolomics, University of California Davis, Davis, CA 95616 USA
| | - Oliver Fiehn
- Genome Center–Metabolomics, University of California Davis, Davis, CA 95616 USA
| |
Collapse
|