1
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
2
|
Middha K, Mittal A. An effective feature selection method for type 2 diabetes mellitus detection using gene expression data. INTELLIGENT DECISION TECHNOLOGIES 2022. [DOI: 10.3233/idt-220077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Type 2 diabetes mellitus (T2DM) detection is a chronic disease, which is caused due to the insulin disorder. Moreover, the decreased secretion of insulin increased the blood glucose level, thereby the human body cannot respond with the high glucose level. The T2DM sufferers do not produce enough insulin, or it resists insulin. The symptoms of T2DM disease are increased hunger, thirst, fatigue, frequent urination and blurred vision, and in some cases, there are no symptoms. The commonly utilized treatments of T2DM are exercise, diet, insulin therapy and medication. In this paper, the Competitive Multi-Verse Rider Optimizer (CMVRO)-based hybrid deep learning scheme is devised for T2DM detection. The hybrid deep learning involves two classifiers, such as Rider based Neural Network (RideNN) and Deep Residual Network (DRN). Moreover, the comparative analysis of T2DM detection is done by comparing various feature selection approaches, such as Tanimoto similarity, Chi square (Chi-2), Fisher Score (FS), Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine recursive feature elimination (SVM-RFE) for T2DM detection. Amongst these, the tanimoto similarity feature selection approach attained the better performance with respect to the testing accuracy, sensitivity and specificity of 0.932, 0.932 and 0.914, correspondingly.
Collapse
|
3
|
Liu Y, De Vijlder T, Bittremieux W, Laukens K, Heyndrickx W. Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021:e9120. [PMID: 33955607 DOI: 10.1002/rcm.9120] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/13/2021] [Accepted: 04/29/2021] [Indexed: 06/12/2023]
Abstract
RATIONALE Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation. ARCHITECTURES Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search. CONCLUSIONS In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.
Collapse
Affiliation(s)
| | | | - Wout Bittremieux
- University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), University of Antwerp, Antwerp, Belgium
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, San Diego, CA, USA
| | - Kris Laukens
- University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), University of Antwerp, Antwerp, Belgium
| | | |
Collapse
|
4
|
Gholizadeh A, Coblinski JA, Saberioon M, Ben-Dor E, Drábek O, Demattê JAM, Borůvka L, Němeček K, Chabrillat S, Dajčl J. vis-NIR and XRF Data Fusion and Feature Selection to Estimate Potentially Toxic Elements in Soil. SENSORS 2021; 21:s21072386. [PMID: 33808185 PMCID: PMC8037398 DOI: 10.3390/s21072386] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/20/2021] [Accepted: 03/26/2021] [Indexed: 11/16/2022]
Abstract
Soil contamination by potentially toxic elements (PTEs) is intensifying under increasing industrialization. Thus, the ability to efficiently delineate contaminated sites is crucial. Visible-near infrared (vis-NIR: 350-2500 nm) and X-ray fluorescence (XRF: 0.02-41.08 keV) spectroscopic techniques have attracted tremendous attention for the assessment of PTEs. Recently, the application of fused vis-NIR and XRF spectroscopy, which is based on the complementary effect of data fusion, is also increasing. Moreover, different data manipulation methods, including feature selection approaches, affect the prediction performance. This study investigated the feasibility of using single and fused vis-NIR and XRF spectra while exploring feature selection algorithms for the assessment of key soil PTEs. The soil samples were collected from one of the most heavily polluted areas of the Czech Republic and scanned using laboratory vis-NIR and XRF spectrometers. Univariate filter (UF) and genetic algorithm (GA) were used to select the bands of greater importance for the PTE prediction. Support vector machine (SVM) was then used to train the models using the full-range and feature-selected spectra of single sensors and their fusion. It was found that XRF spectra alone (primarily GA-selected) performed better than single vis-NIR and fused spectral data for predictions of PTEs. Moreover, the prediction models that were derived from the fused data set (particularly the GA-selected) enhanced the models' accuracies as compared with the single vis-NIR spectra. In general, the results suggest that the GA-selected spectra obtained from the single XRF spectrometer (for As and Pb) and from the fusion of vis-NIR and XRF (for Pb) are promising for accurate quantitative estimation detection of the mentioned PTEs.
Collapse
Affiliation(s)
- Asa Gholizadeh
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
- Correspondence: ; Tel.: +420-224-382-633
| | - João A. Coblinski
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
| | - Mohammadmehdi Saberioon
- Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, Germany; (M.S.); (S.C.)
| | - Eyal Ben-Dor
- Remote Sensing Laboratory, Department of Geography and Human Environment, Porter School of Environment and Earth Science, Tel Aviv University, Tel Aviv 69978, Israel;
| | - Ondřej Drábek
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
| | - José A. M. Demattê
- Department of Soil Science, Luiz de Queiroz College of Agriculture, University of Sao Paulo, Padua Dias Avenue, 11, CP 9, Piracicaba 13418-900, Brazil;
| | - Luboš Borůvka
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
| | - Karel Němeček
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
| | - Sabine Chabrillat
- Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, Germany; (M.S.); (S.C.)
| | - Julie Dajčl
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, Suchdol, 16500 Prague, Czech Republic; (J.A.C.); (O.D.); (L.B.); (K.N.); (J.D.)
| |
Collapse
|
5
|
Nguyen DH, Nguyen CH, Mamitsuka H. Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches. Brief Bioinform 2019; 20:2028-2043. [PMID: 30099485 PMCID: PMC6954430 DOI: 10.1093/bib/bby066] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/14/2018] [Accepted: 07/03/2018] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task.
Collapse
Affiliation(s)
- Dai Hai Nguyen
- Department of machine learning and bioinformatics, Bioinformatics Center, Kyoto University, Uji, Japan
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan
- Department of Computer Science, Aalto University, Otakaari, FI, Finland
| |
Collapse
|
6
|
[Special Issue for Honor Award dedicating to Prof Kimito Funatsu]Similarity, Diversity - Chemoinformatics. JOURNAL OF COMPUTER AIDED CHEMISTRY 2019. [DOI: 10.2751/jcac.20.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
Lai Z, Fiehn O. Mass spectral fragmentation of trimethylsilylated small molecules. MASS SPECTROMETRY REVIEWS 2018; 37:245-257. [PMID: 27580014 DOI: 10.1002/mas.21518] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 08/08/2016] [Accepted: 08/11/2016] [Indexed: 06/06/2023]
Abstract
Mass spectrometry-based untargeted metabolomics detects many peaks that cannot be identified. While advances have been made for automatic structure annotations in LC-electrospray-MS/MS, no open source solutions are available for hard electron ionization used in GC-MS. In metabolomics, most compounds bear moieties with acidic protons, for example, amino, hydroxyl, or carboxyl groups. Such functional groups increase the boiling points of metabolites too much for use in GC-MS. Hence, in GC-MS-focused metabolomics, derivatization of these groups is essential and has been employed since the 1960s. Specifically, trimethylsilylation is known as mild and universal method for GC-MS analysis. Here, we comprehensively compile accurate mass fragmentation rules and pathways of trimethylsilylated small molecules from 80 research articles over the past 5 decades, including diagnostic fragment ions, neutral losses, and typical ion ratios, for alcohols, carboxylic acids, amines, amino acids, sugars, steroids, thiols, and phosphates. These fragmentation rules were subsequently validated by specificity and sensitivity assessments using the NIST 14 nominal mass library and a new in-house GC-QTOF MS library containing 589 accurate mass spectra. From 556 tested fragmentation patterns, 228 rules yielded true positive hits within 4 mDa mass accuracy. These rules can be applied to assign substructures for mass spectra computation and unknown identification. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 37:245-257, 2018.
Collapse
Affiliation(s)
- Zijuan Lai
- West Coast Metabolomics Center, University of California Davis, Davis, CA
| | - Oliver Fiehn
- West Coast Metabolomics Center, University of California Davis, Davis, CA
- Biochemistry Department, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
8
|
Vis-NIR Spectroscopy and PLS Regression with Waveband Selection for Estimating the Total C and N of Paddy Soils in Madagascar. REMOTE SENSING 2017. [DOI: 10.3390/rs9101081] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
9
|
Byrne HJ, Knief P, Keating ME, Bonnier F. Spectral pre and post processing for infrared and Raman spectroscopy of biological tissues and cells. Chem Soc Rev 2016; 45:1865-78. [DOI: 10.1039/c5cs00440c] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
This review presents the current understanding of the factors influencing the quality of spectra recorded and the pre-processing steps commonly employed to improve on spectral quality, as well as some of the most common techniques for classification and analysis of the spectral data for biomedical applications.
Collapse
Affiliation(s)
- Hugh J. Byrne
- FOCAS Research Institute
- Dublin Institute of Technology
- Dublin 8
- Ireland
| | - Peter Knief
- Department of Medical Physics and Physiology
- Royal College of Surgeons in Ireland
- Dublin 2
- Ireland
| | - Mark E. Keating
- FOCAS Research Institute
- Dublin Institute of Technology
- Dublin 8
- Ireland
- School of Physics
| | - Franck Bonnier
- Université François-Rabelais de Tours
- Faculty of Pharmacy
- EA 6295 Nanomédicaments et Nanosondes
- 37200 Tours
- France
| |
Collapse
|
10
|
Ma Y, Kind T, Yang D, Leon C, Fiehn O. MS2Analyzer: A software for small molecule substructure annotations from accurate tandem mass spectra. Anal Chem 2014; 86:10724-31. [PMID: 25263576 PMCID: PMC4222628 DOI: 10.1021/ac502818e] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 09/27/2014] [Indexed: 01/08/2023]
Abstract
Systematic analysis and interpretation of the large number of tandem mass spectra (MS/MS) obtained in metabolomics experiments is a bottleneck in discovery-driven research. MS/MS mass spectral libraries are small compared to all known small molecule structures and are often not freely available. MS2Analyzer was therefore developed to enable user-defined searches of thousands of spectra for mass spectral features such as neutral losses, m/z differences, and product and precursor ions from MS/MS spectra in MSP/MGF files. The software is freely available at http://fiehnlab.ucdavis.edu/projects/MS2Analyzer/ . As the reference query set, 147 literature-reported neutral losses and their corresponding substructures were collected. This set was tested for accuracy of linking neutral loss analysis to substructure annotations using 19 329 accurate mass tandem mass spectra of structurally known compounds from the NIST11 MS/MS library. Validation studies showed that 92.1 ± 6.4% of 13 typical neutral losses such as acetylations, cysteine conjugates, or glycosylations are correct annotating the associated substructures, while the absence of mass spectra features does not necessarily imply the absence of such substructures. Use of this tool has been successfully demonstrated for complex lipids in microalgae.
Collapse
Affiliation(s)
- Yan Ma
- UC
Davis Genome Center−Metabolomics, University of California, Davis, California 95616, United States
| | - Tobias Kind
- UC
Davis Genome Center−Metabolomics, University of California, Davis, California 95616, United States
| | - Dawei Yang
- UC
Davis Genome Center−Metabolomics, University of California, Davis, California 95616, United States
- SPKLOMHNM
and Central Laboratory, Zhong Yuan Academy of Biological Medicine, Liaocheng University, Liaocheng People’s Hospital, Liaocheng, Shandong 252000, P. R. China
| | - Carlos Leon
- UC
Davis Genome Center−Metabolomics, University of California, Davis, California 95616, United States
- Biomedical
Engineering School, Carlos III University, Avda Universidad 30, 28911, Leganes, Madrid, Spain
| | - Oliver Fiehn
- UC
Davis Genome Center−Metabolomics, University of California, Davis, California 95616, United States
| |
Collapse
|
11
|
Zhang L, Tang C, Cao D, Zeng Y, Tan B, Zeng M, Fan W, Xiao H, Liang Y. Strategies for structure elucidation of small molecules using gas chromatography-mass spectrometric data. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.02.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
12
|
Erić S, Kalinić M, Popović A, Zloh M, Kuzmanovski I. Prediction of aqueous solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative importance of descriptors implemented in counter-propagation artificial neural networks. Int J Pharm 2012; 437:232-41. [DOI: 10.1016/j.ijpharm.2012.08.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Revised: 08/12/2012] [Accepted: 08/16/2012] [Indexed: 10/28/2022]
|
13
|
Neutral losses: A type of important variables in prediction of branching degree for acyclic alkenes from mass spectra. Anal Chim Acta 2012; 720:16-21. [DOI: 10.1016/j.aca.2011.11.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Revised: 11/10/2011] [Accepted: 11/14/2011] [Indexed: 11/20/2022]
|
14
|
Le Roux A, Kuzmanovski I, Habrant D, Meunier S, Bischoff P, Nadal B, Thetiot-Laurent SAL, Le Gall T, Wagner A, Novič M. Design and Synthesis of New Antioxidants Predicted by the Model Developed on a Set of Pulvinic Acid Derivatives. J Chem Inf Model 2011; 51:3050-9. [DOI: 10.1021/ci200205d] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Antoine Le Roux
- Laboratoire des Systèmes Chimiques Fonctionnels, UMR 7199, Faculté de Pharmacie, 74 route du Rhin, BP 24, 67401 Illkirch-Graffenstaden, France
- Laboratoire de Radiobiologie, EA 3430, Université de Strasbourg, Centre Régional de Lutte contre le Cancer Paul Strauss, 3 rue de la Porte de l’Hôpital, BP 42, 67065 Strasbourg, France
| | - Igor Kuzmanovski
- Laboratory of Chemometrics, National Institute of Chemistry, Hajdrihova 19, POB 660, SI-1001, Ljubljana, Slovenia
- Institut za hemija, PMF, Univerzitet “Sv. Kiril i Metodij”, P.O. Box 162, 1001 Skopje, Macedonia
| | - Damien Habrant
- Laboratoire des Systèmes Chimiques Fonctionnels, UMR 7199, Faculté de Pharmacie, 74 route du Rhin, BP 24, 67401 Illkirch-Graffenstaden, France
| | - Stéphane Meunier
- Laboratoire des Systèmes Chimiques Fonctionnels, UMR 7199, Faculté de Pharmacie, 74 route du Rhin, BP 24, 67401 Illkirch-Graffenstaden, France
| | - Pierre Bischoff
- Laboratoire de Radiobiologie, EA 3430, Université de Strasbourg, Centre Régional de Lutte contre le Cancer Paul Strauss, 3 rue de la Porte de l’Hôpital, BP 42, 67065 Strasbourg, France
| | - Brice Nadal
- CEA Saclay, iBiTecS, Service de Chimie Bioorganique et de Marquage, 91191 Gif-sur-Yvette, France
| | | | - Thierry Le Gall
- CEA Saclay, iBiTecS, Service de Chimie Bioorganique et de Marquage, 91191 Gif-sur-Yvette, France
| | - Alain Wagner
- Laboratoire des Systèmes Chimiques Fonctionnels, UMR 7199, Faculté de Pharmacie, 74 route du Rhin, BP 24, 67401 Illkirch-Graffenstaden, France
| | - Marjana Novič
- Laboratory of Chemometrics, National Institute of Chemistry, Hajdrihova 19, POB 660, SI-1001, Ljubljana, Slovenia
| |
Collapse
|
15
|
Duraipandian S, Zheng W, Ng J, Low JJH, Ilancheran A, Huang Z. In vivo diagnosis of cervical precancer using Raman spectroscopy and genetic algorithm techniques. Analyst 2011; 136:4328-36. [PMID: 21869948 DOI: 10.1039/c1an15296c] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This study aimed to evaluate the clinical utility of applying near-infrared (NIR) Raman spectroscopy and genetic algorithm-partial least squares-discriminant analysis (GA-PLS-DA) to identify biomolecular changes of cervical tissues associated with dysplastic transformation during colposcopic examination. A total of 105 in vivo Raman spectra were measured from 57 cervical sites (35 normal and 22 precancer sites) of 29 patients recruited, in which 65 spectra were from normal sites, while 40 spectra were from cervical precancerous lesions (i.e., 7 low-grade CIN and 33 high-grade CIN). The GA feature selection technique incorporated with PLS was utilized to study the significant biochemical Raman bands for differentiation between normal and precancer cervical tissues. The GA-PLS-DA algorithm with double cross-validation (dCV) identified seven diagnostically significant Raman bands in the ranges of 925-935, 979-999, 1080-1090, 1240-1260, 1320-1340, 1400-1420, and 1625-1645 cm(-1) related to proteins, nucleic acids and lipids in tissue, and yielded a diagnostic accuracy of 82.9% (sensitivity of 72.5% (29/40) and specificity of 89.2% (58/65)) for precancer detection. The results of this exploratory study suggest that Raman spectroscopy in conjunction with GA-PLS-DA and dCV methods has the potential to provide clinically significant discrimination between normal and precancer cervical tissues at the molecular level.
Collapse
Affiliation(s)
- Shiyamala Duraipandian
- Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, 9, Engineering Drive 1, Singapore 117576
| | | | | | | | | | | |
Collapse
|
16
|
Stojić N, Erić S, Kuzmanovski I. Prediction of toxicity and data exploratory analysis of estrogen-active endocrine disruptors using counter-propagation artificial neural networks. J Mol Graph Model 2010; 29:450-60. [PMID: 20952233 DOI: 10.1016/j.jmgm.2010.09.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Revised: 09/05/2010] [Accepted: 09/09/2010] [Indexed: 11/29/2022]
Abstract
In this work, a novel algorithm for optimization of counter-propagation artificial neural networks has been used for development of quantitative structure-activity relationships model for prediction of the estrogenic activity of endocrine-disrupting chemicals. The search for the best model was performed using genetic algorithms. Genetic algorithms were used not only for selection of the most suitable descriptors for modeling, but also for automatic adjustment of their relative importance. Using our recently developed algorithm for automatic adjustment of the relative importance of the input variables, we have developed simple models with very good generalization performances using only few interpretable descriptors. One of the developed models is in details discussed in this article. The simplicity of the chosen descriptors and their relative importance for this model helped us in performing a detailed data exploratory analysis which gave us an insight in the structural features required for the activity of the estrogenic endocrine-disrupting chemicals.
Collapse
Affiliation(s)
- Nataša Stojić
- Institut za Hemija, PMF, Univerzitet "Sv. Kiril i Metodij", PO Box 162, 1001 Skopje, Macedonia
| | | | | |
Collapse
|
17
|
Hummel J, Strehmel N, Selbig J, Walther D, Kopka J. Decision tree supported substructure prediction of metabolites from GC-MS profiles. Metabolomics 2010; 6:322-333. [PMID: 20526350 PMCID: PMC2874469 DOI: 10.1007/s11306-010-0198-7] [Citation(s) in RCA: 220] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 01/25/2010] [Indexed: 11/29/2022]
Abstract
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.
Collapse
Affiliation(s)
- Jan Hummel
- Department Prof. L. Willmitzer, Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany
| | - Nadine Strehmel
- Department Prof. L. Willmitzer, Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany
| | - Joachim Selbig
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Strasse 24-25, Haus 20, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Department Prof. L. Willmitzer, Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany
| | - Joachim Kopka
- Department Prof. L. Willmitzer, Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
18
|
Zhang L, Liang Y, Chen A. Selection of neutral losses and characteristic ions for mass spectral classifier. Analyst 2009; 134:1717-24. [PMID: 20448943 DOI: 10.1039/b904156g] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Gas chromatography-mass spectrometry (GC-MS) is widely used in many fields because of its high sensitivity, high resolution and reproducibility. The major challenge of this analytical technology is the identification of components in complex samples. Generally, mass spectral library searching is commonly employed to assist in the identification of unknown spectra. However, this widely available method just provides a hit-list of candidates ordered by their numerical similarity indices. When an unknown compound has many isomeric compounds or is absent from the reference library, this approach might be less useful. Classification of mass spectra, a complementary technique to the library searching, is beneficial to computer-aided mass spectral interpretation but suffers from the fact that the variables used in the classifier are usually uninterpretable. In this study, a novel classifier is built based on data mining and feature analysis. In this classifier, the neutral loss is skillfully used to identify the differences between mass spectra of alcohols and ethers in the data set. After comparison with two chemometric methods, Fisher ratios linear discriminant analysis (LDA) and genetic algorithm partial least squares discriminant (GA-DPLS) analysis, it is found that our method achieves a better predictive ability. More importantly, this method is able to predict whether compounds could be classified correctly or not.
Collapse
Affiliation(s)
- Liangxiao Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, China
| | | | | |
Collapse
|
19
|
Zhang YX, Xiong Q, Yang G, Li ML. Computer-assisted prediction of the classification of the pesticide chemical structure in mass spectra. CHINESE JOURNAL OF ANALYTICAL CHEMISTRY 2007. [DOI: 10.1016/s1872-2040(07)60088-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
Kuzmanovski I, Dimitrovska-Lazova S, Aleksovska S. Classification of perovskites with supervised self-organizing maps. Anal Chim Acta 2007; 595:182-9. [PMID: 17605999 DOI: 10.1016/j.aca.2007.04.062] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2006] [Revised: 04/17/2007] [Accepted: 04/24/2007] [Indexed: 11/17/2022]
Abstract
In this work supervised self-organizing maps were used for structural classification of perovskites. For this purpose, structural data for total number of 286 perovskites, belonging to ABO3 and/or A2BB'O6 types, were collected from literature: 130 of these are cubic, 85 orthorhombic and 71 monoclinic. For classification purposes, the effective ionic radii of the cations, electronegativities of the cations in B-position, as well as, the oxidation states of these cations, were used as input variables. The parameters of the developed models, as well as, the most suitable variables for classification purposes were selected using genetic algorithms. Two-third of all the compounds were used in the training phase. During the optimization process the performances of the models were checked using cross-validation leave-1/10-out. The performances of obtained solutions were checked using the test set composed of the remaining one-third of the compounds. The obtained models for classification of these three classes of perovskite compounds show very good results. Namely, the classification of the compounds in the test set resulted in small number of discrepancies (4.2-6.4%) between the actual crystallographic class and the one predicted by the models. All these results are strong arguments for the validity of supervised self-organizing maps for performing such types of classification. Therefore, the proposed procedure could be successfully used for crystallographic classification of perovskites in one of these three classes.
Collapse
Affiliation(s)
- Igor Kuzmanovski
- Institute of Chemistry, Faculty of Natural Sciences and Mathematics, University Sts. Cyril and Methodius, P.O. Box 162, 1001 Skopje, Macedonia.
| | | | | |
Collapse
|
21
|
Bak A, Polanski J. Modeling robust QSAR 3: SOM-4D-QSAR with iterative variable elimination IVE-PLS: application to steroid, azo dye, and benzoic acid series. J Chem Inf Model 2007; 47:1469-80. [PMID: 17567123 DOI: 10.1021/ci700025m] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the current paper we present a receptor-independent 4D-QSAR method based on self-organizing mapping (SOM-4D-QSAR) and in particular focus on its pharmacophore mapping ability. We use a novel stochastic procedure to verify the predictive ability of the method for a large population of 4D-QSAR models generated. This systematic study was conducted on a series of benzoic acids, azo dyes, and steroids that bind aromatase. We show that the 4D-QSAR method coupled with IVE-PLS provides a very stable and predictive modeling technique. The method enables us to identify the molecular motifs contributing the most to the fiber-dye affinity and the aromatase enzyme binding activity of the steroid. However, the method appeared much less effective for the benzoic acid series, in which the efficacy was limited by electronic effects strictly correlated to a single conformer.
Collapse
Affiliation(s)
- Andrzej Bak
- Department of Organic Chemistry, Institute of Chemistry, University of Silesia, PL-40-006 Katowice, Poland
| | | |
Collapse
|
22
|
Xiong Q, Zhang Y, Li M. Computer-assisted prediction of pesticide substructure using mass spectra. Anal Chim Acta 2007; 593:199-206. [PMID: 17543608 DOI: 10.1016/j.aca.2007.04.060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Revised: 04/25/2007] [Accepted: 04/26/2007] [Indexed: 10/23/2022]
Abstract
Mass spectral classifiers of 16 substructures that are present in basic structures of pesticides have been investigated to assist pesticide residues analysis as well as screening of pesticide lead compounds. Mass spectral data are first transformed into 396 features, and then Genetic Algorithm-Partial Least Squares (GA-PLS) as a feature selection method and Support Vector Machine (SVM) as a validation method are implemented together to get an optimization feature set for each substructure. At last, a statistical method which is AdaBoost algorithm combined with Classification and Regression Tree (AdaBoost-CART) is trained to predict the 16 substructures presence/absence using the optimization mass spectral feature set. It is demonstrated that the optimum feature sets can be used to predict the 16 pesticide substructures presence/absence with mostly 85-100% in recognition success rate instead of the original 396 features.
Collapse
Affiliation(s)
- Qing Xiong
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | | | | |
Collapse
|
23
|
Kemsley EK, Le Gall G, Dainty JR, Watson AD, Harvey LJ, Tapp HS, Colquhoun IJ. Multivariate techniques and their application in nutrition: a metabolomics case study. Br J Nutr 2007; 98:1-14. [PMID: 17381968 DOI: 10.1017/s0007114507685365] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The post-genomic technologies are generating vast quantities of data but many nutritional scientists are not trained or equipped to analyse it. In high-resolution NMR spectra of urine, for example, the number and complexity of spectral features mean that computational techniques are required to interrogate and display the data in a manner intelligible to the researcher. In addition, there are often multiple underlying biological factors influencing the data and it is difficult to pinpoint which are having the most significant effect. This is especially true in nutritional studies, where small variations in diet can trigger multiple changes in gene expression and metabolite concentration. One class of computational tools that are useful for analysing this highly multivariate data include the well-known 'whole spectrum' methods of principal component analysis and partial least squares. In this work, we present a nutritional case study in which NMR data generated from a human dietary Cu intervention study is analysed using multivariate methods and the advantages and disadvantages of each technique are discussed. It is concluded that an alternative approach, called feature subset selection, will be important in this type of work; here we have used a genetic algorithm to identify the small peaks (arising from metabolites of low concentration) that have been altered significantly following a dietary intervention.
Collapse
|
24
|
Abstract
Quantitative Structure Activity Relationship (QSAR) is a term describing a variety of approaches that are of substantial interest for chemistry. This method can be defined as indirect molecular design by the iterative sampling of the chemical compounds space to optimize a certain property and thus indirectly design the molecular structure having this property. However, modeling the interactions of chemical molecules in biological systems provides highly noisy data, which make predictions a roulette risk. In this paper we briefly review the origins for this noise, particularly in multidimensional QSAR. This was classified as the data, superimposition, molecular similarity, conformational, and molecular recognition noise. We also indicated possible robust answers that can improve modeling and predictive ability of QSAR, especially the self-organizing mapping of molecular objects, in particular, the molecular surfaces, a method that was brought into chemistry by Gasteiger and Zupan.
Collapse
Affiliation(s)
- Jaroslaw Polanski
- Department of Organic Chemistry, Institute of Chemistry, University of Silesia, PL-40-006 Katowice, Poland.
| | | | | | | |
Collapse
|
25
|
Adam T, Baker RR, Zimmermann R. Investigation, by single photon ionisation (SPI)-time-of-flight mass spectrometry (TOFMS), of the effect of different cigarette-lighting devices on the chemical composition of the first cigarette puff. Anal Bioanal Chem 2007; 387:575-84. [PMID: 17171340 DOI: 10.1007/s00216-006-0945-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 10/12/2006] [Accepted: 10/13/2006] [Indexed: 10/23/2022]
Abstract
Soft single-photon ionisation (SPI)-time-of-flight mass spectrometry (TOFMS) has been used to investigate the effect of different cigarette-lighting devices on the chemical composition of the mainstream smoke from the first cigarette puff. Lighting devices examined were a Borgwaldt electric lighter, a propane/butane gas lighter, a match, a candle, and the burning zone of another cigarette. To eliminate the effects of the different masses of tobacco burnt by use of the different lighting methods a normalisation procedure was performed which enabled investigation of changes in the chemical patterns of the resulting smoke. When another cigarette was used as the lighting device, elevated levels of ammonia and other nitrogen-containing substances were observed. These are high in the sidestream smoke of the cigarette used for lighting and would be drawn into the mainstream smoke of the cigarette being lit. In contrast, smoke from the cigarette lit by the electric lighter contained slightly higher normalised amounts of isoprene. Lighting the cigarette by use of a candle resulted in larger amounts of substances, e.g. benzene, which most probably originated from thermal decomposition of wax. The composition of the first puff of smoke obtained by use of the three lighting methods with open flames (gas lighter, match, and candle) was usually similar whereas the composition of the smoke produced by use of the electric lighter and the cigarette as the lighter were more unique. The chemical patterns generated by the different lighting devices could, however, be separated by principal-component analyses. Two additional test series were also studied. In the first the cigarette was lit with an electric lighter, then extinguished, the ash was cut off, and the cigarette was re-lit. In the second the cigarette was heated in an oven to 80 degrees C for 5 min before being lit. These treatments did not result in changes in the chemical composition compared with cigarettes lit in the ordinary way.
Collapse
Affiliation(s)
- Thomas Adam
- Division of Analytical Chemistry, Institute of Physics, University of Augsburg, 86159, Augsburg, Germany.
| | | | | |
Collapse
|
26
|
Engrand C, Kissel J, Krueger FR, Martin P, Silén J, Thirkell L, Thomas R, Varmuza K. Chemometric evaluation of time-of-flight secondary ion mass spectrometry data of minerals in the frame of future in situ analyses of cometary material by COSIMA onboard ROSETTA. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2006; 20:1361-8. [PMID: 16555371 DOI: 10.1002/rcm.2448] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Chemometric data evaluation methods for time-of-flight secondary ion mass spectrometry (TOF-SIMS) have been tested for the characterization and classification of minerals. Potential applications of these methods include the expected data from cometary material to be measured by the COSIMA instrument onboard the ESA mission ROSETTA in the year 2014. Samples of the minerals serpentine, enstatite, olivine, and talc have been used as proxies for minerals existing in extraterrestrial matter. High mass resolution TOF-SIMS data allow the selection of peaks from inorganic ions relevant for minerals. Multivariate cluster analysis of peak intensity data by principal components analysis and the new method CORICO showed a good separation of the mineral classes. Classification by k nearest-neighbor classification (KNN) or binary decision trees (CART method) results in more than 90% correct class assignments in a leave-one-out cross validation.
Collapse
Affiliation(s)
- Cecile Engrand
- Centre de Spectrométrie Nucléaire et de Spectrométrie de Masse, CNRS-Univ. Paris XI, F-91405 Orsay Campus, France
| | | | | | | | | | | | | | | |
Collapse
|
27
|
PENAMENDEZ E, GAJDOSOVA D, NOVOTNA K, PROSEK P, HAVEL J. Mass spectrometry of humic substances of different origin including those from AntarcticaA comparative study. Talanta 2005; 67:880-90. [DOI: 10.1016/j.talanta.2005.03.032] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2004] [Revised: 02/22/2005] [Accepted: 03/31/2005] [Indexed: 10/25/2022]
|
28
|
A generalized boosting algorithm and its application to two-class chemical classification problem. Anal Chim Acta 2005. [DOI: 10.1016/j.aca.2005.04.043] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
29
|
Kuzmanovski I, Trpkovska M, Šoptrajanov B. Optimization of supervised self-organizing maps with genetic algorithms for classification of urinary calculi. J Mol Struct 2005. [DOI: 10.1016/j.molstruc.2005.01.059] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
30
|
Abstract
The applicability of genetic algorithms for solving multicomponent analyses is systematically examined. As a genetic algorithm (GA), the basic proposal of Goldberg is implemented in a straightforward manner to simulate multicomponent analyses in analogy to the well-established UV-vis or IR methods, especially multicomponent regression. The main focus of the study is to investigate the behavior of the genetic algorithm in order to compare it with the well-known behavior of multicomponent regression. A remarkable difference between the two methods is that the genetic algorithm method does not need any calibration procedure because of its pure searching characteristic. As important features of multicomponent systems, the degree of signal overlap (selectivity), the behavior of systems with known and unknown component numbers and qualities, and linear as well as nonlinear relationships between the analytical signal and concentration are varied within the simulations. According to multicomponent regression, recovering concentrations by a genetic algorithm is of limited applicability with the exception of systems at a low degree of signal overlap. On the other hand, the recovery of a probe spectrum in the analytical process always gives satisfactory results independent of the features of the probe system. The genetic algorithm obviously shows autoadaptive behavior in probe spectrum recovery. The quality and quantity of the resulting components may dramatically differ from the given probe, although the resulting spectrum is nearly the same. In such cases, the resulting component mixture can be interpreted as an imitation of the probe. As well probe spectra, theoretically designed spectra can also be autoadapted by genetic algorithms. The only limitation is that the desired spectrum must, of course, be incorporated into the search space defined by the involved components. Furthermore, a spectral signal is only one single property of a chemical compound or mixture. Because of the nonlinear search characteristic of genetic algorithms, any other chemical or physical property can also be treated as a desired property. Therefore, the conclusion of the study is well-founded that an old challenge of applied chemistry, namely, the development of new chemical products with desired properties, seems to be reachable under the control of genetic algorithms.
Collapse
Affiliation(s)
- Peter Zinn
- Ruhr-Universität Bochum, Lehrstuhl für Analytische Chemie, 44780 Bochum, Germany.
| |
Collapse
|
31
|
De Maesschalck R, Van den Kerkhof T. Implementation of a simple semi-quantitative near-infrared method for the classification of clinical trial tablets. J Pharm Biomed Anal 2005; 37:109-14. [PMID: 15664749 DOI: 10.1016/j.jpba.2004.10.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2004] [Revised: 09/30/2004] [Accepted: 10/10/2004] [Indexed: 11/15/2022]
Abstract
Near infrared transmission spectroscopy combined with chemometrical methods can be applied for identity confirmation of double-blind clinical trial tablets. Samples of two clinical studies, investigating the dose and placebo effect of an experimental drug, were studied. The identity of the blistered tablets was checked using partial least squares beta classification (PLSBC) applied to their NIR transmission spectra. PLSBC is a new supervised classification approach based on partial least squares (PLS) regression combined with beta-error driven class boundaries. It has the ability to limit the probability for misclassification to a known number and therefore providing the method developer a tool for deciding whether the NIR spectra of the different strengths of tablets are specific enough to obtain a robust classification model. The presented approach has the advantage to be applicable on most commercial available near infrared spectroscopy (NIRS) instrumentation software and it can be applied in a GMP environment since validation according to the ICH Q2A and Q2B guidelines on analytical method validation is fast and relatively easy.
Collapse
Affiliation(s)
- R De Maesschalck
- Janssen Pharmaceutica NV, Pharmaceutical Research and Development, Global Analytical Development, Turnhoutseweg 30, B-2340 Beerse, Belgium.
| | | |
Collapse
|
32
|
Demuth W, Karlovits M, Varmuza K. Spectral similarity versus structural similarity: mass spectrometry. Anal Chim Acta 2004. [DOI: 10.1016/j.aca.2004.04.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
33
|
Dieterle F, Busche S, Gauglitz G. Different approaches to multivariate calibration of nonlinear sensor data. Anal Bioanal Chem 2004; 380:383-96. [PMID: 15156303 DOI: 10.1007/s00216-004-2652-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2004] [Revised: 04/13/2004] [Accepted: 04/21/2004] [Indexed: 11/25/2022]
Abstract
In this study, different approaches to the multivariate calibration of the vapors of two refrigerants are reported. As the relationships between the time-resolved sensor signals and the concentrations of the analytes are nonlinear, the widely used partial least-squares regression (PLS) fails. Therefore, different methods are used, which are known to be able to deal with nonlinearities present in data. First, the Box-Cox transformation, which transforms the dependent variables nonlinearly, was applied. The second approach, the implicit nonlinear PLS regression, tries to account for nonlinearities by introducing squared terms of the independent variables to the original independent variables. The third approach, quadratic PLS (QPLS), uses a nonlinear quadratic inner relationship for the model instead of a linear relationship such as PLS. Tree algorithms are also used, which split a nonlinear problem into smaller subproblems, which are modeled using linear methods or discrete values. Finally, neural networks are applied, which are able to model any relationship. Different special implementations, like genetic algorithms with neural networks and growing neural networks, are also used to prevent an overfitting. Among the fast and simpler algorithms, QPLS shows good results. Different implementations of neural networks show excellent results. Among the different implementations, the most sophisticated and computing-intensive algorithms (growing neural networks) show the best results. Thus, the optimal method for the data set presented is a compromise between quality of calibration and complexity of the algorithm.
Collapse
Affiliation(s)
- Frank Dieterle
- Institute of Physical and Theoretical Chemistry, Auf der Morgenstelle 8, 72076, Tübingen, Germany
| | | | | |
Collapse
|
34
|
Dieterle F, Busche S, Gauglitz G. Growing neural networks for a multivariate calibration and variable selection of time-resolved measurements. Anal Chim Acta 2003. [DOI: 10.1016/s0003-2670(03)00338-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Selection of quasi-optimal inputs in chemometrics modeling by artificial neural network analysis. Anal Chim Acta 2003. [DOI: 10.1016/s0003-2670(03)00349-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
36
|
Johnson HE, Broadhurst D, Goodacre R, Smith AR. Metabolic fingerprinting of salt-stressed tomatoes. PHYTOCHEMISTRY 2003; 62:919-928. [PMID: 12590119 DOI: 10.1016/s0031-9422(02)00722-7] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The aim of this study was to adopt the approach of metabolic fingerprinting through the use of Fourier transform infrared (FT-IR) spectroscopy and chemometrics to study the effect of salinity on tomato fruit. Two varieties of tomato were studied, Edkawy and Simge F1. Salinity treatment significantly reduced the relative growth rate of Simge F1 but had no significant effect on that of Edkawy. In both tomato varieties salt-treatment significantly reduced mean fruit fresh weight and size class but had no significant affect on total fruit number. Marketable yield was however reduced in both varieties due to the occurrence of blossom end rot in response to salinity. Whole fruit flesh extracts from control and salt-grown tomatoes were analysed using FT-IR spectroscopy. Each sample spectrum contained 882 variables, absorbance values at different wavenumbers, making visual analysis difficult and therefore machine learning methods were applied. The unsupervised clustering method, principal component analysis (PCA) showed no discrimination between the control and salt-treated fruit for either variety. The supervised method, discriminant function analysis (DFA) was able to classify control and salt-treated fruit in both varieties. Genetic algorithms (GA) were applied to identify discriminatory regions within the FT-IR spectra important for fruit classification. The GA models were able to classify control and salt-treated fruit with a typical error, when classifying the whole data set, of 9% in Edkawy and 5% in Simge F1. Key regions were identified within the spectra corresponding to nitrile containing compounds and amino radicals. The application of GA enabled the identification of functional groups of potential importance in relation to the response of tomato to salinity.
Collapse
Affiliation(s)
- Helen E Johnson
- Institute of Biological Sciences, Cledwyn Building, University of Wales, Aberystwyth, Ceredigion, SY23 3DD, Wales, UK.
| | | | | | | |
Collapse
|
37
|
Du Y, Liang Y, Li B, Xu C. Orthogonalization of block variables by subspace-projection for quantitative structure property relationship (QSPR) research. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:993-1003. [PMID: 12376986 DOI: 10.1021/ci020283+] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A subspace-projection method is developed to construct orthogonal block variable, which is originally from some kinds of series of topological indices or quantum chemical parameters. With the help of canonical correlation analysis, the orthogonal block variables were used to establish the structure-retention index correlation model. The regression of only few new orthogonal variables obtained by canonical correlation analysis against retention index shows significant improvement both in fitting and prediction ability of the correlation model. Moreover, the quantitative intercorrelation between the different block variables of topological indices can also be evaluated with the help of the subspace-projection technique proposed in this work.
Collapse
Affiliation(s)
- Yiping Du
- Institute of Chemometrics and Chemical Sensing Technology, Hunan University, Changsha 410082, P R China
| | | | | | | |
Collapse
|