1
|
Khondkaryan L, Tevosyan A, Navasardyan H, Khachatrian H, Tadevosyan G, Apresyan L, Chilingaryan G, Navoyan Z, Stopper H, Babayan N. Datasets Construction and Development of QSAR Models for Predicting Micronucleus In Vitro and In Vivo Assay Outcomes. Toxics 2023; 11:785. [PMID: 37755795 PMCID: PMC10537630 DOI: 10.3390/toxics11090785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/07/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023]
Abstract
In silico (quantitative) structure-activity relationship modeling is an approach that provides a fast and cost-effective alternative to assess the genotoxic potential of chemicals. However, one of the limiting factors for model development is the availability of consolidated experimental datasets. In the present study, we collected experimental data on micronuclei in vitro and in vivo, utilizing databases and conducting a PubMed search, aided by text mining using the BioBERT large language model. Chemotype enrichment analysis on the updated datasets was performed to identify enriched substructures. Additionally, chemotypes common for both endpoints were found. Five machine learning models in combination with molecular descriptors, twelve fingerprints and two data balancing techniques were applied to construct individual models. The best-performing individual models were selected for the ensemble construction. The curated final dataset consists of 981 chemicals for micronuclei in vitro and 1309 for mouse micronuclei in vivo, respectively. Out of 18 chemotypes enriched in micronuclei in vitro, only 7 were found to be relevant for in vivo prediction. The ensemble model exhibited high accuracy and sensitivity when applied to an external test set of in vitro data. A good balanced predictive performance was also achieved for the micronucleus in vivo endpoint.
Collapse
Affiliation(s)
- Lusine Khondkaryan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Ani Tevosyan
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
- YerevaNN, Yerevan 0025, Armenia; (H.K.); (G.C.)
| | | | - Hrant Khachatrian
- YerevaNN, Yerevan 0025, Armenia; (H.K.); (G.C.)
- Department of Informatics and Applied Mathematics, Yerevan State University, Yerevan 0025, Armenia
| | - Gohar Tadevosyan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Lilit Apresyan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | | | - Zaven Navoyan
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Helga Stopper
- Institute of Pharmacology and Toxicology, University of Würzburg, 97078 Würzburg, Germany;
| | - Nelly Babayan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| |
Collapse
|
2
|
Tevosyan A, Khondkaryan L, Khachatrian H, Tadevosyan G, Apresyan L, Babayan N, Stopper H, Navoyan Z. Improving VAE based molecular representations for compound property prediction. J Cheminform 2022; 14:69. [PMID: 36242073 PMCID: PMC9569108 DOI: 10.1186/s13321-022-00648-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/01/2022] [Indexed: 11/25/2022] Open
Abstract
Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction tasks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space.
Collapse
Affiliation(s)
- Ani Tevosyan
- YerevaNN, Charents str. 20, 0025, Yerevan, Armenia
| | - Lusine Khondkaryan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Hrant Khachatrian
- YerevaNN, Charents str. 20, 0025, Yerevan, Armenia.,Yerevan State University, Alex Manoogian str. 1, 0025, Yerevan, Armenia
| | - Gohar Tadevosyan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Lilit Apresyan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Nelly Babayan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia.,, Toxometris.ai, Sarmen str. 7, 0009, Yerevan, Armenia
| | - Helga Stopper
- Department of Toxicology, Institute of Pharmacology and Toxicology, University of Würzburg, Versbacher str. 9, 97078, Würzburg, Germany
| | - Zaven Navoyan
- , Toxometris.ai, Sarmen str. 7, 0009, Yerevan, Armenia.
| |
Collapse
|
3
|
Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data 2019; 6:96. [PMID: 31209213 PMCID: PMC6572845 DOI: 10.1038/s41597-019-0103-9] [Citation(s) in RCA: 182] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 05/24/2019] [Indexed: 11/08/2022] Open
Abstract
Health care is one of the most exciting frontiers in data mining and machine learning. Successful adoption of electronic health records (EHRs) created an explosion in digital clinical data available for analysis, but progress in machine learning for healthcare research has been difficult to measure because of the absence of publicly available benchmark data sets. To address this problem, we propose four clinical prediction benchmarks using data derived from the publicly available Medical Information Mart for Intensive Care (MIMIC-III) database. These tasks cover a range of clinical problems including modeling risk of mortality, forecasting length of stay, detecting physiologic decline, and phenotype classification. We propose strong linear and neural baselines for all four tasks and evaluate the effect of deep supervision, multitask training and data-specific architectural modifications on the performance of neural models.
Collapse
Affiliation(s)
- Hrayr Harutyunyan
- USC Information Sciences Institute, Marina del Rey, California, 90292, United States of America
| | - Hrant Khachatrian
- YerevaNN, Yerevan, 0025, Armenia.
- Yerevan State University, Yerevan, 0025, Armenia.
| | - David C Kale
- USC Information Sciences Institute, Marina del Rey, California, 90292, United States of America
| | - Greg Ver Steeg
- USC Information Sciences Institute, Marina del Rey, California, 90292, United States of America
| | - Aram Galstyan
- USC Information Sciences Institute, Marina del Rey, California, 90292, United States of America
| |
Collapse
|