1
|
Tay DWP, Yeo NZX, Adaikkappan K, Lim YH, Ang SJ. 67 million natural product-like compound database generated via molecular language processing. Sci Data 2023; 10:296. [PMID: 37208372 DOI: 10.1038/s41597-023-02207-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/21/2023] [Indexed: 05/21/2023] Open
Abstract
Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.
Collapse
Affiliation(s)
- Dillon W P Tay
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
| | - Naythan Z X Yeo
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Hwa Chong Institution, 661 Bukit Timah Road, Singapore, 269734, Republic of Singapore
| | - Krishnan Adaikkappan
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- National Junior College, 37 Hillcrest Road, Singapore, 288913, Republic of Singapore
| | - Yee Hwee Lim
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore, 117597, Republic of Singapore
| | - Shi Jun Ang
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore, 138632, Republic of Singapore.
| |
Collapse
|