1
|
Lea-Smith DJ, Hassard F, Coulon F, Partridge N, Horsfall L, Parker KDJ, Smith RDJ, McCarthy RR, McKew B, Gutierrez T, Kumar V, Dotro G, Yang Z, Krasnogor N. Engineering biology applications for environmental solutions: potential and challenges. Nat Commun 2025; 16:3538. [PMID: 40229265 PMCID: PMC11997111 DOI: 10.1038/s41467-025-58492-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 03/24/2025] [Indexed: 04/16/2025] Open
Abstract
Engineering biology applies synthetic biology to address global environmental challenges like bioremediation, biosequestration, pollutant monitoring, and resource recovery. This perspective outlines innovations in engineering biology, its integration with other technologies (e.g., nanotechnology, IoT, AI), and commercial ventures leveraging these advancements. We also discuss commercialisation and scaling challenges, biosafety and biosecurity considerations including biocontainment strategies, social and political dimensions, and governance issues that must be addressed for successful real-world implementation. Finally, we highlight future perspectives and propose strategies to overcome existing hurdles, aiming to accelerate the adoption of engineering biology for environmental solutions.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Natalio Krasnogor
- GitLife Biotech Ltd, Newcastle Upon Tyne, UK.
- Newcastle University, Newcastle upon Tyne, UK.
| |
Collapse
|
2
|
Mo W, Vaiana CA, Myers CJ. The need for adaptability in detection, characterization, and attribution of biosecurity threats. Nat Commun 2024; 15:10699. [PMID: 39702312 PMCID: PMC11659417 DOI: 10.1038/s41467-024-55436-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 12/12/2024] [Indexed: 12/21/2024] Open
Abstract
Modern biotechnology necessitates robust biosecurity protocols to address the risk of engineered biological threats. Current efforts focus on screening DNA and rejecting the synthesis of dangerous elements but face technical and logistical barriers. Screening should integrate into a broader strategy that addresses threats at multiple stages of development and deployment. The success of this approach hinges upon reliable detection, characterization, and attribution of engineered DNA. Recent advances notably aid the potential to both develop threats and analyze them. However, further work is needed to translate developments into biosecurity applications. This work reviews cutting-edge methods for DNA analysis and recommends avenues to improve biosecurity in an adaptable manner.
Collapse
Affiliation(s)
- William Mo
- Draper Scholar, The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA
| | - Christopher A Vaiana
- The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
| | - Chris J Myers
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA.
| |
Collapse
|
3
|
Berezin CT, Peccoud S, Kar DM, Peccoud J. Cryptographic approaches to authenticating synthetic DNA sequences. Trends Biotechnol 2024; 42:1002-1016. [PMID: 38418329 PMCID: PMC11309913 DOI: 10.1016/j.tibtech.2024.02.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/01/2024]
Abstract
In a bioeconomy that relies on synthetic DNA sequences, the ability to ensure their authenticity is critical. DNA watermarks can encode identifying data in short sequences and can be combined with error correction and encryption protocols to ensure that sequences are robust to errors and securely communicated. New digital signature techniques allow for public verification that a sequence has not been modified and can contain sufficient information for synthetic DNA to be self-documenting. In translating these techniques from bacteria to more complex genetically modified organisms (GMOs), special considerations must be made to allow for public verification of these products. We argue that these approaches should be widely implemented to assert authorship, increase the traceability, and detect the unauthorized use of synthetic DNA.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA
| | - Samuel Peccoud
- Department of Electrical Engineering, Colorado State University, Fort Collins, CO, USA
| | - Diptendu M Kar
- Department of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Jean Peccoud
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA; Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA; School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA; Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
4
|
Tay AP, Didi K, Wickramarachchi A, Bauer DC, Wilson LOW, Maselko M. Synsor: a tool for alignment-free detection of engineered DNA sequences. Front Bioeng Biotechnol 2024; 12:1375626. [PMID: 39070163 PMCID: PMC11272466 DOI: 10.3389/fbioe.2024.1375626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/18/2024] [Indexed: 07/30/2024] Open
Abstract
DNA sequences of nearly any desired composition, length, and function can be synthesized to alter the biology of an organism for purposes ranging from the bioproduction of therapeutic compounds to invasive pest control. Yet despite offering many great benefits, engineered DNA poses a risk due to their possible misuse or abuse by malicious actors, or their unintentional introduction into the environment. Monitoring the presence of engineered DNA in biological or environmental systems is therefore crucial for routine and timely detection of emerging biological threats, and for improving public acceptance of genetic technologies. To address this, we developed Synsor, a tool for identifying engineered DNA sequences in high-throughput sequencing data. Synsor leverages the k-mer signature differences between naturally occurring and engineered DNA sequences and uses an artificial neural network to classify whether a DNA sequence is natural or engineered. By querying suspected sequences against the model, Synsor can identify sequences that are likely to have been engineered. Using natural plasmid and engineered vector sequences, we showed that Synsor identifies engineered DNA with >99% accuracy. We demonstrate how Synsor can be used to detect potential genetically engineered organisms and locate where engineered DNA is being introduced into the environment by analysing genomic and metagenomic data from yeast and wastewater samples, respectively. Synsor is therefore a powerful tool that will streamline the process of identifying engineered DNA in poorly characterized biological or environmental systems, thereby allowing for enhanced monitoring of emerging biological threats.
Collapse
Affiliation(s)
- Aidan P. Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Kieran Didi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Denis C. Bauer
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Laurence O. W. Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Maciej Maselko
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
- Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| |
Collapse
|
5
|
Adler A, Bader JS, Basnight B, Booth BW, Cai J, Cho E, Collins JH, Ge Y, Grothendieck J, Keating K, Marshall T, Persikov A, Scott H, Siegelmann R, Singh M, Taggart A, Toll B, Wan KH, Wyschogrod D, Yaman F, Young EM, Celniker SE, Roehner N. Ensemble Detection of DNA Engineering Signatures. ACS Synth Biol 2024; 13:1105-1115. [PMID: 38468602 DOI: 10.1021/acssynbio.3c00398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Synthetic biology is creating genetically engineered organisms at an increasing rate for many potentially valuable applications, but this potential comes with the risk of misuse or accidental release. To begin to address this issue, we have developed a system called GUARDIAN that can automatically detect signatures of engineering in DNA sequencing data, and we have conducted a blinded test of this system using a curated Test and Evaluation (T&E) data set. GUARDIAN uses an ensemble approach based on the guiding principle that no single approach is likely to be able to detect engineering with perfect accuracy. Critically, ensembling enables GUARDIAN to detect sequence inserts in 13 target organisms with a high degree of specificity that requires no subject matter expert (SME) review.
Collapse
Affiliation(s)
- Aaron Adler
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Joel S Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Brian Basnight
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Benjamin W Booth
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Jitong Cai
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Elizabeth Cho
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Joseph H Collins
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Yuchen Ge
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | | | - Kevin Keating
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Tyler Marshall
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Anton Persikov
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, United States
| | - Helen Scott
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Roy Siegelmann
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, United States
| | | | - Benjamin Toll
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Kenneth H Wan
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | | | - Fusun Yaman
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Eric M Young
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Susan E Celniker
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | | |
Collapse
|
6
|
Wijethilake N, Anandakumar M, Zheng C, So PTC, Yildirim M, Wadduwage DN. DEEP-squared: deep learning powered De-scattering with Excitation Patterning. LIGHT, SCIENCE & APPLICATIONS 2023; 12:228. [PMID: 37704619 PMCID: PMC10499829 DOI: 10.1038/s41377-023-01248-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/21/2023] [Accepted: 07/29/2023] [Indexed: 09/15/2023]
Abstract
Limited throughput is a key challenge in in vivo deep tissue imaging using nonlinear optical microscopy. Point scanning multiphoton microscopy, the current gold standard, is slow especially compared to the widefield imaging modalities used for optically cleared or thin specimens. We recently introduced "De-scattering with Excitation Patterning" or "DEEP" as a widefield alternative to point-scanning geometries. Using patterned multiphoton excitation, DEEP encodes spatial information inside tissue before scattering. However, to de-scatter at typical depths, hundreds of such patterned excitations were needed. In this work, we present DEEP2, a deep learning-based model that can de-scatter images from just tens of patterned excitations instead of hundreds. Consequently, we improve DEEP's throughput by almost an order of magnitude. We demonstrate our method in multiple numerical and experimental imaging studies, including in vivo cortical vasculature imaging up to 4 scattering lengths deep in live mice.
Collapse
Affiliation(s)
- Navodini Wijethilake
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA
- Department of Electronic and Telecommunication Engineering, University of Moratuwa, Moratuwa, Sri Lanka
| | - Mithunjha Anandakumar
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Cheng Zheng
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
- Laser Biomedical Research Center, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
| | - Peter T C So
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
- Laser Biomedical Research Center, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
| | - Murat Yildirim
- Laser Biomedical Research Center, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA
- Department of Neuroscience, Cleveland Clinic Lerner Research Institute, Cleveland, OH, 44195, USA
| | - Dushan N Wadduwage
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
7
|
Spirgel R, Comolli J, Guido NJ. A Machine Learning Method for Genome Engineering Design Tool Attribution. Health Secur 2023; 21:407-414. [PMID: 37594776 DOI: 10.1089/hs.2022.0152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023] Open
Abstract
As the ability to engineer biological systems improves with increasingly advanced technology, the risk of accidental or intentional release of a dangerous genetically modified organism becomes greater. It is important that authorities can carry out attribution for the source of a genetically modified biological agent release. In the absence of evidence that ties a release directly to the individuals responsible, attribution can be carried out in part by discovering the in silico tools used to design the engineered genetic components, which can leave a signature in the DNA of the organism. Previous attribution methods have focused on identifying the laboratory of origin of an engineered organism using machine learning on plasmid signatures. The next logical step is to address attribution using signatures from the tools that are used to create the engineered modifications. A random forest classifier was developed that discriminates between design tools used to optimize coding regions for incorporation into the genome of another organism. To this end, tens of thousands of genes were optimized with 4 different codon optimization methods and relevant features from these sequences were generated for a machine learning classifier. This method achieves more than 97% accuracy in predicting which tools were used to design codon optimized genes for expression in other organisms. The methods presented here lay the groundwork for the creation of effective organism engineering attribution techniques. Such methods can act both as deterrents for future attempts at creating dangerous organisms as well as tools for forensic science.
Collapse
Affiliation(s)
- Rebecca Spirgel
- Rebecca Spirgel, MS, is Associate Technical Staff, Group 23, MIT Lincoln Laboratory, Lexington, MA
| | - James Comolli
- James Comolli, PhD, Group 23, MIT Lincoln Laboratory, Lexington, MA
| | - Nicholas J Guido
- Nicholas J. Guido, PhD, are Technical Staff, Group 23, MIT Lincoln Laboratory, Lexington, MA
| |
Collapse
|
8
|
Zhang C, Liu H, Li X, Xu F, Li Z. Modularized synthetic biology enabled intelligent biosensors. Trends Biotechnol 2023; 41:1055-1065. [PMID: 36967259 DOI: 10.1016/j.tibtech.2023.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/27/2023] [Accepted: 03/06/2023] [Indexed: 03/29/2023]
Abstract
Biosensors that sense the concentration of a specified target and produce a specific signal output have become important technology for biological analysis. Recently, intelligent biosensors have received great interest due to their adaptability to meet sophisticated demands. Advances in developing standard modules and carriers in synthetic biology have shed light on intelligent biosensors that can implement advanced analytical processing to better accommodate practical applications. This review focuses on intelligent synthetic biology-enabled biosensors (SBBs). First, we illustrate recent progress in intelligent SBBs with the capability of computation, memory storage, and self-calibration. Then, we discuss emerging applications of SBBs in point-of-care testing (POCT) and wearable monitoring. Finally, future perspectives on intelligent SBBs are proposed.
Collapse
Affiliation(s)
- Chao Zhang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Hao Liu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Xiujun Li
- Department of Chemistry and Biochemistry, University of Texas at El Paso, 500 West University Ave, El Paso, TX 79968, USA
| | - Feng Xu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an 710049, P.R. China.
| | - Zedong Li
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China; Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an 710049, P.R. China; TFX Group-Xi'an Jiaotong University Institute of Life Health, Xi'an 710049, P.R. China.
| |
Collapse
|
9
|
Wei Z, Boivin JR, Xue Y, Burnell K, Wijethilake N, Chen X, So PTC, Nedivi E, Wadduwage DN. De-scattering Deep Neural Network Enables Fast Imaging of Spines through Scattering Media by Temporal Focusing Microscopy. RESEARCH SQUARE 2023:rs.3.rs-2410214. [PMID: 37333305 PMCID: PMC10275030 DOI: 10.21203/rs.3.rs-2410214/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Today the gold standard for in vivo imaging through scattering tissue is point-scanning two-photon microscopy (PSTPM). Especially in neuroscience, PSTPM is widely used for deep-tissue imaging in the brain. However, due to sequential scanning, PSTPM is slow. Temporal focusing microscopy (TFM), on the other hand, focuses femtosecond pulsed laser light temporally while keeping wide-field illumination, and is consequently much faster. However, due to the use of a camera detector, TFM suffers from the scattering of emission photons. As a result, TFM produces images of poor quality, obscuring fluorescent signals from small structures such as dendritic spines. In this work, we present a de-scattering deep neural network (DeScatterNet) to improve the quality of TFM images. Using a 3D convolutional neural network (CNN) we build a map from TFM to PSTPM modalities, to enable fast TFM imaging while maintaining high image quality through scattering media. We demonstrate this approach for in vivo imaging of dendritic spines on pyramidal neurons in the mouse visual cortex. We quantitatively show that our trained network rapidly outputs images that recover biologically relevant features previously buried in the scattered fluorescence in the TFM images. In vivo imaging that combines TFM and the proposed neural network is one to two orders of magnitude faster than PSTPM but retains the high quality necessary to analyze small fluorescent structures. The proposed approach could also be beneficial for improving the performance of many speed-demanding deep-tissue imaging applications, such as in vivo voltage imaging.
Collapse
Affiliation(s)
- Zhun Wei
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
- State Key Laboratory of Modern Optical Instrumentation, ZJU-Hangzhou Global Science and Technology Innovation Center, College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
| | - Josiah R. Boivin
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yi Xue
- Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| | - Kendyll Burnell
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Navodini Wijethilake
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Xudong Chen
- Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583, Singapore
| | - Peter T. C. So
- Dept. of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Laser Biomedical Research Center, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Dept. of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| | - Elly Nedivi
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dushan N. Wadduwage
- Center for Advanced Imaging, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
10
|
Ding N, Zhang G, Zhang L, Shen Z, Yin L, Zhou S, Deng Y. Engineering an AI-based forward-reverse platform for the design of cross-ribosome binding sites of a transcription factor biosensor. Comput Struct Biotechnol J 2023; 21:2929-2939. [PMID: 38213883 PMCID: PMC10781712 DOI: 10.1016/j.csbj.2023.04.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 01/13/2024] Open
Abstract
A cross-ribosome binding site (cRBS) adjusts the dynamic range of transcription factor-based biosensors (TFBs) by controlling protein expression and folding. The rational design of a cRBS with desired TFB dynamic range remains an important issue in TFB forward and reverse engineering. Here, we report a novel artificial intelligence (AI)-based forward-reverse engineering platform for TFB dynamic range prediction and de novo cRBS design with selected TFB dynamic ranges. The platform demonstrated superior in processing unbalanced minority-class datasets and was guided by sequence characteristics from trained cRBSs. The platform identified correlations between cRBSs and dynamic ranges to mimic bidirectional design between these factors based on Wasserstein generative adversarial network (GAN) with a gradient penalty (GP) (WGAN-GP) and balancing GAN with GP (BAGAN-GP). For forward and reverse engineering, the predictive accuracy was up to 98% and 82%, respectively. Collectively, we generated an AI-based method for the rational design of TFBs with desired dynamic ranges.
Collapse
Affiliation(s)
- Nana Ding
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People’s Republic of China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People’s Republic of China
- Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, NO.1239 Siping Road, Shanghai 201210, People’s Republic of China
| | - Guangkun Zhang
- Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, NO.1239 Siping Road, Shanghai 201210, People’s Republic of China
| | - LinPei Zhang
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People’s Republic of China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People’s Republic of China
| | - Ziyun Shen
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People’s Republic of China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People’s Republic of China
| | - Lianghong Yin
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, People’s Republic of China
- Zhejiang Provincial Key Laboratory of Resources Protection and Innovation of Traditional Chinese Medicine, Zhejiang A&F University, Hangzhou 311300, People’s Republic of China
| | - Shenghu Zhou
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People’s Republic of China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People’s Republic of China
| | - Yu Deng
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People’s Republic of China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People’s Republic of China
| |
Collapse
|
11
|
Thanikkal JG, Dubey AK, Thomas MT. An Efficient Mobile Application for Identification of Immunity Boosting Medicinal Plants using Shape Descriptor Algorithm. WIRELESS PERSONAL COMMUNICATIONS 2023; 131:1-17. [PMID: 37360141 PMCID: PMC10119011 DOI: 10.1007/s11277-023-10476-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 06/28/2023]
Abstract
In the Covid-19 pandemic situation, the world is looking for immunity-boosting techniques for fighting against coronavirus. Every plant is medicine in one or another way, but Ayurveda explains the uses of plant-based medicines and immunity boosters for specific requirements of the human body. To help Ayurveda, botanists are trying to identify more species of medicinal immunity-boosting plants by evaluating the characteristics of the leaf. For a normal person, detecting immunity-boosting plants is a difficult task. Deep learning networks provide highly accurate results in image processing. In the medicinal plant analysis, many leaves are like each other. So, the direct analysis of leaf images using the deep learning network causes many issues for medicinal plant identification. Hence, keeping the requirement of a method at large to help all human beings, the proposed leaf shape descriptor with the deep learning-based mobile application is developed for the identification of immunity-boosting medicinal plants using a smartphone. SDAMPI algorithm explained numerical descriptor generation for closed shapes. This mobile application achieved 96%accuracy for the 64 × 64 sized images.
Collapse
Affiliation(s)
- Jibi G. Thanikkal
- Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, U.P. 201313 India
| | - Ashwani Kumar Dubey
- Department of Electronics and Communication Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, U.P. 201313 India
| | - M. T. Thomas
- Department of Botany, St. Thomas College, Thrissur, Kerala India
| |
Collapse
|
12
|
McGuffie MJ, Barrick JE. Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536277. [PMID: 37090600 PMCID: PMC10120640 DOI: 10.1101/2023.04.10.536277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science.
Collapse
Affiliation(s)
- Matthew J. McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States
| |
Collapse
|
13
|
Ge F, Yu Z, Li Y, Zhu M, Zhang B, Zhang Q, Harrison RM, Chen L. Predicting aviation non-volatile particulate matter emissions at cruise via convolutional neural network. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 850:158089. [PMID: 35985597 DOI: 10.1016/j.scitotenv.2022.158089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/25/2022] [Accepted: 08/13/2022] [Indexed: 06/15/2023]
Abstract
Aviation emissions are the only direct source of anthropogenic particulate pollution at high altitudes, which can form contrails and contrail-induced clouds, with consequent effects upon global radiative forcing. In this study, we develop a predictive model, called APMEP-CNN, for aviation non-volatile particulate matter (nvPM) emissions using a convolutional neural network (CNN) technique. The model is established with data sets from the newly published aviation emission databank and measurement results from several field studies on the ground and during cruise operation. The model also takes the influence of sustainable aviation fuels (SAFs) on nvPM emissions into account by considering fuel properties. This study demonstrates that the APMEP-CNN can predict nvPM emission index in mass (EIm) and number (EIn) for a number of high-bypass turbofan engines. The accuracy of predicting EIm and EIn at ground level is significantly improved (R2 = 0.96 and 0.96) compared to the published models. We verify the suitability and the applicability of the APMEP-CNN model for estimating nvPM emissions at cruise and burning SAFs and blend fuels, and find that our predictions for EIm are within ±36.4 % of the measurements at cruise and within ±33.0 % of the measurements burning SAFs in average. In the worst case, the APMEP-CNN prediction is different by -69.2 % from the measurements at cruise for the JT3D-3B engine. Thus, the APMEP-CNN model can provide new data for establishing accurate emission inventories of global aviation and help assess the impact of aviation emissions on human health, environment and climate. SYNOPSIS: The results of this paper provide accurate predictions of nvPM emissions from in-use aircraft engines, which impact airport local air quality and global radiative forcing.
Collapse
Affiliation(s)
- Fudong Ge
- School of Energy and Power Engineering, Beihang University, Beijing 100191, China; Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China
| | - Zhenhong Yu
- Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China
| | - Yan Li
- School of Energy and Power Engineering, Beihang University, Beijing 100191, China; Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China
| | - Meiyin Zhu
- Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China
| | - Bin Zhang
- Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China
| | - Qian Zhang
- School of Energy and Power Engineering, Beihang University, Beijing 100191, China
| | - Roy M Harrison
- School of Geography, Earth & Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | - Longfei Chen
- School of Energy and Power Engineering, Beihang University, Beijing 100191, China; Beihang Hangzhou Innovation Institute Yuhang, Xixi Octagon City, Yuhang District, Hangzhou 310023, China.
| |
Collapse
|
14
|
Crook OM, Warmbrod KL, Lipstein G, Chung C, Bakerlee CW, McKelvey TG, Holland SR, Swett JL, Esvelt KM, Alley EC, Bradshaw WJ. Analysis of the first genetic engineering attribution challenge. Nat Commun 2022; 13:7374. [PMID: 36450726 PMCID: PMC9712580 DOI: 10.1038/s41467-022-35032-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
The ability to identify the designer of engineered biological sequences-termed genetic engineering attribution (GEA)-would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA techniques. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered plasmid sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.
Collapse
Affiliation(s)
- Oliver M Crook
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Kelsey Lane Warmbrod
- Johns Hopkins Center for Health Security, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Institute of Public Health Genetics, University of Washington, Seattle, WA, USA
| | | | | | | | | | | | | | - Kevin M Esvelt
- altLabs Inc, Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ethan C Alley
- altLabs Inc, Berkeley, CA, USA.
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - William J Bradshaw
- altLabs Inc, Berkeley, CA, USA.
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
15
|
Chandler M, Jain S, Halman J, Hong E, Dobrovolskaia MA, Zakharov AV, Afonin KA. Artificial Immune Cell, AI-cell, a New Tool to Predict Interferon Production by Peripheral Blood Monocytes in Response to Nucleic Acid Nanoparticles. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204941. [PMID: 36216772 PMCID: PMC9671856 DOI: 10.1002/smll.202204941] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Nucleic acid nanoparticles, or NANPs, rationally designed to communicate with the human immune system, can offer innovative therapeutic strategies to overcome the limitations of traditional nucleic acid therapies. Each set of NANPs is unique in their architectural parameters and physicochemical properties, which together with the type of delivery vehicles determine the kind and the magnitude of their immune response. Currently, there are no predictive tools that would reliably guide the design of NANPs to the desired immunological outcome, a step crucial for the success of personalized therapies. Through a systematic approach investigating physicochemical and immunological profiles of a comprehensive panel of various NANPs, the research team developes and experimentally validates a computational model based on the transformer architecture able to predict the immune activities of NANPs. It is anticipated that the freely accessible computational tool that is called an "artificial immune cell," or AI-cell, will aid in addressing the current critical public health challenges related to safety criteria of nucleic acid therapies in a timely manner and promote the development of novel biomedical tools.
Collapse
Affiliation(s)
- Morgan Chandler
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Sankalp Jain
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Justin Halman
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Enping Hong
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Marina A. Dobrovolskaia
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Kirill A. Afonin
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
16
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
17
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
18
|
Using metric learning to identify the lab-of-origin of engineered DNA. NATURE COMPUTATIONAL SCIENCE 2022; 2:296-297. [PMID: 38177813 DOI: 10.1038/s43588-022-00240-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
19
|
Soares IM, Camargo FHF, Marques A, Crook OM. Improving lab-of-origin prediction of genetically engineered plasmids via deep metric learning. NATURE COMPUTATIONAL SCIENCE 2022; 2:253-264. [PMID: 38177551 DOI: 10.1038/s43588-022-00234-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 03/22/2022] [Indexed: 01/06/2024]
Abstract
Genome engineering is undergoing unprecedented development and is now becoming widely available. Genetic engineering attribution can make sequence-lab associations and assist forensic experts in ensuring responsible biotechnology innovation and reducing misuse of engineered DNA sequences. Here we propose a method based on metric learning to rank the most likely labs of origin while simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employs a circular shift augmentation method and can correctly rank the lab of origin 90% of the time within its top-10 predictions. We also demonstrate that we can perform few-shot learning and obtain 76% top-10 accuracy using only 10% of the sequences. Finally, our approach can also extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
Collapse
Affiliation(s)
| | | | | | - Oliver M Crook
- Oxford Protein Informatics Group, University of Oxford, Oxford, UK.
| |
Collapse
|
20
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
21
|
Singer JM, Novotney S, Strickland D, Haddox HK, Leiby N, Rocklin GJ, Chow CM, Roy A, Bera AK, Motta FC, Cao L, Strauch EM, Chidyausiku TM, Ford A, Ho E, Zaitzeff A, Mackenzie CO, Eramian H, DiMaio F, Grigoryan G, Vaughn M, Stewart LJ, Baker D, Klavins E. Large-scale design and refinement of stable proteins using sequence-only models. PLoS One 2022; 17:e0265020. [PMID: 35286324 PMCID: PMC8920274 DOI: 10.1371/journal.pone.0265020] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/18/2022] [Indexed: 12/25/2022] Open
Abstract
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Collapse
Affiliation(s)
| | - Scott Novotney
- Two Six Technologies, Arlington, Virginia, United States of America
| | - Devin Strickland
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, United States of America
| | - Hugh K. Haddox
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Nicholas Leiby
- Two Six Technologies, Arlington, Virginia, United States of America
| | - Gabriel J. Rocklin
- Department of Pharmacology and Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Cameron M. Chow
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Anindya Roy
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Asim K. Bera
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Francis C. Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Longxing Cao
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Eva-Maria Strauch
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, Georgia, United States of America
| | - Tamuka M. Chidyausiku
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Alex Ford
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Ethan Ho
- Texas Advanced Computing Center, Austin, Texas, United States of America
| | | | - Craig O. Mackenzie
- Quantitative Biomedical Sciences Graduate Program, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hamed Eramian
- Netrias, Cambridge, Massachusetts, United States of America
| | - Frank DiMaio
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Gevorg Grigoryan
- Departments of Computer Science and Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Matthew Vaughn
- Texas Advanced Computing Center, Austin, Texas, United States of America
| | - Lance J. Stewart
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Eric Klavins
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
22
|
Lee BD, Gitter A, Greene CS, Raschka S, Maguire F, Titus AJ, Kessler MD, Lee AJ, Chevrette MG, Stewart PA, Britto-Borges T, Cofer EM, Yu KH, Carmona JJ, Fertig EJ, Kalinin AA, Signal B, Lengerich BJ, Triche TJ, Boca SM. Ten quick tips for deep learning in biology. PLoS Comput Biol 2022; 18:e1009803. [PMID: 35324884 PMCID: PMC8946751 DOI: 10.1371/journal.pcbi.1009803] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Affiliation(s)
- Benjamin D. Lee
- In-Q-Tel Labs, Arlington, Virginia, United States of America
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Sebastian Raschka
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Finlay Maguire
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Alexander J. Titus
- University of New Hampshire, Manchester, New Hampshire, United States of America
- Bioeconomy.XYZ, Manchester, New Hampshire, United States of America
| | - Michael D. Kessler
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marc G. Chevrette
- Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Paul Allen Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Thiago Britto-Borges
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany
- Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Heidelberg, Germany
| | - Evan M. Cofer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Juan Jose Carmona
- Philips Healthcare, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biomedical Engineering, Department of Applied Mathematics and Statistics, Convergence Institute, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Alexandr A. Kalinin
- Medical Big Data Group, Shenzhen Research Institute of Big Data, Shenzhen, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Brandon Signal
- School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Australia
| | - Benjamin J. Lengerich
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Timothy J. Triche
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
- Department of Pediatrics, College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Simina M. Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, District of Columbia, United States of America
- Department of Oncology, Georgetown University Medical Center, Washington, DC, United States of America
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, United States of America
- Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Washington, DC, United States of America
| |
Collapse
|
23
|
Montesinos-López OA, Montesinos-López A, Hernandez-Suarez CM, Barrón-López JA, Crossa J. Deep-learning power and perspectives for genomic selection. THE PLANT GENOME 2021; 14:e20122. [PMID: 34309215 DOI: 10.1002/tpg2.20122] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 05/24/2021] [Indexed: 06/13/2023]
Abstract
Deep learning (DL) is revolutionizing the development of artificial intelligence systems. For example, before 2015, humans were better than artificial machines at classifying images and solving many problems of computer vision (related to object localization and detection using images), but nowadays, artificial machines have surpassed the ability of humans in this specific task. This is just one example of how the application of these models has surpassed human abilities and the performance of other machine-learning algorithms. For this reason, DL models have been adopted for genomic selection (GS). In this article we provide insight about the power of DL in solving complex prediction tasks and how combining GS and DL models can accelerate the revolution provoked by GS methodology in plant breeding. Furthermore, we will mention some trends of DL methods, emphasizing some areas of opportunity to really exploit the DL methodology in GS; however, we are aware that considerable research is required to be able not only to use the existing DL in conjunction with GS, but to adapt and develop DL methods that take the peculiarities of breeding inputs and GS into consideration.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, 44430, México
| | | | - José Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, Lima, 15024, Perú
| | - José Crossa
- Colegio de Postgraduados, Montecillos, Edo, de México, 56230, México
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Edo. De, Mexico DF, 52640, Mexico
| |
Collapse
|
24
|
Kim J, Kim D, Sohn KA. HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball. Bioinformatics 2021; 37:2971-2980. [PMID: 33760022 DOI: 10.1093/bioinformatics/btab193] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. RESULTS In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. AVAILABILITYAND IMPLEMENTATION https://github.com/JaesikKim/HiG2Vec. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jaesik Kim
- Department of Computer Engineering, Ajou University, Suwon 16499, South Korea.,Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon 16499, South Korea.,Department of Artificial Intelligence, Ajou University, Suwon 16499, South Korea
| |
Collapse
|
25
|
Bartoszewicz JM, Genske U, Renard BY. Deep learning-based real-time detection of novel pathogens during sequencing. Brief Bioinform 2021; 22:6326527. [PMID: 34297793 DOI: 10.1093/bib/bbab269] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| |
Collapse
|
26
|
In silico, in vitro, and in vivo machine learning in synthetic biology and metabolic engineering. Curr Opin Chem Biol 2021; 65:85-92. [PMID: 34280705 DOI: 10.1016/j.cbpa.2021.06.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/31/2021] [Accepted: 06/01/2021] [Indexed: 01/29/2023]
Abstract
Among the main learning methods reviewed in this study and used in synthetic biology and metabolic engineering are supervised learning, reinforcement and active learning, and in vitro or in vivo learning. In the context of biosynthesis, supervised machine learning is being exploited to predict biological sequence activities, predict structures and engineer sequences, and optimize culture conditions. Active and reinforcement learning methods use training sets acquired through an iterative process generally involving experimental measurements. They are applied to design, engineer, and optimize metabolic pathways and bioprocesses. The nascent but promising developments with in vitro and in vivo learning comprise molecular circuits performing simple tasks such as pattern recognition and classification.
Collapse
|
27
|
Development of a growth coupled and multi-layered dynamic regulation network balancing malonyl-CoA node to enhance (2S)-naringenin biosynthesis in Escherichia coli. Metab Eng 2021; 67:41-52. [PMID: 34052445 DOI: 10.1016/j.ymben.2021.05.007] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 04/29/2021] [Accepted: 05/21/2021] [Indexed: 02/07/2023]
Abstract
Metabolic heterogeneity and dynamic changes in metabolic fluxes are two inherent characteristics of microbial fermentation that limit the precise control of metabolisms, often leading to impaired cell growth and low productivity. Dynamic metabolic engineering addresses these challenges through the design of multi-layered and multi-genetic dynamic regulation network (DRN) that allow a single cell to autonomously adjust metabolic flux in response to its growth and metabolite accumulation conditions. Here, we developed a growth coupled NCOMB (Naringenin-Coumaric acid-Malonyl-CoA-Balanced) DRN with systematic optimization of (2S)-naringenin and p-coumaric acid-responsive regulation pathways for real-time control of intracellular supply of malonyl-CoA. In this scenario, the acyl carrier protein was used as a novel critical node for fine-tuning malonyl-CoA consumption instead of direct repression of fatty acid synthase commonly employed in previous studies. To do so, we first engineered a multi-layered DRN enabling single cells to concurrently regulate acpH, acpS, acpT, acs, and ACC in malonyl-CoA catabolic and anabolic pathways. Next, the NCOMB DRN was optimized to enhance the synergies between different dynamic regulation layers via a biosensor-based directed evolution strategy. Finally, a high producer obtained from NCOMB DRN approach yielded a 8.7-fold improvement in (2S)-naringenin production (523.7 ± 51.8 mg/L) with a concomitant 20% increase in cell growth compared to the base strain using static strain engineering approach, thus demonstrating the high efficiency of this system for improving pathway production.
Collapse
|
28
|
Zhang Z, van Dijk F, de Klein N, van Gijn ME, Franke LH, Sinke RJ, Swertz MA, van der Velde KJ. Feasibility of predicting allele specific expression from DNA sequencing using machine learning. Sci Rep 2021; 11:10606. [PMID: 34012022 PMCID: PMC8134421 DOI: 10.1038/s41598-021-89904-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/04/2021] [Indexed: 11/09/2022] Open
Abstract
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
Collapse
Affiliation(s)
- Zhenhua Zhang
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Freerk van Dijk
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Prinses Maxima Center for Child Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Niek de Klein
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Mariëlle E van Gijn
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Lude H Franke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Richard J Sinke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Morris A Swertz
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - K Joeri van der Velde
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
29
|
Parker MT, Kunjapur AM. Deployment of Engineered Microbes: Contributions to the Bioeconomy and Considerations for Biosecurity. Health Secur 2021; 18:278-296. [PMID: 32816583 DOI: 10.1089/hs.2020.0010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Engineering at microscopic scales has an immense effect on the modern bioeconomy. Microbes contribute to such disparate markets as chemical manufacturing, fuel production, crop optimization, and pharmaceutical synthesis, to name a few. Due to new and emerging synthetic biology technologies, and the sophistication and control afforded by them, we are on the brink of deploying engineered microbes to not only enhance traditional applications but also to introduce these microbes to sectors, contexts, and formats not previously attempted. In microbially managed medicine, microbial engineering holds promise for increasing efficacy, improving tissue penetration, and sustaining treatment. In the environment, the most effective areas for deployment are in the management of crops and protection of ecosystems. However, caution is warranted before introducing engineered organisms to new environments where they may proliferate without control and could cause unforeseen effects. We summarize ideas and data that can inform identification and assessment of the risks that these tools present to ensure that realistic hazards are described and unrealistic ones do not hinder advancement. Further, because modes of containment are crucial complements to deployment, we describe the state of the art in microbial biocontainment strategies, current gaps, and how these gaps might be addressed through technological advances in synthetic engineering. Collectively, this work highlights engineered microbes as a foundational and expanding facet of the bioeconomy, projects their utility in upcoming deployments outside the laboratory, and identifies knowns and unknowns that will be necessary considerations and points of focus in this endeavor.
Collapse
Affiliation(s)
- Michael T Parker
- Michael T. Parker, PhD, is an Assistant Dean, Office of the Dean, Georgetown University, Washington, DC. Aditya M. Kunjapur, PhD, is an Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware, Newark, DE
| | - Aditya M Kunjapur
- Michael T. Parker, PhD, is an Assistant Dean, Office of the Dean, Georgetown University, Washington, DC. Aditya M. Kunjapur, PhD, is an Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware, Newark, DE
| |
Collapse
|
30
|
Early forecasting of tsunami inundation from tsunami and geodetic observation data with convolutional neural networks. Nat Commun 2021; 12:2253. [PMID: 33859177 PMCID: PMC8050057 DOI: 10.1038/s41467-021-22348-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 03/03/2021] [Indexed: 11/18/2022] Open
Abstract
Rapid and accurate hazard forecasting is important for prompt evacuations and reducing casualties during natural disasters. In the decade since the 2011 Tohoku tsunami, various tsunami forecasting methods using real-time data have been proposed. However, rapid and accurate tsunami inundation forecasting in coastal areas remains challenging. Here, we propose a tsunami forecasting approach using convolutional neural networks (CNNs) for early warning. Numerical tsunami forecasting experiments for Tohoku demonstrated excellent performance with average maximum tsunami amplitude and tsunami arrival time forecasting errors of ~0.4 m and ~48 s, respectively, for 1,000 unknown synthetic tsunami scenarios. Our forecasting approach required only 0.004 s on average using a single CPU node. Moreover, the CNN trained on only synthetic tsunami scenarios provided reasonable inundation forecasts using actual observation data from the 2011 event, even with noisy inputs. These results verify the feasibility of AI-enabled tsunami forecasting for providing rapid and accurate early warnings. Rapid and accurate hazard prediction is important for prompt evacuation and casualty reduction during natural disasters. Here, the authors present an AI-enabled tsunami forecasting approach, which provided rapid and accurate early warnings.
Collapse
|
31
|
Suh Y, Bostanabad R, Won Y. Deep learning predicts boiling heat transfer. Sci Rep 2021; 11:5622. [PMID: 33692489 PMCID: PMC7970936 DOI: 10.1038/s41598-021-85150-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/15/2021] [Indexed: 01/17/2023] Open
Abstract
Boiling is arguably Nature's most effective thermal management mechanism that cools submersed matter through bubble-induced advective transport. Central to the boiling process is the development of bubbles. Connecting boiling physics with bubble dynamics is an important, yet daunting challenge because of the intrinsically complex and high dimensional of bubble dynamics. Here, we introduce a data-driven learning framework that correlates high-quality imaging on dynamic bubbles with associated boiling curves. The framework leverages cutting-edge deep learning models including convolutional neural networks and object detection algorithms to automatically extract both hierarchical and physics-based features. By training on these features, our model learns physical boiling laws that statistically describe the manner in which bubbles nucleate, coalesce, and depart under boiling conditions, enabling in situ boiling curve prediction with a mean error of 6%. Our framework offers an automated, learning-based, alternative to conventional boiling heat transfer metrology.
Collapse
Affiliation(s)
- Youngjoon Suh
- Department of Mechanical and Aerospace Engineering, University of California, 5200 Engineering Hall, Irvine, CA, 92617-2700, USA
| | - Ramin Bostanabad
- Department of Mechanical and Aerospace Engineering, University of California, 5200 Engineering Hall, Irvine, CA, 92617-2700, USA
| | - Yoonjin Won
- Department of Mechanical and Aerospace Engineering, University of California, 5200 Engineering Hall, Irvine, CA, 92617-2700, USA.
- , 4200 Engineering Gateway, Irvine, USA.
| |
Collapse
|
32
|
Wang Q, Kille B, Liu TR, Elworth RAL, Treangen TJ. PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment. Nat Commun 2021; 12:1167. [PMID: 33637701 PMCID: PMC7910462 DOI: 10.1038/s41467-021-21180-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 01/12/2021] [Indexed: 12/26/2022] Open
Abstract
With advances in synthetic biology and genome engineering comes a heightened awareness of potential misuse related to biosafety concerns. A recent study employed machine learning to identify the lab-of-origin of DNA sequences to help mitigate some of these concerns. Despite their promising results, this deep learning based approach had limited accuracy, was computationally expensive to train, and wasn't able to provide the precise features that were used in its predictions. To address these shortcomings, we developed PlasmidHawk for lab-of-origin prediction. Compared to a machine learning approach, PlasmidHawk has higher prediction accuracy; PlasmidHawk can successfully predict unknown sequences' depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. In summary, PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. PlasmidHawk is available at https://gitlab.com/treangenlab/plasmidhawk.git .
Collapse
Affiliation(s)
- Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Rice University, Houston, Texas, 77005, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - Tian Rui Liu
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States.
| |
Collapse
|
33
|
Lewis G, Jordan JL, Relman DA, Koblentz GD, Leung J, Dafoe A, Nelson C, Epstein GL, Katz R, Montague M, Alley EC, Filone CM, Luby S, Church GM, Millett P, Esvelt KM, Cameron EE, Inglesby TV. The biosecurity benefits of genetic engineering attribution. Nat Commun 2020; 11:6294. [PMID: 33293537 PMCID: PMC7722838 DOI: 10.1038/s41467-020-19149-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 09/28/2020] [Indexed: 11/23/2022] Open
Abstract
Biology can be misused, and the risk of this causing widespread harm increases in step with the rapid march of technological progress. A key security challenge involves attribution: determining, in the wake of a human-caused biological event, who was responsible. Recent scientific developments have demonstrated a capability for detecting whether an organism involved in such an event has been genetically modified and, if modified, to infer from its genetic sequence its likely lab of origin. We believe this technique could be developed into powerful forensic tools to aid the attribution of outbreaks caused by genetically engineered pathogens, and thus protect against the potential misuse of synthetic biology.
Collapse
Affiliation(s)
- Gregory Lewis
- Future of Humanity Institute, Oxford University, Oxford, UK.
- Alt. Technology Labs, Inc., Berkeley, CA, USA.
| | | | - David A Relman
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Microbiology & Immunology, Stanford University School of Medicine; and Center for International Security and Cooperation, Stanford University, Stanford, CA, USA
| | - Gregory D Koblentz
- Schar School of Policy and Government, George Mason University, Washington, DC, USA
| | - Jade Leung
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Allan Dafoe
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Cassidy Nelson
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Gerald L Epstein
- Center for the Study of Weapons of Mass Destruction, National Defense University, Washington, DC, USA
| | - Rebecca Katz
- Center for Global Health Science and Security, Georgetown University, Washington, DC, USA
| | - Michael Montague
- Center for Health Security, Johns Hopkins University, Baltimore, MD, USA
| | - Ethan C Alley
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Stephen Luby
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - George M Church
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Piers Millett
- Future of Humanity Institute, Oxford University, Oxford, UK
- International Genetically Engineered Machine Competition, Boston, MA, USA
| | - Kevin M Esvelt
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Thomas V Inglesby
- Center for Health Security, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
34
|
A machine learning toolkit for genetic engineering attribution to facilitate biosecurity. Nat Commun 2020; 11:6293. [PMID: 33293535 PMCID: PMC7722865 DOI: 10.1038/s41467-020-19612-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 10/05/2020] [Indexed: 12/21/2022] Open
Abstract
The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike.
Collapse
|
35
|
Ding N, Yuan Z, Zhang X, Chen J, Zhou S, Deng Y. Programmable cross-ribosome-binding sites to fine-tune the dynamic range of transcription factor-based biosensor. Nucleic Acids Res 2020; 48:10602-10613. [PMID: 32976557 PMCID: PMC7544201 DOI: 10.1093/nar/gkaa786] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 08/20/2020] [Accepted: 09/09/2020] [Indexed: 11/24/2022] Open
Abstract
Currently, predictive translation tuning of regulatory elements to the desired output of transcription factor (TF)-based biosensors remains a challenge. The gene expression of a biosensor system must exhibit appropriate translation intensity, which is controlled by the ribosome-binding site (RBS), to achieve fine-tuning of its dynamic range (i.e. fold change in gene expression between the presence and absence of inducer) by adjusting the translation level of the TF and reporter. However, existing TF-based biosensors generally suffer from unpredictable dynamic range. Here, we elucidated the connections and partial mechanisms between RBS, translation level, protein folding and dynamic range, and presented a design platform that predictably tuned the dynamic range of biosensors based on deep learning of large datasets cross-RBSs (cRBSs). In doing so, a library containing 7053 designed cRBSs was divided into five sub-libraries through fluorescence-activated cell sorting to establish a classification model based on convolutional neural network in deep learning. Finally, the present work exhibited a powerful platform to enable predictable translation tuning of RBS to the dynamic range of biosensors.
Collapse
Affiliation(s)
- Nana Ding
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People's Republic of China
| | - Zhenqi Yuan
- School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Engineering Research Center of Internet of Things Technology Applications, Ministry of Education, Wuxi 214122, People's Republic of China
| | - Xiaojuan Zhang
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People's Republic of China
| | - Jing Chen
- School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Engineering Research Center of Internet of Things Technology Applications, Ministry of Education, Wuxi 214122, People's Republic of China
| | - Shenghu Zhou
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People's Republic of China
| | - Yu Deng
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, People's Republic of China.,Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, People's Republic of China
| |
Collapse
|
36
|
Application of deep learning in genomics. SCIENCE CHINA-LIFE SCIENCES 2020; 63:1860-1878. [PMID: 33051704 DOI: 10.1007/s11427-020-1804-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 08/15/2020] [Indexed: 12/19/2022]
Abstract
In recent years, deep learning has been widely used in diverse fields of research, such as speech recognition, image classification, autonomous driving and natural language processing. Deep learning has showcased dramatically improved performance in complex classification and regression problems, where the intricate structure in the high-dimensional data is difficult to discover using conventional machine learning algorithms. In biology, applications of deep learning are gaining increasing popularity in predicting the structure and function of genomic elements, such as promoters, enhancers, or gene expression levels. In this review paper, we described the basic concepts in machine learning and artificial neural network, followed by elaboration on the workflow of using convolutional neural network in genomics. Then we provided a concise introduction of deep learning applications in genomics and synthetic biology at the levels of DNA, RNA and protein. Finally, we discussed the current challenges and future perspectives of deep learning in genomics.
Collapse
|
37
|
Bartoszewicz JM, Seidel A, Rentzsch R, Renard BY. DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 2020; 36:81-89. [PMID: 31298694 DOI: 10.1093/bioinformatics/btz541] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 06/22/2019] [Accepted: 07/10/2019] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. RESULTS We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. AVAILABILITY AND IMPLEMENTATION The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Anja Seidel
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Robert Rentzsch
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
38
|
Kar DM, Ray I, Gallegos J, Peccoud J, Ray I. Synthesizing DNA molecules with identity-based digital signatures to prevent malicious tampering and enabling source attribution. JOURNAL OF COMPUTER SECURITY 2020. [DOI: 10.3233/jcs-191383] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Diptendu Mohan Kar
- Department of Computer Science, Colorado State University, Fort Collins, Colorado, USA. E-mails: , ,
| | - Indrajit Ray
- Department of Computer Science, Colorado State University, Fort Collins, Colorado, USA. E-mails: , ,
| | - Jenna Gallegos
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, USA. E-mails: ,
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, USA. E-mails: ,
- GenoFAB, Inc., Fort Collins, Colorado, USA
| | - Indrakshi Ray
- Department of Computer Science, Colorado State University, Fort Collins, Colorado, USA. E-mails: , ,
| |
Collapse
|
39
|
Yuan Y, Ma G, Cheng C, Zhou B, Zhao H, Zhang HT, Ding H. A general end-to-end diagnosis framework for manufacturing systems. Natl Sci Rev 2020; 7:418-429. [PMID: 34692057 PMCID: PMC8289032 DOI: 10.1093/nsr/nwz190] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/20/2019] [Accepted: 11/03/2019] [Indexed: 11/13/2022] Open
Abstract
The manufacturing sector is envisioned to be heavily influenced by artificial-intelligence-based technologies with the extraordinary increases in computational power and data volumes. A central challenge in the manufacturing sector lies in the requirement of a general framework to ensure satisfied diagnosis and monitoring performances in different manufacturing applications. Here, we propose a general data-driven, end-to-end framework for the monitoring of manufacturing systems. This framework, derived from deep-learning techniques, evaluates fused sensory measurements to detect and even predict faults and wearing conditions. This work exploits the predictive power of deep learning to automatically extract hidden degradation features from noisy, time-course data. We have experimented the proposed framework on 10 representative data sets drawn from a wide variety of manufacturing applications. Results reveal that the framework performs well in examined benchmark applications and can be applied in diverse contexts, indicating its potential use as a critical cornerstone in smart manufacturing.
Collapse
Affiliation(s)
- Ye Yuan
- School of Artificial Intelligence and Automation, MOE Key Lab of Intelligent Control and Image Processing, Huazhong University of Science and Technology, Wuhan 430074, China.,State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Guijun Ma
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Cheng Cheng
- School of Artificial Intelligence and Automation, MOE Key Lab of Intelligent Control and Image Processing, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Beitong Zhou
- School of Artificial Intelligence and Automation, MOE Key Lab of Intelligent Control and Image Processing, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Huan Zhao
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Hai-Tao Zhang
- School of Artificial Intelligence and Automation, MOE Key Lab of Intelligent Control and Image Processing, Huazhong University of Science and Technology, Wuhan 430074, China.,State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Han Ding
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.,School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
40
|
MacIntyre CR, Adam DC, Turner R, Chughtai AA, Engells T. Public awareness, acceptability and risk perception about infectious diseases dual-use research of concern: a cross-sectional survey. BMJ Open 2020; 10:e029134. [PMID: 31911509 PMCID: PMC6955500 DOI: 10.1136/bmjopen-2019-029134] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 10/25/2019] [Accepted: 10/25/2019] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES In this study, we aimed to measure the awareness, acceptability and perceptions of current issues in biosecurity posed by infectious diseases dual-use research of concern (DURC) in the community. DURC is conducted today in many locations around the world for the benefit of humanity but may also cause harm through either a laboratory accident or deliberate misuse. Most DURC is approved by animal ethics committees, which do not typically consider harm to humans. Given the unique characteristics of contagion and the potential for epidemics and pandemics, the community is an important stakeholder in DURC. DESIGN Self-administered web-based cross-sectional survey. PARTICIPANTS Participants over the age of 18 in Australia and 21 in the USA were included in the survey. A total of 604 participants completed the study. The results of 52 participants were excluded due to potential biases about DURC stemming from their employment as medical researchers, infectious diseases researchers or law enforcement professionals, leaving 552 participants. Of those, 274 respondents resided in Australia and 278 in the USA. OUTCOMES Baseline awareness, acceptability and perceptions of current issues surrounding DURC. Changes in perception from baseline were measured after provision of information about DURC. RESULTS Presurvey, 77% of respondents were unaware of DURC and 64% found it unacceptable or were unsure. Two-thirds of respondents did not change their views. The baseline perception of high risk for laboratory accidents (29%) and deliberate bioterrorism (34%) was low but increased with increasing provision of information (42% and 44% respectively, p<0.001), with men more accepting of DURC (OR=1.79, 95% CI 1.25 to 2.57, p=0.002). Postsurvey, higher education predicted lower risk perception of laboratory accidents (OR=0.56, 95% CI 0.34 to 0.93, p=0.02) and bioterrorism (OR=0.48, 95% CI 0.29 to 0.80, p=0.004). CONCLUSION The community is an important stakeholder in infectious diseases DURC but has a low awareness of this kind of research. Only a minority support DURC, and this proportion decreased with increasing provision of knowledge. There were differences of opinion between age groups, gender and education levels. The community should be informed and engaged in decisions about DURC.
Collapse
Affiliation(s)
- Chandini Raina MacIntyre
- Biosecurity Program, Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
- College of Health Solutions, Arizona State University, Tempe, Arizona, USA
- College of Public Service & Community Solutions, Arizona State University, Tempe, Arizona, USA
| | - Dillon Charles Adam
- Biosecurity Program, Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Robin Turner
- Centre for Biostatistics, Division of Health Sciences, University of Otago Dunedin School of Medicine, Dunedin, New Zealand
| | - Abrar Ahmad Chughtai
- University of New South Wales School of Public Health and Community Medicine, Sydney, New South Wales, Australia
| | - Thomas Engells
- University of Texas Medical Branch, Galveston, Texas, USA
| |
Collapse
|
41
|
de Los Santos ELC. NeuRiPP: Neural network identification of RiPP precursor peptides. Sci Rep 2019; 9:13406. [PMID: 31527713 PMCID: PMC6746993 DOI: 10.1038/s41598-019-49764-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 08/30/2019] [Indexed: 01/29/2023] Open
Abstract
Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.
Collapse
Affiliation(s)
- Emmanuel L C de Los Santos
- Warwick Integrative Synthetic Biology Centre, School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, United Kingdom.
| |
Collapse
|