1
|
Wheeler NE. Responsible AI in biotechnology: balancing discovery, innovation and biosecurity risks. Front Bioeng Biotechnol 2025; 13:1537471. [PMID: 39974189 PMCID: PMC11835847 DOI: 10.3389/fbioe.2025.1537471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Accepted: 01/03/2025] [Indexed: 02/21/2025] Open
Abstract
The integration of artificial intelligence (AI) in protein design presents unparalleled opportunities for innovation in bioengineering and biotechnology. However, it also raises significant biosecurity concerns. This review examines the changing landscape of bioweapon risks, the dual-use potential of AI-driven bioengineering tools, and the necessary safeguards to prevent misuse while fostering innovation. It highlights emerging policy frameworks, technical safeguards, and community responses aimed at mitigating risks and enabling responsible development and application of AI in protein design.
Collapse
Affiliation(s)
- Nicole E. Wheeler
- Department of Microbes, Infection and Microbiomes, School of Infection, Inflammation and Immunology, College of Medicine and Health, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
2
|
Mo W, Vaiana CA, Myers CJ. The need for adaptability in detection, characterization, and attribution of biosecurity threats. Nat Commun 2024; 15:10699. [PMID: 39702312 PMCID: PMC11659417 DOI: 10.1038/s41467-024-55436-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 12/12/2024] [Indexed: 12/21/2024] Open
Abstract
Modern biotechnology necessitates robust biosecurity protocols to address the risk of engineered biological threats. Current efforts focus on screening DNA and rejecting the synthesis of dangerous elements but face technical and logistical barriers. Screening should integrate into a broader strategy that addresses threats at multiple stages of development and deployment. The success of this approach hinges upon reliable detection, characterization, and attribution of engineered DNA. Recent advances notably aid the potential to both develop threats and analyze them. However, further work is needed to translate developments into biosecurity applications. This work reviews cutting-edge methods for DNA analysis and recommends avenues to improve biosecurity in an adaptable manner.
Collapse
Affiliation(s)
- William Mo
- Draper Scholar, The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA
| | - Christopher A Vaiana
- The Charles Stark Draper Laboratory, Inc., 555 Technology Square, Cambridge, MA, USA
| | - Chris J Myers
- Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, 1111 Engineering Dr, Boulder, CO, USA.
| |
Collapse
|
3
|
Berezin CT, Peccoud S, Kar DM, Peccoud J. Cryptographic approaches to authenticating synthetic DNA sequences. Trends Biotechnol 2024; 42:1002-1016. [PMID: 38418329 PMCID: PMC11309913 DOI: 10.1016/j.tibtech.2024.02.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/01/2024]
Abstract
In a bioeconomy that relies on synthetic DNA sequences, the ability to ensure their authenticity is critical. DNA watermarks can encode identifying data in short sequences and can be combined with error correction and encryption protocols to ensure that sequences are robust to errors and securely communicated. New digital signature techniques allow for public verification that a sequence has not been modified and can contain sufficient information for synthetic DNA to be self-documenting. In translating these techniques from bacteria to more complex genetically modified organisms (GMOs), special considerations must be made to allow for public verification of these products. We argue that these approaches should be widely implemented to assert authorship, increase the traceability, and detect the unauthorized use of synthetic DNA.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA
| | - Samuel Peccoud
- Department of Electrical Engineering, Colorado State University, Fort Collins, CO, USA
| | - Diptendu M Kar
- Department of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Jean Peccoud
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA; Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA; School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA; Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
4
|
Tay AP, Didi K, Wickramarachchi A, Bauer DC, Wilson LOW, Maselko M. Synsor: a tool for alignment-free detection of engineered DNA sequences. Front Bioeng Biotechnol 2024; 12:1375626. [PMID: 39070163 PMCID: PMC11272466 DOI: 10.3389/fbioe.2024.1375626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/18/2024] [Indexed: 07/30/2024] Open
Abstract
DNA sequences of nearly any desired composition, length, and function can be synthesized to alter the biology of an organism for purposes ranging from the bioproduction of therapeutic compounds to invasive pest control. Yet despite offering many great benefits, engineered DNA poses a risk due to their possible misuse or abuse by malicious actors, or their unintentional introduction into the environment. Monitoring the presence of engineered DNA in biological or environmental systems is therefore crucial for routine and timely detection of emerging biological threats, and for improving public acceptance of genetic technologies. To address this, we developed Synsor, a tool for identifying engineered DNA sequences in high-throughput sequencing data. Synsor leverages the k-mer signature differences between naturally occurring and engineered DNA sequences and uses an artificial neural network to classify whether a DNA sequence is natural or engineered. By querying suspected sequences against the model, Synsor can identify sequences that are likely to have been engineered. Using natural plasmid and engineered vector sequences, we showed that Synsor identifies engineered DNA with >99% accuracy. We demonstrate how Synsor can be used to detect potential genetically engineered organisms and locate where engineered DNA is being introduced into the environment by analysing genomic and metagenomic data from yeast and wastewater samples, respectively. Synsor is therefore a powerful tool that will streamline the process of identifying engineered DNA in poorly characterized biological or environmental systems, thereby allowing for enhanced monitoring of emerging biological threats.
Collapse
Affiliation(s)
- Aidan P. Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Kieran Didi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| | - Denis C. Bauer
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Laurence O. W. Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Maciej Maselko
- Applied Biosciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
- Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia
| |
Collapse
|
5
|
McGuffie MJ, Barrick JE. Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences. PLoS One 2024; 19:e0304164. [PMID: 38805426 PMCID: PMC11132462 DOI: 10.1371/journal.pone.0304164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/07/2024] [Indexed: 05/30/2024] Open
Abstract
Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science.
Collapse
Affiliation(s)
- Matthew J. McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
6
|
de Lima RC, Sinclair L, Megger R, Maciel MAG, Vasconcelos PFDC, Quaresma JAS. Artificial intelligence challenges in the face of biological threats: emerging catastrophic risks for public health. Front Artif Intell 2024; 7:1382356. [PMID: 38800763 PMCID: PMC11116769 DOI: 10.3389/frai.2024.1382356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 04/29/2024] [Indexed: 05/29/2024] Open
Abstract
The threat landscape of biological hazards with the evolution of AI presents challenges. While AI promises innovative solutions, concerns arise about its misuse in the creation of biological weapons. The convergence of AI and genetic editing raises questions about biosecurity, potentially accelerating the development of dangerous pathogens. The mapping conducted highlights the critical intersection between AI and biological threats, underscoring emerging risks in the criminal manipulation of pathogens. Technological advancement in biology requires preventative and regulatory measures. Expert recommendations emphasize the need for solid regulations and responsibility of creators, demanding a proactive, ethical approach and governance to ensure global safety.
Collapse
Affiliation(s)
- Renan Chaves de Lima
- GA.IA—AI Integrated Analysis Group, Remote Group, Brazil
- Postgraduate Program in Tropical Diseases, Tropical Medicine Center, Federal University of Pará, Belém, PA, Brazil
| | - Lucas Sinclair
- GA.IA—AI Integrated Analysis Group, Remote Group, Brazil
| | - Ricardo Megger
- GA.IA—AI Integrated Analysis Group, Remote Group, Brazil
| | - Magno Alessandro Guedes Maciel
- GA.IA—AI Integrated Analysis Group, Remote Group, Brazil
- MBA Program in Artificial Intelligence for Business, Faculdade Exame, São Paulo, SP, Brazil
| | | | - Juarez Antônio Simões Quaresma
- Postgraduate Program in Tropical Diseases, Tropical Medicine Center, Federal University of Pará, Belém, PA, Brazil
- Department of Pathology, State University of Pará, Belém, PA, Brazil
- School of Medicine, São Paulo University, São Paulo, SP, Brazil
| |
Collapse
|
7
|
Adler A, Bader JS, Basnight B, Booth BW, Cai J, Cho E, Collins JH, Ge Y, Grothendieck J, Keating K, Marshall T, Persikov A, Scott H, Siegelmann R, Singh M, Taggart A, Toll B, Wan KH, Wyschogrod D, Yaman F, Young EM, Celniker SE, Roehner N. Ensemble Detection of DNA Engineering Signatures. ACS Synth Biol 2024; 13:1105-1115. [PMID: 38468602 DOI: 10.1021/acssynbio.3c00398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Synthetic biology is creating genetically engineered organisms at an increasing rate for many potentially valuable applications, but this potential comes with the risk of misuse or accidental release. To begin to address this issue, we have developed a system called GUARDIAN that can automatically detect signatures of engineering in DNA sequencing data, and we have conducted a blinded test of this system using a curated Test and Evaluation (T&E) data set. GUARDIAN uses an ensemble approach based on the guiding principle that no single approach is likely to be able to detect engineering with perfect accuracy. Critically, ensembling enables GUARDIAN to detect sequence inserts in 13 target organisms with a high degree of specificity that requires no subject matter expert (SME) review.
Collapse
Affiliation(s)
- Aaron Adler
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Joel S Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Brian Basnight
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Benjamin W Booth
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Jitong Cai
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Elizabeth Cho
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Joseph H Collins
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Yuchen Ge
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | | | - Kevin Keating
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Tyler Marshall
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Anton Persikov
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, United States
| | - Helen Scott
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Roy Siegelmann
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, United States
| | | | - Benjamin Toll
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Kenneth H Wan
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | | | - Fusun Yaman
- Raytheon BBN, Cambridge, Massachusetts 02138, United States
| | - Eric M Young
- Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Susan E Celniker
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | | |
Collapse
|
8
|
Spirgel R, Comolli J, Guido NJ. A Machine Learning Method for Genome Engineering Design Tool Attribution. Health Secur 2023; 21:407-414. [PMID: 37594776 DOI: 10.1089/hs.2022.0152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023] Open
Abstract
As the ability to engineer biological systems improves with increasingly advanced technology, the risk of accidental or intentional release of a dangerous genetically modified organism becomes greater. It is important that authorities can carry out attribution for the source of a genetically modified biological agent release. In the absence of evidence that ties a release directly to the individuals responsible, attribution can be carried out in part by discovering the in silico tools used to design the engineered genetic components, which can leave a signature in the DNA of the organism. Previous attribution methods have focused on identifying the laboratory of origin of an engineered organism using machine learning on plasmid signatures. The next logical step is to address attribution using signatures from the tools that are used to create the engineered modifications. A random forest classifier was developed that discriminates between design tools used to optimize coding regions for incorporation into the genome of another organism. To this end, tens of thousands of genes were optimized with 4 different codon optimization methods and relevant features from these sequences were generated for a machine learning classifier. This method achieves more than 97% accuracy in predicting which tools were used to design codon optimized genes for expression in other organisms. The methods presented here lay the groundwork for the creation of effective organism engineering attribution techniques. Such methods can act both as deterrents for future attempts at creating dangerous organisms as well as tools for forensic science.
Collapse
Affiliation(s)
- Rebecca Spirgel
- Rebecca Spirgel, MS, is Associate Technical Staff, Group 23, MIT Lincoln Laboratory, Lexington, MA
| | - James Comolli
- James Comolli, PhD, Group 23, MIT Lincoln Laboratory, Lexington, MA
| | - Nicholas J Guido
- Nicholas J. Guido, PhD, are Technical Staff, Group 23, MIT Lincoln Laboratory, Lexington, MA
| |
Collapse
|
9
|
McGuffie MJ, Barrick JE. Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536277. [PMID: 37090600 PMCID: PMC10120640 DOI: 10.1101/2023.04.10.536277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science.
Collapse
Affiliation(s)
- Matthew J. McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States
| |
Collapse
|
10
|
Crook OM, Warmbrod KL, Lipstein G, Chung C, Bakerlee CW, McKelvey TG, Holland SR, Swett JL, Esvelt KM, Alley EC, Bradshaw WJ. Analysis of the first genetic engineering attribution challenge. Nat Commun 2022; 13:7374. [PMID: 36450726 PMCID: PMC9712580 DOI: 10.1038/s41467-022-35032-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
The ability to identify the designer of engineered biological sequences-termed genetic engineering attribution (GEA)-would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA techniques. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered plasmid sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.
Collapse
Affiliation(s)
- Oliver M Crook
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Kelsey Lane Warmbrod
- Johns Hopkins Center for Health Security, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Institute of Public Health Genetics, University of Washington, Seattle, WA, USA
| | | | | | | | | | | | | | - Kevin M Esvelt
- altLabs Inc, Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ethan C Alley
- altLabs Inc, Berkeley, CA, USA.
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - William J Bradshaw
- altLabs Inc, Berkeley, CA, USA.
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
11
|
Kalendar R, Orbovic V, Egea-Cortines M, Song GQ. Editorial: Recent advances in plant genetic engineering and innovative applications. FRONTIERS IN PLANT SCIENCE 2022; 13:1045417. [PMID: 36340337 PMCID: PMC9629865 DOI: 10.3389/fpls.2022.1045417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Affiliation(s)
- Ruslan Kalendar
- Helsinki Institute of Life Science HiLIFE, University of Helsinki, Helsinki, Finland
- National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Vladimir Orbovic
- University of Florida, Institute of Food and Agricultural Sciences, Lake Alfred, FL, United States
| | | | - Guo-qing Song
- Plant Biotechnology Resource and Outreach Center, Department of Horticulture, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
12
|
Using metric learning to identify the lab-of-origin of engineered DNA. NATURE COMPUTATIONAL SCIENCE 2022; 2:296-297. [PMID: 38177813 DOI: 10.1038/s43588-022-00240-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
13
|
Soares IM, Camargo FHF, Marques A, Crook OM. Improving lab-of-origin prediction of genetically engineered plasmids via deep metric learning. NATURE COMPUTATIONAL SCIENCE 2022; 2:253-264. [PMID: 38177551 DOI: 10.1038/s43588-022-00234-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 03/22/2022] [Indexed: 01/06/2024]
Abstract
Genome engineering is undergoing unprecedented development and is now becoming widely available. Genetic engineering attribution can make sequence-lab associations and assist forensic experts in ensuring responsible biotechnology innovation and reducing misuse of engineered DNA sequences. Here we propose a method based on metric learning to rank the most likely labs of origin while simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employs a circular shift augmentation method and can correctly rank the lab of origin 90% of the time within its top-10 predictions. We also demonstrate that we can perform few-shot learning and obtain 76% top-10 accuracy using only 10% of the sequences. Finally, our approach can also extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
Collapse
Affiliation(s)
| | | | | | - Oliver M Crook
- Oxford Protein Informatics Group, University of Oxford, Oxford, UK.
| |
Collapse
|
14
|
Abstract
PURPOSE OF REVIEW Due to the impact of the COVID-19 pandemic this past year, we have witnessed a significant acceleration in the science, technology, and policy of global health security. This review highlights important progress made toward the mitigation of Zika, Ebola, and COVID-19 outbreaks. These epidemics and their shared features suggest a unified policy and technology agenda that could broadly improve global health security. RECENT FINDINGS Molecular epidemiology is not yet in widespread use, but shows promise toward informing on-the-ground decision-making during outbreaks. Point-of-care (POC) diagnostics have been achieved for each of these threats; however, deployment of Zika and Ebola diagnostics lags behind those for COVID-19. POC metagenomics offers the possibility of identifying novel viruses. Vaccines have been successfully approved for Ebola and COVID-19, due in large part to public-private partnerships and advance purchase commitments. Therapeutics trials conducted during ongoing epidemics have identified effective antibody therapeutics for Ebola, as well as steroids (both inhaled and oral) and a broad-spectrum antiviral for COVID-19. SUMMARY Achieving global health security remains a challenge, though headway has been made over the past years. Promising policy and technology strategies that would increase resilience across emerging viral pathogens should be pursued.
Collapse
Affiliation(s)
| | - Michele Barry
- School of Medicine
- Center for Innovation in Global Health, Stanford University, Stanford, California, USA
| |
Collapse
|
15
|
Wang Q, Kille B, Liu TR, Elworth RAL, Treangen TJ. PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment. Nat Commun 2021; 12:1167. [PMID: 33637701 PMCID: PMC7910462 DOI: 10.1038/s41467-021-21180-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 01/12/2021] [Indexed: 12/26/2022] Open
Abstract
With advances in synthetic biology and genome engineering comes a heightened awareness of potential misuse related to biosafety concerns. A recent study employed machine learning to identify the lab-of-origin of DNA sequences to help mitigate some of these concerns. Despite their promising results, this deep learning based approach had limited accuracy, was computationally expensive to train, and wasn't able to provide the precise features that were used in its predictions. To address these shortcomings, we developed PlasmidHawk for lab-of-origin prediction. Compared to a machine learning approach, PlasmidHawk has higher prediction accuracy; PlasmidHawk can successfully predict unknown sequences' depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. In summary, PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. PlasmidHawk is available at https://gitlab.com/treangenlab/plasmidhawk.git .
Collapse
Affiliation(s)
- Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Rice University, Houston, Texas, 77005, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - Tian Rui Liu
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, Texas, 77005, United States.
| |
Collapse
|
16
|
Lewis G, Jordan JL, Relman DA, Koblentz GD, Leung J, Dafoe A, Nelson C, Epstein GL, Katz R, Montague M, Alley EC, Filone CM, Luby S, Church GM, Millett P, Esvelt KM, Cameron EE, Inglesby TV. The biosecurity benefits of genetic engineering attribution. Nat Commun 2020; 11:6294. [PMID: 33293537 PMCID: PMC7722838 DOI: 10.1038/s41467-020-19149-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 09/28/2020] [Indexed: 11/23/2022] Open
Abstract
Biology can be misused, and the risk of this causing widespread harm increases in step with the rapid march of technological progress. A key security challenge involves attribution: determining, in the wake of a human-caused biological event, who was responsible. Recent scientific developments have demonstrated a capability for detecting whether an organism involved in such an event has been genetically modified and, if modified, to infer from its genetic sequence its likely lab of origin. We believe this technique could be developed into powerful forensic tools to aid the attribution of outbreaks caused by genetically engineered pathogens, and thus protect against the potential misuse of synthetic biology.
Collapse
Affiliation(s)
- Gregory Lewis
- Future of Humanity Institute, Oxford University, Oxford, UK.
- Alt. Technology Labs, Inc., Berkeley, CA, USA.
| | | | - David A Relman
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Microbiology & Immunology, Stanford University School of Medicine; and Center for International Security and Cooperation, Stanford University, Stanford, CA, USA
| | - Gregory D Koblentz
- Schar School of Policy and Government, George Mason University, Washington, DC, USA
| | - Jade Leung
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Allan Dafoe
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Cassidy Nelson
- Future of Humanity Institute, Oxford University, Oxford, UK
| | - Gerald L Epstein
- Center for the Study of Weapons of Mass Destruction, National Defense University, Washington, DC, USA
| | - Rebecca Katz
- Center for Global Health Science and Security, Georgetown University, Washington, DC, USA
| | - Michael Montague
- Center for Health Security, Johns Hopkins University, Baltimore, MD, USA
| | - Ethan C Alley
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Stephen Luby
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - George M Church
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Piers Millett
- Future of Humanity Institute, Oxford University, Oxford, UK
- International Genetically Engineered Machine Competition, Boston, MA, USA
| | - Kevin M Esvelt
- Alt. Technology Labs, Inc., Berkeley, CA, USA
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Thomas V Inglesby
- Center for Health Security, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|