1
|
Gao Y, Cheng Z, Huang B, Mao Y, Hu J, Wang S, Wang Z, Wang M, Huang S, Han M. Deciphering the profiles and hosts of antibiotic resistance genes and evaluating the risk assessment of general and non-general hospital wastewater by metagenomic sequencing. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 375:126313. [PMID: 40288632 DOI: 10.1016/j.envpol.2025.126313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/28/2025] [Accepted: 04/25/2025] [Indexed: 04/29/2025]
Abstract
Hospital wastewater (HWW) is a substantial environmental reservoir of antibiotic resistance genes (ARGs) and poses risks to public health and aquatic ecosystems. However, research on the diversity, transmission mechanisms, pathogenic hosts, and risks of ARGs in different HWW types is limited. This study involved the collection of HWW samples from 15 hospitals in Hefei, China, which were subsequently categorized as general hospitals (GHs) and non-general hospitals (NGHs). A 280.28-Gbp sequencing dataset was generated using a metagenomic sequencing strategy and analyzed using metagenomic assembly and binning approaches to highlight these issues in GHs and NGHs. Results showed significant differences between GHs and NGHs in ARG distribution, microbial community composition, and hosts of ARGs. Potential pathogens such as Rhodocyclaceae bacterium ICHIAU1 and Acidovorax caeni were more abundant in GHs. Furthermore, plasmid-mediated ARGs (45.21%) were more prevalent than chromosome-mediated ARGs (25.74%) in HWW, with a significantly higher proportion of plasmid-mediated ARGs in GHs compared to NGHs. The co-occurrence of ARGs and mobile genetic elements was more frequent in GHs. Additionally, the antibiotic resistome risk index was higher in GHs (38.73 ± 12.84) than NGHs (22.53 ± 11.80), indicating a greater risk of ARG transmission in GHs. This pioneering study provides valuable insights into the transmission mechanisms and hosts of ARGs in hospital settings, emphasizing the increased risk of ARG transmission in GHs.
Collapse
Affiliation(s)
- Yue Gao
- School of Life Sciences, Anhui Medical University, Hefei, Anhui, 230032, China; Microbial Medicinal Resources Development Research Team, Anhui Provincial Institute of Translational Medicine, China
| | - Zhixiang Cheng
- Department of Blood Transfusion, The Fourth Affiliated Hospital of Anhui Medical University, Hefei, Anhui, 230012, China
| | - Binbin Huang
- Department of Maternal, Child and Adolescent Health, School of Public Health, Anhui Medical University, Hefei, 230032, Anhui, China
| | - Yujie Mao
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China
| | - Jie Hu
- School of Life Sciences, Anhui Medical University, Hefei, Anhui, 230032, China
| | - Shu Wang
- The First People's Hospital of Hefei, The Third Affiliated Hospital of Anhui Medical University, Hefei, 230032, Anhui, China
| | - Zhi Wang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China
| | - Mingchao Wang
- Qingdao University of Science and Technology, Qingdao, 266000, China
| | - Shenghai Huang
- Department of Microbiology, The Institute of Clinical Virology, School of Basic Medical Sciences, Anhui Medical University, Hefei, Anhui 230032, China.
| | - Maozhen Han
- School of Life Sciences, Anhui Medical University, Hefei, Anhui, 230032, China; Microbial Medicinal Resources Development Research Team, Anhui Provincial Institute of Translational Medicine, China; Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China.
| |
Collapse
|
2
|
van Zyl DJ, Dunaiski M, Tegally H, Baxter C, de Oliveira T, Xavier JS. Alignment-free viral sequence classification at scale. BMC Genomics 2025; 26:389. [PMID: 40251515 PMCID: PMC12007369 DOI: 10.1186/s12864-025-11554-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 04/01/2025] [Indexed: 04/20/2025] Open
Abstract
BACKGROUND The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for sequence comparison. Alignment-free (AF) methods offer a scalable alternative to traditional alignment-based approaches such as BLAST. This study evaluates alignment-free methods as scalable and rapid alternatives for viral sequence classification, focusing on identifying techniques that maintain high accuracy and efficiency when applied to extremely large datasets. RESULTS We employed six established AF techniques to extract feature vectors from viral genomes, which were subsequently used to train Random Forest classifiers. Our primary dataset comprises 297,186 SARS-CoV- 2 nucleotide sequences, categorized into 3502 distinct lineages. Furthermore, we validated our models using dengue and HIV sequences to demonstrate robustness across different viral datasets. Our AF classifiers achieved 97.8% accuracy on the SARS-CoV- 2 test set, and 99.8% and 89.1% accuracy on dengue and HIV test sets, respectively. CONCLUSION Despite the high-class dimensionality, we show that word-based AF methods effectively represent viral sequences. Our study highlights the practical advantages of AF techniques, including significantly faster processing compared to alignment-based methods and the ability to classify sequences using modest computational resources.
Collapse
Affiliation(s)
- Daniel J van Zyl
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa.
- Computer Science Division, Department of Mathematical Sciences, Faculty of Science, Stellenbosch University, Stellenbosch, South Africa.
| | - Marcel Dunaiski
- Computer Science Division, Department of Mathematical Sciences, Faculty of Science, Stellenbosch University, Stellenbosch, South Africa
| | - Houriiyah Tegally
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
| | - Cheryl Baxter
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa
| | - Tulio de Oliveira
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa
- Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP), Nelson R Mandela School of Medicine, University of Kwazulu-Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, USA
| | - Joicymara S Xavier
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
- Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri (UFVJM), Unaí, Brazil
- Institute of Biological Sciences, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
3
|
Chemla Y, Sweeney CJ, Wozniak CA, Voigt CA. Design and regulation of engineered bacteria for environmental release. Nat Microbiol 2025; 10:281-300. [PMID: 39905169 DOI: 10.1038/s41564-024-01918-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/04/2024] [Indexed: 02/06/2025]
Abstract
Emerging products of biotechnology involve the release of living genetically modified microbes (GMMs) into the environment. However, regulatory challenges limit their use. So far, GMMs have mainly been tested in agriculture and environmental cleanup, with few approved for commercial purposes. Current government regulations do not sufficiently address modern genetic engineering and limit the potential of new applications, including living therapeutics, engineered living materials, self-healing infrastructure, anticorrosion coatings and consumer products. Here, based on 47 global studies on soil-released GMMs and laboratory microcosm experiments, we discuss the environmental behaviour of released bacteria and offer engineering strategies to help improve performance, control persistence and reduce risk. Furthermore, advanced technologies that improve GMM function and control, but lead to increases in regulatory scrutiny, are reviewed. Finally, we propose a new regulatory framework informed by recent data to maximize the benefits of GMMs and address risks.
Collapse
Affiliation(s)
- Yonatan Chemla
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Connor J Sweeney
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Christopher A Voigt
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
4
|
Argentel-Martínez L, Peñuelas-Rubio O, Herrera-Sepúlveda A, González-Aguilera J, Sudheer S, Salim LM, Lal S, Pradeep CK, Ortiz A, Sansinenea E, Hathurusinghe SHK, Shin JH, Babalola OO, Azizoglu U. Biotechnological advances in plant growth-promoting rhizobacteria for sustainable agriculture. World J Microbiol Biotechnol 2024; 41:21. [PMID: 39738995 DOI: 10.1007/s11274-024-04231-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Accepted: 12/13/2024] [Indexed: 01/02/2025]
Abstract
The rhizosphere, the soil zone surrounding plant roots, serves as a reservoir for numerous beneficial microorganisms that enhance plant productivity and crop yield, with substantial potential for application as biofertilizers. These microbes play critical roles in ecological processes such as nutrient recycling, organic matter decomposition, and mineralization. Plant growth-promoting rhizobacteria (PGPR) represent a promising tool for sustainable agriculture, enabling green management of crop health and growth, being eco-friendly alternatives to replace chemical fertilizers and pesticides. In this sense, biotechnological advancements respecting genomics and gene editing have been crucial to develop microbiome engineering which is pivotal in developing microbial consortia to improve crop production. Genome mining, which involves comprehensive analysis of the entire genome sequence data of PGPR, is crucial for identifying genes encoding valuable bacterial enzymes and metabolites. The CRISPR-Cas system, a cutting-edge genome-editing technology, has shown significant promise in beneficial microbial species. Advances in genetic engineering, particularly CRISPR-Cas, have markedly enhanced grain output, plant biomass, resistance to pests, and the sensory and nutritional quality of crops. There has been a great advance about the use of PGPR in important crops; however, there is a need to go further studying synthetic microbial communities, microbiome engineering, and gene editing approaches in field trials. This review focuses on future research directions involving several factors and topics around the use of PGPR putting special emphasis on biotechnological advances.
Collapse
Affiliation(s)
- Leandris Argentel-Martínez
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico.
| | - Ofelda Peñuelas-Rubio
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico
| | - Angélica Herrera-Sepúlveda
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico
| | - Jorge González-Aguilera
- Department of Agronomy, Universidad Estadual de Mato Grosso Do Sul (UEMS), Cassilândia, MS, 79540-000, Brazil
| | - Surya Sudheer
- Institute of Ecology and Earth Sciences, Department of Botany, University of Tartu, 51005, Tartu, Estonia
| | - Linu M Salim
- Faculty of Fisheries Engineering, Kerala University of Fisheries and Ocean Studies, Cochin, India
| | - Sunaina Lal
- Department of Biochemistry, Sikkim Manipal Institute of Medical Sciences, Gangtok, Sikkim, India
| | - Chittethu Kunjan Pradeep
- Microbiology Division, Jawaharlal Nehru Tropical Botanic Garden & Research Institute, Palode, Thiruvananthapuram, Kerala, 695562, India
| | - Aurelio Ortiz
- Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, C.P. 72570, Puebla, Puebla, México
| | - Estibaliz Sansinenea
- Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, C.P. 72570, Puebla, Puebla, México
| | | | - Jae-Ho Shin
- School of Applied Biosciences, College of Agriculture and Life Sciences, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Olubukola Oluranti Babalola
- Food Security and Safety Focus Area, Faculty of Natural and Agricultural Sciences, North-West University, Private Bag X2046, Mmabatho, 2735, South Africa
| | - Ugur Azizoglu
- Department of Crop and Animal Production, Safiye Cikrikcioglu Vocational College, Kayseri University, Kayseri, Türkiye.
- Genome and Stem Cell Research Center, Erciyes University, Kayseri, Türkiye.
| |
Collapse
|
5
|
Wang C, Mao Y, Zhang L, Wei H, Wang Z. Insight into environmental adaptability of antibiotic resistome from surface water to deep sediments in anthropogenic lakes by metagenomics. WATER RESEARCH 2024; 256:121583. [PMID: 38614031 DOI: 10.1016/j.watres.2024.121583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/06/2024] [Accepted: 04/06/2024] [Indexed: 04/15/2024]
Abstract
The escalating antibiotic resistance threatens the long-term global health. Lake sediment is a vital hotpot in transmitting antibiotic resistance genes (ARGs); however, their vertical distribution pattern and driving mechanisms in sediment cores remain unclear. This study first utilized metagenomics to reveal how resistome is distributed from surface water to 45 cm sediments in four representative lakes, central China. Significant vertical variations in ARG profiles were observed (R2 = 0.421, p < 0.001), with significant reductions in numbers, abundance, and Shannon index from the surface water to deep sediment (all p-values < 0.05). ARGs also has interconnections within the vertical profile of the lakes: twelve ARGs persistently exist all sites and depths, and shared ARGs (e.g., vanS and mexF) were assembled by diverse hosts at varying depths. The 0-18 cm sediment had the highest mobility and health risk of ARGs, followed by the 18-45 cm sediment and water. The drivers of ARGs transformed along the profile of lakes: microbial communities and mobile genetic elements (MGEs) dominated in water, whereas environmental variables gradually become the primary through regulating microbial communities and MGEs with increasing sediment depth. Interestingly, the stochastic process governed ARG assembly, while the stochasticity diminished under the mediation of Chloroflexi, Candidatus Bathyarcaeota and oxidation-reduction potential with increasing depth. Overall, we formulated a conceptual framework to elucidate the vertical environmental adaptability of resistome in anthropogenic lakes. This study shed on the resistance risks and their environmental adaptability from sediment cores, which could reinforce the governance of public health issues.
Collapse
Affiliation(s)
- Cong Wang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yujie Mao
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lu Zhang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Huimin Wei
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhi Wang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430077, China.
| |
Collapse
|
6
|
Gündüz HA, Mreches R, Moosbauer J, Robertson G, To XY, Franzosa EA, Huttenhower C, Rezaei M, McHardy AC, Bischl B, Münch PC, Binder M. Optimized model architectures for deep learning on genomic data. Commun Biol 2024; 7:516. [PMID: 38693292 PMCID: PMC11063068 DOI: 10.1038/s42003-024-06161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 04/08/2024] [Indexed: 05/03/2024] Open
Abstract
The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.
Collapse
Affiliation(s)
- Hüseyin Anil Gündüz
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - René Mreches
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Julia Moosbauer
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Gary Robertson
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Xiao-Yin To
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Eric A Franzosa
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Mina Rezaei
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Alice C McHardy
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
| | - Bernd Bischl
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Philipp C Münch
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany.
| | - Martin Binder
- Department of Statistics, LMU Munich, Munich, Germany.
- Munich Center for Machine Learning, Munich, Germany.
| |
Collapse
|
7
|
Ma C, Liu S, Koslicki D. MetagenomicKG: a knowledge graph for metagenomic applications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585056. [PMID: 38559251 PMCID: PMC10980061 DOI: 10.1101/2024.03.14.585056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Motivation The sheer volume and variety of genomic content within microbial communities makes metagenomics a field rich in biomedical knowledge. To traverse these complex communities and their vast unknowns, metagenomic studies often depend on distinct reference databases, such as the Genome Taxonomy Database (GTDB), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), for various analytical purposes. These databases are crucial for genetic and functional annotation of microbial communities. Nevertheless, the inconsistent nomenclature or identifiers of these databases present challenges for effective integration, representation, and utilization. Knowledge graphs (KGs) offer an appropriate solution by organizing biological entities and their interrelations into a cohesive network. The graph structure not only facilitates the unveiling of hidden patterns but also enriches our biological understanding with deeper insights. Despite KGs having shown potential in various biomedical fields, their application in metagenomics remains underexplored. Results We present MetagenomicKG, a novel knowledge graph specifically tailored for metagenomic analysis. MetagenomicKG integrates taxonomic, functional, and pathogenesis-related information from widely used databases, and further links these with established biomedical knowledge graphs to expand biological connections. Through several use cases, we demonstrate its utility in enabling hypothesis generation regarding the relationships between microbes and diseases, generating sample-specific graph embeddings, and providing robust pathogen prediction. Availability and Implementation The source code and technical details for constructing the MetagenomicKG and reproducing all analyses are available at Github: https://github.com/KoslickiLab/MetagenomicKG. We also host a Neo4j instance: http://mkg.cse.psu.edu:7474 for accessing and querying this graph.
Collapse
Affiliation(s)
- Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - David Koslicki
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Biology, Pennsylvania State University, State College, Pennsylvania, USA
- The One Health Microbiome Center, Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| |
Collapse
|
8
|
Cho HJ, Wang Z, Cong Y, Bekiranov S, Zhang A, Zang C. DARDN: A Deep-Learning Approach for CTCF Binding Sequence Classification and Oncogenic Regulatory Feature Discovery. Genes (Basel) 2024; 15:144. [PMID: 38397134 PMCID: PMC10888155 DOI: 10.3390/genes15020144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 02/25/2024] Open
Abstract
Characterization of gene regulatory mechanisms in cancer is a key task in cancer genomics. CCCTC-binding factor (CTCF), a DNA binding protein, exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription programs by cooperating with transcription factors bound at flanking distal regions. Identification of DNA sequence features from a broad genomic region that distinguish cancer-specific CTCF binding sites from regular CTCF binding sites can help find oncogenic transcription factors in a cancer type. However, the presence of long DNA sequences without localization information makes it difficult to perform conventional motif analysis. Here, we present DNAResDualNet (DARDN), a computational method that utilizes convolutional neural networks (CNNs) for predicting cancer-specific CTCF binding sites from long DNA sequences and employs DeepLIFT, a method for interpretability of deep learning models that explains the model's output in terms of the contributions of its input features. The method is used for identifying DNA sequence features associated with cancer-specific CTCF binding. Evaluation on DNA sequences associated with CTCF binding sites in T-cell acute lymphoblastic leukemia (T-ALL) and other cancer types demonstrates DARDN's ability in classifying DNA sequences surrounding cancer-specific CTCF binding from control constitutive CTCF binding and identifying sequence motifs for transcription factors potentially active in each specific cancer type. We identify potential oncogenic transcription factors in T-ALL, acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD). Our work demonstrates the power of advanced machine learning and feature discovery approach in finding biologically meaningful information from complex high-throughput sequencing data.
Collapse
Affiliation(s)
- Hyun Jae Cho
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA;
| | - Zhenjia Wang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA; (Z.W.); (Y.C.)
| | - Yidan Cong
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA; (Z.W.); (Y.C.)
| | - Stefan Bekiranov
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA;
| | - Aidong Zhang
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA;
| | - Chongzhi Zang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA; (Z.W.); (Y.C.)
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA;
| |
Collapse
|
9
|
Ming Z, Chen X, Wang S, Liu H, Yuan Z, Wu M, Xia H. HostNet: improved sequence representation in deep neural networks for virus-host prediction. BMC Bioinformatics 2023; 24:455. [PMID: 38041071 PMCID: PMC10691023 DOI: 10.1186/s12859-023-05582-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 11/24/2023] [Indexed: 12/03/2023] Open
Abstract
BACKGROUND The escalation of viruses over the past decade has highlighted the need to determine their respective hosts, particularly for emerging ones that pose a potential menace to the welfare of both human and animal life. Yet, the traditional means of ascertaining the host range of viruses, which involves field surveillance and laboratory experiments, is a laborious and demanding undertaking. A computational tool with the capability to reliably predict host ranges for novel viruses can provide timely responses in the prevention and control of emerging infectious diseases. The intricate nature of viral-host prediction involves issues such as data imbalance and deficiency. Therefore, developing highly accurate computational tools capable of predicting virus-host associations is a challenging and pressing demand. RESULTS To overcome the challenges of virus-host prediction, we present HostNet, a deep learning framework that utilizes a Transformer-CNN-BiGRU architecture and two enhanced sequence representation modules. The first module, k-mer to vector, pre-trains a background vector representation of k-mers from a broad range of virus sequences to address the issue of data deficiency. The second module, an adaptive sliding window, truncates virus sequences of various lengths to create a uniform number of informative and distinct samples for each sequence to address the issue of data imbalance. We assess HostNet's performance on a benchmark dataset of "Rabies lyssavirus" and an in-house dataset of "Flavivirus". Our results show that HostNet surpasses the state-of-the-art deep learning-based method in host-prediction accuracies and F1 score. The enhanced sequence representation modules, significantly improve HostNet's training generalization, performance in challenging classes, and stability. CONCLUSION HostNet is a promising framework for predicting virus hosts from genomic sequences, addressing challenges posed by sparse and varying-length virus sequence data. Our results demonstrate its potential as a valuable tool for virus-host prediction in various biological contexts. Virus-host prediction based on genomic sequences using deep neural networks is a promising approach to identifying their potential hosts accurately and efficiently, with significant impacts on public health, disease prevention, and vaccine development.
Collapse
Affiliation(s)
- Zhaoyan Ming
- School of Computer and Computing Science, Hangzhou City University, Hangzhou, 310015, China
| | - Xiangjun Chen
- Polytechnic Institute, Zhejiang University, Hangzhou, 310058, China
| | - Shunlong Wang
- Key Laboratory of Virology and Biosafety, Wuhan Institute of Virology, Wuhan, 430071, China
- University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Hong Liu
- Institute of Biomedicine, Shandong University of Technology, Zibo, 255000, China
| | - Zhiming Yuan
- Key Laboratory of Virology and Biosafety, Wuhan Institute of Virology, Wuhan, 430071, China
- University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Minghui Wu
- School of Computer and Computing Science, Hangzhou City University, Hangzhou, 310015, China.
| | - Han Xia
- Key Laboratory of Virology and Biosafety, Wuhan Institute of Virology, Wuhan, 430071, China.
- University of Chinese Academy of Sciences, Beijing, 100190, China.
- Hubei Jiangxia Laboratory, Wuhan, 430200, China.
| |
Collapse
|
10
|
Roev GV, Borisova NI, Chistyakova NV, Agletdinov MR, Akimkin VG, Khafizov K. Unlocking the Viral Universe: Metagenomic Analysis of Bat Samples Using Next-Generation Sequencing. Microorganisms 2023; 11:2532. [PMID: 37894190 PMCID: PMC10608967 DOI: 10.3390/microorganisms11102532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/02/2023] [Accepted: 10/05/2023] [Indexed: 10/29/2023] Open
Abstract
Next-generation sequencing technologies have revolutionized the field of virology by enabling the reading of complete viral genomes, extensive metagenomic studies, and the identification of novel viral pathogens. Although metagenomic sequencing has the advantage of not requiring specific probes or primers, it faces significant challenges in analyzing data and identifying novel viruses. Traditional bioinformatics tools for sequence identification mainly depend on homology-based strategies, which may not allow the detection of a virus significantly different from known variants due to the extensive genetic diversity and rapid evolution of viruses. In this work, we performed metagenomic analysis of bat feces from different Russian cities and identified a wide range of viral pathogens. We then selected sequences with minimal homology to a known picornavirus and used "Switching Mechanism at the 5' end of RNA Template" technology to obtain a longer genome fragment, allowing for more reliable identification. This study emphasizes the importance of integrating advanced computational methods with experimental strategies for identifying unknown viruses to better understand the viral universe.
Collapse
Affiliation(s)
- German V. Roev
- Central Research Institute of Epidemiology, 111123 Moscow, Russia
- Moscow Institute of Physics and Technology, National Research University, 115184 Dolgoprudny, Russia
| | | | - Nadezhda V. Chistyakova
- A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, 119071 Moscow, Russia
| | - Matvey R. Agletdinov
- Central Research Institute of Epidemiology, 111123 Moscow, Russia
- Moscow Institute of Physics and Technology, National Research University, 115184 Dolgoprudny, Russia
| | | | - Kamil Khafizov
- Central Research Institute of Epidemiology, 111123 Moscow, Russia
| |
Collapse
|
11
|
Jung J, Yoo S. Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches. Genes (Basel) 2023; 14:1820. [PMID: 37761960 PMCID: PMC10530902 DOI: 10.3390/genes14091820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open
Abstract
Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.
Collapse
Affiliation(s)
- Jinmyung Jung
- Division of Data Science, College of Information and Communication Technology, The University of Suwon, Hwaseong 18323, Republic of Korea
| | - Sunyong Yoo
- Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 61005, Republic of Korea
| |
Collapse
|
12
|
Johnson MA, Vinatzer BA, Li S. Reference-Free Plant Disease Detection Using Machine Learning and Long-Read Metagenomic Sequencing. Appl Environ Microbiol 2023; 89:e0026023. [PMID: 37184398 PMCID: PMC10304783 DOI: 10.1128/aem.00260-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 04/14/2023] [Indexed: 05/16/2023] Open
Abstract
Surveillance for early disease detection is crucial to reduce the threat of plant diseases to food security. Metagenomic sequencing and taxonomic classification have recently been used to detect and identify plant pathogens. However, for an emerging pathogen, its genome may not be similar enough to any public genome to permit reference-based tools to identify infected samples. Also, in the case of point-of care diagnosis in the field, database access may be limited. Therefore, here we explore reference-free detection of plant pathogens using metagenomic sequencing and machine learning (ML). We used long-read metagenomes from healthy and infected plants as our model system and constructed k-mer frequency tables to test eight different ML models. The accuracy in classifying individual reads as coming from a healthy or infected metagenome were compared. Of all models, random forest (RF) had the best combination of short run-time and high accuracy (over 0.90) using tomato metagenomes. We further evaluated the RF model with a different tomato sample infected with the same pathogen or a different pathogen and a grapevine sample infected with a grapevine pathogen and achieved similar performances. ML models can thus learn features to successfully perform reference-free detection of plant diseases whereby a model trained with one pathogen-host system can also be used to detect different pathogens on different hosts. Potential and challenges of applying ML to metagenomics in plant disease detection are discussed. IMPORTANCE Climate change may lead to the emergence of novel plant diseases caused by yet unknown pathogens. Surveillance for emerging plant diseases is crucial to reduce their threat to food security. However, conventional genomic based methods require knowledge of existing plant pathogens and cannot be applied to detecting newly emerged pathogens. In this work, we explored reference-free, meta-genomic sequencing-based disease detection using machine learning. By sequencing the genomes of all microbial species extracted from an infected plant sample, we were able to train machine learning models to accurately classify individual sequencing reads as coming from a healthy or an infected plant sample. This method has the potential to be integrated into a generic pipeline for a meta-genomic based plant disease surveillance approach but also has limitations that still need to be overcome.
Collapse
Affiliation(s)
- Marcela A. Johnson
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
- Graduate Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, Virginia, USA
| | - Boris A. Vinatzer
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
| |
Collapse
|
13
|
Naor-Hoffmann S, Svetlitsky D, Sal-Man N, Orenstein Y, Ziv-Ukelson M. Predicting the pathogenicity of bacterial genomes using widely spread protein families. BMC Bioinformatics 2022; 23:253. [PMID: 35751023 PMCID: PMC9233384 DOI: 10.1186/s12859-022-04777-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/13/2022] [Indexed: 11/15/2022] Open
Abstract
Background The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. Results We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host.
Collapse
Affiliation(s)
- Shaked Naor-Hoffmann
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Dina Svetlitsky
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Michal Ziv-Ukelson
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel.
| |
Collapse
|
14
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
15
|
Mavaie P, Holder L, Beck D, Skinner MK. Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach. BMC Bioinformatics 2021; 22:575. [PMID: 34847877 PMCID: PMC8630850 DOI: 10.1186/s12859-021-04491-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/18/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Deep learning is an active bioinformatics artificial intelligence field that is useful in solving many biological problems, including predicting altered epigenetics such as DNA methylation regions. Deep learning (DL) can learn an informative representation that addresses the need for defining relevant features. However, deep learning models are computationally expensive, and they require large training datasets to achieve good classification performance. RESULTS One approach to addressing these challenges is to use a less complex deep learning network for feature selection and Machine Learning (ML) for classification. In the current study, we introduce a hybrid DL-ML approach that uses a deep neural network for extracting molecular features and a non-DL classifier to predict environmentally responsive transgenerational differential DNA methylated regions (DMRs), termed epimutations, based on the extracted DL-based features. Various environmental toxicant induced epigenetic transgenerational inheritance sperm epimutations were used to train the model on the rat genome DNA sequence and use the model to predict transgenerational DMRs (epimutations) across the entire genome. CONCLUSION The approach was also used to predict potential DMRs in the human genome. Experimental results show that the hybrid DL-ML approach outperforms deep learning and traditional machine learning methods.
Collapse
Affiliation(s)
- Pegah Mavaie
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164-2752, USA
| | - Lawrence Holder
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164-2752, USA.
| | - Daniel Beck
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA, 99164-4236, USA
| | - Michael K Skinner
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA, 99164-4236, USA.
| |
Collapse
|
16
|
The fate of plant growth-promoting rhizobacteria in soilless agriculture: future perspectives. 3 Biotech 2021; 11:382. [PMID: 34350087 DOI: 10.1007/s13205-021-02941-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
The application of plant growth-promoting rhizobacteria (PGPRs) can be an excellent and eco-friendly alternative to the use of chemical fertilizers. While PGPRs are often used in traditional agriculture to facilitate yield increases, their use in soilless agriculture has been limited. Soilless agriculture is growing in popularity among commercial farmers because it eliminates soil-borne problems, and the essential strategy is to keep the system as clean as possible. However, a new trend is the inclusion of PGPRs to enhance plant development. Despite the plethora of research that has been performed to date, there remains a huge knowledge gap that needs to be addressed to facilitate the commercialization of PGPRs for sustainable soilless agriculture. Hence, the development of proper strategies and additional research and trials are required. The present review provides an update on recent developments in the use of PGPRs in soilless agriculture, examining these bacteria from different perspectives in an attempt to generate critical discussion and aid in the understanding of the interaction between soilless agriculture and PGPRs.
Collapse
|
17
|
Abstract
Current studies on environmental chemistry mainly focus on a single stressor or single group of stressors, which does not reflect the multiple stressors in the dynamic exposome we are facing. Similarly, current studies on environmental toxicology mostly target humans, animals, or the environment separately, which are inadequate to solve the grand challenge of multiple receptors in One Health. Though chemical, biological, and physical stressors all pose health threats, the susceptibilities of different organisms are different. As such, significant relationships and interactions of the chemical, biological, and physical stressors in the environment and their holistic environmental and biological consequences remain unclear. Fortunately, the rapid developments in various techniques, as well as the concepts of multistressors in the exposome and multireceptor in One Health provide the possibilities to understand our environment better. Since the combined stressor is location-specific and mixture toxicity is species-specific, more comprehensive frameworks to guide risk assessment and environmental treatment are urgently needed. Here, three conceptual frameworks to categorize unknown stressors, spatially visualize the riskiest stressors, and investigate the combined effects of multiple stressors across multiple species within the concepts of the exposome and One Health are proposed for the first time.
Collapse
Affiliation(s)
- Peng Gao
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94304, United States
| |
Collapse
|
18
|
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genom Bioinform 2021; 3:lqab004. [PMID: 33554119 PMCID: PMC7849996 DOI: 10.1093/nargab/lqab004] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 01/21/2023] Open
Abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| | - Anja Seidel
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| |
Collapse
|
19
|
O'Brien JT, Nelson C. Assessing the Risks Posed by the Convergence of Artificial Intelligence and Biotechnology. Health Secur 2020; 18:219-227. [PMID: 32559154 PMCID: PMC7310294 DOI: 10.1089/hs.2019.0122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 03/04/2020] [Accepted: 04/29/2020] [Indexed: 12/22/2022] Open
Abstract
Rapid developments are currently taking place in the fields of artificial intelligence (AI) and biotechnology, and applications arising from the convergence of these 2 fields are likely to offer immense opportunities that could greatly benefit human health and biosecurity. The combination of AI and biotechnology could potentially lead to breakthroughs in precision medicine, improved biosurveillance, and discovery of novel medical countermeasures as well as facilitate a more effective public health emergency response. However, as is the case with many preceding transformative technologies, new opportunities often present new risks in parallel. Understanding the current and emerging risks at the intersection of AI and biotechnology is crucial for health security specialists and unlikely to be achieved by examining either field in isolation. Uncertainties multiply as technologies merge, showcasing the need to identify robust assessment frameworks that could adequately analyze the risk landscape emerging at the convergence of these 2 domains.This paper explores the criteria needed to assess risks associated with Artificial intelligence and biotechnology and evaluates 3 previously published risk assessment frameworks. After highlighting their strengths and limitations and applying to relevant Artificial intelligence and biotechnology examples, the authors suggest a hybrid framework with recommendations for future approaches to risk assessment for convergent technologies.
Collapse
Affiliation(s)
- John T. O'Brien
- John T. O'Brien, MS, is a Research Associate, Bipartisan Commission on Biodefense, Washington, DC
| | - Cassidy Nelson
- Cassidy Nelson, MBBS, MPH, is a Research Scholar, Future of Humanity Institute, University of Oxford, Oxford, UK
| |
Collapse
|
20
|
Chen J, Karanth S, Pradhan AK. Quantitative microbial risk assessment for Salmonella: Inclusion of whole genome sequencing and genomic epidemiological studies, and advances in the bioinformatics pipeline. JOURNAL OF AGRICULTURE AND FOOD RESEARCH 2020; 2:100045. [DOI: 10.1016/j.jafr.2020.100045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
21
|
Koo PK, Ploenzke M. Deep learning for inferring transcription factor binding sites. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 19:16-23. [PMID: 32905524 PMCID: PMC7469942 DOI: 10.1016/j.coisb.2020.04.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Deep learning is a powerful tool for predicting transcription factor binding sites from DNA sequence. Despite their high predictive accuracy, there are no guarantees that a high-performing deep learning model will learn causal sequence-function relationships. Thus a move beyond performance comparisons on benchmark datasets is needed. Interpreting model predictions is a powerful approach to identify which features drive performance gains and ideally provide insight into the underlying biological mechanisms. Here we highlight timely advances in deep learning for genomics, with a focus on inferring transcription factors binding sites. We describe recent applications, model architectures, and advances in local and global model interpretability methods, then conclude with a discussion on future research directions.
Collapse
Affiliation(s)
- Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Matt Ploenzke
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|