1
|
De Niz M, Pereira SS, Kirchenbuechler D, Lemgruber L, Arvanitis C. Artificial intelligence-powered microscopy: Transforming the landscape of parasitology. J Microsc 2025. [PMID: 40492595 DOI: 10.1111/jmi.13433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 05/16/2025] [Accepted: 05/19/2025] [Indexed: 06/12/2025]
Abstract
Microscopy and image analysis play a vital role in parasitology research; they are critical for identifying parasitic organisms and elucidating their complex life cycles. Despite major advancements in imaging and analysis, several challenges remain. These include the integration of interdisciplinary data; information derived from various model organisms; and data acquired from clinical research. In our view, artificial intelligence-with the latest advances in machine and deep learning-holds enormous potential to address many of these challenges. This review addresses how artificial intelligence, machine learning and deep learning have been used in the field of parasitology-mainly focused on Apicomplexan, Diplomonad, and Kinetoplastid groups. We explore how gaps in our understanding could be filled by AI in future parasitology research and diagnosis in the field. Moreover, it addresses challenges and limitations currently faced in implementing and expanding the use of artificial intelligence across biomedical fields. The necessary increased collaboration between biologists and computational scientists will facilitate understanding, development, and implementation of the latest advances for both scientific discovery and clinical impact. Current and future AI tools hold the potential to revolutionise parasitology and expand One Health principles.
Collapse
Affiliation(s)
- Mariana De Niz
- Center for Advanced Microscopy and Nikon Imaging Center, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Sara Silva Pereira
- Católica Biomedical Research Centre, Católica Medical School, Universidade Católica Portuguesa, Lisbon, Portugal
| | - David Kirchenbuechler
- Center for Advanced Microscopy and Nikon Imaging Center, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Leandro Lemgruber
- Cellular Analysis Facility, MVLS Shared Research Facilities, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Constadina Arvanitis
- Center for Advanced Microscopy and Nikon Imaging Center, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
2
|
Elmanzalawi M, Fujisawa T, Mori H, Nakamura Y, Tanizawa Y. DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes. BMC Bioinformatics 2025; 26:3. [PMID: 39773409 PMCID: PMC11705978 DOI: 10.1186/s12859-024-06030-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 12/27/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses. RESULTS We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool's efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects. CONCLUSIONS DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc .
Collapse
Affiliation(s)
- Mohamed Elmanzalawi
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima, 411-8540, Japan
| | - Takatomo Fujisawa
- Department of Informatics, National Institute of Genetics, Mishima, 411-8540, Japan
| | - Hiroshi Mori
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima, 411-8540, Japan
- Department of Informatics, National Institute of Genetics, Mishima, 411-8540, Japan
| | - Yasukazu Nakamura
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima, 411-8540, Japan
- Department of Informatics, National Institute of Genetics, Mishima, 411-8540, Japan
| | - Yasuhiro Tanizawa
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima, 411-8540, Japan.
- Department of Informatics, National Institute of Genetics, Mishima, 411-8540, Japan.
| |
Collapse
|
3
|
Kim J, Kim E, Yang SM, Park SH, Kim HY. Direct On-Chip Diagnostics of Streptococcus bovis/ Streptococcus equinus Complex in Bovine Mastitis Using Bioinformatics-Driven Portable qPCR. Biomolecules 2024; 14:1624. [PMID: 39766331 PMCID: PMC11726764 DOI: 10.3390/biom14121624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/14/2024] [Accepted: 12/17/2024] [Indexed: 01/15/2025] Open
Abstract
This study introduces an innovative on-site diagnostic method for rapidly detecting the Streptococcus bovis/Streptococcus equinus complex (SBSEC), crucial for livestock health and food safety. Through a comprehensive genomic analysis of 206 genomes, this study identified genetic markers that improved classification and addressed misclassifications, particularly in genomes labeled S. equinus and S. lutetiensis. These markers were integrated into a portable quantitative polymerase chain reaction (qPCR) that can detect SBSEC species with high sensitivity (down to 101 or 100 colony-forming units/mL). The portable system featuring a flat chip and compact equipment allows immediate diagnosis within 30 min. The diagnostic method was validated in field conditions directly from cattle udders, farm environments, and dairy products. Among the 100 samples, 51 tested positive for bacteria associated with mastitis. The performance of this portable qPCR was comparable to laboratory methods, offering a reliable alternative to whole-genome sequencing for early detection in clinical, agricultural, and environmental settings.
Collapse
Affiliation(s)
- Jaewook Kim
- Institute of Life Sciences & Resources, Department of Food Science and Biotechnology, Kyung Hee University, Yongin 17104, Republic of Korea; (J.K.); (E.K.); (S.-M.Y.); (S.H.P.)
| | - Eiseul Kim
- Institute of Life Sciences & Resources, Department of Food Science and Biotechnology, Kyung Hee University, Yongin 17104, Republic of Korea; (J.K.); (E.K.); (S.-M.Y.); (S.H.P.)
| | - Seung-Min Yang
- Institute of Life Sciences & Resources, Department of Food Science and Biotechnology, Kyung Hee University, Yongin 17104, Republic of Korea; (J.K.); (E.K.); (S.-M.Y.); (S.H.P.)
| | - Si Hong Park
- Institute of Life Sciences & Resources, Department of Food Science and Biotechnology, Kyung Hee University, Yongin 17104, Republic of Korea; (J.K.); (E.K.); (S.-M.Y.); (S.H.P.)
- Department of Food Science and Technology, Oregon State University, Corvallis, OR 97331, USA
| | - Hae-Yeong Kim
- Institute of Life Sciences & Resources, Department of Food Science and Biotechnology, Kyung Hee University, Yongin 17104, Republic of Korea; (J.K.); (E.K.); (S.-M.Y.); (S.H.P.)
| |
Collapse
|
4
|
Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (BETHESDA, MD.) 2024; 14:jkae187. [PMID: 39148415 PMCID: PMC11540323 DOI: 10.1093/g3journal/jkae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites, and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualizing two-dimensional representations of read tetranucleotide composition learned by a variational autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualization tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
Collapse
Affiliation(s)
- Claudia C Weber
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
5
|
Fenske L, Jelonek L, Goesmann A, Schwengers O. BakRep - a searchable large-scale web repository for bacterial genomes, characterizations and metadata. Microb Genom 2024; 10:001305. [PMID: 39475723 PMCID: PMC11524574 DOI: 10.1099/mgen.0.001305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 09/19/2024] [Indexed: 11/02/2024] Open
Abstract
Bacteria are fascinating research objects in many disciplines for countless reasons, and whole-genome sequencing (WGS) has become the paramount methodology to advance our microbiological understanding. Meanwhile, access to cost-effective sequencing platforms has accelerated bacterial WGS to unprecedented levels, introducing new challenges in terms of data accessibility, computational demands, heterogeneity of analysis workflows and, thus, ultimately its scientific usability. To this end, a previous study released a uniformly processed set of 661 405 bacterial genome assemblies obtained from the European Nucleotide Archive as of November 2018. Building on these accomplishments, we conducted further genome-based analyses like taxonomic classification, multilocus sequence typing and annotation of all genomes. Here, we present BakRep, a searchable large-scale web repository of these genomes enriched with consistent genome characterizations and original metadata. The platform provides a flexible search engine combining taxonomic, genomic and metadata information, as well as interactive elements to visualize genomic features. Furthermore, all results can be downloaded for offline analyses via an accompanying command line tool. The web repository is accessible via https://bakrep.computational.bio.
Collapse
Affiliation(s)
- Linda Fenske
- Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany
| | - Lukas Jelonek
- Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany
| | - Oliver Schwengers
- Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
6
|
Batovska J, Brohier ND, Mee PT, Constable FE, Rodoni BC, Lynch SE. The Australian Biosecurity Genomic Database: a new resource for high-throughput sequencing analysis based on the National Notifiable Disease List of Terrestrial Animals. Database (Oxford) 2024; 2024:baae084. [PMID: 39197058 PMCID: PMC11352597 DOI: 10.1093/database/baae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/21/2024] [Accepted: 08/07/2024] [Indexed: 08/30/2024]
Abstract
The Australian Biosecurity Genomic Database (ABGD) is a curated collection of reference viral genome sequences based on the Australian National Notifiable Disease List of Terrestrial Animals. It was created to facilitate the screening of high-throughput sequencing (HTS) data for the potential presence of viruses associated with notifiable disease. The database includes a single verified sequence (the exemplar species sequence, where relevant) for each of the 60 virus species across 21 viral families that are associated with or cause these notifiable diseases, as recognized by the World Organisation for Animal Health. The open-source ABGD on GitHub provides usage guidance documents and is intended to support building a culture in Australian HTS communities that promotes the use of quality-assured, standardized, and verified databases for Australia's national biosecurity interests. Future expansion of the database will include the addition of more strains or subtypes for highly variable viruses, viruses causing diseases of aquatic animals, and genomes of other types of pathogens associated with notifiable diseases, such as bacteria. Database URL: https://github.com/ausbiopathgenDB/AustralianBiosecurityGenomicDatabase.
Collapse
Affiliation(s)
- Jana Batovska
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
| | - Natasha D Brohier
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
| | - Peter T Mee
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
- School of Applied Systems Biology (SASB), La Trobe University, Bundoora, Melbourne, Victoria 3086, Australia
| | - Fiona E Constable
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
- School of Applied Systems Biology (SASB), La Trobe University, Bundoora, Melbourne, Victoria 3086, Australia
| | - Brendan C Rodoni
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
- School of Applied Systems Biology (SASB), La Trobe University, Bundoora, Melbourne, Victoria 3086, Australia
| | - Stacey E Lynch
- Agriculture Victoria Research, AgriBio Centre for AgriBioscience, 5 Ring Road, Bundoora, Victoria 3083, Australia
| |
Collapse
|
7
|
Bolaji AJ, Duggan AT. In silico analyses identify sequence contamination thresholds for Nanopore-generated SARS-CoV-2 sequences. PLoS Comput Biol 2024; 20:e1011539. [PMID: 39159257 PMCID: PMC11398645 DOI: 10.1371/journal.pcbi.1011539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 09/13/2024] [Accepted: 07/14/2024] [Indexed: 08/21/2024] Open
Abstract
The SARS-CoV-2 pandemic has brought molecular biology and genomic sequencing into the public consciousness and lexicon. With an emphasis on rapid turnaround, genomic data informed both diagnostic and surveillance decisions for the current pandemic at a previously unheard-of scale. The surge in the submission of genomic data to publicly available databases proved essential as comparing different genome sequences offers a wealth of knowledge, including phylogenetic links, modes of transmission, rates of evolution, and the impact of mutations on infection and disease severity. However, the scale of the pandemic has meant that sequencing runs are rarely repeated due to limited sample material and/or the availability of sequencing resources, resulting in the upload of some imperfect runs to public repositories. As a result, it is crucial to investigate the data obtained from these imperfect runs to determine whether the results are reliable prior to depositing them in a public database. Numerous studies have identified a variety of sources of contamination in public next-generation sequencing (NGS) data as the number of NGS studies increases along with the diversity of sequencing technologies and procedures. For this study, we conducted an in silico experiment with known SARS-CoV-2 sequences produced from Oxford Nanopore Technologies sequencing to investigate the effect of contamination on lineage calls and single nucleotide variants (SNVs). A contamination threshold below which runs are expected to generate accurate lineage calls and maintain genome-relatedness and integrity was identified. Together, these findings provide a benchmark below which imperfect runs may be considered robust for reporting results to both stakeholders and public repositories and reduce the need for repeat or wasted runs.
Collapse
Affiliation(s)
- Ayooluwa J Bolaji
- Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, Canada
- Cadham Provincial Laboratory, Winnipeg, Canada
| | - Ana T Duggan
- Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, Canada
| |
Collapse
|
8
|
Hauptfeld E, Pappas N, van Iwaarden S, Snoek BL, Aldas-Vargas A, Dutilh BE, von Meijenfeldt FAB. Integrating taxonomic signals from MAGs and contigs improves read annotation and taxonomic profiling of metagenomes. Nat Commun 2024; 15:3373. [PMID: 38643272 PMCID: PMC11032395 DOI: 10.1038/s41467-024-47155-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 03/20/2024] [Indexed: 04/22/2024] Open
Abstract
Metagenomic analysis typically includes read-based taxonomic profiling, assembly, and binning of metagenome-assembled genomes (MAGs). Here we integrate these steps in Read Annotation Tool (RAT), which uses robust taxonomic signals from MAGs and contigs to enhance read annotation. RAT reconstructs taxonomic profiles with high precision and sensitivity, outperforming other state-of-the-art tools. In high-diversity groundwater samples, RAT annotates a large fraction of the metagenomic reads, calling novel taxa at the appropriate, sometimes high taxonomic ranks. Thus, RAT integrative profiling provides an accurate and comprehensive view of the microbiome from shotgun metagenomics data. The package of Contig Annotation Tool (CAT), Bin Annotation Tool (BAT), and RAT is available at https://github.com/MGXlab/CAT_pack (from CAT pack v6.0). The CAT pack now also supports Genome Taxonomy Database (GTDB) annotations.
Collapse
Affiliation(s)
- Ernestina Hauptfeld
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Nikolaos Pappas
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Sandra van Iwaarden
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Basten L Snoek
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Andrea Aldas-Vargas
- Environmental Technology, Wageningen University & Research, P.O. Box 17, 6700, EV Wageningen, The Netherlands
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Rosalind Franklin Strasse 1, 07743, Jena, Germany.
| | - F A Bastiaan von Meijenfeldt
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.
- Department of Marine Microbiology and Biogeochemistry (MMB), NIOZ Royal Netherlands Institute for Sea Research, PO Box 59, 1790AB, Den Burg, The Netherlands.
| |
Collapse
|
9
|
Paul B. Concatenated 16S rRNA sequence analysis improves bacterial taxonomy. F1000Res 2023; 11:1530. [PMID: 37767069 PMCID: PMC10521043 DOI: 10.12688/f1000research.128320.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/29/2023] [Indexed: 09/29/2023] Open
Abstract
Background: Microscopic, biochemical, molecular, and computer-based approaches are extensively used to identify and classify bacterial populations. Advances in DNA sequencing and bioinformatics workflows have facilitated sophisticated genome-based methods for microbial taxonomy although sequencing of the 16S rRNA gene is widely employed to identify and classify bacterial communities as a cost-effective and single-gene approach. However, the 16S rRNA sequence-based species identification accuracy is limited because of the occurrence of multiple copies of the 16S rRNA gene and higher sequence identity between closely related species. The availability of the genomes of several bacterial species provided an opportunity to develop comprehensive species-specific 16S rRNA reference libraries. Methods: Sequences of the 16S rRNA genes were retrieved from the whole genomes available in the Genome databases. With defined criteria, four 16S rRNA gene copy variants were concatenated to develop a species-specific reference library. The sequence similarity search was performed with a web-based BLAST program, and MEGA software was used to construct the phylogenetic tree. Results: Using this approach, species-specific 16S rRNA gene libraries were developed for four closely related Streptococcus species ( S. gordonii, S. mitis, S. oralis, and S. pneumoniae). Sequence similarity and phylogenetic analysis using concatenated 16S rRNA copies yielded better resolution than single gene copy approaches. Conclusions: The approach is very effective in classifying genetically closely related bacterial species and may reduce misclassification of bacterial species and genome assemblies.
Collapse
Affiliation(s)
- Bobby Paul
- Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| |
Collapse
|
10
|
Of data and transparency. NATURE COMPUTATIONAL SCIENCE 2023; 3:571. [PMID: 38177745 DOI: 10.1038/s43588-023-00499-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
11
|
Liu Y, Li X, Lin L. Transcriptome of the pygmy grasshopper Formosatettix qinlingensis (Orthoptera: Tetrigidae). PeerJ 2023; 11:e15123. [PMID: 37016680 PMCID: PMC10066883 DOI: 10.7717/peerj.15123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 03/03/2023] [Indexed: 04/03/2023] Open
Abstract
Formosatettix qinlingensis (Zheng, 1982) is a tiny grasshopper endemic to Qinling in China. For further study of its transcriptomic features, we obtained RNA-Seq data by Illumina HiSeq X Ten sequencing platform. Firstly, transcriptomic analysis showed that transcriptome read numbers of two female and one male samples were 25,043,314, 24,429,905, and 25,034,457, respectively. We assembled 65,977 unigenes, their average length was 1,072.09 bp, and the length of N50 was 2,031 bp. The average lengths of F. qinlingensis female and male unigenes were 911.30 bp, and 941.82 bp, and the N50 lengths were 1,745 bp and 1,735 bp, respectively. Eight databases were used to annotate the functions of unigenes, and 23,268 functional unigenes were obtained. Besides, we also studied the body color, immunity and insecticide resistance of F. qinlingensis. Thirty-nine pigment-related genes were annotated. Some immunity genes and signaling pathways were found, such as JAK-STAT and Toll-LIKE receptor signaling pathways. There are also some insecticide resistance genes and signal pathways, like nAChR, GST and DDT. Further, some of these genes were differentially expressed in female and male samples, including pigment, immunity and insecticide resistance. The transcriptomic study of F. qinlingensis will provide data reference for gene prediction and molecular expression study of other Tetrigidae species in the future. Differential genetic screening of males and females provides a basis for studying sex and immune balance in insects.
Collapse
Affiliation(s)
- Yuxin Liu
- Shaanxi Normal University, Xi’an, China
| | | | | |
Collapse
|
12
|
Vuong P, Wise MJ, Whiteley AS, Kaur P. Ten simple rules for investigating (meta)genomic data from environmental ecosystems. PLoS Comput Biol 2022; 18:e1010675. [PMID: 36480496 PMCID: PMC9731419 DOI: 10.1371/journal.pcbi.1010675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Paton Vuong
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
| | - Michael J. Wise
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, Australia
- The Marshall Centre of Infectious Diseases, School of Biological Sciences, The University of Western Australia, Perth, Australia
| | - Andrew S. Whiteley
- Centre for Environment & Life Sciences, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Floreat, Australia
| | - Parwinder Kaur
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
- * E-mail:
| |
Collapse
|
13
|
Goudey B, Geard N, Verspoor K, Zobel J. Propagation, detection and correction of errors using the sequence database network. Brief Bioinform 2022; 23:6764545. [PMID: 36266246 PMCID: PMC9677457 DOI: 10.1093/bib/bbac416] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/31/2022] [Accepted: 08/28/2022] [Indexed: 12/14/2022] Open
Abstract
Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within and across databases, offers new opportunities to detect-or even correct-erroneous entries and more broadly to make inferences about record quality. Here, we describe this novel perspective of sequence database records as a rich network, which we call the sequence database network, and illustrate the opportunities this perspective offers for quantification of database quality and detection of spurious entries. We provide an overview of the relevant databases and describe how the interdependencies between sequence records across these databases can be exploited by network analyses. We review the process of sequence annotation and provide a classification of sources of error, highlighting propagation as a major source. We illustrate the value of a network perspective through three case studies that use network analysis to detect errors, and explore the quality and quantity of critical relationships that would inform such network analyses. This systematic description of a network perspective of sequence database records provides a novel direction to combat the proliferation of errors within these critical bioinformatics resources.
Collapse
Affiliation(s)
- Benjamin Goudey
- Corresponding author. Benjamin Goudey, School of Computing and Information Systems, University of Melbourne Parkville, Victoria, 3010,
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne Parkville, Victoria, 3010
| | - Karin Verspoor
- School of Computing Technologies, RMIT University Melbourne, Victoria, 3000
| | - Justin Zobel
- School of Computing and Information Systems, University of Melbourne Parkville, Victoria, 3010
| |
Collapse
|
14
|
Nassar M, Rogers AB, Talo' F, Sanchez S, Shafique Z, Finn RD, McEntyre J. A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications. Gigascience 2022; 11:giac077. [PMID: 35950838 PMCID: PMC9366992 DOI: 10.1093/gigascience/giac077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/13/2022] [Accepted: 07/12/2022] [Indexed: 11/17/2022] Open
Abstract
Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies, can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especially when comparing results between independent studies, is that key metadata about the sample and molecular methods used to extract and sequence the genetic material are often missing from sequence records, making it difficult to account for confounding factors. Nevertheless, these missing metadata may be found in the narrative of publications describing the research. Here, we describe a machine learning framework that automatically extracts essential metadata for a wide range of metagenomics studies from the literature contained in Europe PMC. This framework has enabled the extraction of metadata from 114,099 publications in Europe PMC, including 19,900 publications describing metagenomics studies in European Nucleotide Archive (ENA) and MGnify. Using this framework, a new metagenomics annotations pipeline was developed and integrated into Europe PMC to regularly enrich up-to-date ENA and MGnify metagenomics studies with metadata extracted from research articles. These metadata are now available for researchers to explore and retrieve in the MGnify and Europe PMC websites, as well as Europe PMC annotations API.
Collapse
Affiliation(s)
- Maaly Nassar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Current affiliation: SciBite - an Elsevier Company, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
| | - Alexander B Rogers
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Francesco Talo'
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Santiago Sanchez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zunaira Shafique
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
15
|
Lelwala RV, LeBlanc Z, Gauthier MEA, Elliott CE, Constable FE, Murphy G, Tyle C, Dinsdale A, Whattam M, Pattemore J, Barrero RA. Implementation of GA-VirReport, a Web-Based Bioinformatics Toolkit for Post-Entry Quarantine Screening of Virus and Viroids in Plants. Viruses 2022; 14:v14071480. [PMID: 35891459 PMCID: PMC9317486 DOI: 10.3390/v14071480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 06/29/2022] [Accepted: 06/29/2022] [Indexed: 02/01/2023] Open
Abstract
High-throughput sequencing (HTS) of host plant small RNA (sRNA) is a popular approach for plant virus and viroid detection. The major bottlenecks for implementing this approach in routine virus screening of plants in quarantine include lack of computational resources and/or expertise in command-line environments and limited availability of curated plant virus and viroid databases. We developed: (1) virus and viroid report web-based bioinformatics workflows on Galaxy Australia called GA-VirReport and GA-VirReport-Stats for detecting viruses and viroids from host plant sRNA extracts and (2) a curated higher plant virus and viroid database (PVirDB). We implemented sRNA sequencing with unique dual indexing on a set of plants with known viruses. Sequencing data were analyzed using GA-VirReport and PVirDB to validate these resources. We detected all known viruses in this pilot study with no cross-sample contamination. We then conducted a large-scale diagnosis of 105 imported plants processed at the post-entry quarantine facility (PEQ), Australia. We detected various pathogens in 14 imported plants and discovered that de novo assembly using 21–22 nt sRNA fraction and the megablast algorithm yielded better sensitivity and specificity. This study reports the successful, large-scale implementation of HTS and a user-friendly bioinformatics workflow for virus and viroid screening of imported plants at the PEQ.
Collapse
Affiliation(s)
- Ruvini V. Lelwala
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Zacharie LeBlanc
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
| | - Marie-Emilie A. Gauthier
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
| | - Candace E. Elliott
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Fiona E. Constable
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| | - Greg Murphy
- Technology Infrastructure Branch, Information Services Division, Department of Agriculture, Fisheries and Forestry, Canberra, ACT 2601, Australia; (G.M.); (C.T.)
| | - Callum Tyle
- Technology Infrastructure Branch, Information Services Division, Department of Agriculture, Fisheries and Forestry, Canberra, ACT 2601, Australia; (G.M.); (C.T.)
| | - Adrian Dinsdale
- Plant Innovation Centre, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (A.D.); (M.W.)
| | - Mark Whattam
- Plant Innovation Centre, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (A.D.); (M.W.)
| | - Julie Pattemore
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Roberto A. Barrero
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
- Correspondence:
| |
Collapse
|
16
|
Abbà S, Rossi M, Vallino M, Galetto L, Marzachì C, Turina M. Metatranscriptomic Assessment of the Microbial Community Associated With the Flavescence dorée Phytoplasma Insect Vector Scaphoideus titanus. Front Microbiol 2022; 13:866523. [PMID: 35516423 PMCID: PMC9063733 DOI: 10.3389/fmicb.2022.866523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Phytoplasmas are insect-borne pathogenic bacteria that cause major economic losses to several crops worldwide. The dynamic microbial community associated with insect vectors influences several aspects of their biology, including their vector competence for pathogens. Unraveling the diversity of the microbiome of phytoplasma insect vectors is gaining increasing importance in the quest to develop novel microbe-based pest control strategies that can minimize the use of insecticides for better environmental quality. The leafhopper Scaphoideus titanus is the primary vector of the Flavescence dorée phytoplasma, a quarantine pest which is dramatically affecting the main grape-growing European countries. In this study, the RNA-Seq data, which were previously used for insect virus discovery, were further explored to assess the composition of the whole microbial community associated with insects caught in the wild in both its native (the United States) and invasive (Europe) areas. The first de novo assembly of the insect transcriptome was used to filter the host sequencing reads. The remaining ones were assembled into contigs and analyzed by blastx to provide the taxonomic identification of the microorganisms associated with S. titanus, including the non-bacterial components. By comparing the transcriptomic libraries, we could differentiate the stable and consistent associations from the more ephemeral and flexible ones. Two species appeared to be universal to the core microbiome of S. titanus: the obligate bacterial symbiont Candidatus Sulcia muelleri and an Ophiocordyceps-allied fungus distantly related to yeast-like symbionts described from other hemipterans. Bacteria of the genus Cardinium have been identified as another dominant member of the microbiome, but only in the European specimens. Although we are yet to witness how the interplay among the microorganisms influences the vector competence of S. titanus, this unbiased in silico characterization of its microbiome is paramount for identifying the naturally occurring targets for new biocontrol strategies to counteract Flavescence dorée spread in Europe.
Collapse
|
17
|
Hubert CB, de Carvalho LPS. Metabolomic approaches for enzyme function and pathway discovery in bacteria. Methods Enzymol 2022; 665:29-47. [DOI: 10.1016/bs.mie.2021.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
The ATCC Genome Portal: Microbial Genome Reference Standards with Data Provenance. Microbiol Resour Announc 2021; 10:e0081821. [PMID: 34817215 PMCID: PMC8612085 DOI: 10.1128/mra.00818-21] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal (https://genomes.atcc.org) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.
Collapse
|
19
|
Morrissey KL, Iveša L, Delva S, D'Hondt S, Willems A, De Clerck O. Impacts of environmental stress on resistance and resilience of algal-associated bacterial communities. Ecol Evol 2021; 11:15004-15019. [PMID: 34765156 PMCID: PMC8571626 DOI: 10.1002/ece3.8184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/18/2022] Open
Abstract
Algal-associated bacteria are fundamental to the ecological success of marine green macroalgae such as Caulerpa. The resistance and resilience of algal-associated microbiota to environmental stress can promote algal health and genetic adaptation to changing environments. The composition of bacterial communities has been shown to be unique to algal morphological niches. Therefore, the level of response to various environmental perturbations may in fact be different for each niche-specific community. Factorial in situ experiments were set up to investigate the effect of nutrient enrichment and temperature stress on the bacterial communities associated with Caulerpa cylindracea. Bacteria were characterized using the 16S rRNA gene, and the community compositions were compared between different parts of the algal thallus (endo-, epi-, and rhizomicrobiome). Resistance and resilience were calculated to further understand the changes of microbial composition in response to perturbations. The results of this study provide evidence that nutrient enrichment has a significant influence on the taxonomic and functional structure of the epimicrobiota, with a low community resistance index observed for both. Temperature and nutrient stress had a significant effect on the rhizomicrobiota taxonomic composition, exhibiting the lowest overall resistance to change. The functional performance of the rhizomicrobiota had low resilience to the combination of stressors, indicating potential additive effects. Interestingly, the endomicrobiota had the highest overall resistance, yet the lowest overall resilience to environmental stress. This further contributes to our understanding of algal microbiome dynamics in response to environmental changes.
Collapse
Affiliation(s)
| | - Ljiljana Iveša
- Center for Marine ResearchRuđer Bošković InstituteRovinjCroatia
| | - Soria Delva
- Phycology Research GroupDepartment of BiologyGhent UniversityGhentBelgium
| | - Sofie D'Hondt
- Phycology Research GroupDepartment of BiologyGhent UniversityGhentBelgium
| | - Anne Willems
- Laboratory of MicrobiologyDepartment of Biochemistry and MicrobiologyGhent UniversityGhentBelgium
| | - Olivier De Clerck
- Phycology Research GroupDepartment of BiologyGhent UniversityGhentBelgium
| |
Collapse
|
20
|
Kapili BJ, Dekas AE. PPIT: an R package for inferring microbial taxonomy from nifH sequences. Bioinformatics 2021; 37:2289-2298. [PMID: 33580675 DOI: 10.1093/bioinformatics/btab100] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 07/22/2020] [Accepted: 02/11/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. RESULTS We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. AVAILABILITY PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bennett J Kapili
- Department of Earth System Science, Stanford University, Stanford, CA, 94305, USA
| | - Anne E Dekas
- Department of Earth System Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|