1
|
Wijaya AJ, Anžel A, Richard H, Hattab G. Current state and future prospects of Horizontal Gene Transfer detection. NAR Genom Bioinform 2025; 7:lqaf005. [PMID: 39935761 PMCID: PMC11811736 DOI: 10.1093/nargab/lqaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/26/2024] [Accepted: 02/04/2025] [Indexed: 02/13/2025] Open
Abstract
Artificial intelligence (AI) has been shown to be beneficial in a wide range of bioinformatics applications. Horizontal Gene Transfer (HGT) is a driving force of evolutionary changes in prokaryotes. It is widely recognized that it contributes to the emergence of antimicrobial resistance (AMR), which poses a particularly serious threat to public health. Many computational approaches have been developed to study and detect HGT. However, the application of AI in this field has not been investigated. In this work, we conducted a review to provide information on the current trend of existing computational approaches for detecting HGT and to decipher the use of AI in this field. Here, we show a growing interest in HGT detection, characterized by a surge in the number of computational approaches, including AI-based approaches, in recent years. We organize existing computational approaches into a hierarchical structure of computational groups based on their computational methods and show how each computational group evolved. We make recommendations and discuss the challenges of HGT detection in general and the adoption of AI in particular. Moreover, we provide future directions for the field of HGT detection.
Collapse
Affiliation(s)
- Andre Jatmiko Wijaya
- Center for Artificial Intelligent in Public Health Research (ZKI-PH), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität, Arnimallee 14, 14195 Berlin, Germany
- Genome Competence Center (MF1), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligent in Public Health Research (ZKI-PH), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | - Hugues Richard
- Genome Competence Center (MF1), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligent in Public Health Research (ZKI-PH), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität, Arnimallee 14, 14195 Berlin, Germany
| |
Collapse
|
2
|
Chakraborty J, Roy RP, Chatterjee R, Chaudhuri P. Performance assessment of genomic island prediction tools with an improved version of Design-Island. Comput Biol Chem 2022; 98:107698. [PMID: 35597186 DOI: 10.1016/j.compbiolchem.2022.107698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 04/01/2022] [Accepted: 05/11/2022] [Indexed: 11/03/2022]
Abstract
Genomic Islands (GIs) play an important role in the evolution and adaptation of prokaryotes. The origin and extent of ecological diversity of prokaryotes can be analyzed by comparing GIs across closely or distantly related prokaryotes. Understanding the importance of GI and to study the bacterial evolution, several GI prediction tools have been generated. An unsupervised method, Design-Island, was developed to identify GIs using Monte-Carlo statistical test on randomly selected segments of a chromosome. Here, in the present study Design-Island was modified with the incorporation of majority voting, multiple hypothesis testing correction. The performance of the modified version, Design-Island-II was tested and compared with the existing GI prediction tools. The performance assessment and benchmarking of the GI prediction tools require experimentally validated dataset, which is lacking. So, different datasets, generated or taken from literature were utilized to compare the sensitivity (SN), specificity (SP), precision (PPV) and accuracy (AC) of Design-Island-II. It showed substantial enhancement in term of SN, SP, PPV and AC, and significantly reduced the computation time of the algorithm. The performance of Design-Island-II has also been compared with several GI prediction tools using curated dataset of putative horizontally transferred genes. Design-Island-II showed the highest sensitivity and F1 score, comparable specificity, precision and accuracy in comparison to the other available methods. IslandViewer4 and Islander outperformed all the available methods in terms of AC and PPV respectively. Our study suggested Design-Island-II, IslandViewer4 and GIHunter among the top performing GI prediction tools considering both sensitivity and specificity of the methods.
Collapse
Affiliation(s)
- Joyeeta Chakraborty
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Rudra Prasad Roy
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Raghunath Chatterjee
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Probal Chaudhuri
- Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| |
Collapse
|
3
|
Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9969751. [PMID: 34122622 PMCID: PMC8169257 DOI: 10.1155/2021/9969751] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/14/2021] [Indexed: 12/02/2022]
Abstract
Genomic islands are related to microbial adaptation and carry different genomic characteristics from the host. Therefore, many methods have been proposed to detect genomic islands from the rest of the genome by evaluating its sequence composition. Many sequence features have been proposed, but many of them have not been applied to the identification of genomic islands. In this paper, we present a scheme to predict genomic islands using the chi-square test and random forest algorithm. We extract seven kinds of sequence features and select the important features with the chi-square test. All the selected features are then input into the random forest to predict the genome islands. Three experiments and comparison show that the proposed method achieves the best performance. This understanding can be useful to design more powerful method for the genomic island prediction.
Collapse
|
4
|
Genomic islands and the evolution of livestock-associated Staphylococcus aureus genomes. Biosci Rep 2021; 40:226941. [PMID: 33185245 PMCID: PMC7689654 DOI: 10.1042/bsr20202287] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 09/23/2020] [Accepted: 10/07/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Genomic Islands (GIs) are commonly believed to be relics of horizontal transfer and associated with specific metabolic capacities, including virulence of the strain. Horizontal gene transfer (HGT) plays a vital role in the acquisition of GIs and the evolution and adaptation of bacterial genomes. OBJECTIVE The present study was designed to predict the GIs and role of HGT in evolution of livestock-associated Staphylococcus aureus (LA-SA). METHODS GIs were predicted with two methods namely, Ensemble algorithm for Genomic Island Detection (EGID) tool, and Seq word Sniffer script. Functional characterization of GI elements was performed with clustering of orthologs. The putative donor predictions of GIs was done with the aid of the pre_GI database. RESULTS The present study predicted a pan of 46 GIs across the LA-SA genomes. Functional characterization of GI sequences revealed few unique results like the presence of metabolic operons like leuABCD and folPK genes in GIs and showed the importance of GIs in the adaptation to the host niche. The developed framework for GI donor prediction results revealed Rickettsia and Mycoplasma as the major donors of GI elements. CONCLUSIONS The role of GIs during the evolutionary race of LA-SA could be concluded from the present study. Niche adaptation of LA-SA enhanced presumably due to these GIs. Future studies could focus on the evolutionary relationships between Rickettsia and Mycoplasma sp. with S. aureus and also the evolution of Leucine/Isoleucine mosaic operon (leuABCD).
Collapse
|
5
|
Bertelli C, Tilley KE, Brinkman FSL. Microbial genomic island discovery, visualization and analysis. Brief Bioinform 2020; 20:1685-1698. [PMID: 29868902 PMCID: PMC6917214 DOI: 10.1093/bib/bby042] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/30/2018] [Indexed: 12/27/2022] Open
Abstract
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.
Collapse
Affiliation(s)
- Claire Bertelli
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Keith E Tilley
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
6
|
Mageeney CM, Lau BY, Wagner JM, Hudson CM, Schoeniger JS, Krishnakumar R, Williams KP. New candidates for regulated gene integrity revealed through precise mapping of integrative genetic elements. Nucleic Acids Res 2020; 48:4052-4065. [PMID: 32182341 PMCID: PMC7192596 DOI: 10.1093/nar/gkaa156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 02/26/2020] [Accepted: 02/28/2020] [Indexed: 12/12/2022] Open
Abstract
Integrative genetic elements (IGEs) are mobile multigene DNA units that integrate into and excise from host bacterial genomes. Each IGE usually targets a specific site within a conserved host gene, integrating in a manner that preserves target gene function. However, a small number of bacterial genes are known to be inactivated upon IGE integration and reactivated upon excision, regulating phenotypes of virulence, mutation rate, and terminal differentiation in multicellular bacteria. The list of regulated gene integrity (RGI) cases has been slow-growing because IGEs have been challenging to precisely and comprehensively locate in genomes. We present software (TIGER) that maps IGEs with unprecedented precision and without attB site bias. TIGER uses a comparative genomic, ping-pong BLAST approach, based on the principle that the IGE integration module (i.e. its int-attP region) is cohesive. The resultant IGEs from 2168 genomes, along with integrase phylogenetic analysis and gene inactivation tests, revealed 19 new cases of genes whose integrity is regulated by IGEs (including dut, eccCa1, gntT, hrpB, merA, ompN, prkA, tqsA, traG, yifB, yfaT and ynfE), as well as recovering previously known cases (in sigK, spsM, comK, mlrA and hlb genes). It also recovered known clades of site-promiscuous integrases and identified possible new ones.
Collapse
Affiliation(s)
- Catherine M Mageeney
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Britney Y Lau
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Julian M Wagner
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Corey M Hudson
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Joseph S Schoeniger
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Raga Krishnakumar
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Kelly P Williams
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| |
Collapse
|
7
|
2SigFinder: the combined use of small-scale and large-scale statistical testing for genomic island detection from a single genome. BMC Bioinformatics 2020; 21:159. [PMID: 32349677 PMCID: PMC7191778 DOI: 10.1186/s12859-020-3501-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 04/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomic islands are associated with microbial adaptations, carrying genomic signatures different from the host. Some methods perform an overall test to identify genomic islands based on their local features. However, regions of different scales will display different genomic features. RESULTS We proposed here a novel method "2SigFinder ", the first combined use of small-scale and large-scale statistical testing for genomic island detection. The proposed method was tested by genomic island boundary detection and identification of genomic islands or functional features of real biological data. We also compared the proposed method with the comparative genomics and composition-based approaches. The results indicate that the proposed 2SigFinder is more efficient in identifying genomic islands. CONCLUSIONS From real biological data, 2SigFinder identified genomic islands from a single genome and reported robust results across different experiments, without annotated information of genomes or prior knowledge from other datasets. 2SigHunter identified 25 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats from 27 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats, and detected 101 Phage and 28 HEG out of 130 Phage and 36 HEGs in S. enterica Typhi CT18, which shows that it is more efficient in detecting functional features associated with GIs.
Collapse
|
8
|
Dai Q, Bao C, Hai Y, Ma S, Zhou T, Wang C, Wang Y, Huo W, Liu X, Yao Y, Xuan Z, Chen M, Zhang MQ. MTGIpick allows robust identification of genomic islands from a single genome. Brief Bioinform 2019; 19:361-373. [PMID: 28025178 DOI: 10.1093/bib/bbw118] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomic islands (GIs) that are associated with microbial adaptations and carry sequence patterns different from that of the host are sporadically distributed among closely related species. This bias can dominate the signal of interest in GI detection. However, variations still exist among the segments of the host, although no uniform standard exists regarding the best methods of discriminating GIs from the rest of the genome in terms of compositional bias. In the present work, we proposed a robust software, MTGIpick, which used regions with pattern bias showing multiscale difference levels to identify GIs from the host. MTGIpick can identify GIs from a single genome without annotated information of genomes or prior knowledge from other data sets. When real biological data were used, MTGIpick demonstrated better performance than existing methods, as well as revealed potential GIs with accurate sizes missed by existing methods because of a uniform standard. Software and supplementary are freely available at http://bioinfo.zstu.edu.cn/MTGI or https://github.com/bioinfo0706/MTGIpick.
Collapse
Affiliation(s)
- Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.,Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yabing Hai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Sheng Ma
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Tao Zhou
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Cong Wang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yunfei Wang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Wenwen Huo
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Zhenyu Xuan
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA.,Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
9
|
Tao J, Liu X, Yang S, Bao C, He P, Dai Q. An efficient genomic signature ranking method for genomic island prediction from a single genome. J Theor Biol 2019; 467:142-149. [PMID: 30768974 DOI: 10.1016/j.jtbi.2019.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 02/07/2019] [Accepted: 02/11/2019] [Indexed: 01/13/2023]
Abstract
Genomic islands that are associated with microbial adaptations and carry genomic signatures different from that of the host, and thus many methods have been proposed to select the informative genomic signatures from a range of organisms and discriminate genomic islands from the rest of the genome in terms of these signature biases. However, they are of limited use when closely related genomes are unavailable. In the present work, we proposed a kurtosis-based ranking method to select the informative genomic signatures from a single genome. In simulations with alien fragments from artificial and real genomes, the proposed kurtosis-based ranking method efficiently selected the informative genomic signatures from a single genome, without annotated information of genomes or prior knowledge from other datasets. This understanding can be useful to design more powerful method for genomic island detection.
Collapse
Affiliation(s)
- Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Siqian Yang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China; Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75080, USA.
| |
Collapse
|
10
|
da Silva Filho AC, Raittz RT, Guizelini D, De Pierri CR, Augusto DW, Dos Santos-Weiss ICR, Marchaukoski JN. Comparative Analysis of Genomic Island Prediction Tools. Front Genet 2018; 9:619. [PMID: 30631340 PMCID: PMC6315130 DOI: 10.3389/fgene.2018.00619] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 11/23/2018] [Indexed: 12/11/2022] Open
Abstract
Tools for genomic island prediction use strategies for genomic comparison analysis and sequence composition analysis. The goal of comparative analysis is to identify unique regions in the genomes of related organisms, whereas sequence composition analysis evaluates and relates the composition of specific regions with other regions in the genome. The goal of this study was to qualitatively and quantitatively evaluate extant genomic island predictors. We chose tools reported to produce significant results using sequence composition prediction, comparative genomics, and hybrid genomics methods. To maintain diversity, the tools were applied to eight complete genomes of organisms with distinct characteristics and belonging to different families. Escherichia coli CFT073 was used as a control and considered as the gold standard because its islands were previously curated in vitro. The results of predictions with the gold standard were manually curated, and the content and characteristics of each predicted island were analyzed. For other organisms, we created GenBank (GBK) files using Artemis software for each predicted island. We copied only the amino acid sequences from the coding sequence and constructed a multi-FASTA file for each predictor. We used BLASTp to compare all results and generate hits to evaluate similarities and differences among the predictions. Comparison of the results with the gold standard revealed that GIPSy produced the best results, covering ~91% of the composition and regions of the islands, followed by Alien Hunter (81%), IslandViewer (47.8%), Predict Bias (31%), GI Hunter (17%), and Zisland Explorer (16%). The tools with the best results in the analyzes of the set of organisms were the same ones that presented better performance in the tests with the gold standard.
Collapse
Affiliation(s)
- Antonio Camilo da Silva Filho
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | - Roberto Tadeu Raittz
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | - Dieval Guizelini
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | | | - Diônata Willian Augusto
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | | | - Jeroniza Nunes Marchaukoski
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| |
Collapse
|
11
|
Bush EC, Clark AE, DeRanek CA, Eng A, Forman J, Heath K, Lee AB, Stoebel DM, Wang Z, Wilber M, Wu H. xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria. BMC Bioinformatics 2018; 19:32. [PMID: 29402213 PMCID: PMC5799925 DOI: 10.1186/s12859-018-2038-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 01/23/2018] [Indexed: 12/13/2022] Open
Abstract
Background Genomic islands play an important role in microbial genome evolution, providing a mechanism for strains to adapt to new ecological conditions. A variety of computational methods, both genome-composition based and comparative, have been developed to identify them. Some of these methods are explicitly designed to work in single strains, while others make use of multiple strains. In general, existing methods do not identify islands in the context of the phylogeny in which they evolved. Even multiple strain approaches are best suited to identifying genomic islands that are present in one strain but absent in others. They do not automatically recognize islands which are shared between some strains in the clade or determine the branch on which these islands inserted within the phylogenetic tree. Results We have developed a software package, xenoGI, that identifies genomic islands and maps their origin within a clade of closely related bacteria, determining which branch they inserted on. It takes as input a set of sequenced genomes and a tree specifying their phylogenetic relationships. Making heavy use of synteny information, the package builds gene families in a species-tree-aware way, and then attempts to combine into islands those families whose members are adjacent and whose most recent common ancestor is shared. The package provides a variety of text-based analysis functions, as well as the ability to export genomic islands into formats suitable for viewing in a genome browser. We demonstrate the capabilities of the package with several examples from enteric bacteria, including an examination of the evolution of the acid fitness island in the genus Escherichia. In addition we use output from simulations and a set of known genomic islands from the literature to show that xenoGI can accurately identify genomic islands and place them on a phylogenetic tree. Conclusions xenoGI is an effective tool for studying the history of genomic island insertions in a clade of microbes. It identifies genomic islands, and determines which branch they inserted on within the phylogenetic tree for the clade. Such information is valuable because it helps us understand the adaptive path that has produced living species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2038-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Eliot C Bush
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.
| | - Anne E Clark
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, 98195-5065, WA, USA
| | - Carissa A DeRanek
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Alexander Eng
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, 98195-5065, WA, USA
| | - Juliet Forman
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Kevin Heath
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Biology and Biotechnology, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, 01609, MA, USA
| | - Alexander B Lee
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Quantitative Biosciences Program, Georgia Institute of Technology, 837 State Street, Atlanta, 30332-0430, GA, USA
| | - Daniel M Stoebel
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Zunyan Wang
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Matthew Wilber
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Helen Wu
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| |
Collapse
|
12
|
Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018; 19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. RESULTS We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. CONCLUSIONS In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.
Collapse
Affiliation(s)
- Luis Acuña-Amador
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.,Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Aline Primot
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Edouard Cadieu
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Alain Roulet
- GenoToul Genome & Transcriptome (GeT-PlaGe), INRA, US1426, Castanet-Tolosan, France
| | - Frédérique Barloy-Hubler
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.
| |
Collapse
|
13
|
Insights into horizontal acquisition patterns of dormancy and reactivation regulon genes in mycobacterial species using a partitioning-based framework. J Biosci 2016; 41:475-85. [DOI: 10.1007/s12038-016-9622-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Lu B, Leong HW. Computational methods for predicting genomic islands in microbial genomes. Comput Struct Biotechnol J 2016; 14:200-6. [PMID: 27293536 PMCID: PMC4887561 DOI: 10.1016/j.csbj.2016.05.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 05/01/2016] [Accepted: 05/03/2016] [Indexed: 11/02/2022] Open
Abstract
Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.
Collapse
Affiliation(s)
- Bingxin Lu
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| | - Hon Wai Leong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| |
Collapse
|
15
|
Zhu Q, Kosoy M, Dittmar K. HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers. BMC Genomics 2014; 15:717. [PMID: 25159222 PMCID: PMC4155097 DOI: 10.1186/1471-2164-15-717] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Accepted: 08/20/2014] [Indexed: 11/23/2022] Open
Abstract
Background First pass methods based on BLAST match are commonly used as an initial step to separate the different phylogenetic histories of genes in microbial genomes, and target putative horizontal gene transfer (HGT) events. This will continue to be necessary given the rapid growth of genomic data and the technical difficulties in conducting large-scale explicit phylogenetic analyses. However, these methods often produce misleading results due to their inability to resolve indirect phylogenetic links and their vulnerability to stochastic events. Results A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has a low requirement for computational resources. Conclusions HGTector is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-717) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiyun Zhu
- Department of Biological Sciences, University at Buffalo, State University of New York, 109 Cooke Hall, Buffalo, NY 14260, USA.
| | | | | |
Collapse
|
16
|
Identifying pathogenicity islands in bacterial pathogenomics using computational approaches. Pathogens 2014; 3:36-56. [PMID: 25437607 PMCID: PMC4235732 DOI: 10.3390/pathogens3010036] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Revised: 12/30/2013] [Accepted: 01/07/2014] [Indexed: 12/22/2022] Open
Abstract
High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms.
Collapse
|
17
|
Jaron KS, Moravec JC, Martínková N. SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes. Bioinformatics 2013; 30:1081-1086. [PMID: 24371153 DOI: 10.1093/bioinformatics/btt727] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/09/2013] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. RESULTS We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. AVAILABILITY AND IMPLEMENTATION Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. CONTACT 376090@mail.muni.cz or martinkova@ivb.cz.
Collapse
Affiliation(s)
- Kamil S Jaron
- Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| | - Jiří C Moravec
- Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| | - Natália Martínková
- Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| |
Collapse
|
18
|
Hasan MS, Liu Q, Wang H, Fazekas J, Chen B, Che D. GIST: Genomic island suite of tools for predicting genomic islands in genomic sequences. Bioinformation 2012; 8:203-5. [PMID: 22419842 PMCID: PMC3302003 DOI: 10.6026/97320630008203] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 02/07/2012] [Indexed: 11/24/2022] Open
Abstract
UNLABELLED Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.
Collapse
Affiliation(s)
- Mohammad Shabbir Hasan
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA
| | - Qi Liu
- College of Life Science and
Biotechnology, Tongji University, Shanghai, 200092, China
| | - Han Wang
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA
| | - John Fazekas
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA
| | - Bernard Chen
- Department of Computer Science, University of Central Arkansas,
Conway, AR, 72035, USA
| | - Dongsheng Che
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301, USA
| |
Collapse
|
19
|
Das C, Ghosh TS, Mande SS. Computational analysis of the ESX-1 region of Mycobacterium tuberculosis: insights into the mechanism of type VII secretion system. PLoS One 2011; 6:e27980. [PMID: 22140496 PMCID: PMC3227618 DOI: 10.1371/journal.pone.0027980] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 10/28/2011] [Indexed: 01/17/2023] Open
Abstract
Type VII secretion system (T7SS) is a recent discovery in bacterial secretion systems. First identified in Mycobacterium tuberculosis, this secretion system has later been reported in organisms belonging to the Actinomycetales order and even to distant phyla like Firmicutes. The genome of M. tuberculosis H37Rv contains five gene clusters that have evolved through gene duplication events and include components of the T7SS secretion machinery. These clusters are called ESAT-6 secretion system (ESX) 1 through 5. Out of these, ESX-1 has been the most widely studied region because of its pathological importance. In spite of this, the overall mechanism of protein translocation through ESX-1 secretion machinery is not clearly understood. Specifically, the structural components contributing to the translocation through the mycomembrane have not been characterized yet. In this study, we have carried out a comprehensive in silico analysis of the genes known to be involved in ESX-1 secretion pathway and identified putative proteins having high probability to be associated with this particular pathway. Our study includes analysis of phylogenetic profiles, identification of domains, transmembrane helices, 3D folds, signal peptides and prediction of protein-protein associations. Based on our analysis, we could assign probable novel functions to a few of the ESX-1 components. Additionally, we have identified a few proteins with probable role in the initial activation and formation of mycomembrane translocon of ESX-1 secretion machinery. We also propose a probable working model of T7SS involving ESX-1 secretion pathway.
Collapse
Affiliation(s)
- Chandrani Das
- Bio-sciences R& D Division, Tata Consultancy ServicesInnovation Labs, Tata Consultancy Services Ltd, Hyderabad, Andhra Pradesh, India
| | - Tarini Shankar Ghosh
- Bio-sciences R& D Division, Tata Consultancy ServicesInnovation Labs, Tata Consultancy Services Ltd, Hyderabad, Andhra Pradesh, India
| | - Sharmila S. Mande
- Bio-sciences R& D Division, Tata Consultancy ServicesInnovation Labs, Tata Consultancy Services Ltd, Hyderabad, Andhra Pradesh, India
- * E-mail:
| |
Collapse
|
20
|
Che D, Hasan MS, Wang H, Fazekas J, Huang J, Liu Q. EGID: an ensemble algorithm for improved genomic island detection in genomic sequences. Bioinformation 2011; 7:311-4. [PMID: 22355228 PMCID: PMC3280502 DOI: 10.6026/007/97320630007311] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 11/17/2011] [Indexed: 11/23/2022] Open
Abstract
Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.
Collapse
Affiliation(s)
- Dongsheng Che
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301
| | | | - Han Wang
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301
| | - John Fazekas
- Department of Computer Science, East Stroudsburg University, East Stroudsburg, PA 18301
| | - Jinling Huang
- Department of Biology, East Carolina University, Greenville, NC 27858
| | - Qi Liu
- College of Life Science and Biotechnology, Tongji University, Shanghai, 200092, China
| |
Collapse
|