1
|
Aziz T, Naveed M, Shabbir MA, Sarwar A, Naseeb J, Zhao L, Yang Z, Cui H, Lin L, Albekairi TH. Unveiling the whole genomic features and potential probiotic characteristics of novel Lactiplantibacillus plantarum HMX2. Front Microbiol 2024; 15:1504625. [PMID: 39611087 PMCID: PMC11602494 DOI: 10.3389/fmicb.2024.1504625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 10/24/2024] [Indexed: 11/30/2024] Open
Abstract
This study investigates the genomic features and probiotic potential of Lactiplantibacillus plantarum HMX2, isolated from Chinese Sauerkraut, using whole-genome sequencing (WGS) and bioinformatics for the first time. This study also aims to find genetic diversity, antibiotic resistance genes, and functional capabilities to help us better understand its food safety applications and potential as a probiotic. L. plantarum HMX2 was cultured, and DNA was extracted for WGS. Genomic analysis comprised average nucleotide identity (ANI) prediction, genome annotation, pangenome, and synteny analysis. Bioinformatics techniques were used to identify CoDing Sequences (CDSs), transfer RNA (tRNA) and ribosomal RNA (rRNA) genes, and antibiotic resistance genes, as well as to conduct phylogenetic analysis to establish genetic diversity and evolution. The study found a significant genetic similarity (99.17% ANI) between L. plantarum HMX2 and the reference strain. Genome annotation revealed 3,242 coding sequences, 65 tRNA genes, and 16 rRNA genes. Significant genetic variety was found, including 25 antibiotic resistance genes. A phylogenetic study placed L. plantarum HMX2 among closely related bacteria, emphasizing its potential for probiotic and food safety applications. The genomic investigation of L. plantarum showed essential genes, including plnJK and plnEF, which contribute to antibacterial action against foodborne pathogens. Furthermore, genes such as MurA, Alr, and MprF improve food safety and probiotic potential by promoting bacterial survival under stress conditions in food and the gastrointestinal tract. This study introduces the new genomic features of L. plantarum HMX2 about specific genetics and its possibility of relevant uses in food security and technologies. These findings of specific genes involved in antimicrobial activity provide fresh possibilities for exploiting this strain in forming probiotic preparations and food preservation methods. The future research should focus on the experimental validation of antibiotic resistance genes, comparative genomics to investigate functional diversity, and the development of novel antimicrobial therapies that take advantage of L. plantarum's capabilities.
Collapse
Affiliation(s)
- Tariq Aziz
- Department of Food Science and Technology, College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, Guangdong, China
- School of Biomedical Engineering, Shenzhen University, Shenzhen, Guangdong, China
- Key Laboratory of Geriatric Nutrition and Health of Ministry of Education, Beijing Advanced Innovation Center for Food Nutrition and Human Health, Beijing Engineering and Technology Research Center of Food Additives, Beijing Technology and Business University, Beijing, China
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, China
| | - Muhammad Naveed
- Department of Biotechnology, Faculty of Science and Technology, University of Central Punjab, Lahore, Pakistan
| | - Muhammad Aqib Shabbir
- Department of Biotechnology, Faculty of Biological Sciences, Lahore University of Biological and Applied Sciences, Lahore, Pakistan
| | - Abid Sarwar
- Key Laboratory of Geriatric Nutrition and Health of Ministry of Education, Beijing Advanced Innovation Center for Food Nutrition and Human Health, Beijing Engineering and Technology Research Center of Food Additives, Beijing Technology and Business University, Beijing, China
| | - Jasra Naseeb
- Key Laboratory of Geriatric Nutrition and Health of Ministry of Education, Beijing Advanced Innovation Center for Food Nutrition and Human Health, Beijing Engineering and Technology Research Center of Food Additives, Beijing Technology and Business University, Beijing, China
| | - Liqing Zhao
- Department of Food Science and Technology, College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, Guangdong, China
| | - Zhennai Yang
- Key Laboratory of Geriatric Nutrition and Health of Ministry of Education, Beijing Advanced Innovation Center for Food Nutrition and Human Health, Beijing Engineering and Technology Research Center of Food Additives, Beijing Technology and Business University, Beijing, China
| | - Haiying Cui
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, China
| | - Lin Lin
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, China
| | - Thamer H. Albekairi
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
2
|
Revived Amplicon Sequence Variants Monitoring in Closed Systems Identifies More Dormant Microorganisms. Microorganisms 2023; 11:microorganisms11030757. [PMID: 36985330 PMCID: PMC10055844 DOI: 10.3390/microorganisms11030757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/08/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023] Open
Abstract
The large number of dormant microorganisms present in the environment is an important component of microbial diversity, and neglecting dormant microorganisms would be disruptive to all research under the science of microbial diversity. However, current methods can only predict the dormancy potential of microorganisms in a sample and are not yet able to monitor dormant microorganisms directly and efficiently. Based on this, this study proposes a new method for the identification of dormant microorganisms based on high-throughput sequencing technology: Revived Amplicon sequence variants (ASV) Monitoring (RAM). Pao cai (Chinese fermented vegetables) soup was used to construct a closed experimental system, and sequenced samples were collected at 26 timepoints over a 60-day period. RAM was used to identify dormant microorganisms in the samples. The results were then compared with the results of the currently used gene function prediction (GFP), and it was found that RAM was able to identify more dormant microorganisms. In 60 days, GFP monitored 5045 ASVs and 270 genera, while RAM monitored 27,415 ASVs and 616 genera, and the RAM results were fully inclusive of the GFP results. Meanwhile, the consistency of GFP and RAM was also found in the results. The dormant microorganisms monitored by both showed a four-stage distribution pattern over a 60-day period, with significant differences in the community structure between the stages. Therefore, RAM monitoring of dormant microorganisms is effective and feasible. It is worth noting that the results of GFP and RAM can complement and refer to each other. In the future, the results obtained from RAM can be used as a database to extend and improve the monitoring of dormant microorganisms by GFP, and the two can be combined with each other to build a dormant microorganism detection system.
Collapse
|
3
|
Zhao Y, Wang J, Guo M, Zhang X, Yu G. Cross-Species Protein Function Prediction with Asynchronous-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1439-1450. [PMID: 31562099 DOI: 10.1109/tcbb.2019.2943342] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW first constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory, and then performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
Collapse
|
4
|
Piovesan D, Tosatto SCE. INGA 2.0: improving protein function prediction for the dark proteome. Nucleic Acids Res 2020; 47:W373-W378. [PMID: 31073595 PMCID: PMC6602455 DOI: 10.1093/nar/gkz375] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 04/29/2019] [Accepted: 04/30/2019] [Indexed: 12/21/2022] Open
Abstract
Our current knowledge of complex biological systems is stored in a computable form through the Gene Ontology (GO) which provides a comprehensive description of genes function. Prediction of GO terms from the sequence remains, however, a challenging task, which is particularly critical for novel genomes. Here we present INGA 2.0, a new version of the INGA software for protein function prediction. INGA exploits homology, domain architecture, interaction networks and information from the ‘dark proteome’, like transmembrane and intrinsically disordered regions, to generate a consensus prediction. INGA was ranked in the top ten methods on both CAFA2 and CAFA3 blind tests. The new algorithm can process entire genomes in a few hours or even less when additional input files are provided. The new interface provides a better user experience by integrating filters and widgets to explore the graph structure of the predicted terms. The INGA web server, databases and benchmarking are available from URL: https://inga.bio.unipd.it/.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padua, Italy.,CNR Institute of Neuroscience, Padua, Italy
| |
Collapse
|
5
|
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020; 11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jian Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, China Agricultural University, Beijing, China
| | - Xiangliang Zhang
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
6
|
Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep 2019; 9:19537. [PMID: 31863070 PMCID: PMC6925100 DOI: 10.1038/s41598-019-55984-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 12/02/2019] [Indexed: 01/01/2023] Open
Abstract
Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.
Collapse
|
7
|
Li J, Zhang L, Li H, Ping Y, Xu Q, Wang R, Tan R, Wang Z, Liu B, Wang Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinformatics 2019; 20:283. [PMID: 31182012 PMCID: PMC6557737 DOI: 10.1186/s12859-019-2772-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Exons and introns are the most notable components of DNA and their identification and prediction are always the focus of state-of-the-art research. RESULTS In this study, we designed an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences. We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences. By comparing digitalizing entropy values of exons and introns, we observed that they are significantly different. After we converted DNA data to numerical topological entropy value, we applied SVD method to effectively investigate exon and intron regions on a single gene sequence. Additionally, several genes across five species are used for exon predictions. CONCLUSIONS Our approach not only helps to explore the complexity of DNA sequence and its functional elements, but also provides an entropy-based GSP method to analyze exon and intron regions. Our work is feasible across different species and extendable to analyze other components in both coding and noncoding region of DNA sequences.
Collapse
Affiliation(s)
- Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Li Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Huinian Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Yuan Ping
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Qingzhe Xu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Rongjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| |
Collapse
|
8
|
Abstract
Motivation Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results Here, we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding-based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation Source code and datasets are available at http://snap.stanford.edu/ohmnet.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| |
Collapse
|