1
|
Shi H, Hou G, Jiang S, Su X. PM-profiler: a high-resolution and fast tool for taxonomy annotation of amplicon-based microbiome. Microbiol Spectr 2024; 12:e0069524. [PMID: 38912828 PMCID: PMC11302061 DOI: 10.1128/spectrum.00695-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/12/2024] [Indexed: 06/25/2024] Open
Abstract
Amplicon sequencing stands as a cornerstone in microbiome profiling, yet concerns persist regarding its resolution and accuracy. The enhancement of reference databases and annotations marks a new era for 16S rRNA-based profiling. Capitalizing on this potential, we introduce PM-profiler, a novel tool for profiling amplicon short reads. PM-profiler is implemented by C++-based advanced algorithms, such as pre-allocated hash for reference construction, hybrid and dynamic short-read matching, big-data-guided dual-mode hierarchical taxonomy annotation strategy, and full-procedure parallel computing. This tool delivers species-level resolution and ultrafast speed for large-scale microbiomes, surpassing alignment-based approaches and the Naïve-Bayesian model. Furthermore, recognizing the global uneven distribution of microbes, we delineate optimal annotation strategies for each sampling habitat based on microbial patterns over 270,000 microbiomes. Integrated with the established workflow of Parallel-Meta Suite and the latest curated reference databases, this endeavor offers a swift and dependable solution for high-precision microbiome surveys.IMPORTANCEOur study introduces PM-profiler, a new tool that deciphers the complexity of microbial communities. With advanced algorithms, flexible annotation strategies, and well-organized big-data, PM-profiler provides a faster and more accurate way to study on microbiomes, paving the way for discoveries that could improve our understanding of microbiomes and their impact on the world.
Collapse
Affiliation(s)
- Haobo Shi
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Guosen Hou
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Sikai Jiang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
2
|
Piquer-Esteban S, Arnau V, Diaz W, Moya A. OMD Curation Toolkit: a workflow for in-house curation of public omics datasets. BMC Bioinformatics 2024; 25:184. [PMID: 38724907 PMCID: PMC11084137 DOI: 10.1186/s12859-024-05803-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 05/07/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Collapse
Affiliation(s)
- Samuel Piquer-Esteban
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.
| | - Vicente Arnau
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - Wladimiro Diaz
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain.
| |
Collapse
|
3
|
Zheng Y, Wang B, Gao P, Yang Y, Xu B, Su X, Ning D, Tao Q, Li Q, Zhao F, Wang D, Zhang Y, Li M, Winkler MKH, Ingalls AE, Zhou J, Zhang C, Stahl DA, Jiang J, Martens-Habbena W, Qin W. Novel order-level lineage of ammonia-oxidizing archaea widespread in marine and terrestrial environments. THE ISME JOURNAL 2024; 18:wrad002. [PMID: 38365232 PMCID: PMC10811736 DOI: 10.1093/ismejo/wrad002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/03/2023] [Accepted: 10/28/2023] [Indexed: 02/18/2024]
Abstract
Ammonia-oxidizing archaea (AOA) are among the most ubiquitous and abundant archaea on Earth, widely distributed in marine, terrestrial, and geothermal ecosystems. However, the genomic diversity, biogeography, and evolutionary process of AOA populations in subsurface environments are vastly understudied compared to those in marine and soil systems. Here, we report a novel AOA order Candidatus (Ca.) Nitrosomirales which forms a sister lineage to the thermophilic Ca. Nitrosocaldales. Metagenomic and 16S rRNA gene-read mapping demonstrates the abundant presence of Nitrosomirales AOA in various groundwater environments and their widespread distribution across a range of geothermal, terrestrial, and marine habitats. Terrestrial Nitrosomirales AOA show the genetic capacity of using formate as a source of reductant and using nitrate as an alternative electron acceptor. Nitrosomirales AOA appear to have acquired key metabolic genes and operons from other mesophilic populations via horizontal gene transfer, including genes encoding urease, nitrite reductase, and V-type ATPase. The additional metabolic versatility conferred by acquired functions may have facilitated their radiation into a variety of subsurface, marine, and soil environments. We also provide evidence that each of the four AOA orders spans both marine and terrestrial habitats, which suggests a more complex evolutionary history for major AOA lineages than previously proposed. Together, these findings establish a robust phylogenomic framework of AOA and provide new insights into the ecology and adaptation of this globally abundant functional guild.
Collapse
Affiliation(s)
- Yue Zheng
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Baozhan Wang
- Department of Microbiology, Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing 210095, China
| | - Ping Gao
- Department of Microbiology, Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing 210095, China
| | - Yiyan Yang
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Bu Xu
- Department of Ocean Science and Engineering, Shenzhen Key Laboratory of Marine Archaea Geo-Omics, Southern University of Science and Technology, Shenzhen 518055, China
- Shanghai Sheshan National Geophysical Observatory , Shanghai 201602, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University , Qingdao 266101, China
| | - Daliang Ning
- School of Biological Sciences, Institute for Environmental Genomics, University of Oklahoma, Norman, OK 73019, United States
| | - Qing Tao
- School of Biological Sciences, Institute for Environmental Genomics, University of Oklahoma, Norman, OK 73019, United States
| | - Qian Li
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
| | - Feng Zhao
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
| | - Dazhi Wang
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Yao Zhang
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
| | - Meng Li
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Mari-K H Winkler
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA 98195, United States
| | - Anitra E Ingalls
- School of Oceanography, University of Washington, Seattle, WA 98195, United States
| | - Jizhong Zhou
- School of Biological Sciences, Institute for Environmental Genomics, University of Oklahoma, Norman, OK 73019, United States
- School of Civil Engineering and Environmental Sciences, University of Oklahoma, Norman, OK 73019, United States
- Department of Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Chuanlun Zhang
- Department of Ocean Science and Engineering, Shenzhen Key Laboratory of Marine Archaea Geo-Omics, Southern University of Science and Technology, Shenzhen 518055, China
- Shanghai Sheshan National Geophysical Observatory , Shanghai 201602, China
| | - David A Stahl
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA 98195, United States
| | - Jiandong Jiang
- Department of Microbiology, Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing 210095, China
| | - Willm Martens-Habbena
- Department of Microbiology and Cell Science, Fort Lauderdale Research and Education Center, University of Florida, Davie, FL 33314, United States
| | - Wei Qin
- School of Biological Sciences, Institute for Environmental Genomics, University of Oklahoma, Norman, OK 73019, United States
| |
Collapse
|
4
|
Zhang W, Fan X, Shi H, Li J, Zhang M, Zhao J, Su X. Comprehensive Assessment of 16S rRNA Gene Amplicon Sequencing for Microbiome Profiling across Multiple Habitats. Microbiol Spectr 2023; 11:e0056323. [PMID: 37102867 PMCID: PMC10269731 DOI: 10.1128/spectrum.00563-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 04/10/2023] [Indexed: 04/28/2023] Open
Abstract
The 16S rRNA gene works as a rapid and effective marker for the identification of microorganisms in complex communities; hence, a huge number of microbiomes have been surveyed by 16S amplicon-based sequencing. The resolution of the 16S rRNA gene is always considered only at the genus level; however, it has not been verified on a wide range of microbes yet. To fully explore the ability and potential of the 16S rRNA gene in microbial profiling, here, we propose Qscore, a comprehensive method to evaluate the performance of amplicons by integrating the amplification rate, multitier taxonomic annotation, sequence type, and length. Our in silico assessment by a "global view" of 35,889 microbe species across multiple reference databases summarizes the optimal sequencing strategy for 16S short reads. On the other hand, since microbes are unevenly distributed according to their habitats, we also provide the recommended configuration for 16 typical ecosystems based on the Qscores of 157,390 microbiomes in the Microbiome Search Engine (MSE). Detailed data simulation further proves that the 16S amplicons produced with Qscore-suggested parameters exhibit high precision in microbiome profiling, which is close to that of shotgun metagenomes under CAMI metrics. Therefore, by reconsidering the precision of 16S-based microbiome profiling, our work not only enables the high-quality reusability of massive sequence legacy that has already been produced but is also significant for guiding microbiome studies in the future. We have implemented the Qscore as an online service at http://qscore.single-cell.cn to parse the recommended sequencing strategy for specific habitats or expected microbial structures. IMPORTANCE 16S rRNA has long been used as a biomarker to identify distinct microbes from complex communities. However, due to the influence of the amplification region, sequencing type, sequence processing, and reference database, the accuracy of 16S rRNA has not been fully verified on a global range. More importantly, the microbial composition of different habitats varies greatly, and it is necessary to adopt different strategies according to the corresponding target microbes to achieve optimal analytical performance. Here, we developed Qscore, which evaluates the comprehensive performance of 16S amplicons from multiple perspectives, thus providing the best sequencing strategies for common ecological environments by using big data.
Collapse
Affiliation(s)
- Wenke Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Xiaoqian Fan
- Shouguang Hospital of Traditional Chinese Medicine, Weifang, China
| | - Haobo Shi
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Jian Li
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Mingqian Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Jin Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| |
Collapse
|
5
|
Sun Y, Lu J, Yang J, Liu Y, Liu L, Zeng F, Niu Y, Dong L, Yang F. Construction of a caries diagnosis model based on microbiome novelty score. HUA XI KOU QIANG YI XUE ZA ZHI = HUAXI KOUQIANG YIXUE ZAZHI = WEST CHINA JOURNAL OF STOMATOLOGY 2023; 41:208-217. [PMID: 37056188 PMCID: PMC10427253 DOI: 10.7518/hxkq.2023.2022301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 12/30/2022] [Indexed: 04/15/2023]
Abstract
OBJECTIVES This study aimed to analyze the bacteria in dental caries and establish an optimized dental-ca-ries diagnosis model based on 16S ribosomal RNA (rRNA) data of oral flora. METHODS We searched the public databa-ses of microbiomes including NCBI, MG-RAST, EMBL-EBI, and QIITA and collected data involved in the relevant research on human oral microbiomes worldwide. The samples in the caries dataset (1 703) were compared with healthy ones (20 540) by using the microbial search engine (MSE) to obtain the microbiome novelty score (MNS) and construct a caries diagnosis model based on this index. Nonparametric multivariate ANOVA was used to analyze and compare the impact of different host factors on the oral flora MNS, and the model was optimized by controlling related factors. Finally, the effect of the model was evaluated by receiver operating characteristic (ROC) curve analysis. RESULTS 1) The oral microbiota distribution obviously differed among people with various oral-health statuses, and the species richness and species diversity index decreased. 2) ROC curve was used to evaluate the caries data set, and the area under ROC curve was AUC=0.67. 3) Among the five hosts' factors including caries status, country, age, decayed missing filled tooth (DMFT) indices, and sampling site displayed the strongest effect on MNS of samples (P=0.001). 4) The AUC of the model was 0.87, 0.74, 0.74, and 0.75 in high caries, medium caries, low caries samples in Chinese children, and mixed dental plaque samples after controlling host factors, respectively. CONCLUSIONS The model based on the analysis of 16S rRNA data of oral flora had good diagnostic efficiency.
Collapse
Affiliation(s)
- Yanfei Sun
- School of Stomatology, Qingdao University, Qingdao 266003, China
- Dept. of Pediatric Dentistry, Center of Stomatology, Municipal Hospital, Qingdao 266071, China
| | - Jie Lu
- Dept. of Stomatology, Pujiang Stomatological Hospital, Jinhua 322299, China
| | - Jiazhen Yang
- Dept. of Pediatric Dentistry, Stomatological Hospital of Qingdao, Qingdao 266000, China
| | - Yuhan Liu
- Central Laboratory, Stomatological Hospital of Qing-dao, Qingdao 266000, China
| | - Lu Liu
- Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao 266101, China
| | - Fei Zeng
- Dept. of Stomatology, Affiliated Hospital of Jining Medical University, Jining 272000, China
| | - Yufen Niu
- Dept. of Pediatric Dentistry, Center of Stomatology, Municipal Hospital, Qingdao 266071, China
- School of Stomatology, Dalian Medical University, Dalian 116044, China
| | - Lei Dong
- Dept. of Pediatric Dentistry, Center of Stomatology, Municipal Hospital, Qingdao 266071, China
- School of Stomatology, Dalian Medical University, Dalian 116044, China
| | - Fang Yang
- School of Stomatology, Qingdao University, Qingdao 266003, China
- Dept. of Pediatric Dentistry, Center of Stomatology, Municipal Hospital, Qingdao 266071, China
| |
Collapse
|
6
|
Bai H, He LY, Gao FZ, Wu DL, Yao KS, Zhang M, Jia WL, He LX, Zou HY, Yao MS, Ying GG. Airborne antibiotic resistome and human health risk in railway stations during COVID-19 pandemic. ENVIRONMENT INTERNATIONAL 2023; 172:107784. [PMID: 36731187 PMCID: PMC9884615 DOI: 10.1016/j.envint.2023.107784] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/22/2022] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
Antimicrobial resistance is recognized as one of the greatest public health concerns. It is becoming an increasingly threat during the COVID-19 pandemic due to increasing usage of antimicrobials, such as antibiotics and disinfectants, in healthcare facilities or public spaces. To explore the characteristics of airborne antibiotic resistome in public transport systems, we assessed distribution and health risks of airborne antibiotic resistome and microbiome in railway stations before and after the pandemic outbreak by culture-independent and culture-dependent metagenomic analysis. Results showed that the diversity of airborne antibiotic resistance genes (ARGs) decreased following the pandemic, while the relative abundance of core ARGs increased. A total of 159 horizontally acquired ARGs, predominantly confering resistance to macrolides and aminoglycosides, were identified in the airborne bacteria and dust samples. Meanwhile, the abundance of horizontally acquired ARGs hosted by pathogens increased during the pandemic. A bloom of clinically important antibiotic (tigecycline and meropenem) resistant bacteria was found following the pandemic outbreak. 251 high-quality metagenome-assembled genomes (MAGs) were recovered from 27 metagenomes, and 86 genera and 125 species were classified. Relative abundance of ARG-carrying MAGs, taxonomically assigned to genus of Bacillus, Pseudomonas, Acinetobacter, and Staphylococcus, was found increased during the pandemic. Bayesian source tracking estimated that human skin and anthropogenic activities were presumptive resistome sources for the public transit air. Moreover, risk assessment based on resistome and microbiome data revealed elevated airborne health risks during the pandemic.
Collapse
Affiliation(s)
- Hong Bai
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Liang-Ying He
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China.
| | - Fang-Zhou Gao
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Dai-Ling Wu
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China; Aquatic Ecology and Water Quality Management group, Wageningen University, P.O. Box 47, 6700 AA Wageningen, the Netherlands
| | - Kai-Sheng Yao
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China; Aquatic Ecology and Water Quality Management group, Wageningen University, P.O. Box 47, 6700 AA Wageningen, the Netherlands
| | - Min Zhang
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Wei-Li Jia
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Lu-Xi He
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Hai-Yan Zou
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Mao-Sheng Yao
- State Key Joint Laboratory of Environmental Simulation and Pollution Control, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, China
| | - Guang-Guo Ying
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China.
| |
Collapse
|
7
|
Abbasi BUD, Fatima I, Mukhtar H, Khan S, Alhumam A, Ahmad HF. Autonomous schema markups based on intelligent computing for search engine optimization. PeerJ Comput Sci 2022; 8:e1163. [PMID: 36532807 PMCID: PMC9748814 DOI: 10.7717/peerj-cs.1163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 10/26/2022] [Indexed: 06/17/2023]
Abstract
With advances in artificial intelligence and semantic technology, search engines are integrating semantics to address complex search queries to improve the results. This requires identification of well-known concepts or entities and their relationship from web page contents. But the increase in complex unstructured data on web pages has made the task of concept identification overly complex. Existing research focuses on entity recognition from the perspective of linguistic structures such as complete sentences and paragraphs, whereas a huge part of the data on web pages exists as unstructured text fragments enclosed in HTML tags. Ontologies provide schemas to structure the data on the web. However, including them in the web pages requires additional resources and expertise from organizations or webmasters and thus becoming a major hindrance in their large-scale adoption. We propose an approach for autonomous identification of entities from short text present in web pages to populate semantic models based on a specific ontology model. The proposed approach has been applied to a public dataset containing academic web pages. We employ a long short-term memory (LSTM) deep learning network and the random forest machine learning algorithm to predict entities. The proposed methodology gives an overall accuracy of 0.94 on the test dataset, indicating a potential for automated prediction even in the case of a limited number of training samples for various entities, thus, significantly reducing the required manual workload in practical applications.
Collapse
Affiliation(s)
| | - Iram Fatima
- Schema App-Hunch Manifest Inc, Guelph, Canada
| | - Hamid Mukhtar
- Department of Computer Science, College of Engineering and Physical Sciences (EPS), University of Birmingham Dubai, Dubai, United Arab Emirates
| | - Sharifullah Khan
- PAF-Institute of Applied Sciences and Technology, Haripur, Pakistan
| | - Abdulaziz Alhumam
- Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Saudi Arabia
| | - Hafiz Farooq Ahmad
- Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Saudi Arabia
| |
Collapse
|
8
|
Jing X, Gong Y, Pan H, Meng Y, Ren Y, Diao Z, Mu R, Xu T, Zhang J, Ji Y, Li Y, Wang C, Qu L, Cui L, Ma B, Xu J. Single-cell Raman-activated sorting and cultivation (scRACS-Culture) for assessing and mining in situ phosphate-solubilizing microbes from nature. ISME COMMUNICATIONS 2022; 2:106. [PMID: 37938284 PMCID: PMC9723661 DOI: 10.1038/s43705-022-00188-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 10/05/2022] [Accepted: 10/06/2022] [Indexed: 01/25/2023]
Abstract
Due to the challenges in detecting in situ activity and cultivating the not-yet-cultured, functional assessment and mining of living microbes from nature has typically followed a 'culture-first' paradigm. Here, employing phosphate-solubilizing microbes (PSM) as model, we introduce a 'screen-first' strategy that is underpinned by a precisely one-cell-resolution, complete workflow of single-cell Raman-activated Sorting and Cultivation (scRACS-Culture). Directly from domestic sewage, individual cells were screened for in-situ organic-phosphate-solubilizing activity via D2O intake rate, sorted by the function via Raman-activated Gravity-driven Encapsulation (RAGE), and then cultivated from precisely one cell. By scRACS-Culture, pure cultures of strong organic PSM including Comamonas spp., Acinetobacter spp., Enterobacter spp. and Citrobacter spp., were derived, whose phosphate-solubilizing activities in situ are 90-200% higher than in pure culture, underscoring the importance of 'screen-first' strategy. Moreover, employing scRACS-Seq for post-RACS cells that remain uncultured, we discovered a previously unknown, low-abundance, strong organic-PSM of Cutibacterium spp. that employs secretary metallophosphoesterase (MPP), cell-wall-anchored 5'-nucleotidase (encoded by ushA) and periplasmic-membrane located PstSCAB-PhoU transporter system for efficient solubilization and scavenging of extracellular phosphate in sewage. Therefore, scRACS-Culture and scRACS-Seq provide an in situ function-based, 'screen-first' approach for assessing and mining microbes directly from the environment.
Collapse
Affiliation(s)
- Xiaoyan Jing
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Yanhai Gong
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Huihui Pan
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Yu Meng
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Yishang Ren
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Zhidian Diao
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Runzhi Mu
- Qingdao Zhang Cun River Water Co., Ltd, Qingdao, Shandong, China
| | - Teng Xu
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Jia Zhang
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Yuetong Ji
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
- Qingdao Single-Cell Biotechnology Co., Ltd, Qingdao, Shandong, China
| | - Yuandong Li
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Chen Wang
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Energy Institute, Qingdao, Shandong, China
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China
| | - Lingyun Qu
- The First Institute of Oceanography, Ministry of Natural Resources, Qingdao, Shandong, China
| | - Li Cui
- Key Laboratory of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, Fujian, China
| | - Bo Ma
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Shandong Energy Institute, Qingdao, Shandong, China.
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China.
| | - Jian Xu
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Shandong Energy Institute, Qingdao, Shandong, China.
- Qingdao New Energy Shandong Laboratory, Qingdao, Shandong, China.
| |
Collapse
|
9
|
Chen Y, Su X. Search-based health status detection and disease classification using species-level profiles of metagenomes. MEDICINE IN MICROECOLOGY 2022. [DOI: 10.1016/j.medmic.2021.100048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
10
|
Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms. IMETA 2022; 1:e1. [PMID: 38867729 PMCID: PMC10989749 DOI: 10.1002/imt2.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 06/14/2024]
Abstract
Massive microbiome sequencing data has been generated, which elucidates associations between microbes and their environmental phenotypes such as host health or ecosystem status. Outstanding bioinformatic tools are the basis to decipher the biological information hidden under microbiome data. However, most approaches placed difficulties on the accessibility to nonprofessional users. On the other side, the computing throughput has become a significant bottleneck of many analytical pipelines in processing large-scale datasets. In this study, we introduce Parallel-Meta Suite (PMS), an interactive software package for fast and comprehensive microbiome data analysis, visualization, and interpretation. It covers a wide array of functions for data preprocessing, statistics, visualization by state-of-the-art algorithms in a user-friendly graphical interface, which is accessible to diverse users. To meet the rapidly increasing computational demands, the entire procedure of PMS has been optimized by a parallel computing scheme, enabling the rapid processing of thousands of samples. PMS is compatible with multiple platforms, and an installer has been integrated for full-automatic installation.
Collapse
Affiliation(s)
- Yuzhu Chen
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Jian Li
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Yufeng Zhang
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Mingqian Zhang
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
| | - Zheng Sun
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| | - Gongchao Jing
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| | - Shi Huang
- Faculty of DentistryThe University of Hong KongHong KongHong Kong SARChina
| | - Xiaoquan Su
- College of Computer Science and TechnologyQingdao UniversityQingdaoShandongChina
- Single‐Cell Center, Qingdao Institute of BioEnergy and Bioprocess TechnologyChinese Academy of SciencesQingdaoShandongChina
| |
Collapse
|
11
|
Sun Z, Liu X, Jing G, Chen Y, Jiang S, Zhang M, Liu J, Xu J, Su X. Comprehensive understanding to the public health risk of environmental microbes via a microbiome-based index. J Genet Genomics 2022; 49:685-688. [PMID: 35017120 DOI: 10.1016/j.jgg.2021.12.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/20/2021] [Accepted: 12/20/2021] [Indexed: 12/29/2022]
Affiliation(s)
- Zheng Sun
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Science, Qingdao 266101, China
| | - Xudong Liu
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Science, Qingdao 266101, China
| | - Gongchao Jing
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Science, Qingdao 266101, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
| | - Shuaiming Jiang
- Department of Endocrinology, Hainan General Hospital, School of Food Science and Engineering, Hainan University, Haikou 570228, China
| | - Meng Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Huhehot 010018, China
| | - Jiquan Liu
- Procter & Gamble Singapore Innovation Center, 138589, Singapore
| | - Jian Xu
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Science, Qingdao 266101, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China.
| |
Collapse
|
12
|
Jing G, Zhang Y, Liu L, Wang Z, Sun Z, Knight R, Su X, Xu J. A Scale-Free, Fully Connected Global Transition Network Underlies Known Microbiome Diversity. mSystems 2021; 6:e0039421. [PMID: 34254819 PMCID: PMC8407412 DOI: 10.1128/msystems.00394-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/24/2021] [Indexed: 12/14/2022] Open
Abstract
Microbiomes are inherently linked by their structural similarity, yet the global features of such similarity are not clear. Here, we propose as a solution a search-based microbiome transition network. By traversing a composition-similarity-based network of 177,022 microbiomes, we show that although the compositions are distinct by habitat, each microbiome is on-average only seven neighbors from any other microbiome on Earth, indicating the inherent homology of microbiomes at the global scale. This network is scale-free, suggesting a high degree of stability and robustness in microbiome transition. By tracking the minimum spanning tree in this network, a global roadmap of microbiome dispersal was derived that tracks the potential paths of formulating and propagating microbiome diversity. Such search-based global microbiome networks, reconstructed within hours on just one computing node, provide a readily expanded reference for tracing the origin and evolution of existing or new microbiomes. IMPORTANCE It remains unclear whether and how compositional changes at the "community to community" level among microbiomes are linked to the origin and evolution of global microbiome diversity. Here we propose a microbiome transition model and a network-based analysis framework to describe and simulate the variation and dispersal of the global microbial beta-diversity across multiple habitats. The traversal of a transition network with 177,022 samples shows the inherent homology of microbiome at the global scale. Then a global roadmap of microbiome dispersal derived from the network tracks the potential paths of formulating and propagating microbiome diversity. Such search-based microbiome network provides a readily expanded reference for tracing the origin and evolution of existing or new microbiomes at the global scale.
Collapse
Affiliation(s)
- Gongchao Jing
- Single-Cell Center, CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yufeng Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Lu Liu
- Single-Cell Center, CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zengbin Wang
- Single-Cell Center, CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Sun
- Single-Cell Center, CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Rob Knight
- University of California, San Diego, California, USA
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Jian Xu
- Single-Cell Center, CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Abstract
Quantitative comparison among microbiomes can link microbial beta-diversity to environmental features, thus enabling prediction of ecosystem properties or dissection of host-microbiome interaction. However, to compute beta-diversity, current methods mainly employ the entire community profiles of taxa or functions, which can miss the subtle differences caused by low-abundance community members that may play crucial roles in the properties of interest. In this work, I review the distance metrics and search engines that we developed to match microbiomes at a large scale based on whole-community-level similarities, as well as their limitations in tackling the microbiome changes caused by less abundant community features. Then I propose the concept of microbiome "local alignment," including an algorithm to measure microbiome similarity on specific fractions of biodiversity and an indexing strategy for rapidly fetching microbiome local-alignment matches from the data repository.
Collapse
Affiliation(s)
- Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| |
Collapse
|
14
|
Abstract
Quantitative comparison among microbiomes can link microbial beta-diversity to environmental features, thus enabling prediction of ecosystem properties or dissection of host-microbiome interaction. However, to compute beta-diversity, current methods mainly employ the entire community profiles of taxa or functions, which can miss the subtle differences caused by low-abundance community members that may play crucial roles in the properties of interest. In this work, I review the distance metrics and search engines that we developed to match microbiomes at a large scale based on whole-community-level similarities, as well as their limitations in tackling the microbiome changes caused by less abundant community features. Then I propose the concept of microbiome “local alignment,” including an algorithm to measure microbiome similarity on specific fractions of biodiversity and an indexing strategy for rapidly fetching microbiome local-alignment matches from the data repository.
Collapse
|
15
|
Zhang Y, Huang S, Jia S, Sun Z, Li S, Li F, Zhang L, Lu J, Tan K, Teng F, Yang F. The predictive power of saliva electrolytes exceeds that of saliva microbiomes in diagnosing early childhood caries. J Oral Microbiol 2021; 13:1921486. [PMID: 34035879 PMCID: PMC8131007 DOI: 10.1080/20002297.2021.1921486] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Early childhood caries (ECC) is one of the most prevalent chronic diseases affecting children worldwide, and thus its etiology, diagnosis, and prognosis are of particular clinical significance. This study aims to test the ability of salivary microbiome and electrolytes in diagnosing ECC, and their interplays within the same population. We here simultaneously profiled salivary microbiome and biochemical components of 331 children (166 caries-free (H group) and 165 caries-active children (C group)) aged 4-6 years. We identified both salivary microbial and biochemical dysbiosis associated with ECC. Remarkably, K+, Cl-, NH4+, Na+, SO42-, Ca2+, Mg2+, and Br- were enriched while pH and NO3- were depleted in ECC. Moreover, the dmft index (ECC severity) positively correlated with Cl-, NH4+, Ca2+, Mg2+, Br-, while negatively with pH and NO3-. Furthermore, machine-learning classification models were constructed based on these biomarkers from saliva microbiota, or electrolytes (and pH). Unexpectedly, the electrolyte-based classifier (AUROC = 0.94) outperformed microbiome-based (AUROC = 0.70) one and the composite-based one (with both microbial and biochemical data; AUC = 0.89) in predicting ECC. Collectively, these findings indicate ECC-associated alterations and interplays in the oral microbiota, electrolytes and pH, underscoring the necessity of developing diagnostic models with predictors from salivary electrolytes.
Collapse
Affiliation(s)
- Ying Zhang
- School of Stomatology, Qingdao University, Qingdao, Shandong, China
| | - Shi Huang
- Centre of Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, California, 92093, USA.,UCSD Health Department of Pediatrics, University of California, San Diego, La Jolla, California, 92093, USA
| | - Songbo Jia
- Department of Stomatology, Tianjin Children's Hospital, Tianjin, 300400 China
| | - Zheng Sun
- Single-Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Shanshan Li
- School of Stomatology, Qingdao University, Qingdao, Shandong, China
| | - Fan Li
- School of Stomatology, Qingdao University, Qingdao, Shandong, China.,Stomatology Centre, Qingdao Municipal Hospital, Qingdao, Shandong, 266071 China
| | - Lijuan Zhang
- Department of Stomatology, Women & Children's Health Care Hospital of Linyi, Linyi, Shandong, 276000 China
| | - Jie Lu
- Stomatology Centre, Qingdao Municipal Hospital, Qingdao, Shandong, 266071 China
| | - Kaixuan Tan
- Stomatology Centre, Qingdao Municipal Hospital, Qingdao, Shandong, 266071 China
| | - Fei Teng
- Single-Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Fang Yang
- School of Stomatology, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
16
|
Wu S, Chen Y, Li Z, Li J, Zhao F, Su X. Towards multi-label classification: Next step of machine learning for microbiome research. Comput Struct Biotechnol J 2021; 19:2742-2749. [PMID: 34093989 PMCID: PMC8131981 DOI: 10.1016/j.csbj.2021.04.054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Zhiruo Li
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China
| | - Jian Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Fengyang Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| |
Collapse
|