1
|
Jiang X, Yan J, Huang H, Ai L, Yu X, Zhong P, Chen Y, Liang Z, Qiu W, Huang H, Yan W, Liang Y, Chen P, Wang R. Development of novel parameters for pathogen identification in clinical metagenomic next-generation sequencing. Front Genet 2023; 14:1266990. [PMID: 38046047 PMCID: PMC10693447 DOI: 10.3389/fgene.2023.1266990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/27/2023] [Indexed: 12/05/2023] Open
Abstract
Introduction: Metagenomic next-generation sequencing (mNGS) has emerged as a powerful tool for rapid pathogen identification in clinical practice. However, the parameters used to interpret mNGS data, such as read count, genus rank, and coverage, lack explicit performance evaluation. In this study, the developed indicators as well as novel parameters were assessed for their performance in bacterium detection. Methods: We developed several relevant parameters, including 10M normalized reads, double-discard reads, Genus Rank Ratio, King Genus Rank Ratio, Genus Rank Ratio*Genus Rank, and King Genus Rank Ratio*Genus Rank. These parameters, together with frequently used read indicators including raw reads, reads per million mapped reads (RPM), transcript per kilobase per million mapped reads (TPM), Genus Rank, and coverage were analyzed for their diagnostic efficiency in bronchoalveolar lavage fluid (BALF), a common source for detecting eight bacterium pathogens: Acinetobacter baumannii, Klebsiella pneumoniae, Streptococcus pneumoniae, Staphylococcus aureus, Hemophilus influenzae, Stenotrophomonas maltophilia, Pseudomonas aeruginosa, and Aspergillus fumigatus. Results: The results demonstrated that these indicators exhibited good diagnostic efficacy for the eight pathogens. The AUC values of all indicators were almost greater than 0.9, and the corresponding sensitivity and specificity values were almost greater than 0.8, excepted coverage. The negative predictive value of all indicators was greater than 0.9. The results showed that the use of double-discarded reads, Genus Rank Ratio*Genus Rank, and King Genus Rank Ratio*Genus Rank exhibited better diagnostic efficiency than that of raw reads, RPM, TPM, and in Genus Rank. These parameters can serve as a reference for interpreting mNGS data of BALF. Moreover, precision filters integrating our novel parameters were built to detect the eight bacterium pathogens in BALF samples through machine learning. Summary: In this study, we developed a set of novel parameters for pathogen identification in clinical mNGS based on reads and ranking. These parameters were found to be more effective in diagnosing pathogens than traditional approaches. The findings provide valuable insights for improving the interpretation of mNGS reports in clinical settings, specifically in BALF analysis.
Collapse
Affiliation(s)
- Xiwen Jiang
- College of Biological Science and Engineering, Fuzhou University, Fuzhou, China
| | - Jinghai Yan
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Hao Huang
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Lu Ai
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Xuegao Yu
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Pengqiang Zhong
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yili Chen
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Zhikun Liang
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Wancen Qiu
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Huiying Huang
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Wenyan Yan
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Yan Liang
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Peisong Chen
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ruizhi Wang
- Department of Laboratory Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
2
|
Lehmann R, Machné R, Georg J, Benary M, Axmann I, Steuer R. How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis. BMC Bioinformatics 2013; 14:133. [PMID: 23601192 PMCID: PMC3679775 DOI: 10.1186/1471-2105-14-133] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 03/18/2013] [Indexed: 11/24/2022] Open
Abstract
Background The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns, reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focussed on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases. Results A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DTF transformation are favorable for the clustering of time series in contrast to the 12 m transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT transformation to derive the diurnal biological program of Synechocystis sp.. Conclusion Application of quantile normalization, median polishing, and also cyclic LOESS normalization of the presented cyanobacterial dataset lead to increased numbers of oscillating genes and the systematic shift of the expression phase. The LOS normalization minimizes the observed detrimental effects. As previous analyses employed a variety of different normalization methods, a direct comparison of results must be treated with caution.
Collapse
Affiliation(s)
- Robert Lehmann
- Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstraße 43, D-10115 Berlin, Germany.
| | | | | | | | | | | |
Collapse
|
3
|
Rosa BA, Jiao Y, Oh S, Montgomery BL, Qin W, Chen J. Frequency-based time-series gene expression recomposition using PRIISM. BMC SYSTEMS BIOLOGY 2012; 6:69. [PMID: 22703599 PMCID: PMC3464900 DOI: 10.1186/1752-0509-6-69] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 06/15/2012] [Indexed: 11/30/2022]
Abstract
Background Circadian rhythm pathways influence the expression patterns of as much as 31% of the Arabidopsis genome through complicated interaction pathways, and have been found to be significantly disrupted by biotic and abiotic stress treatments, complicating treatment-response gene discovery methods due to clock pattern mismatches in the fold change-based statistics. The PRIISM (Pattern Recomposition for the Isolation of Independent Signals in Microarray data) algorithm outlined in this paper is designed to separate pattern changes induced by different forces, including treatment-response pathways and circadian clock rhythm disruptions. Results Using the Fourier transform, high-resolution time-series microarray data is projected to the frequency domain. By identifying the clock frequency range from the core circadian clock genes, we separate the frequency spectrum to different sections containing treatment-frequency (representing up- or down-regulation by an adaptive treatment response), clock-frequency (representing the circadian clock-disruption response) and noise-frequency components. Then, we project the components’ spectra back to the expression domain to reconstruct isolated, independent gene expression patterns representing the effects of the different influences. By applying PRIISM on a high-resolution time-series Arabidopsis microarray dataset under a cold treatment, we systematically evaluated our method using maximum fold change and principal component analyses. The results of this study showed that the ranked treatment-frequency fold change results produce fewer false positives than the original methodology, and the 26-hour timepoint in our dataset was the best statistic for distinguishing the most known cold-response genes. In addition, six novel cold-response genes were discovered. PRIISM also provides gene expression data which represents only circadian clock influences, and may be useful for circadian clock studies. Conclusion PRIISM is a novel approach for overcoming the problem of circadian disruptions from stress treatments on plants. PRIISM can be integrated with any existing analysis approach on gene expression data to separate circadian-influenced changes in gene expression, and it can be extended to apply to any organism with regular oscillations in gene expression patterns across a large portion of the genome.
Collapse
Affiliation(s)
- Bruce A Rosa
- Department of Biology, Lakehead University, ON, Canada
| | | | | | | | | | | |
Collapse
|