1
|
Wattam AR, Bowers N, Brettin T, Conrad N, Cucinell C, Davis JJ, Dickerman AW, Dietrich EM, Kenyon RW, Machi D, Mao C, Nguyen M, Olson RD, Overbeek R, Parrello B, Pusch GD, Shukla M, Stevens RL, Vonstein V, Warren AS. Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance. Methods Mol Biol 2024; 2802:547-571. [PMID: 38819571 DOI: 10.1007/978-1-0716-3838-5_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.
Collapse
Affiliation(s)
- Alice R Wattam
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA.
| | - Nicole Bowers
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Thomas Brettin
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Neal Conrad
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Clark Cucinell
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | - James J Davis
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Allan W Dickerman
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | - Emily M Dietrich
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | - Dustin Machi
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | - Chunhong Mao
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | - Marcus Nguyen
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Robert D Olson
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Ross Overbeek
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Bruce Parrello
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Division of Data Science and Learning, Argonne National Laboratory, Argonne, IL, USA
| | - Rick L Stevens
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
2
|
Chen H, Zhu Z, Qiu Y, Ge X, Zheng H, Peng Y. Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms. Virol Sin 2022; 37:437-444. [PMID: 35513273 PMCID: PMC9060714 DOI: 10.1016/j.virs.2022.04.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 04/02/2022] [Indexed: 12/05/2022] Open
Abstract
The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.
Collapse
Affiliation(s)
- Huiting Chen
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China
| | - Zhaozhong Zhu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China
| | - Ye Qiu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China
| | - Xingyi Ge
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China
| | - Heping Zheng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China.
| |
Collapse
|