1
|
Dort EN, Layne E, Feau N, Butyaev A, Henrissat B, Martin FM, Haridas S, Salamov A, Grigoriev IV, Blanchette M, Hamelin RC. Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits. Sci Rep 2023; 13:17203. [PMID: 37821494 PMCID: PMC10567782 DOI: 10.1038/s41598-023-44005-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 10/03/2023] [Indexed: 10/13/2023] Open
Abstract
Invasive plant pathogenic fungi have a global impact, with devastating economic and environmental effects on crops and forests. Biosurveillance, a critical component of threat mitigation, requires risk prediction based on fungal lifestyles and traits. Recent studies have revealed distinct genomic patterns associated with specific groups of plant pathogenic fungi. We sought to establish whether these phytopathogenic genomic patterns hold across diverse taxonomic and ecological groups from the Ascomycota and Basidiomycota, and furthermore, if those patterns can be used in a predictive capacity for biosurveillance. Using a supervised machine learning approach that integrates phylogenetic and genomic data, we analyzed 387 fungal genomes to test a proof-of-concept for the use of genomic signatures in predicting fungal phytopathogenic lifestyles and traits during biosurveillance activities. Our machine learning feature sets were derived from genome annotation data of carbohydrate-active enzymes (CAZymes), peptidases, secondary metabolite clusters (SMCs), transporters, and transcription factors. We found that machine learning could successfully predict fungal lifestyles and traits across taxonomic groups, with the best predictive performance coming from feature sets comprising CAZyme, peptidase, and SMC data. While phylogeny was an important component in most predictions, the inclusion of genomic data improved prediction performance for every lifestyle and trait tested. Plant pathogenicity was one of the best-predicted traits, showing the promise of predictive genomics for biosurveillance applications. Furthermore, our machine learning approach revealed expansions in the number of genes from specific CAZyme and peptidase families in the genomes of plant pathogens compared to non-phytopathogenic genomes (saprotrophs, endo- and ectomycorrhizal fungi). Such genomic feature profiles give insight into the evolution of fungal phytopathogenicity and could be useful to predict the risks of unknown fungi in future biosurveillance activities.
Collapse
Affiliation(s)
- E N Dort
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, Canada
| | - E Layne
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - N Feau
- Pacific Forestry Centre, Canadian Forest Service, Natural Resources Canada, Victoria, BC, Canada
| | - A Butyaev
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - B Henrissat
- Department of Biotechnology and Biomedicine (DTU Bioengineering), Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - F M Martin
- Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, Unité Mixte de Recherche Interactions Arbres/Microorganismes, Centre INRAE, Grand Est-Nancy, Université de Lorraine, Champenoux, France
| | - S Haridas
- Lawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome Institute, Berkeley, CA, USA
| | - A Salamov
- Lawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome Institute, Berkeley, CA, USA
| | - I V Grigoriev
- Lawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome Institute, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - M Blanchette
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - R C Hamelin
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Département des Sciences du bois et de la Forêt, Faculté de Foresterie et Géographie, Université Laval, Québec, QC, Canada.
| |
Collapse
|