1
|
Orlandi KN, Phillips SR, Sailer ZR, Harman JL, Harms MJ. Topiary: Pruning the manual labor from ancestral sequence reconstruction. Protein Sci 2023; 32:e4551. [PMID: 36565302 PMCID: PMC9847077 DOI: 10.1002/pro.4551] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 12/25/2022]
Abstract
Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.
Collapse
Affiliation(s)
- Kona N. Orlandi
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of BiologyUniversity of OregonEugeneOregonUSA
| | - Sophia R. Phillips
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Zachary R. Sailer
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Joseph L. Harman
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Michael J. Harms
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| |
Collapse
|
2
|
Simmons MP, Maurin O, Bailey P, Brewer GE, Roy S, Lombardi JA, Forest F, Baker WJ. Benefits of alignment quality-control processing steps and an Angiosperms353 phylogenomics pipeline applied to the Celastrales. Cladistics 2022; 38:595-611. [PMID: 35569142 DOI: 10.1111/cla.12507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2022] [Indexed: 01/31/2023] Open
Abstract
We examined the impact of successive alignment quality-control steps on downstream phylogenomic analyses. We applied a recently published phylogenomics pipeline that was developed for the Angiosperms353 target-sequence-capture probe set to the flowering plant order Celastrales. Our final dataset consists of 158 species, including at least one exemplar from all 109 currently recognized Celastrales genera. We performed nine quality-control steps and compared the inferred resolution, branch support, and topological congruence of the inferred gene and species trees with those generated after each of the first six steps. We describe and justify each of our quality-control steps, including manual masking, in detail so that they may be readily applied to other lineages. We found that highly supported clades could generally be relied upon even if stringent orthology and alignment quality-control measures had not been applied. But separate instances were identified, for both concatenation and coalescence, wherein a clade was highly supported before manual masking but then subsequently contradicted. These results are generally reassuring for broad-scale analyses that use phylogenomics pipelines, but also indicate that we cannot rely exclusively on these analyses to conclude how challenging phylogenetic problems are best resolved.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, Colorado, 80523-1878, USA
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Paul Bailey
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Grace E Brewer
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Shyamali Roy
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Julio A Lombardi
- Departamento de Botânica, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista - UNESP, Av. 24-A 1515 - Bela Vista, Caixa Postal 199, São Paulo, Brazil
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | | |
Collapse
|