1
|
Ceron-Noriega A, Schoonenberg VAC, Butter F, Levin M. AlexandrusPS: A User-Friendly Pipeline for the Automated Detection of Orthologous Gene Clusters and Subsequent Positive Selection Analysis. Genome Biol Evol 2023; 15:evad187. [PMID: 37831426 PMCID: PMC10612477 DOI: 10.1093/gbe/evad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 09/26/2023] [Accepted: 10/06/2023] [Indexed: 10/14/2023] Open
Abstract
The detection of adaptive selection in a system approach considering all protein-coding genes allows for the identification of mechanisms and pathways that enabled adaptation to different environments. Currently, available programs for the estimation of positive selection signals can be divided into two groups. They are either easy to apply but can analyze only one gene family at a time, restricting system analysis; or they can handle larger cohorts of gene families, but require considerable prerequisite data such as orthology associations, codon alignments, phylogenetic trees, and proper configuration files. All these steps require extensive computational expertise, restricting this endeavor to specialists. Here, we introduce AlexandrusPS, a high-throughput pipeline that overcomes technical challenges when conducting transcriptome-wide positive selection analyses on large sets of nucleotide and protein sequences. The pipeline streamlines 1) the execution of an accurate orthology prediction as a precondition for positive selection analysis, 2) preparing and organizing configuration files for CodeML, 3) performing positive selection analysis using CodeML, and 4) generating an output that is easy to interpret, including all maximum likelihood and log-likelihood test results. The only input needed from the user is the CDS and peptide FASTA files of proteins of interest. The pipeline is provided in a Docker image, requiring no program or module installation, enabling the application of the pipeline in any computing environment. AlexandrusPS and its documentation are available via GitHub (https://github.com/alejocn5/AlexandrusPS).
Collapse
Affiliation(s)
- Alejandro Ceron-Noriega
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Department of Human Genetics, Mainz, Germany
| | - Vivien A C Schoonenberg
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Present address: Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
- Present address: Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA.
| | - Falk Butter
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Institute of Molecular Virology and Cell Biology, Friedrich-Loeffler-Institute, Greifswald, Germany
| | - Michal Levin
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
| |
Collapse
|
2
|
Maldonado E, Antunes A. LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation. BMC Bioinformatics 2019; 20:739. [PMID: 31888452 PMCID: PMC6937843 DOI: 10.1186/s12859-019-3292-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 11/26/2019] [Indexed: 01/22/2023] Open
Abstract
Background Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. Results We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. Conclusions We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/.
Collapse
Affiliation(s)
- Emanuel Maldonado
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| |
Collapse
|
3
|
Dobon B, Montanucci L, Peretó J, Bertranpetit J, Laayouni H. Gene connectivity and enzyme evolution in the human metabolic network. Biol Direct 2019; 14:17. [PMID: 31481097 PMCID: PMC6724310 DOI: 10.1186/s13062-019-0248-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 08/21/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining the factors involved in the likelihood of a gene being under adaptive selection is still a challenging goal in Evolutionary Biology. Here, we perform an evolutionary analysis of the human metabolic genes to explore the associations between network structure and the presence and strength of natural selection in the genes whose products are involved in metabolism. Purifying and positive selection are estimated at interspecific (among mammals) and intraspecific (among human populations) levels, and the connections between enzymatic reactions are differentiated between incoming (in-degree) and outgoing (out-degree) links. RESULTS We confirm that purifying selection has been stronger in highly connected genes. Long-term positive selection has targeted poorly connected enzymes, whereas short-term positive selection has targeted different enzymes depending on whether the selective sweep has reached fixation in the population: genes under a complete selective sweep are poorly connected, whereas those under an incomplete selective sweep have high out-degree connectivity. The last steps of pathways are more conserved due to stronger purifying selection, with long-term positive selection targeting preferentially enzymes that catalyze the first steps. However, short-term positive selection has targeted enzymes that catalyze the last steps in the metabolic network. Strong signals of positive selection have been found for metabolic processes involved in lipid transport and membrane fluidity and permeability. CONCLUSIONS Our analysis highlights the importance of analyzing the same biological system at different evolutionary timescales to understand the evolution of metabolic genes and of distinguishing between incoming and outgoing links in a metabolic network. Short-term positive selection has targeted enzymes with a different connectivity profile depending on the completeness of the selective sweep, while long-term positive selection has targeted genes with fewer connections that code for enzymes that catalyze the first steps in the network. REVIEWERS This article was reviewed by Diamantis Sellis and Brandon Invergo.
Collapse
Affiliation(s)
- Begoña Dobon
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
| | - Ludovica Montanucci
- Dipartimento di Biomedicina Comparata e Alimentazione, Università degli Studi di Padova, Padua, Italy
| | - Juli Peretó
- Institute for Integrative Systems Biology I2SysBio (University of Valencia-CSIC) and Department of Biochemistry and Molecular Biology, University of Valencia, Valencia, Spain
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain.
| | - Hafid Laayouni
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain. .,Bioinformatics Studies, ESCI-UPF, Pg.Pujades 1, 08003, Barcelona, Catalonia, Spain.
| |
Collapse
|
4
|
Machado JP, Philip S, Maldonado E, O'Brien SJ, Johnson WE, Antunes A. Positive Selection Linked with Generation of Novel Mammalian Dentition Patterns. Genome Biol Evol 2016; 8:2748-59. [PMID: 27613398 PMCID: PMC5630915 DOI: 10.1093/gbe/evw200] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
A diverse group of genes are involved in the tooth development of mammals. Several studies, focused mainly on mice and rats, have provided a detailed depiction of the processes coordinating tooth formation and shape. Here we surveyed 236 tooth-associated genes in 39 mammalian genomes and tested for signatures of selection to assess patterns of molecular adaptation in genes regulating mammalian dentition. Of the 236 genes, 31 (∼13.1%) showed strong signatures of positive selection that may be responsible for the phenotypic diversity observed in mammalian dentition. Mammalian-specific tooth-associated genes had accelerated mutation rates compared with older genes found across all vertebrates. More recently evolved genes had fewer interactions (either genetic or physical), were associated with fewer Gene Ontology terms and had faster evolutionary rates compared with older genes. The introns of these positively selected genes also exhibited accelerated evolutionary rates, which may reflect additional adaptive pressure in the intronic regions that are associated with regulatory processes that influence tooth-gene networks. The positively selected genes were mainly involved in processes like mineralization and structural organization of tooth specific tissues such as enamel and dentin. Of the 236 analyzed genes, 12 mammalian-specific genes (younger genes) provided insights on diversification of mammalian teeth as they have higher evolutionary rates and exhibit different expression profiles compared with older genes. Our results suggest that the evolution and development of mammalian dentition occurred in part through positive selection acting on genes that previously had other functions.
Collapse
Affiliation(s)
- João Paulo Machado
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal Abel Salazar Biomedical Sciences Institute (ICBAS), University of Porto, Porto, Portugal
| | - Siby Philip
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| | - Emanuel Maldonado
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Stephen J O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia Oceanographic Center, Nova Southeastern University, Ft Lauderdale
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia, USA
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal Abel Salazar Biomedical Sciences Institute (ICBAS), University of Porto, Porto, Portugal Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| |
Collapse
|
5
|
Maldonado E, Almeida D, Escalona T, Khan I, Vasconcelos V, Antunes A. LMAP: Lightweight Multigene Analyses in PAML. BMC Bioinformatics 2016; 17:354. [PMID: 27597435 PMCID: PMC5011788 DOI: 10.1186/s12859-016-1204-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 08/24/2016] [Indexed: 12/22/2022] Open
Abstract
Background Uncovering how phenotypic diversity arises and is maintained in nature has long been a major interest of evolutionary biologists. Recent advances in genome sequencing technologies have remarkably increased the efficiency to pinpoint genes involved in the adaptive evolution of phenotypes. Reliability of such findings is most often examined with statistical and computational methods using Maximum Likelihood codon-based models (i.e., site, branch, branch-site and clade models), such as those available in codeml from the Phylogenetic Analysis by Maximum Likelihood (PAML) package. While these models represent a well-defined workflow for documenting adaptive evolution, in practice they can be challenging for researchers having a vast amount of data, as multiple types of relevant codon-based datasets are generated, making the overall process hard and tedious to handle, error-prone and time-consuming. Results We introduce LMAP (Lightweight Multigene Analyses in PAML), a user-friendly command-line and interactive package, designed to handle the codeml workflow, namely: directory organization, execution, results gathering and organization for Likelihood Ratio Test estimations with minimal manual user intervention. LMAP was developed for the workstation multi-core environment and provides a unique advantage for processing one, or more, if not all codeml codon-based models for multiple datasets at a time. Our software, proved efficiency throughout the codeml workflow, including, but not limited, to simultaneously handling more than 20 datasets. Conclusions We have developed a simple and versatile LMAP package, with outstanding performance, enabling researchers to analyze multiple different codon-based datasets in a high-throughput fashion. At minimum, two file types are required within a single input directory: one for the multiple sequence alignment and another for the phylogenetic tree. To our knowledge, no other software combines all codeml codon substitution models of adaptive evolution. LMAP has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP package is released under GPLv3 license and is freely available at http://lmapaml.sourceforge.net/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1204-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emanuel Maldonado
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal
| | - Daniela Almeida
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Tibisay Escalona
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Imran Khan
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Vitor Vasconcelos
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal. .,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| |
Collapse
|