1
|
Seo CW, Yoo S, Cho Y, Kim JS, Steinegger M, Lim YW. FunVIP: Fungal Validation and Identification Pipeline based on phylogenetic analysis. J Microbiol 2025; 63:e2411017. [PMID: 40313148 DOI: 10.71150/jm.2411017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 01/20/2025] [Indexed: 05/03/2025]
Abstract
The increase of sequence data in public nucleotide databases has made DNA sequence-based identification an indispensable tool for fungal identification. However, the large proportion of mislabeled sequence data in public databases leads to frequent misidentifications. Inaccurate identification is causing severe problems, especially for industrial and clinical fungi, and edible mushrooms. Existing species identification pipelines require separate validation of a dataset obtained from public databases containing mislabeled taxonomic identifications. To address this issue, we developed FunVIP, a fully automated phylogeny-based fungal validation and identification pipeline (https://github.com/Changwanseo/FunVIP). FunVIP employs phylogeny-based identification with validation, where the result is achievable only with a query, database, and a single command. FunVIP command comprises nine steps within a workflow: input management, sequence-set organization, alignment, trimming, concatenation, model selection, tree inference, tree interpretation, and report generation. Users may acquire identification results, phylogenetic tree evidence, and reports of conflicts and issues detected in multiple checkpoints during the analysis. The conflicting sample validation performance of FunVIP was demonstrated by re-iterating the manual revision of a fungal genus with a database with mislabeled sequences, Fuscoporia. We also compared the identification performance of FunVIP with BLAST and q2-feature-classifier with two mass double-revised fungal datasets, Sanghuangporus and Aspergillus section Terrei. Therefore, with its automatic validation ability and high identification performance, FunVIP proves to be a highly promising tool for achieving easy and accurate fungal identification.
Collapse
Affiliation(s)
- Chang Wan Seo
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea
| | - Shinnam Yoo
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea
| | - Yoonhee Cho
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea
| | - Ji Seon Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, Republic of Korea
| | - Young Woon Lim
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
- Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
2
|
Vargas-Castro I, Giorda F, Mattioda V, Goria M, Serracca L, Varello K, Carta V, Nodari S, Maniaci MG, Dell’Atti L, Testori C, Pussini N, Iulini B, Battistini R, Zoppi S, Nocera FD, Lucifora G, Fontanesi E, Acutis P, Casalone C, Grattarola C, Peletto S. Herpesvirus surveillance in stranded striped dolphins (Stenella coeruleoalba) and bottlenose dolphins (Tursiops truncatus) from Italy with emphasis on neuropathological characterization. PLoS One 2024; 19:e0311767. [PMID: 39441833 PMCID: PMC11498698 DOI: 10.1371/journal.pone.0311767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/24/2024] [Indexed: 10/25/2024] Open
Abstract
Herpesvirus (HV) is widely distributed among cetacean populations, with the highest prevalence reported in the Mediterranean Sea. In this study, a comprehensive analysis was conducted, including epidemiological, phylogenetic, and pathological aspects, with particular emphasis on neuropathology, to better understand the impact of HV in these animals. Our results show a higher presence of HV in males compared to females, with males exhibiting a greater number of positive tissues. Additionally, adults were more frequently affected by HV infection than juveniles, with no infections detected in calves or neonates. The affected species were striped (Stenella coeruleoalba) and bottlenose dolphins (Tursiops truncatus). The highest positivity rates were observed in the genital system, cerebrum, and skin tissues. Phylogenetic analysis indicated a higher occurrence of Gammaherpesvirus (GHV) sequences but increased genetic diversity within Alphaherpesvirus (AHV). Key neuropathological features included astro-microgliosis (n = 4) and meningitis with minimal to mild perivascular cuffing (n = 2). The presence of concurrent infections with other pathogens, particularly cetacean morbillivirus (CeMV), underscores the complex nature of infectious diseases in cetaceans. However, the presence of lesions at the Central Nervous System (CNS) with molecular positivity for GHV, excluding the involvement of other potential neurotropic agents, would confirm the potential of this HV subfamily to induce neurological damage. Pathological examination identified lesions in other organs that could potentially be associated with HV, characterized by lymphoid depletion and tissue inflammation. These findings enhance our understanding of HV in odontocetes and highlight the need for ongoing research into the factors driving these infections and their broader implications.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, Spain
| | - Federica Giorda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Virginia Mattioda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Maria Goria
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Laura Serracca
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Katia Varello
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Valerio Carta
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Sabrina Nodari
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Maria Grazia Maniaci
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Luana Dell’Atti
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Camilla Testori
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Nicola Pussini
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Barbara Iulini
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Roberta Battistini
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Simona Zoppi
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Fabio Di Nocera
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | - Giuseppe Lucifora
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | | | - Pierluigi Acutis
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Cristina Casalone
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Carla Grattarola
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Simone Peletto
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta—WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| |
Collapse
|
3
|
Tang X, Ortner NJ, Nikonishyna YV, Fernández-Quintero ML, Kokot J, Striessnig J, Liedl KR. Pathogenicity of de novo CACNA1D Ca 2+ channel variants predicted from sequence co-variation. Eur J Hum Genet 2024; 32:1065-1073. [PMID: 38553610 PMCID: PMC11369236 DOI: 10.1038/s41431-024-01594-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/02/2024] [Accepted: 03/12/2024] [Indexed: 09/04/2024] Open
Abstract
Voltage-gated L-type Cav1.3 Ca2+ channels support numerous physiological functions including neuronal excitability, sinoatrial node pacemaking, hearing, and hormone secretion. De novo missense mutations in the gene of their pore-forming α1-subunit (CACNA1D) induce severe gating defects which lead to autism spectrum disorder and a more severe neurological disorder with and without endocrine symptoms. The number of CACNA1D variants reported is constantly rising, but their pathogenic potential often remains unclear, which complicates clinical decision-making. Since functional tests are time-consuming and not always available, bioinformatic tools further improving pathogenicity potential prediction of novel variants are needed. Here we employed evolutionary analysis considering sequences of the Cav1.3 α1-subunit throughout the animal kingdom to predict the pathogenicity of human disease-associated CACNA1D missense variants. Co-variation analyses of evolutionary information revealed residue-residue couplings and allowed to generate a score, which correctly predicted previously identified pathogenic variants, supported pathogenicity in variants previously classified as likely pathogenic and even led to the re-classification or re-examination of 18 out of 80 variants previously assessed with clinical and electrophysiological data. Based on the prediction score, we electrophysiologically tested one variant (V584I) and found significant gating changes associated with pathogenic risks. Thus, our co-variation model represents a valuable addition to complement the assessment of the pathogenicity of CACNA1D variants completely independent of clinical diagnoses, electrophysiology, structural or biophysical considerations, and solely based on evolutionary analyses.
Collapse
Affiliation(s)
- Xuechen Tang
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Nadine J Ortner
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Yuliia V Nikonishyna
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Monica L Fernández-Quintero
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Janik Kokot
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Jörg Striessnig
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria.
| | - Klaus R Liedl
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria.
| |
Collapse
|
4
|
Tong H, Omar MAA, Wang Y, Li M, Li Z, Li Z, Ao Y, Wang Y, Jiang M, Li F. Essential roles of histone lysine methyltransferases EZH2 and EHMT1 in male embryo development of Phenacoccus solenopsis. Commun Biol 2024; 7:1021. [PMID: 39164404 PMCID: PMC11336100 DOI: 10.1038/s42003-024-06705-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 08/08/2024] [Indexed: 08/22/2024] Open
Abstract
Paternal genome elimination (PGE) is an intriguing but poorly understood reproductive strategy in which females are typically diploid, but males lose paternal genomes. Paternal genome heterochromatin (PGH) occurs in arthropods with germline PGE, such as the mealybug, coffee borer beetles, and booklice. Here, we present evidence that PGH initially occurs during early embryo development at around 15 h post-mating (hpm) in the cotton mealybug, Phenacoccus solenopsis Tinsley. Transcriptome analysis followed by qPCR validation indicated that six histone lysine methyltransferase (KMT) genes are predominantly expressed in adult females. We knocked down these five genes through dsRNA microinjection. We found that downregulation of two KMT genes, PsEZH2-X1 and PsEHMT1, resulted in a decrease of heterochromatin-related methylations, including H3K27me1, H3K27me3, and H3K9me3 in the ovaries, fewer PGH male embryos, and reduced male offspring. For further confirmation, we obtained two strains of transgenic tobacco highly expressing dsRNA targeting PsEZH2-X1 and PsEHMT1, respectively. Similarly, fewer PGH embryos and fewer male offspring were observed when feeding on these transgenic tobacco plants. Overall, we present evidence that PsEZH2-X1 and PsEHMT1 have essential roles in male embryo survival by regulating PGH formation in cotton mealybugs.
Collapse
Affiliation(s)
- Haojie Tong
- College of Life Sciences, China Jiliang University, Hangzhou, China.
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China.
| | - Mohamed A A Omar
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Department of Plant Protection, Faculty of Agriculture (Saba Basha), Alexandria University, Alexandria, Egypt
| | - Yuan Wang
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Meizhen Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Zicheng Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Zihao Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yan Ao
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Ying Wang
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Mingxing Jiang
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China.
| | - Fei Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insect Pests, Institute of Insect Sciences, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China.
| |
Collapse
|
5
|
Iglhaut C, Pečerska J, Gil M, Anisimova M. Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels. Mol Biol Evol 2024; 41:msae109. [PMID: 38842253 PMCID: PMC11221656 DOI: 10.1093/molbev/msae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 06/07/2024] Open
Abstract
Despite having important biological implications, insertion, and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modeling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.
Collapse
Affiliation(s)
- Clara Iglhaut
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Faculty of Mathematics and Science, University of Zurich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jūlija Pečerska
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Manuel Gil
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
6
|
Sha L, Liang X, Zhang X, Gao S, Zhang Y, Zhou Y, Fan X. Roegneria yenchiana: A new species in the Triticeae (Poaceae) from the Hengduan Mountain region. Ecol Evol 2024; 14:e11171. [PMID: 38495436 PMCID: PMC10944672 DOI: 10.1002/ece3.11171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/27/2024] [Accepted: 03/01/2024] [Indexed: 03/19/2024] Open
Abstract
Roegneria yenchiana sp. nov. (Triticeae) is a new species collected from Shangri-la of Yunnan Province in China based on morphological, cytological, and molecular data. It is morphologically characterized by one spikelet per node, rectangular glums, awns flanked by two short mucros in lemmas, distinguished from other species of Roegneria. The genomic in situ hybridization results indicate that R. yenchiana is an allotetraploid, and its genomic constitution is StY. Phylogenetic analyses based on multiple loci suggested that R. yenchiana is closely related to Pseudoroegneria and Roegneria, and the Pseudoroegneria served as the maternal donors during its polyploid speciation.
Collapse
Affiliation(s)
- Li‐Na Sha
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduSichuanChina
| | - Xiao Liang
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| | - Xin‐Yi Zhang
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| | - Shan Gao
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| | - Yue Zhang
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| | - Yong‐Hong Zhou
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| | - Xing Fan
- Triticeae Research InstituteSichuan Agricultural UniversityChengduSichuanChina
| |
Collapse
|
7
|
Gahtori R, Tripathi AH, Chand G, Pande A, Joshi P, Rai RC, Upadhyay SK. Phytochemical Screening of Nyctanthes arbor-tristis Plant Extracts and Their Antioxidant and Antibacterial Activity Analysis. Appl Biochem Biotechnol 2024; 196:436-456. [PMID: 37140779 DOI: 10.1007/s12010-023-04552-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/18/2023] [Indexed: 05/05/2023]
Abstract
Nyctanthes arbor-tristis, alias "Vishnu Parijat," is a medicinal plant used to treat various inflammation-associated ailments and to combat innumerable infections in the traditional system of medicine. In the present study, we collected the samples of N. arbor-tristis from the lower Himalayan region of Uttarakhand, India, and carried out their molecular identification through DNA barcoding. To examine the antioxidant and antibacterial activities, we prepared the ethanolic and aqueous extracts (from flowers and leaves) and performed their phytochemical analysis by using different qualitative and quantitative approaches. The phytoextracts showed marked antioxidant potential, as revealed by a comprehensive set of assays. The ethanolic leaf extract showed marked antioxidant potential towards DPPH, ABTS, and NO scavenging (IC50 = 30.75 ± 0.006, 30.83 ± 0.002, and 51.23 ± 0.009 μg/mL, respectively). We used TLC-bioautography assay to characterize different antioxidant constituents (based on their Rf values) in the chromatograms ran under different mobile phases. For one of the prominent antioxidant spots in TLC bioautography, GC-MS analysis identified cis-9-hexadecenal and n-hexadecanoic acid as the major constituents. Furthermore, in antibacterial study, the ethanolic leaf extract showed marked activity against Aeromonas salmonicida (113.40 mg/mL of extract was equivalent to 100 μg/mL of kanamycin). In contrast, the ethanolic flower extract showed considerable antibacterial activity against Pseudomonas aeruginosa (125.85 mg/mL of extract ≡100 μg/mL of kanamycin). This study presents the phylogenetic account and unravels the antioxidant-related properties and antibacterial potential of N. arbor-tristis.
Collapse
Affiliation(s)
- Rekha Gahtori
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India
| | - Ankita H Tripathi
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India
| | - Garima Chand
- Department of Chemistry, Kumaun University, DSB Campus, Nainital, Uttarakhand, 263001, India
| | - Amit Pande
- ICAR-Directorate Coldwater Fisheries Research, Bhimtal, Uttarakhand, 263136, India
| | - Penny Joshi
- Department of Chemistry, Kumaun University, DSB Campus, Nainital, Uttarakhand, 263001, India
| | - Ramesh Chandra Rai
- Translational Health Science and Technology Institute (THSTI), Faridabad, Haryana, 121001, India.
| | - Santosh K Upadhyay
- Department of Biotechnology, Kumaun University, Bhimtal Campus, Nainital, Uttarakhand, 263136, India.
| |
Collapse
|
8
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
9
|
Vargas-Castro I, Crespo-Picazo JL, Jiménez Martínez MÁ, Marco-Cabedo V, Muñoz-Baquero M, García-Párraga D, Sánchez-Vizcaíno JM. First description of a lesion in the upper digestive mucosa associated with a novel gammaherpesvirus in a striped dolphin (Stenella coeruleoalba) stranded in the Western Mediterranean Sea. BMC Vet Res 2023; 19:118. [PMID: 37563731 PMCID: PMC10413511 DOI: 10.1186/s12917-023-03677-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND A wide variety of lesions have been associated with herpesvirus in cetaceans. However, descriptions of herpesvirus infections in the digestive system of cetaceans are scarce. CASE REPORT A young female striped dolphin stranded in the Valencian Community (Spain) on the 6th August 2021. The animal showed external macroscopic lesions suggestive of an aggressive interaction with bottlenose dolphins (rake marks in the epidermis). Internally, the main findings included congestion of the central nervous system and multiple, well-defined, whitish, irregularly shaped, proliferative lesions on the oropharyngeal and laryngopharyngeal mucosa. Histopathology revealed lymphoplasmacytic and histiocytic meningoencephalitis, consistent with neuro brucellosis. The oropharyngeal and laryngopharyngeal plaques were comprised histologically of focally extensive epithelial hyperplasia. As part of the health surveillance program tissue samples were tested for cetacean morbillivirus using a real-time reverse transcription-PCR, for Brucella spp. using a real-time PCR, and for herpesvirus using a conventional nested PCR. All samples were negative for cetacean morbillivirus; molecular positivity for Brucella spp. was obtained in pharyngeal tonsils and cerebrospinal fluid; herpesvirus was detected in a proliferative lesion in the upper digestive mucosa. Phylogenetic analysis showed that the herpesvirus sequence was included in the Gammaherpesvirinae subfamily. This novel sequence showed the greatest identity with other Herpesvirus sequences detected in skin, pharyngeal and genital lesions in five different species. CONCLUSIONS To the best of the authors' knowledge, this is the first report of a proliferative lesion in the upper digestive mucosa associated with gammaherpesvirus posititvity in a striped dolphin (Stenella coeruleoalba).
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, 28040, Spain.
| | - José Luis Crespo-Picazo
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Mª Ángeles Jiménez Martínez
- Department of Animal Medicine and Surgery, Veterinary Faculty, Complutense University of Madrid, Madrid, 28040, Spain
| | - Vicente Marco-Cabedo
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Marta Muñoz-Baquero
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
| | - Daniel García-Párraga
- Research Department, Fundación Oceanogràfic de la Comunidad Valenciana, 46013, Valencia, Spain
- Biology Department, Oceanogràfic, Ciudad de las Artes y las Ciencias, 46013, Valencia, Spain
| | - José Manuel Sánchez-Vizcaíno
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, 28040, Spain
| |
Collapse
|
10
|
Vargas-Castro I, Peletto S, Mattioda V, Goria M, Serracca L, Varello K, Sánchez-Vizcaíno JM, Puleio R, Nocera FD, Lucifora G, Acutis P, Casalone C, Grattarola C, Giorda F. Epidemiological and genetic analysis of Cetacean Morbillivirus circulating on the Italian coast between 2018 and 2021. Front Vet Sci 2023; 10:1216838. [PMID: 37583469 PMCID: PMC10424449 DOI: 10.3389/fvets.2023.1216838] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/04/2023] [Indexed: 08/17/2023] Open
Abstract
Cetacean morbillivirus (CeMV) has caused several outbreaks, unusual mortality events, and interepidemic single-lethal disease episodes in the Mediterranean Sea. Since 2012, a new strain with a northeast (NE) Atlantic origin has been circulating among Mediterranean cetaceans, causing numerous deaths. The objective of this study was to determine the prevalence of CeMV in cetaceans stranded in Italy between 2018 and 2021 and characterize the strain of CeMV circulating. Out of the 354 stranded cetaceans along the Italian coastlines, 113 were CeMV-positive. This prevalence (31.9%) is one of the highest reported without an associated outbreak. All marine sectors along the Italian coastlines, except for the northern Adriatic coast, reported a positive molecular diagnosis of CeMV. In one-third of the CeMV-positive cetaceans submitted to a histological evaluation, a chronic form of the infection (detectable viral antigen, the absence of associated lesions, and concomitant coinfections) was suspected. Tissues from 24 animals were used to characterize the strain, obtaining 57 sequences from phosphoprotein, nucleocapsid, and fusion protein genes, which were submitted to GenBank. Our sequences showed the highest identity with NE-Atlantic strain sequences, and in the phylogenetic study, they clustered together with them. Regarding age and species, most of these individuals were adults (17/24, 70.83%) and striped dolphins (19/24, 79.16%). This study improves our understanding on the NE-Atlantic CeMV strain in the Italian waters, supporting the hypothesis of an endemic circulation of the virus in this area; however, additional studies are necessary to deeply comprehend the epidemiology of this strain in the Mediterranean Sea.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, Madrid, Spain
| | - Simone Peletto
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Virginia Mattioda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Maria Goria
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Laura Serracca
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Katia Varello
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | | | - Roberto Puleio
- Istituto Zooprofilattico Sperimentale della Sicilia, Palermo, Italy
| | - Fabio Di Nocera
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | - Giuseppe Lucifora
- Istituto Zooprofilattico Sperimentale del Mezzogiorno, Naples, Italy
| | - Pierluigi Acutis
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Cristina Casalone
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Carla Grattarola
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| | - Federica Giorda
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta - WOAH Collaborating Centre for the Health of Marine Mammals, Turin, Italy
| |
Collapse
|
11
|
Santus L, Garriga E, Deorowicz S, Gudyś A, Notredame C. Towards the accurate alignment of over a million protein sequences: Current state of the art. Curr Opin Struct Biol 2023; 80:102577. [PMID: 37012200 DOI: 10.1016/j.sbi.2023.102577] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/21/2023] [Accepted: 02/27/2023] [Indexed: 04/04/2023]
Abstract
Large-scale genomics requires highly scalable and accurate multiple sequence alignment methods. Results collected over this last decade suggest accuracy loss when scaling up over a few thousand sequences. This issue has been actively addressed with a number of innovative algorithmic solutions that combine low-level hardware optimization with novel higher-level heuristics. This review provides an extensive critical overview of these recent methods. Using established reference datasets we conclude that albeit significant progress has been achieved, a unified framework able to consistently and efficiently produce high-accuracy large-scale multiple alignments is still lacking.
Collapse
|
12
|
Kuang M, Zhang Y, Lam TW, Ting HF. MLProbs: A Data-Centric Pipeline for Better Multiple Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:524-533. [PMID: 35120007 DOI: 10.1109/tcbb.2022.3148382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In this paper, we explore using the data-centric approach to tackle the Multiple Sequence Alignment (MSA) construction problem. Unlike the algorithm-centric approach, which reduces the construction problem to a combinatorial optimization problem based on an abstract mathematical model, the data-centric approach explores using classification models trained from existing benchmark data to guide the construction. We identified two simple classifications to help us choose a better alignment tool and determine whether and how much to carry out realignment. We show that shallow machine-learning algorithms suffice to train sensitive models for these classifications. Based on these models, we implemented a new multiple sequence alignment pipeline, called MLProbs. Compared with 10 other popular alignment tools over four benchmark databases (namely, BAliBASE, OXBench, OXBench-X and SABMark), MLProbs consistently gives the highest TC score. More importantly, MLProbs shows non-trivial improvement for protein families with low similarity; in particular, when evaluated against the 1,356 protein families with similarity ≤ 50%, MLProbs achieves a TC score of 56.93, while the next best three tools are in the range of [55.41, 55.91] (increased by more than 1.8%). We also compared the performance of MLProbs and other MSA tools in two real-life applications - Phylogenetic Tree Construction Analysis and Protein Secondary Structure Prediction - and MLProbs also had the best performance. In our study, we used only shallow machine-learning algorithms to train our models. It would be interesting to study whether deep-learning methods can help make further improvements, so we suggest some possible research directions in the conclusion section.
Collapse
|
13
|
SIN-3 functions through multi-protein interaction to regulate apoptosis, autophagy, and longevity in Caenorhabditis elegans. Sci Rep 2022; 12:10560. [PMID: 35732652 PMCID: PMC9217932 DOI: 10.1038/s41598-022-13864-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/09/2022] [Indexed: 11/08/2022] Open
Abstract
SIN3/HDAC is a multi-protein complex that acts as a regulatory unit and functions as a co-repressor/co-activator and a general transcription factor. SIN3 acts as a scaffold in the complex, binding directly to HDAC1/2 and other proteins and plays crucial roles in regulating apoptosis, differentiation, cell proliferation, development, and cell cycle. However, its exact mechanism of action remains elusive. Using the Caenorhabditis elegans (C. elegans) model, we can surpass the challenges posed by the functional redundancy of SIN3 isoforms. In this regard, we have previously demonstrated the role of SIN-3 in uncoupling autophagy and longevity in C. elegans. In order to understand the mechanism of action of SIN3 in these processes, we carried out a comparative analysis of the SIN3 protein interactome from model organisms of different phyla. We identified conserved, expanded, and contracted gene classes. The C. elegans SIN-3 interactome -revealed the presence of well-known proteins, such as DAF-16, SIR-2.1, SGK-1, and AKT-1/2, involved in autophagy, apoptosis, and longevity. Overall, our analyses propose potential mechanisms by which SIN3 participates in multiple biological processes and their conservation across species and identifies candidate genes for further experimental analysis.
Collapse
|
14
|
Hubley R, Wheeler TJ, Smit AFA. Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families. NAR Genom Bioinform 2022; 4:lqac040. [PMID: 35591887 PMCID: PMC9112768 DOI: 10.1093/nargab/lqac040] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 03/29/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open
Abstract
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
Collapse
Affiliation(s)
- Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Travis J Wheeler
- Department of Computer Science, University of Montana, Missoula, MT 59801, USA
| | | |
Collapse
|
15
|
Chao J, Tang F, Xu L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022; 12:biom12040546. [PMID: 35454135 PMCID: PMC9024764 DOI: 10.3390/biom12040546] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 01/27/2023] Open
Abstract
The continuous development of sequencing technologies has enabled researchers to obtain large amounts of biological sequence data, and this has resulted in increasing demands for software that can perform sequence alignment fast and accurately. A number of algorithms and tools for sequence alignment have been designed to meet the various needs of biologists. Here, the ideas that prevail in the research of sequence alignment and some quality estimation methods for multiple sequence alignment tools are summarized.
Collapse
Affiliation(s)
- Jiannan Chao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China;
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China;
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
- Correspondence:
| |
Collapse
|
16
|
Kostenko DO, Korotkov EV. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences. Int J Mol Sci 2022; 23:ijms23073764. [PMID: 35409125 PMCID: PMC8998981 DOI: 10.3390/ijms23073764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/10/2022] Open
Abstract
The aim of this work was to compare the multiple alignment methods MAHDS, T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK in their ability to align highly divergent amino acid sequences. To accomplish this, we created test amino acid sequences with an average number of substitutions per amino acid (x) from 0.6 to 5.6, a total of 81 sets. Comparison of the performance of sequence alignments constructed by MAHDS and previously developed algorithms using the CS and Z score criteria and the benchmark alignment database (BAliBASE) indicated that, although the quality of the alignments built with MAHDS was somewhat lower than that of the other algorithms, it was compensated by greater statistical significance. MAHDS could construct statistically significant alignments of artificial sequences with x ≤ 4.8, whereas the other algorithms (T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK) could not perform that at x > 2.4. The application of MAHDS to align 21 families of highly diverged proteins (identity < 20%) from Pfam and HOMSTRAD databases showed that it could calculate statistically significant alignments in cases when the other methods failed. Thus, MAHDS could be used to construct statistically significant multiple alignments of highly divergent protein sequences, which accumulated multiple mutations during evolution.
Collapse
|
17
|
Tumescheit C, Firth AE, Brown K. CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments. PeerJ 2022; 10:e12983. [PMID: 35310163 PMCID: PMC8932311 DOI: 10.7717/peerj.12983] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 02/01/2022] [Indexed: 01/11/2023] Open
Abstract
Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
Collapse
Affiliation(s)
| | - Andrew E. Firth
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Katherine Brown
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
18
|
Baylon JL, Ursu O, Muzdalo A, Wassermann AM, Adams GL, Spale M, Mejzlik P, Gromek A, Pisarenko V, Hancharyk D, Jenkins E, Bednar D, Chang C, Clarova K, Glick M, Bitton DA. PepSeA: Peptide Sequence Alignment and Visualization Tools to Enable Lead Optimization. J Chem Inf Model 2022; 62:1259-1267. [PMID: 35192366 DOI: 10.1021/acs.jcim.1c01360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Therapeutic peptides offer potential advantages over small molecules in terms of selectivity, affinity, and their ability to target "undruggable" proteins that are associated with a wide range of pathologies. Despite their importance, current molecular design capabilities that inform medicinal chemistry decisions on peptide programs are limited. More specifically, there are unmet needs for structure-activity relationship (SAR) analysis and visualization of linear, cyclic, and cross-linked peptides containing non-natural motifs, which are widely used in drug discovery. To bridge this gap, we developed PepSeA (Peptide Sequence Alignment and Visualization), an open-source, freely available package of sequence-based tools (https://github.com/Merck/PepSeA). PepSeA enables multiple sequence alignment of non-natural amino acids and enhanced visualization with the hierarchical editing language for macromolecules (HELM). Via stepwise SAR analysis of a ChEMBL peptide data set, we demonstrate the utility of PepSeA to accelerate decision making in lead optimization campaigns in pharmaceutical setting. PepSeA represents an initial attempt to expand cheminformatics capabilities for therapeutic peptides and to enable rapid and more efficient design-make-test cycles.
Collapse
Affiliation(s)
- Javier L Baylon
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Oleg Ursu
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Anja Muzdalo
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Anne Mai Wassermann
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Gregory L Adams
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Martin Spale
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Petr Mejzlik
- AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Anna Gromek
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Viktor Pisarenko
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Dzianis Hancharyk
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Esteban Jenkins
- Foundational Data and Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - David Bednar
- Foundational Data and Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Charlie Chang
- Discovery Research IT, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Kamila Clarova
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic.,Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague 166 28, Czech Republic
| | - Meir Glick
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Danny A Bitton
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| |
Collapse
|
19
|
Liu H, Zou Q, Xu Y. A novel fast multiple nucleotide sequence alignment method based on FM-index. Brief Bioinform 2021; 23:6458932. [PMID: 34893794 DOI: 10.1093/bib/bbab519] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/19/2021] [Accepted: 11/14/2021] [Indexed: 11/13/2022] Open
Abstract
Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.
Collapse
Affiliation(s)
- Huan Liu
- School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China
| | - Quan Zou
- Institute of basic and Frontier Sciences, University of Electronic Science and Technology of China and Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Yun Xu
- School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China
| |
Collapse
|
20
|
Ghosh P, Bhattacharya M, Patra P, Sharma G, Patra BC, Lee SS, Sharma AR, Chakraborty C. Evaluation and Designing of Epitopic-Peptide Vaccine Against Bunyamwera orthobunyavirus Using M-Polyprotein Target Sequences. Int J Pept Res Ther 2021; 28:5. [PMID: 34867129 PMCID: PMC8634745 DOI: 10.1007/s10989-021-10322-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2021] [Indexed: 11/30/2022]
Abstract
Bunyamwera orthobunyavirus and its serogroup can cause several diseases in humans, cattle, ruminants, and birds. The viral M-polyprotein helps the virus to enter the host body. Therefore, this protein might serve as a potential vaccine target against Bunyamwera orthobunyavirus. The present study applied the immunoinformatics technique to design an epitopic vaccine component that could protect against Bunyamwera infection. Phylogenetic analysis revealed the presence of conserved patterns of M-polyprotein within the viral serogroup. Three epitopes common for both B-cell and T-cell were identified, i.e., YQPTELTRS, YKAHDKEET, and ILGTGTPKF merged with a specific linker peptide to construct an active vaccine component. The low atomic contact energy value of docking complex between human TLR4 (TLR4/MD2 complex) and vaccine construct confirms the elevated protein–protein binding interaction. Molecular dynamic simulation and normal mode analysis illustrate the docking complex’s stability, especially by the higher Eigenvalue. In silico cloning of the vaccine construct was applied to amplify the desired vaccine component. Structural allocation of both the vaccine and epitopes also show the efficacy of the developed vaccine. Hence, the computational research design outcomes support that the peptide-based vaccine construction is a crucial drive target to limit the infection of Bunyamwera orthobunyavirus to an extent.
Collapse
Affiliation(s)
- Pratik Ghosh
- Department of Zoology, Vidyasagar University, Midnapore, West Bengal 721102 India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha 756020 India
| | - Prasanta Patra
- Department of Zoology, Vidyasagar University, Midnapore, West Bengal 721102 India
| | - Garima Sharma
- Neuropsychopharmacology and Toxicology Program, College of Pharmacy, Kangwon National University, Chuncheon-si, Republic of Korea
| | - Bidhan Chandra Patra
- Department of Zoology, Vidyasagar University, Midnapore, West Bengal 721102 India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, 24252 Gangwon-do Republic of Korea
| | - Ashish Ranjan Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, 24252 Gangwon-do Republic of Korea
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Barasat-Barrackpore Rd, Kolkata, West Bengal 700126 India
| |
Collapse
|
21
|
Vargas-Castro I, Melero M, Crespo-Picazo JL, Jiménez MDLÁ, Sierra E, Rubio-Guerri C, Arbelo M, Fernández A, García-Párraga D, Sánchez-Vizcaíno JM. Systematic Determination of Herpesvirus in Free-Ranging Cetaceans Stranded in the Western Mediterranean: Tissue Tropism and Associated Lesions. Viruses 2021; 13:v13112180. [PMID: 34834986 PMCID: PMC8621769 DOI: 10.3390/v13112180] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/22/2021] [Accepted: 10/25/2021] [Indexed: 11/16/2022] Open
Abstract
The monitoring of herpesvirus infection provides useful information when assessing marine mammals’ health. This paper shows the prevalence of herpesvirus infection (80.85%) in 47 cetaceans stranded on the coast of the Valencian Community, Spain. Of the 966 tissues evaluated, 121 tested positive when employing nested-PCR (12.53%). The largest proportion of herpesvirus-positive tissue samples was in the reproductive system, nervous system, and tegument. Herpesvirus was more prevalent in females, juveniles, and calves. More than half the DNA PCR positive tissues contained herpesvirus RNA, indicating the presence of actively replicating virus. This RNA was most frequently found in neonates. Fourteen unique sequences were identified. Most amplified sequences belonged to the Gammaherpesvirinae subfamily, but a greater variation was found in Alphaherpesvirinae sequences. This is the first report of systematic herpesvirus DNA and RNA determination in free-ranging cetaceans. Nine (19.14%) were infected with cetacean morbillivirus and all of them (100%) were coinfected with herpesvirus. Lesions similar to those caused by herpesvirus in other species were observed, mainly in the skin, upper digestive tract, genitalia, and central nervous system. Other lesions were also attributable to concomitant etiologies or were nonspecific. It is necessary to investigate the possible role of herpesvirus infection in those cases.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Correspondence:
| | - Mar Melero
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Division of External Health, Government Delegation in the Community of Madrid, Ministry of Territorial Policy, 28071 Madrid, Spain
| | - José Luis Crespo-Picazo
- Research Department, Fundación Oceanogràfic de la Comunitat Valenciana, 46013 Valencia, Spain; (J.L.C.-P.); (D.G.-P.)
| | - María de los Ángeles Jiménez
- Department of Animal Medicine and Surgery, Veterinary Faculty, Complutense University of Madrid, 28040 Madrid, Spain;
| | - Eva Sierra
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Consuelo Rubio-Guerri
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
- Department of Pharmacy, Facultad de CC de la Salud, UCH-CEU University, 46113 Valencia, Spain
| | - Manuel Arbelo
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Antonio Fernández
- Division of Veterinary Histology and Pathology, Institute for Animal Health, Veterinary School, University of Las Palmas de Gran Canaria, 35416 Canary Islands, Spain; (E.S.); (M.A.); (A.F.)
| | - Daniel García-Párraga
- Research Department, Fundación Oceanogràfic de la Comunitat Valenciana, 46013 Valencia, Spain; (J.L.C.-P.); (D.G.-P.)
| | - José Manuel Sánchez-Vizcaíno
- VISAVET Health Surveillance Centre and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040 Madrid, Spain; (M.M.); (C.R.-G.); (J.M.S.-V.)
| |
Collapse
|
22
|
Qu J, Zou X, Cao W, Xu Z, Liang Z. Two new species of Hirsutella (Ophiocordycipitaceae, Sordariomycetes) that are parasitic on lepidopteran insects from China. MycoKeys 2021; 82:81-96. [PMID: 34408539 PMCID: PMC8367965 DOI: 10.3897/mycokeys.82.66927] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 06/13/2021] [Indexed: 11/16/2022] Open
Abstract
Hirsutella are globally distributed entomopathogenic fungi that offer important economic applications in biological control and biomedicine. Hirsutella was suppressed in favour of Ophiocordyceps affected by the ending of dual nomenclature for pleomorphic fungi in 2011. Currently, Hirsutella has been resurrected as a genus under Ophiocordycipitaceae. In this study, we introduce two new species of Hirsutella, based on morphological and phylogenetic analyses. Hirsutellaflava and H.kuankuoshuiensis are pathogenic on different species of larval Lepidoptera in China. Hirsutellaflava primarily differs from related species by its awl-shaped base; long and narrow neck, 24–40.8 × 2.2–2.5 μm; long and narrow cymbiform or fusoid conidia, 6.5–10 × 2.1–4.3 μm. Hirsutellakuankuoshuiensis has two types of phialides and distinctive 9.9–12.6 × 2.7–4.5 μm, clavate or botuliform conidia. The distinctions amongst the new species and phylogenetic relationships with other Hirsutella species are discussed.
Collapse
Affiliation(s)
- Jiaojiao Qu
- College of Tea Sciences, Guizhou University, Guiyang, 550025, China Guizhou University Guiyang China
| | - Xiao Zou
- Institute of Fungal Resources, College of Life Sciences, Guizhou University, Guiyang, 550025, China Guizhou University Guiyang China
| | - Wei Cao
- Institute of Fungal Resources, College of Life Sciences, Guizhou University, Guiyang, 550025, China Guizhou University Guiyang China
| | - Zhongshun Xu
- Institute of Fungal Resources, College of Life Sciences, Guizhou University, Guiyang, 550025, China Guizhou University Guiyang China
| | - Zongqi Liang
- Institute of Fungal Resources, College of Life Sciences, Guizhou University, Guiyang, 550025, China Guizhou University Guiyang China
| |
Collapse
|
23
|
Ramanathan N, Ramamurthy J, Natarajan G. Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison - A Review. Comb Chem High Throughput Screen 2021; 25:365-380. [PMID: 34382516 DOI: 10.2174/1386207324666210811101437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 06/16/2021] [Accepted: 06/24/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Biological macromolecules namely, DNA, RNA, and protein have their building blocks organized in a particular sequence and the sequential arrangement encodes evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by multiple sequence algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using numerical characterization of DNA sequences. <P> Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimesnional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis is presented. The extension of computing molecular descriptors in chemometrics to the calculation of new set of DNA invariants and their use in alignment-free sequence comparison in a N-dimensional space and construction of phylogenetic tress is also reviewed. <P> Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide-based vaccine by combining numerical characterization and graphical representation.
Collapse
Affiliation(s)
- Natarajan Ramanathan
- Department of Chemistry, Sri Sarada Niketan College for Women, Karur-639005, Tamil Nadu. India
| | - Jayalakshmi Ramamurthy
- Department of Computer Science, Sri Sarada Niketan College for Women, Karur-639005, Tamil Nadu. India
| | - Ganapathy Natarajan
- Department of Mechanical Engineering and Industrial Engineering, University of Wisconsin, Platteville, WI 53818. United States
| |
Collapse
|
24
|
Aguilar-Vega C, Rivera B, Lucientes J, Gutiérrez-Boada I, Sánchez-Vizcaíno JM. A study of the composition of the Obsoletus complex and genetic diversity of Culicoides obsoletus populations in Spain. Parasit Vectors 2021; 14:351. [PMID: 34217330 PMCID: PMC8254917 DOI: 10.1186/s13071-021-04841-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 06/11/2021] [Indexed: 11/10/2022] Open
Abstract
Background The Culicoides obsoletus species complex (henceforth ‘Obsoletus complex’) is implicated in the transmission of several arboviruses that can cause severe disease in livestock, such as bluetongue, African horse sickness, epizootic hemorrhagic disease and Schmallenberg disease. Thus, this study aimed to increase our knowledge of the composition and genetic diversity of the Obsoletus complex by partial sequencing of the cytochrome c oxidase I (cox1) gene in poorly studied areas of Spain. Methods A study of C. obsoletus populations was carried out using a single-tube multiplex polymerase chain reaction (PCR) assay that was designed to differentiate the Obsoletus complex sibling species Culicoides obsoletus and Culicoides scoticus, based on the partial amplification of the cox1 gene, as well as cox1 georeferenced sequences from Spain available at GenBank. We sampled 117 insects of the Obsoletus complex from six locations and used a total of 238 sequences of C. obsoletus (ss) individuals (sampled here, and from GenBank) from 14 sites in mainland Spain, the Balearic Islands and the Canary Islands for genetic diversity and phylogenetic analyses. Results We identified 90 C. obsoletus (ss), 19 Culicoides scoticus and five Culicoides montanus midges from the six collection sites sampled, and found that the genetic diversity of C. obsoletus (ss) were higher in mainland Spain than in the Canary Islands. The multiplex PCR had limitations in terms of specificity, and no cryptic species within the Obsoletus complex were identified. Conclusions Within the Obsoletus complex, C. obsoletus (ss) was the predominant species in the analyzed sites of mainland Spain. Information about the species composition of the Obsoletus complex could be of relevance for future epidemiological studies when specific aspects of the vector competence and capacity of each species have been identified. Our results indicate that the intraspecific divergence is higher in C. obsoletus (ss) northern populations, and demonstrate the isolation of C. obsoletus (ss) populations of the Canary Islands. Graphical abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04841-z.
Collapse
Affiliation(s)
- Cecilia Aguilar-Vega
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain.
| | - Belén Rivera
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| | - Javier Lucientes
- Department of Animal Pathology (Animal Health), Faculty of Veterinary Medicine, AgriFood Institute of Aragón IA2, University of Zaragoza, Zaragoza, Spain
| | - Isabel Gutiérrez-Boada
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| | - José Manuel Sánchez-Vizcaíno
- Animal Health Department, Faculty of Veterinary Medicine, VISAVET Health Surveillance Centre, Complutense University of Madrid, Madrid, Spain
| |
Collapse
|
25
|
Gortázar C, Barroso-Arévalo S, Ferreras-Colino E, Isla J, de la Fuente G, Rivera B, Domínguez L, de la Fuente J, Sánchez-Vizcaíno JM. Natural SARS-CoV-2 Infection in Kept Ferrets, Spain. Emerg Infect Dis 2021; 27:1994-1996. [PMID: 34152974 PMCID: PMC8237878 DOI: 10.3201/eid2707.210096] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We found severe acute respiratory syndrome coronavirus 2 RNA in 6 (8.4%) of 71 ferrets in central Spain and isolated and sequenced virus from 1 oral and 1 rectal swab specimen. Natural infection occurs in kept ferrets when virus circulation among humans is high. However, small ferret collections probably cannot maintain virus circulation.
Collapse
|
26
|
Krebs FS, Zoete V, Trottet M, Pouchon T, Bovigny C, Michielin O. Swiss-PO: a new tool to analyze the impact of mutations on protein three-dimensional structures for precision oncology. NPJ Precis Oncol 2021; 5:19. [PMID: 33737716 PMCID: PMC7973488 DOI: 10.1038/s41698-021-00156-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
Swiss-PO is a new web tool to map gene mutations on the 3D structure of corresponding proteins and to intuitively assess the structural implications of protein variants for precision oncology. Swiss-PO is constructed around a manually curated database of 3D structures, variant annotations, and sequence alignments, for a list of 50 genes taken from the Ion AmpliSeqTM Custom Cancer Hotspot Panel. The website was designed to guide users in the choice of the most appropriate structure to analyze regarding the mutated residue, the role of the protein domain it belongs to, or the drug that could be selected to treat the patient. The importance of the mutated residue for the structure and activity of the protein can be assessed based on the molecular interactions exchanged with neighbor residues in 3D within the same protein or between different biomacromolecules, its conservation in orthologs, or the known effect of reported mutations in its 3D or sequence-based vicinity. Swiss-PO is available free of charge or login at https://www.swiss-po.ch .
Collapse
Affiliation(s)
- Fanny S Krebs
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
| | - Vincent Zoete
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Maxence Trottet
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Timothée Pouchon
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Christophe Bovigny
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Olivier Michielin
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Department of Oncology, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
27
|
Akand EH, Murray JM. NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences. BMC Bioinformatics 2021; 22:54. [PMID: 33557755 PMCID: PMC7869453 DOI: 10.1186/s12859-020-03901-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/23/2020] [Indexed: 08/29/2023] Open
Abstract
BACKGROUND The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences. RESULTS We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope. CONCLUSIONS NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .
Collapse
Affiliation(s)
- Elma H Akand
- School of Mathematics and Statistics, UNSW, Sydney, NSW, Australia.
| | - John M Murray
- School of Mathematics and Statistics, UNSW, Sydney, NSW, Australia
| |
Collapse
|
28
|
Interactomes: Experimental and In Silico Approaches. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:107-117. [DOI: 10.1007/978-3-030-80352-0_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
Vargas-Castro I, Crespo-Picazo JL, Rivera-Arroyo B, Sánchez R, Marco-Cabedo V, Jiménez-Martínez MÁ, Fayos M, Serdio Á, García-Párraga D, Sánchez-Vizcaíno JM. Alpha- and gammaherpesviruses in stranded striped dolphins (Stenella coeruleoalba) from Spain: first molecular detection of gammaherpesvirus infection in central nervous system of odontocetes. BMC Vet Res 2020; 16:288. [PMID: 32787898 PMCID: PMC7425534 DOI: 10.1186/s12917-020-02511-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/06/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Herpesvirus infections in cetaceans have always been attributed to the Alphaherpesvirinae and Gammaherpesvirinae subfamilies. To date, gammaherpesviruses have not been reported in the central nervous system of odontocetes. CASE PRESENTATION A mass stranding of 14 striped dolphins (Stenella coeruleoalba) occurred in Cantabria (Spain) on 18th May 2019. Tissue samples were collected and tested for herpesvirus using nested polymerase chain reaction (PCR), and for cetacean morbillivirus using reverse transcription-PCR. Cetacean morbillivirus was not detected in any of the animals, while gammaherpesvirus was detected in nine male and one female dolphins. Three of these males were coinfected by alphaherpesviruses. Alphaherpesvirus sequences were detected in the cerebrum, spinal cord and tracheobronchial lymph node, while gammaherpesvirus sequences were detected in the cerebrum, cerebellum, spinal cord, pharyngeal tonsils, mesenteric lymph node, tracheobronchial lymph node, lung, skin and penile mucosa. Macroscopic and histopathological post-mortem examinations did not unveil the potential cause of the mass stranding event or any evidence of severe infectious disease in the dolphins. The only observed lesions that may be associated with herpesvirus were three cases of balanitis and one penile papilloma. CONCLUSIONS To the authors' knowledge, this is the first report of gammaherpesvirus infection in the central nervous system of odontocete cetaceans. This raises new questions for future studies about how gammaherpesviruses reach the central nervous system and how infection manifests clinically.
Collapse
Affiliation(s)
- Ignacio Vargas-Castro
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain.
| | | | - Belén Rivera-Arroyo
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| | - Rocío Sánchez
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| | | | | | - Manena Fayos
- Centro de Recuperación de Fauna Silvestre de Cantabria, 39690, Santander, Spain.,Tragsatec, 39005, Santander, Spain
| | - Ángel Serdio
- Dirección General de Biodiversidad, Medio Ambiente y Cambio Climático, 39011, Santander, Spain
| | | | - José Manuel Sánchez-Vizcaíno
- VISAVET Center and Animal Health Department, Veterinary School, Complutense University of Madrid, 28040, Madrid, Spain
| |
Collapse
|
30
|
Hanf ZR, Chavez AS. A Comprehensive Multi-Omic Approach Reveals a Relatively Simple Venom in a Diet Generalist, the Northern Short-Tailed Shrew, Blarina brevicauda. Genome Biol Evol 2020; 12:1148-1166. [PMID: 32520994 PMCID: PMC7486961 DOI: 10.1093/gbe/evaa115] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2020] [Indexed: 12/15/2022] Open
Abstract
Animals that use venom to feed on a wide diversity of prey may evolve a complex mixture of toxins to target a variety of physiological processes and prey-defense mechanisms. Blarina brevicauda, the northern short-tailed shrew, is one of few venomous mammals, and is also known to eat evolutionarily divergent prey. Despite their complex diet, earlier proteomic and transcriptomic studies of this shrew's venom have only identified two venom proteins. Here, we investigated with comprehensive molecular approaches whether B. brevicauda venom is more complex than previously understood. We generated de novo assemblies of a B. brevicauda genome and submaxillary-gland transcriptome, as well as sequenced the salivary proteome. Our findings show that B. brevicauda's venom composition is simple relative to their broad diet and is likely limited to seven proteins from six gene families. Additionally, we explored expression levels and rate of evolution of these venom genes and the origins of key duplications that led to toxin neofunctionalization. We also found three proteins that may be involved in endogenous self-defense. The possible synergism of the toxins suggests that vertebrate prey may be the main target of the venom. Further functional assays for all venom proteins on both vertebrate and invertebrate prey would provide further insight into the ecological relevance of venom in this species.
Collapse
Affiliation(s)
- Zachery R Hanf
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University
| | - Andreas S Chavez
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University
- Translational Data Analytics Institute, The Ohio State University
| |
Collapse
|
31
|
Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020; 35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. RESULTS We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. AVAILABILITY AND IMPLEMENTATION All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, Paris, France
| | - Jacques Chomilier
- Sorbonne Université, MNHN, CNRS, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC), BiBiP, Paris, France
| |
Collapse
|
32
|
Müller T, Miladi M, Hutter F, Hofacker I, Will S, Backofen R. The locality dilemma of Sankoff-like RNA alignments. Bioinformatics 2020; 36:i242-i250. [PMID: 32657398 PMCID: PMC7355259 DOI: 10.1093/bioinformatics/btaa431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SA&F approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score. Results In the light of this hypothesis, we study pairwise local SA&F for the first time systematically—based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SA&F. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SA&F. Third, by introducing a position-wise correction term in local SA&F, we constructively solve its principal issues. Availability and implementation The benchmark data, detailed results and scripts are available at https://github.com/BackofenLab/local_alignment. The RNA alignment tool LocARNA, including the modifications proposed in this work, is available at https://github.com/s-will/LocARNA/releases/tag/v2.0.0RC6. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Teresa Müller
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany
| | - Milad Miladi
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany
| | - Frank Hutter
- Machine Learning Lab, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Ivo Hofacker
- Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Vienna, Wien 1090, Austria
| | - Sebastian Will
- Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Vienna, Wien 1090, Austria.,Bioinformatics Group AMIBio, LIX-Laboratoire d'Informatique d'École Polytechnique, IPP, Palaiseau 91120, France
| | - Rolf Backofen
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
33
|
Naznooshsadat E, Elham P, Ali SZ. FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots. Bioinformatics 2020; 36:3662-3668. [PMID: 32170927 DOI: 10.1093/bioinformatics/btaa175] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 02/10/2020] [Accepted: 03/12/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool. RESULTS The results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets. AVAILABILITY AND IMPLEMENTATION The source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Etminan Naznooshsadat
- Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Parvinnia Elham
- Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Sharifi-Zarchi Ali
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
34
|
Bankapur S, Patil N. ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system. J Bioinform Comput Biol 2020; 18:2050005. [PMID: 32372711 DOI: 10.1142/s0219720020500055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model's efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.
Collapse
Affiliation(s)
- Sanjay Bankapur
- Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Manglore 575025, Karnataka, India
| | - Nagamma Patil
- Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Manglore 575025, Karnataka, India
| |
Collapse
|
35
|
Paul L, Mudogo CN, Mtei KM, Machunda RL, Ntie-Kang F. A computer-based approach for developing linamarase inhibitory agents. PHYSICAL SCIENCES REVIEWS 2020. [DOI: 10.1515/psr-2019-0098] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractCassava is a strategic crop, especially for developing countries. However, the presence of cyanogenic compounds in cassava products limits the proper nutrients utilization. Due to the poor availability of structure discovery and elucidation in the Protein Data Bank is limiting the full understanding of the enzyme, how to inhibit it and applications in different fields. There is a need to solve the three-dimensional structure (3-D) of linamarase from cassava. The structural elucidation will allow the development of a competitive inhibitor and various industrial applications of the enzyme. The goal of this review is to summarize and present the available 3-D modeling structure of linamarase enzyme using different computational strategies. This approach could help in determining the structure of linamarase and later guide the structure elucidationin silicoand experimentally.
Collapse
Affiliation(s)
- Lucas Paul
- The Department of Materials and Energy Science & Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
- Department of Chemistry, Dar es Salaam University College of Education, P.O. Box 2329, 255Dar es Salaam, Tanzania
| | - Celestin N. Mudogo
- Biochemistry and Molecularbiology, University of Hamburg Institute of Biochemistry and Molecularbiology, Hamburg, Germany
- Department of Basic Sciences, School of Medicine, University of Kinshasa, Kinshasa, Congo (Democratic Republic of the)
| | - Kelvin M. Mtei
- The Department of Water and Environmental Science and Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
| | - Revocatus L. Machunda
- The Department of Water and Environmental Science and Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
| | - Fidele Ntie-Kang
- Department of Pharmaceutical Chemistry, Martin-Luther University Halle-Wittenberg, Wolfgang-Langenbeck Str. 4, Halle (Saale)06120, Germany
- Department of Informatics and Chemistry, University of Chemistry and Technology Prague, Technická 5, Prague 6, Dejvice 166 28, Czech Republic
- Department of Chemistry, University of Buea, P. O. Box 63Buea, Cameroon
| |
Collapse
|
36
|
Bayegan AH, Clote P. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment. PLoS One 2020; 15:e0227177. [PMID: 31978147 PMCID: PMC6980424 DOI: 10.1371/journal.pone.0227177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 12/13/2019] [Indexed: 11/19/2022] Open
Abstract
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
Collapse
Affiliation(s)
- Amir H. Bayegan
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
- * E-mail:
| |
Collapse
|
37
|
Chandrasekaran M, Lee JM, Ye BM, Jung SM, Kim J, Kim JW, Chun SC. Isolation and Characterization of Avirulent and Virulent Strains of Agrobacterium tumefaciens from Rose Crown Gall in Selected Regions of South Korea. PLANTS (BASEL, SWITZERLAND) 2019; 8:E452. [PMID: 31731525 PMCID: PMC6918265 DOI: 10.3390/plants8110452] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/18/2019] [Accepted: 10/24/2019] [Indexed: 01/22/2023]
Abstract
Agrobacterium tumefaciens is a plant pathogen that causes crown gall disease in various hosts across kingdoms. In the present study, five regions (Wonju, Jincheon, Taean, Suncheon, and Kimhae) of South Korea were chosen to isolate A. tumefaciens strains on roses and assess their opine metabolism (agrocinopine, nopaline, and octopine) genes based on PCR amplification. These isolated strains were confirmed as Agrobacterium using morphological, biochemical, and 16S rDNA analyses; and pathogenicity tests, including the growth characteristics of the white colony appearance on ammonium sulfate glucose minimal media, enzyme activities, 16S rDNA sequence alignment, and pathogenicity on tomato (Solanum lycopersicum). Carbon utilization, biofilm formation, tumorigenicity, and motility assays were performed to demarcate opine metabolism genes. Of 87 isolates, 18 pathogenic isolates were affirmative for having opine plasmid genes. Most of these isolates showed the presence of an agrocinopine type of carbon utilization. Two isolates showed nopaline types. However, none of these isolates showed octopine metabolic genes. The objectives of the present study were to isolate and confirm virulent strains from rose crown galls grown in the different regions of Korea and characterize their physiology and opine types. This is the first report to describe the absence of the octopine type inciting the crown gall disease of rose in South Korea.
Collapse
Affiliation(s)
- Murugesan Chandrasekaran
- Department of Food Science and Biotechnology, Sejong University, Gwangjin-gu, Seoul 05006, Korea;
| | - Jong Moon Lee
- Department of Environmental Health Science, Konkuk University, Gwangjin-gu, Seoul-143 701, Korea; (J.M.L.); (B.-M.Y.); (S.M.J.)
| | - Bee-Moon Ye
- Department of Environmental Health Science, Konkuk University, Gwangjin-gu, Seoul-143 701, Korea; (J.M.L.); (B.-M.Y.); (S.M.J.)
| | - So Mang Jung
- Department of Environmental Health Science, Konkuk University, Gwangjin-gu, Seoul-143 701, Korea; (J.M.L.); (B.-M.Y.); (S.M.J.)
| | - Jinwoo Kim
- Institute of Agriculture & Life Science and Division of Applied Life Science, Gyeongsang National University, Jinju 52828, Korea;
| | - Jin-Won Kim
- Department of Environmental Horticulture, University of Seoul, Seoul 02504, Korea;
| | - Se Chul Chun
- Department of Environmental Health Science, Konkuk University, Gwangjin-gu, Seoul-143 701, Korea; (J.M.L.); (B.-M.Y.); (S.M.J.)
| |
Collapse
|
38
|
Sievers F, Higgins DG. QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction. Bioinformatics 2019; 36:90-95. [PMID: 31292629 PMCID: PMC9881607 DOI: 10.1093/bioinformatics/btz552] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/17/2019] [Accepted: 07/09/2019] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. RESULTS We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. AVAILABILITY AND IMPLEMENTATION QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabian Sievers
- Conway Institute, UCD School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | | |
Collapse
|
39
|
Lambert MÈ, Arsenault J, Delisle B, Audet P, Poljak Z, D'Allaire S. Impact of alignment algorithm on the estimation of pairwise genetic similarity of porcine reproductive and respiratory syndrome virus (PRRSV). BMC Vet Res 2019; 15:135. [PMID: 31068211 PMCID: PMC6505299 DOI: 10.1186/s12917-019-1890-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 04/29/2019] [Indexed: 12/19/2022] Open
Abstract
Background Porcine reproductive and respiratory syndrome (PRRS) is a major threat to the swine industry. It is caused by the PRRS virus (PRRSV). Determination and comparison of the nucleotide sequences of PRRSV strains provides useful information in support of control initiatives or epidemiological studies on transmission patterns. The alignment of sequences is the first step in analyzing sequence data, with multiple algorithms being available, but little is known on the impact of this methodological choice. Here, a study was conducted to evaluate the impact of different alignment algorithms on the resulting aligned sequence dataset and on practical issues when applied to a large field database of PRRSV open reading frame (ORF) 5 sequences collected in Quebec, Canada, from 2010 to 2014. Five multiple sequence alignment programs were compared: Clustal W, Clustal Omega, Muscle, T-Coffee and MAFFT. Results The resulting alignments showed very similar results in terms of average pairwise genetic similarity, proportion of pairwise comparisons having ≥97.5% genetic similarity and sum of pairs (SP) score, except for T-Coffee where increased length of aligned datasets as well as limitation to handle large datasets were observed. Conclusions Based on efficiency at minimizing the number of gaps in different dataset sizes with default open gap values as well as the capability to handle a large number of sequences in a timely manner, the use of Clustal Omega might be recommended for the management of PRRSV extensive database for both research and surveillance purposes.
Collapse
Affiliation(s)
- Marie-Ève Lambert
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada. .,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.
| | - Julie Arsenault
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Benjamin Delisle
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Pascal Audet
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
| | - Sylvie D'Allaire
- Laboratoire d'épidémiologie et de médecine porcine (LEMP), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada.,Swine and Poultry Infectious Diseases Research Center (CRIPA), Faculty of Veterinary Medicine, Université de Montréal, St. Hyacinthe, Quebec, Canada
| |
Collapse
|
40
|
Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019; 68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open
Abstract
The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical coestimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In other words, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities. [BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment.]
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1205 W. Clark St., Urbana, IL 61801, USA.,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
41
|
He J, Zhao H, Cheng Z, Ke Y, Liu J, Ma H. Evolution Analysis of the Fasciclin-Like Arabinogalactan Proteins in Plants Shows Variable Fasciclin-AGP Domain Constitutions. Int J Mol Sci 2019; 20:E1945. [PMID: 31010036 PMCID: PMC6514703 DOI: 10.3390/ijms20081945] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/17/2019] [Accepted: 04/19/2019] [Indexed: 01/03/2023] Open
Abstract
The fasciclin-like arabinogalactan proteins (FLAs) play important roles in plant development and adaptation to the environment. FLAs contain both fasciclin domains and arabinogalactan protein (AGP) regions, which have been identified in several plants. The evolutionary history of this gene family in plants is still undiscovered. In this study, we identified the FLA gene family in 13 plant species covering major lineages of plants using bioinformatics methods. A total of 246 FLA genes are identified with gene copy numbers ranging from one (Chondrus crispus) to 49 (Populus trichocarpa). These FLAs are classified into seven groups, mainly based on the phylogenetic analysis of plant FLAs. All FLAs in land plants contain one or two fasciclin domains, while in algae, several FLAs contain four or six fasciclin domains. It has been proposed that there was a divergence event, represented by the reduced number of fasciclin domains from algae to land plants in evolutionary history. Furthermore, introns in FLA genes are lost during plant evolution, especially from green algae to land plants. Moreover, it is found that gene duplication events, including segmental and tandem duplications are essential for the expansion of FLA gene families. The duplicated gene pairs in FLA gene family mainly evolve under purifying selection. Our findings give insight into the origin and expansion of the FLA gene family and help us understand their functions during the process of evolution.
Collapse
Affiliation(s)
- Jiadai He
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Hua Zhao
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Zhilu Cheng
- College of Landscape Architecture and Arts, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Yuwei Ke
- College of Life Sciences, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Jiaxi Liu
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| | - Haoli Ma
- College of Agronomy, Northwest A&F University, Xianyang 712100, Shaanxi, China.
| |
Collapse
|
42
|
Exploring the sequence, function, and evolutionary space of protein superfamilies using sequence similarity networks and phylogenetic reconstructions. Methods Enzymol 2019; 620:315-347. [PMID: 31072492 DOI: 10.1016/bs.mie.2019.03.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions by enabling the observation and investigation of complex sequence-structure-function and evolutionary relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) and phylogenetic reconstructions to map the functional divergence and evolutionary history of protein superfamilies. We exemplify this approach using the nitroreductase (NTR) flavoenzyme superfamily, demonstrating that SSN investigations can provide a rapid and effective means to classify groups of proteins, expose sequence similarity relationships across the global scale of a protein superfamily, and efficiently support detailed phylogenetic analyses. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes, their evolution, and their associated physiological roles.
Collapse
|
43
|
Ashkenazy H, Sela I, Levy Karin E, Landan G, Pupko T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst Biol 2019; 68:117-130. [PMID: 29771363 PMCID: PMC6657586 DOI: 10.1093/sysbio/syy036] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| | - Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
- Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Giddy Landan
- Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| |
Collapse
|
44
|
Abstract
Background Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This strategy could avoid the influence brought by different clustering methods thus make results more dependable. Results Results showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results. Conclusions These results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results. Electronic supplementary material The online version of this article (10.1186/s12859-018-2524-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yingying Wang
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China
| | - Hongyan Wu
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China.
| | - Yunpeng Cai
- Research Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
45
|
Abstract
Pervasive application of CRISPR-Cas systems in genome editing has prompted an increase in both interest and necessity to further elucidate existing systems as well as discover putative novel systems. The ubiquity and power of current computational platforms have made in silico approaches to CRISPR-Cas identification and characterization accessible to a wider audience and increasingly amenable for processing extensive data sets. Here, we describe in silico methods for predicting and visualizing notable features of CRISPR-Cas systems, including Cas domain determination, CRISPR array visualization, and inference of the protospacer-adjacent motif. The efficiency of these tools enables rapid exploration of CRISPR-Cas diversity across prokaryotic genomes and supports scalable analysis of large genomic data sets.
Collapse
Affiliation(s)
- Matthew A Nethery
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, NC, United States; Department of Food, Bioprocessing & Nutrition Sciences, North Carolina State University, Raleigh, NC, United States
| | - Rodolphe Barrangou
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, NC, United States; Department of Food, Bioprocessing & Nutrition Sciences, North Carolina State University, Raleigh, NC, United States.
| |
Collapse
|
46
|
Dijkstra M, Bawono P, Abeln S, Feenstra KA, Fokkink W, Heringa J. Motif-Aware PRALINE: Improving the alignment of motif regions. PLoS Comput Biol 2018; 14:e1006547. [PMID: 30383764 PMCID: PMC6233922 DOI: 10.1371/journal.pcbi.1006547] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 11/13/2018] [Accepted: 10/05/2018] [Indexed: 11/21/2022] Open
Abstract
Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems. The most important functional parts of proteins are often small—but very specific—sequence motifs. Moreover, these motifs tend to be strongly conserved during evolution due to their functional role. Nevertheless, when trying to align protein sequences of the same family, it is often very difficult to align such motifs using standard multiple sequence alignment methods. Aligning functional residues correctly is essential to detect motif conservation, which can be used to filter out spuriously occurring motifs. Additionally, many downstream analyses, such as phylogenetics, are strongly reliant on alignment quality. We have developed a sequence alignment program named Motif-Aware PRALINE (MA-PRALINE) that incorporates information about motifs explicitly. Motifs are provided to MA-PRALINE in the PROSITE pattern syntax; it then scans the input sequences for instances of the pattern and provides a score bonus to matching sequence positions. Our method provides a reproducible alternative to editing alignments by hand in order to account for motif conservation, which is a tedious and error-prone process. We will show that MA-PRALINE allows the alignment of motif-rich regions to be fine-tuned while not degrading the rest of the alignment. MA-PRALINE is available on GitHub as open source software; this allows it to be easily tailored to similar problems. We apply MA-PRALINE on the HIV-1 envelope glycoprotein (gp120) to get an improved alignment of the N-terminal glycosylation motifs. The presence of these motifs is essential for the virus in evading the immune response of the host.
Collapse
Affiliation(s)
- Maurits Dijkstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| | - Punto Bawono
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Wan Fokkink
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
47
|
Chaabane L. A hybrid solver for protein multiple sequence alignment problem. J Bioinform Comput Biol 2018; 16:1850015. [DOI: 10.1142/s0219720018500154] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this work, a novel hybrid model called PSOSA for solving multiple sequence alignment (MSA) problem is proposed. The developed approach is a combination between particle swarm optimization (PSO) algorithm and simulated annealing (SA) technique. In our PSOSA approach, PSO is exploited in global search, but it is easily trapped into local optimum and may lead to premature convergence. SA is incorporated as local improvement approach to overcome local optimum problem and intensify the search in local regions to improve solution quality. Numerical results on BAliBASE benchmark have shown the effectiveness of the proposed method and its ability to achieve good quality solutions when compared with those given by other existing methods.
Collapse
Affiliation(s)
- Lamiche Chaabane
- Department of Computer Science, Mohamed Boudiaf University, BP. 166 M’sila 28000, Algeria
| |
Collapse
|
48
|
|
49
|
Nethery MA, Barrangou R. CRISPR Visualizer: rapid identification and visualization of CRISPR loci via an automated high-throughput processing pipeline. RNA Biol 2018; 16:577-584. [PMID: 30130453 DOI: 10.1080/15476286.2018.1493332] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
A CRISPR locus, defined by an array of repeat and spacer elements, constitutes a genetic record of the ceaseless battle between bacteria and viruses, showcasing the genomic integration of spacers acquired from invasive DNA. In particular, iterative spacer acquisitions represent unique evolutionary histories and are often useful for high-resolution bacterial genotyping, including comparative analysis of closely related organisms, clonal lineages, and clinical isolates. Current spacer visualization methods are typically tedious and can require manual data manipulation and curation, including spacer extraction at each CRISPR locus from genomes of interest. Here, we constructed a high-throughput extraction pipeline coupled with a local web-based visualization tool which enables CRISPR spacer and repeat extraction, rapid visualization, graphical comparison, and progressive multiple sequence alignment. We present the bioinformatic pipeline and investigate the loci of reference CRISPR-Cas systems and model organisms in 4 well-characterized subtypes. We illustrate how this analysis uncovers the evolutionary tracks and homology shared between various organisms through visual comparison of CRISPR spacers and repeats, driven through progressive alignments. Due to the ability to process unannotated genome files with minimal preparation and curation, this pipeline can be implemented promptly. Overall, this efficient high-throughput solution supports accelerated analysis of genomic data sets and enables and expedites genotyping efforts based on CRISPR loci.
Collapse
Affiliation(s)
- Matthew A Nethery
- a Genomic Sciences Graduate Program , North Carolina State University , Raleigh , NC , USA.,b Department of Food, Bioprocessing & Nutrition Sciences , North Carolina State University , Raleigh , NC , USA
| | - Rodolphe Barrangou
- a Genomic Sciences Graduate Program , North Carolina State University , Raleigh , NC , USA.,b Department of Food, Bioprocessing & Nutrition Sciences , North Carolina State University , Raleigh , NC , USA
| |
Collapse
|
50
|
Orlando G, Raimondi D, Khan T, Lenaerts T, Vranken WF. SVM-dependent pairwise HMM: an application to protein pairwise alignments. Bioinformatics 2018; 33:3902-3908. [PMID: 28666322 DOI: 10.1093/bioinformatics/btx391] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 06/12/2017] [Indexed: 12/27/2022] Open
Abstract
Motivation Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Results Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. Availability and implementation A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. Contact wim.vranken@vub.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB.,Structural Machine Learning Group, Université Libre de Bruxelles
| | - Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB.,Structural Machine Learning Group, Université Libre de Bruxelles
| | - Taushif Khan
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Machine Learning Group, Université Libre de Bruxelles.,Artificial Intelligence Lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB
| |
Collapse
|