1
|
Lorenzi JN, Graner F, Courtier-Orgogozo V, Achaz G. CNCA aligns small annotated genomes. BMC Bioinformatics 2024; 25:89. [PMID: 38424511 PMCID: PMC10905818 DOI: 10.1186/s12859-024-05700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 02/12/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-CoV-2 using both coding and non-coding sequences. To our knowledge, there is no tool that can be used to construct this type of alignment, which motivated the creation of CNCA. RESULTS CNCA is a web tool that aligns annotated genomes from GenBank files. It generates a nucleotide alignment that is then updated based on the protein sequence alignment. The output final nucleotide alignment matches the protein alignment and guarantees no frameshift. CNCA was designed to align closely related small genome sequences up to 50 kb (typically viruses) for which the gene order is conserved. CONCLUSIONS CNCA constructs multiple alignments of small genomes by integrating both coding and non-coding sequences. This preserves regions traditionally ignored in conventional back-translation methods, such as non-coding regions.
Collapse
Affiliation(s)
- Jean-Noël Lorenzi
- Université Paris Cité, Paris, France.
- CNRS, Institut Jacques Monod, 75013, Paris, France.
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, 75006, Paris, France.
| | - François Graner
- Université Paris Cité, Paris, France
- CNRS, Matière Et Systèmes Complexes, 75013, Paris, France
| | | | - Guillaume Achaz
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, 75006, Paris, France
| |
Collapse
|
2
|
Zhou H, Piñeiro Llanes J, Sarntinoranont M, Subhash G, Simmons CS. Label-free quantification of soft tissue alignment by polarized Raman spectroscopy. Acta Biomater 2021; 136:363-374. [PMID: 34537413 DOI: 10.1016/j.actbio.2021.09.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/24/2021] [Accepted: 09/09/2021] [Indexed: 11/29/2022]
Abstract
The organization of proteins is an important determinant of functionality in soft tissues. However, such organization is difficult to monitor over time in soft tissue with complex compositions. Here, we establish a method to determine the alignment of proteins in soft tissues of varying composition by polarized Raman spectroscopy (PRS). Unlike most conventional microscopy methods, PRS leverages non-destructive, label-free sample preparation. PRS data from highly aligned muscle layers were utilized to derive a weighting function for aligned proteins via principal component analysis (PCA). This trained weighting function was used as a master loading function to calculate a principal component score (PC1 Score) as a function of polarized angle for tendon, dermis, hypodermis, and fabricated collagen gels. Since the PC1 Score calculated at arbitrary angles was insufficient to determine level of alignment, we developed an Amplitude Alignment Metric by fitting a sine function to PC1 Score with respect to polarized angle. We found that our PRS-based Amplitude Alignment Metric can be used as an indicator of level of protein alignment in soft tissues in a non-destructive manner with label-free preparation and has similar discriminatory capacity among isotropic and anisotropic samples compared to microscopy-based image processing method. This PRS method does not require a priori knowledge of sample orientation nor composition and appears insensitive to changes in protein composition among different tissues. The Amplitude Alignment Metric introduced here could enable convenient and adaptable evaluation of protein alignment in soft tissues of varying protein and cell composition. STATEMENT OF SIGNIFICANCE: Polarized Raman spectroscopy (PRS) has been used to characterize the of organization of soft tissues. However, most of the reported applications of PRS have been on collagen-rich tissues and reliant on intensities of collagen-related vibrations. This work describes a PRS method via a multivariate analysis to characterize alignment in soft tissues composed of varying proteins. Of note, the highly aligned muscle layer of mouse skin was used to train a master function then applied to other soft tissue samples, and the degree of anisotropy in the PRS response was evaluated to obtain the level of alignment in tissues. We have demonstrated that this method supports convenient and adaptable evaluation of protein alignment in soft tissues of varying protein and cell composition.
Collapse
Affiliation(s)
- Hui Zhou
- Department of Mechanical and Aerospace Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, USA
| | - Janny Piñeiro Llanes
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, USA
| | - Malisa Sarntinoranont
- Department of Mechanical and Aerospace Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, USA
| | - Ghatu Subhash
- Department of Mechanical and Aerospace Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, USA
| | - Chelsey S Simmons
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, USA.
| |
Collapse
|
3
|
Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 2019; 20:473. [PMID: 31521110 PMCID: PMC6744700 DOI: 10.1186/s12859-019-3019-7] [Citation(s) in RCA: 488] [Impact Index Per Article: 97.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 08/02/2019] [Indexed: 01/06/2023] Open
Abstract
Background HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. Results We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite. Conclusion The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.
Collapse
Affiliation(s)
- Martin Steinegger
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.,Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Markus Meier
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany
| | - Milot Mirdita
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany
| | - Harald Vöhringer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.,European Bioinformatics Institute, Cambridge, CB10 1SD, United Kingdom
| | | | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, Munich, 81379, Germany.
| |
Collapse
|
4
|
Ghavami S, Toozandehjani H, Ghavami G, Sardari S. Innovative protein translation into music and color image applicable for assessing protein alignment based on bio-mimicking human perception system. Int J Biol Macromol 2018; 119:896-901. [PMID: 30076932 DOI: 10.1016/j.ijbiomac.2018.07.185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 07/11/2018] [Accepted: 07/29/2018] [Indexed: 11/17/2022]
Abstract
One of the valuable bioinformatics techniques is protein sequence alignment which is a method of searching, comparing and ordering the sequences of protein. The pointed method is employed to recognize region of similarity which may be a significance of functional, structural, or evolutionary relatives between the protein sequences. In current investigation, an innovative similarity search/alignment algorithm for pattern recognition process of protein structures in the frame of bio-mimicking pattern recognition capabilities of human visual and auditory systems towards investigating more and more novel approaches in the field of protein sequence alignment procedure. The selected approach in current investigation based on the concept of intra scientific facts and using both capabilities of bioinformatics and psychological knowledge led to present the unique automatic translational system (ATS-P) for translating protein structures to musical composition in addition to image color combination towards finalizing innovative pattern and method for protein alignment. Actually during current study, the perception of protein sequence via visual and sonic representation was projected to support researchers in the process of protein pattern recognition and structural demonstrating. In the other word, this presented algorithm confirmed their properties by bio-mimicking of developed visual and auditory perception systems can progress proficient trend to assist protein relevant scientists towards successful protein alignment procedure.
Collapse
Affiliation(s)
- Setareh Ghavami
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran; Department of Psychology, School of Humanities, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
| | - Hassan Toozandehjani
- Department of Psychology, School of Humanities, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
| | - Ghazaleh Ghavami
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran
| | - Soroush Sardari
- Drug Design and Bioinformatics Unit, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran 13164, Iran.
| |
Collapse
|
5
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|