1
|
Kadoya SS, Maeda H, Katayama H. Correspondence of SARS-CoV-2 genomic sequences obtained from wastewater samples and COVID-19 patient at long-term care facilities. Sci Total Environ 2024; 916:170103. [PMID: 38232855 DOI: 10.1016/j.scitotenv.2024.170103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/07/2024] [Accepted: 01/09/2024] [Indexed: 01/19/2024]
Abstract
Wastewater-based epidemiology (WBE) has been in the spotlight because of applicability of early detection of virus outbreak and new variants at the catchment area. However, there has been a notable absence of research directly confirming the association between SARS-CoV-2 in wastewater and patient specimens. In this study, we performed a quantitative and qualitative investigation with a genetic-level comparison of SARS-CoV-2 between COVID-19 patients and SARS-CoV-2 positive wastewater samples at long-term care facilities. Wastewater samples were collected via passive sampling from manholes, and SARS-CoV-2 load in wastewater was determined by qPCR. We performed correlation analysis between SARS-CoV-2 load and COVID-19 case number, which suggested that SARS-CoV-2 was detected from wastewater earlier than ascertainment of COVID-19 case. Six and six RNA samples from COVID-19 positive cases and wastewater, respectively, from two facilities were then applied for amplicon sequencing analysis. Mutation analysis revealed high sequence similarity of SARS-CoV-2 variants between wastewater and patient samples (>99 %). To the best of our knowledge, this is the first study demonstrating that WBE is also effective in predicting predominant SARS-CoV-2 variant at facility-level, which is helpful to develop early-warning system for outbreak occurrence with predominant variant.
Collapse
Affiliation(s)
- Syun-Suke Kadoya
- Department of Urban Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Hideo Maeda
- Kita City Public Health Center, 2-7-3 Higashijujo, Kita-ku, Tokyo 114-0001, Japan
| | - Hiroyuki Katayama
- Department of Urban Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.
| |
Collapse
|
2
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
3
|
Sriraja LO, Werhli A, Petsalaki E. Phosphoproteomics data-driven signalling network inference: Does it work? Comput Struct Biotechnol J 2023; 21:432-43. [PMID: 36618990 DOI: 10.1016/j.csbj.2022.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/16/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
The advent of global phosphoproteome profiling has led to wide phosphosite coverage and therefore the opportunity to predict kinase-substrate associations from these datasets. However, the regulatory kinase is unknown for most substrates, due to biased and incomplete database annotations. In this study we compare the performance of six pairwise measures to predict kinase-substrate associations using a data driven approach on publicly available time resolved and perturbation mass spectrometry-based phosphoproteome data. First, we validated the performance of these measures using as a reference both a literature-based phosphosite-specific protein interaction network and a predicted kinase-substrate (KS) interactions set. The overall performance in predicting kinase-substrate associations using pairwise measures across both these reference sets was poor. To expand into the wider interactome space, we applied the approach on a network comprising pairs of substrates regulated by the same kinase (substrate-substrate associations) but found the performance to be equally poor. However, the addition of a sequence similarity filter for substrate-substrate associations led to a significant boost in performance. Our findings imply that the use of a filter to reduce the search space, such as a sequence similarity filter, can be used prior to the application of network inference methods to reduce noise and boost the signal. We also find that the current gold standard for reference sets is not adequate for evaluation as it is limited and context-agnostic. Therefore, there is a need for additional evaluation methods that have increased coverage and take into consideration the context-specific nature of kinase-substrate associations.
Collapse
|
4
|
Abstract
BACKGROUND Protein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases. METHODS This paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations. RESULTS Wei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, Fmax scores, and Smin scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes. AVAILABILITY Wei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO.
Collapse
Affiliation(s)
- Maarten J.M.F. Reijnders
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
5
|
Barreto C, Silva A, Wiech E, Lopez A, San A, Singh S. Proteomic Tools for the Analysis of Cytoskeleton Proteins. Methods Mol Biol 2022; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
6
|
Abstract
BACKGROUND High throughput experiments have generated a significantly large amount of protein interaction data, which is being used to study protein networks. Studying complete protein networks can reveal more insight about healthy/disease states than studying proteins in isolation. Similarly, a comparative study of protein-protein interaction (PPI) networks of different species reveals important insights which may help in disease analysis and drug design. The study of PPI network alignment can also helps in understanding the different biological systems of different species. It can also be used in transfer of knowledge across different species. Different aligners have been introduced in the last decade but developing an accurate and scalable global alignment algorithm that can ensures the biological significance alignment is still challenging. RESULTS This paper presents a novel global pairwise network alignment algorithm, SAlign, which uses topological and biological information in the alignment process. The proposed algorithm incorporates sequence and structural information for computing biological scores, whereas previous algorithms only use sequence information. The alignment based on the proposed technique shows that the combined effect of structure and sequence results in significantly better pairwise alignments. We have compared SAlign with state-of-art algorithms on the basis of semantic similarity of alignment and the number of aligned nodes on multiple PPI network pairs. The results of SAlign on the network pairs which have high percentage of proteins with available structure are 3-63% semantically better than all existing techniques. Furthermore, it also aligns 5-14% more nodes of these network pairs as compared to existing aligners. The results of SAlign on other PPI network pairs are comparable or better than all existing techniques. We also introduce [Formula: see text], a Monte Carlo based alignment algorithm, that produces multiple network alignments with similar semantic similarity. This helps the user to pick biologically meaningful alignments. CONCLUSION The proposed algorithm has the ability to find the alignments that are more biologically significant/relevant as compared to the alignments of existing aligners. Furthermore, the proposed method is able to generate alternate alignments that help in studying different genes/proteins of the specie.
Collapse
Affiliation(s)
- Umair Ayub
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan.,Computational Biology Research Lab, Islamabad, 40100, Pakistan
| | - Imran Haider
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan.,Computational Biology Research Lab, Islamabad, 40100, Pakistan
| | - Hammad Naveed
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan. .,Computational Biology Research Lab, Islamabad, 40100, Pakistan.
| |
Collapse
|
7
|
Ehsani S. COVID-19 and iron dysregulation: distant sequence similarity between hepcidin and the novel coronavirus spike glycoprotein. Biol Direct 2020; 15:19. [PMID: 33066821 PMCID: PMC7563913 DOI: 10.1186/s13062-020-00275-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 10/08/2020] [Indexed: 12/20/2022] Open
Abstract
The spike glycoprotein of the SARS-CoV-2 virus, which causes COVID-19, has attracted attention for its vaccine potential and binding capacity to host cell surface receptors. Much of this research focus has centered on the ectodomain of the spike protein. The ectodomain is anchored to a transmembrane region, followed by a cytoplasmic tail. Here we report a distant sequence similarity between the cysteine-rich cytoplasmic tail of the coronavirus spike protein and the hepcidin protein that is found in humans and other vertebrates. Hepcidin is thought to be the key regulator of iron metabolism in humans through its inhibition of the iron-exporting protein ferroportin. An implication of this preliminary observation is to suggest a potential route of investigation in the coronavirus research field making use of an already-established literature on the interplay of local and systemic iron regulation, cytokine-mediated inflammatory processes, respiratory infections and the hepcidin protein. The question of possible homology and an evolutionary connection between the viral spike protein and hepcidin is not assessed in this report, but some scenarios for its study are discussed.
Collapse
Affiliation(s)
- Sepehr Ehsani
- Theoretical and Philosophical Biology, Department of Philosophy, University College London, Bloomsbury, London, WC1E 6BT, UK.
- Ronin Institute for Independent Scholarship, Montclair, NJ, 07043, USA.
| |
Collapse
|
8
|
Das JK, Sengupta A, Choudhury PP, Roy S. Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis. Gene 2021; 766:145096. [PMID: 32919006 DOI: 10.1016/j.gene.2020.145096] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 08/16/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022]
Abstract
The phylogenetic analysis based on sequence similarity targeted to real biological taxa is one of the major challenging tasks. In this paper, we propose a novel alignment-free method, CoFASA (Codon Feature based Amino acid Sequence Analyser), for similarity analysis of nucleotide sequences. At first, we assign numerical weights to the four nucleotides. We then calculate a score of each codon based on the numerical value of the constituent nucleotides, termed as degree of codons. Accordingly, we obtain the degree of each amino acid based on the degree of codons targeted towards a specific amino acid. Utilizing the degree of twenty amino acids and their relative abundance within a given sequence, we generate 20-dimensional features for every coding DNA sequence or protein sequence. We use the features for performing phylogenetic analysis of the set of candidate sequences. We use multiple protein sequences derived from Beta-globin (BG), NADH dehydrogenase subunit 5 (ND5), Transferrins (TFs), Xylanases, low identity (<40%) and high identity (⩾40%) protein sequences (encompassing 533 and 1064 protein families) for experimental assessments. We compare our results with sixteen (16) well-known methods, including both alignment-based and alignment-free methods. Various assessment indices are used, such as the Pearson correlation coefficient, RF (Robinson-Foulds) distance and ROC score for performance analysis. While comparing the performance of CoFASA with alignment-based methods (ClustalW, ClustalΩ, MAFFT, and MUSCLE), it shows very similar results. Further, CoFASA shows better performance in comparison to well-known alignment-free methods, including LZW-Kernal, jD2Stat, FFP, spaced, and AFKS-D2s in predicting taxonomic relationship among candidate taxa. Overall, we observe that the features derived by CoFASA are very much useful in isolating the sequences according to their taxonomic labels. While our method is cost-effective, at the same time, produces consistent and satisfactory outcomes.
Collapse
|
9
|
Thakkar N, Bailey-Kellogg C. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC Bioinformatics 2019; 20:241. [PMID: 31092185 PMCID: PMC6521430 DOI: 10.1186/s12859-019-2864-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/29/2019] [Indexed: 12/18/2022] Open
Abstract
Background Repertoire sequencing is enabling deep explorations into the cellular immune response, including the characterization of commonalities and differences among T cell receptor (TCR) repertoires from different individuals, pathologies, and antigen specificities. In seeking to understand the generality of patterns observed in different groups of TCRs, it is necessary to balance how well each pattern represents the diversity among TCRs from one group (sensitivity) vs. how many TCRs from other groups it also represents (specificity). The variable complementarity determining regions (CDRs), particularly the third CDRs (CDR3s) interact with major histocompatibility complex (MHC)-presented epitopes from putative antigens, and thus encode the determinants of recognition. Results We here systematically characterize the predictive power that can be obtained from CDR3 sequences, using representative, readily interpretable methods for evaluating CDR sequence similarity and then clustering and classifying sequences based on similarity. An initial analysis of CDR3s of known structure, clustered by structural similarity, helps calibrate the limits of sequence diversity among CDRs that might have a common mode of interaction with presented epitopes. Subsequent analyses demonstrate that this same range of sequence similarity strikes a favorable specificity/sensitivity balance in distinguishing twins from non-twins based on overall CDR3 repertoires, classifying CDR3 repertoires by antigen specificity, and distinguishing general pathologies. Conclusion We conclude that within a fairly broad range of sequence similarity, matching CDR3 sequences are likely to share specificities. Electronic supplementary material The online version of this article (10.1186/s12859-019-2864-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Neerja Thakkar
- Department of Computer Science, Dartmouth, Hanover, NH, USA
| | | |
Collapse
|
10
|
Cho M, Son HS. Prediction of cross-species infection propensities of viruses with receptor similarity. Infect Genet Evol 2019; 73:71-80. [PMID: 31026604 PMCID: PMC7106226 DOI: 10.1016/j.meegid.2019.04.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Revised: 02/21/2019] [Accepted: 04/19/2019] [Indexed: 11/16/2022]
Abstract
Studies of host factors that affect susceptibility to viral infections have led to the possibility of determining the risk of emerging infections in potential host organisms. In this study, we constructed a computational framework to estimate the probability of virus transmission between potential hosts based on the hypothesis that the major barrier to virus infection is differences in cell-receptor sequences among species. Information regarding host susceptibility to virus infection was collected to classify the cross-species infection propensity between hosts. Evolutionary divergence matrices and a sequence similarity scoring program were used to determine the distance and similarity of receptor sequences. The discriminant analysis was validated with cross-validation methods. The results showed that the primary structure of the receptor protein influences host susceptibility to cross-species viral infections. Pair-wise distance, relative distance, and sequence similarity showed the best accuracy in identifying the susceptible group. Based on the results of the discriminant analysis, we constructed ViCIPR (http://lcbb3.snu.ac.kr/ViCIPR/home.jsp), a server-based tool to enable users to easily extract the cross-species infection propensities of specific viruses using a simple two-step procedure. Our sequence-based approach suggests that it may be possible to identify virus transmission between hosts without requiring complex structural analysis. Due to a lack of available data, this method is limited to viruses whose receptor use has been determined. However, the significant accuracy of predictive variables that positively and negatively influence virus transmission suggests that this approach could be improved with further analysis of receptor sequences.
Collapse
Affiliation(s)
- Myeongji Cho
- Laboratory of Computational Biology & Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National Uniersity, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Hyeon Seok Son
- Laboratory of Computational Biology & Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National Uniersity, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea; Interdisciplinary Graduate Program in Bioinformatics, College of Natural Science, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea.
| |
Collapse
|
11
|
Abstract
BACKGROUND Identifying protein-protein interactions (PPIs) is of paramount importance for understanding cellular processes. Machine learning-based approaches have been developed to predict PPIs, but the effectiveness of these approaches is unsatisfactory. One major reason is that they randomly choose non-interacting protein pairs (negative samples) or heuristically select non-interacting pairs with low quality. RESULTS To boost the effectiveness of predicting PPIs, we propose two novel approaches (NIP-SS and NIP-RW) to generate high quality non-interacting pairs based on sequence similarity and random walk, respectively. Specifically, the known PPIs collected from public databases are used to generate the positive samples. NIP-SS then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins to construct the negative dataset. NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network, and then selects protein pairs not connected in the updated network as negative samples. Next, we use auto covariance (AC) descriptor to encode the feature information of amino acid sequences. After that, we employ deep neural networks (DNNs) to predict PPIs based on extracted features, positive and negative examples. Extensive experiments show that NIP-SS and NIP-RW can generate negative samples with higher quality than existing strategies and thus enable more accurate prediction. CONCLUSIONS The experimental results prove that negative datasets constructed by NIP-SS and NIP-RW can reduce the bias and have good generalization ability. NIP-SS and NIP-RW can be used as a plugin to boost the effectiveness of PPIs prediction. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NIP .
Collapse
Affiliation(s)
- Long Zhang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| |
Collapse
|
12
|
Boone K, Camarda K, Spencer P, Tamerler C. Antimicrobial peptide similarity and classification through rough set theory using physicochemical boundaries. BMC Bioinformatics 2018; 19:469. [PMID: 30522443 PMCID: PMC6282327 DOI: 10.1186/s12859-018-2514-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 11/20/2018] [Indexed: 01/09/2023] Open
Abstract
Background Antimicrobial peptides attract considerable interest as novel agents to combat infections. Their long-time potency across bacteria, viruses and fungi as part of diverse innate immune systems offers a solution to overcome the rising concerns from antibiotic resistance. With the rapid increase of antimicrobial peptides reported in the databases, peptide selection becomes a challenge. We propose similarity analyses to describe key properties that distinguish between active and non-active peptide sequences building upon the physicochemical properties of antimicrobial peptides. We used an iterative supervised machine learning approach to classify active peptides from inactive peptides with low false discovery rates in a relatively short computational search time. Results By generating explicit boundaries, our method defines new categories of active and inactive peptides based on their physicochemical properties. Consequently, it describes physicochemical characteristics of similarity among active peptides and the physicochemical boundaries between active and inactive peptides in a single process. To build the similarity boundaries, we used the rough set theory approach; to our knowledge, this is the first time that this approach has been used to classify peptides. The modified rough set theory method limits the number of values describing a boundary to a user-defined limit. Our method is optimized for specificity over selectivity. Noting that false positives increase activity assays while false negatives only increase computational search time, our method provided a low false discovery rate. Published datasets were used to compare our rough set theory method to other published classification methods and based on this comparison, we achieved high selectivity and comparable sensitivity to currently available methods. Conclusions We developed rule sets that define physicochemical boundaries which allow us to directly classify the active sequences from inactive peptides. Existing classification methods are either sequence-order insensitive or length-dependent, whereas our method generates the rule sets that combine order-sensitive descriptors with length-independent descriptors. The method provides comparable or improved performance to currently available methods. Discovering the boundaries of physicochemical properties may lead to a new understanding of peptide similarity. Electronic supplementary material The online version of this article (10.1186/s12859-018-2514-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kyle Boone
- Bioengineering Program, Institute of Bioengineering Research, University of Kansas, Learned Hall, Room 5109, 1530 W 15th Street, Lawrence, KS, 66045, USA
| | - Kyle Camarda
- Chemical and Petroleum Engineering Department, University of Kansas, Learned Hall, Room 4154, 1530 West 15th Street, Lawrence, KS, 66045, USA
| | - Paulette Spencer
- Mechanical Engineering Department, Bioengineering Program, Institute of Bioengineering Research, University of Kansas, Learned Hall, Room 3111, 1530 West 15th Street, Lawrence, KS, 66045, USA
| | - Candan Tamerler
- Mechanical Engineering Department, Bioengineering Program, Institute of Bioengineering Research, University of Kansas, Learned Hall, Room 3135A, 1530 W 15th St, Lawrence, KS, 66045, USA.
| |
Collapse
|
13
|
Malysh JM, Vorontsova YL, Glupov VV, Tsarev AA, Tokarev YS. Vairimorpha ephestiae is a synonym of Vairimorpha necatrix (Opisthosporidia: Microsporidia) based on multilocus sequence analysis. Eur J Protistol 2018; 66:63-67. [PMID: 30145519 DOI: 10.1016/j.ejop.2018.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 08/12/2018] [Accepted: 08/13/2018] [Indexed: 11/29/2022]
Abstract
An isolate of the microsporidium Vairimorpha ephestiae (originally isolated from Ephestia kühniella) from collection of Prof. J. Weiser was propagated in a laboratory culture of Galleria mellonella. Only disporoblastic sporogony was observed and formation of octospores, characteristic of the genus Vairimorpha, never occurred. A partial nucleotide sequence of the small subunit rRNA gene (1247 bp) for this microsporidium showed 100% identity to the homologous sequences of Vairimorpha (Nosema) necatrix (Genbank accession # U11051 and # DQ996241), a microsporidium with a broad host range within the Lepidoptera. Sequence similarity of protein-coding genes (RPB1, HSP70 and actin) between V. ephestiae and V. necatrix was about 98-100%. The level of genetic polymorphism in the RPB1 locus between these two species was essentially the same as between isolates of V. necatrix. It is therefore concluded that V. ephestiae is in fact an isolate of V. necatrix and the former species should be synonymized with the latter. Though described later, V. necatrix has prevailing usage and its precedence over V. ephestiae is proposed to conserve stability and avoid confusion.
Collapse
Affiliation(s)
- Julia M Malysh
- All-Russian Institute of Plant Protection, Podbelskogo 3, Pushkin, St. Petersburg 196068, Russia
| | - Yana L Vorontsova
- Institute of Systematics and Ecology of Animals SB RAS, Frunze 11, Novosibirsk 630091, Russia
| | - Viktor V Glupov
- Institute of Systematics and Ecology of Animals SB RAS, Frunze 11, Novosibirsk 630091, Russia
| | - Alexander A Tsarev
- All-Russian Institute of Plant Protection, Podbelskogo 3, Pushkin, St. Petersburg 196068, Russia
| | - Yuri S Tokarev
- All-Russian Institute of Plant Protection, Podbelskogo 3, Pushkin, St. Petersburg 196068, Russia.
| |
Collapse
|
14
|
Lin J, Wei J, Adjeroh D, Jiang BH, Jiang Y. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform. BMC Bioinformatics 2018; 19:165. [PMID: 29720081 PMCID: PMC5930706 DOI: 10.1186/s12859-018-2155-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 04/11/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. RESULTS A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. CONCLUSIONS Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
Collapse
Affiliation(s)
- Jie Lin
- College of Mathematics and Informatics, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Jing Wei
- College of Mathematics and Informatics, Fujian Normal University, Fuzhou, 350108, People's Republic of China
| | - Donald Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, 26506, WV, USA
| | - Bing-Hua Jiang
- Department of Pathology, University of Iowa, Iowa city, 52242, Iowa, USA
| | - Yue Jiang
- College of Mathematics and Informatics, Fujian Normal University, Fuzhou, 350108, People's Republic of China.
| |
Collapse
|
15
|
Abstract
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.
Collapse
Affiliation(s)
- Won Cheol Yim
- Department of Biochemistry and Molecular Biology, University of Nevada-Reno, Reno, NV, United States of America
| | - John C Cushman
- Department of Biochemistry and Molecular Biology, University of Nevada-Reno, Reno, NV, United States of America
| |
Collapse
|
16
|
Rinčić M, Iourov IY, Liehr T. Thoughts about SLC16A2, TSIX and XIST gene like sites in the human genome and a potential role in cellular chromosome counting. Mol Cytogenet 2016; 9:56. [PMID: 27504142 DOI: 10.1186/s13039-016-0271-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 07/25/2016] [Indexed: 12/12/2022] Open
Abstract
Background Chromosome counting is a process in which cells determine somehow their intrinsic chromosome number(s). The best-studied cellular mechanism that involves chromosome counting is ‘chromosome-kissing’ and X-chromosome inactivation (XCI) mechanism. It is necessary for the well-known dosage compensation between the genders in mammals to balance the number of active X-chromosomes (Xa) with regard to diploid set of autosomes. At the onset of XCI, two X-chromosomes are coming in close proximity and pair physically by a specific segment denominated X-pairing region (Xpr) that involves the SLC16A2 gene. Results An Ensembl BLAST search for human and mouse SLC16A2/Slc16a2 homologues revealed, that highly similar sequences can be found at almost each chromosome in the corresponding genomes. Additionally, a BLAST search for SLC16A2/TSIX/XIST (genes responsible for XCI) reveled that “SLC16A2/TSIX/XIST like sequences” cover equally all chromosomes, too. With respect to this we provide following hypotheses. Hypotheses If a single genomic region containing the SLC16A2 gene on X-chromosome is responsible for maintaining “balanced” active copy numbers, it is possible that similar sequences or gene/s have the same function on other chromosomes (autosomes). SLC16A2 like sequences on autosomes could encompass evolutionary older, but functionally active key regions for chromosome counting in early embryogenesis. Also SLC16A2 like sequence on autosomes could be involved in inappropriate chromosomes pairing and, thereby be involved in aneuploidy formation during embryogenesis and cancer development. Also, “SLC16A2/TSIX/XIST gene like sequence combinations” covering the whole genome, could be important for the determination of X:autosome ratio in cells and chromosome counting. Conclusions SLC16A2 and/or SLC16A2/TSIX/XIST like sequence dispersed across autosomes and X-chromosome(s) could serve as bases for a counting mechanism to determine X:autosome ratio and could potentially be a mechanism by which a cell also counts its autosomes. It could also be that such specific genomic regions have the same function for each specific autosome. As errors during the obviously existing process of chromosome counting are one if not the major origin of germline/somatic aneuploidy the here presented hypotheses should further elaborated and experimentally tested. Electronic supplementary material The online version of this article (doi:10.1186/s13039-016-0271-7) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
Singh K, Zulkifli M, Prasad NG. Identification and characterization of novel natural pathogen of Drosophila melanogaster isolated from wild captured Drosophila spp. Microbes Infect 2016; 18:813-821. [PMID: 27492855 DOI: 10.1016/j.micinf.2016.07.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/13/2016] [Accepted: 07/26/2016] [Indexed: 11/28/2022]
Abstract
Drosophila melanogaster is an emerging model system for the study of evolutionary ecology of immunity. However, a large number of studies have used non natural pathogens as very few natural pathogens have been isolated and identified. Our aim was to isolate and characterize natural pathogen/s of D. melanogaster. A bacterial pathogen was isolated from wild caught Drosophila spp., identified as a new strain of Staphylococcus succinus subsp. succinus and named PK-1. This strain induced substantial mortality (36-62%) in adults of several laboratory populations of D. melanogaster. PK-1 grew rapidly within the body of the flies post infection and both males and females had roughly same number of colony forming units. Mortality was affected by mode of infection and dosage of the pathogen. However mating status of the host had no effect on mortality post infection. Given that there are very few known natural bacterial pathogens of D. melanogaster and that PK-1 can establish a sustained infection across various outbred and inbred populations of D. melanogaster this new isolate is a potential resource for future studies on immunity.
Collapse
Affiliation(s)
- Karan Singh
- Indian Institute of Science Education and Research Mohali, Department of Biological Sciences, Knowledge City, Sector 81, SAS Nagar, PO Manauli, Punjab 140306, India.
| | - Mohammad Zulkifli
- Indian Institute of Science Education and Research Mohali, Department of Biological Sciences, Knowledge City, Sector 81, SAS Nagar, PO Manauli, Punjab 140306, India.
| | - N G Prasad
- Indian Institute of Science Education and Research Mohali, Department of Biological Sciences, Knowledge City, Sector 81, SAS Nagar, PO Manauli, Punjab 140306, India.
| |
Collapse
|
18
|
Pizzi C. MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics. Algorithms Mol Biol 2016; 11:6. [PMID: 27103940 PMCID: PMC4839165 DOI: 10.1186/s13015-016-0072-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 01/08/2016] [Indexed: 11/11/2022] Open
Abstract
Background Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based similarity measures. Results In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity measures: the longest and the average common substring with k mismatches. As a further contribution we provide a “relaxed” version of MissMax that does not guarantee the exact solution, but it is faster in practice and still very precise.
Collapse
|
19
|
Machnicka MA, Dunin-Horkawicz S, de Crécy-Lagard V, Bujnicki JM. tRNAmodpred: A computational method for predicting posttranscriptional modifications in tRNAs. Methods 2016; 107:34-41. [PMID: 27016142 DOI: 10.1016/j.ymeth.2016.03.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Revised: 03/19/2016] [Accepted: 03/21/2016] [Indexed: 11/20/2022] Open
Abstract
tRNA molecules contain numerous chemically altered nucleosides, which are formed by enzymatic modification of the primary transcripts during the complex tRNA maturation process. Some of the modifications are introduced by single reactions, while other require complex series of reactions carried out by several different enzymes. The location and distribution of various types of modifications vary greatly between different tRNA molecules, organisms and organelles. We have developed a computational method tRNAmodpred, for predicting modifications in tRNA sequences. Briefly, our method takes as an input one or more unmodified tRNA sequences and a set of protein sequences corresponding to a proteome of a cell. Subsequently it identifies homologs of known tRNA modification enzymes in the proteome, predicts tRNA modification activities and maps them onto known pathways of RNA modification from the MODOMICS database. Thereby, theoretically possible modification pathways are identified, and products of these modification reactions are proposed for query tRNAs. This method allows for predicting modification patterns for newly sequenced genomes as well as for checking tentative modification status of tRNAs from one species treated with enzymes from another source, e.g. to predict the possible modifications of eukaryotic tRNAs expressed in bacteria. tRNAmodpred is freely available as a web server at http://genesilico.pl/trnamodpred/.
Collapse
|
20
|
Scarpati M, Heavner ME, Wiech E, Singh S. Proteomic Tools for the Analysis of Cytoskeleton Proteins. Methods Mol Biol 2016; 1365:385-413. [PMID: 26498799 DOI: 10.1007/978-1-4939-3124-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence. Although a wealth of structural and functional information is available for a large number of cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice, while retaining advanced functionality for more experienced computational biologists.
Collapse
Affiliation(s)
- Michael Scarpati
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Mary Ellen Heavner
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Eliza Wiech
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Shaneen Singh
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA.
- Department of Biology, Brooklyn College, City University of New York, 209 Ingersoll Hall Extension, 2900 Bedford Ave., Brooklyn, NY, 11210, USA.
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA.
| |
Collapse
|
21
|
King BR, Aburdene M, Thompson A, Warres Z. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity. EURASIP J Bioinform Syst Biol 2014; 2014:8. [PMID: 24991213 PMCID: PMC4077688 DOI: 10.1186/1687-4153-2014-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/01/2014] [Indexed: 11/27/2022]
Abstract
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Collapse
Affiliation(s)
- Brian R King
- Department of Computer Science, Bucknell University, Lewisburg, PA 17837, USA
| | - Maurice Aburdene
- Department of Electrical and Computer Engineering, Bucknell University, Lewisburg, PA 17837, USA
| | - Alex Thompson
- Department of Electrical and Computer Engineering, Bucknell University, Lewisburg, PA 17837, USA
| | - Zach Warres
- Department of Electrical and Computer Engineering, Bucknell University, Lewisburg, PA 17837, USA
| |
Collapse
|
22
|
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014; 355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]
Abstract
Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.
Collapse
Affiliation(s)
- Lichao Zhang
- College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China
| | - Xiqiang Zhao
- College of Mathematical Science, Ocean University of China, Songling Road, Qingdao 266100, PR China.
| | - Liang Kong
- College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
23
|
Jeong BS, Golam Bari AT, Rokeya Reaz M, Jeon S, Lim CG, Choi HJ. Codon-based encoding for DNA sequence analysis. Methods 2014; 67:373-9. [PMID: 24530970 DOI: 10.1016/j.ymeth.2014.01.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 12/31/2013] [Accepted: 01/24/2014] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of biological sequence data (DNA or Protein Sequence), DNA sequence analysis has become an essential task for biologist to understand the features, functions, structures, and evolution of species. Encoding DNA sequences is an effective method to extract the features from DNA sequences. It is commonly used for visualizing DNA sequences and analyzing similarities/dissimilarities between different species or cells. Although there have been many encoding approaches proposed for DNA sequence analysis, we require more elegant approaches for higher accuracy. In this paper, we propose a noble encoding approach for measuring the degree of similarity/dissimilarity between different species. Our approach can preserve the physiochemical properties, positional information, and the codon usage bias of nucleotides. An extensive performance study shows that our approach provides higher accuracy than existing approaches in terms of the degree of similarity.
Collapse
|
24
|
Nosaka M, Hirata K, Tsuji R, Sunaba S. Planes formed with four intron-positions in tertiary structures of retinol binding protein and calpain domain VI. J Theor Biol 2014; 340:139-45. [PMID: 24029156 DOI: 10.1016/j.jtbi.2013.08.035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2012] [Revised: 08/28/2013] [Accepted: 08/30/2013] [Indexed: 11/28/2022]
Abstract
Eukaryotic genes have intervening sequences, introns, in their coding regions. Since introns are spliced out from m-RNA before translation, they are considered to have no effect on the protein structure. Here, we report a novel relationship between introns and the tertiary structures of retinol binding protein and calpain domain VI. We identified "intron-positions" as amino acid residues on which or just after which introns are found in their corresponding nucleotide sequences, and then found that four intron-positions form a plane. We also found that the four intron-positions of retinol-binding protein encloses its ligand retinol. The tertiary structure of calpain domain VI changes after Ca(2+) binding, and the four intron-positions form a plane that includes its ligand calpastatin. To evaluate the statistical significance of the planarity, we calculated the mean distance of each intron-position from the plane defined by the other three intron-positions, and showed that it is significantly smaller than the one calculated for randomly generated locations based on exon size distribution. On the basis of this finding, we discuss the evolution of retinol binding protein and the origin of introns.
Collapse
Affiliation(s)
- Michiko Nosaka
- Material and Biological Engineering, Sasebo National College of Technology, Japan.
| | | | | | | |
Collapse
|
25
|
Roorkiwal M, Sharma PC. Sequence similarity based identification of abiotic stress responsive genes in chickpea. Bioinformation 2012; 8:92-7. [PMID: 22359442 PMCID: PMC3282263 DOI: 10.6026/97320630008092] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2012] [Accepted: 01/07/2012] [Indexed: 12/03/2022] Open
Abstract
Chickpea (Cicer arietinum L.) is an important food legume crop, particularly for the arid regions including Indian subcontinent.
Considering the detrimental effect of drought, temperature and salt stress on crop yield, efforts have been initiated in the direction
of developing improved varieties and designing alternate strategies to sustain chickpea production in adverse environmental
conditions. Identification of genes that confer abiotic stress tolerance in plants remains a challenge in contemporary plant breeding.
The present study focused on the identification of abiotic stress responsive genes in chickpea based on sequence similarity
approach exploiting known abiotic stress responsive genes from model crops or other plant species. Ten abiotic stress responsive
genes identified in other plants were partially amplified from eight chickpea genotypes and their presence in chickpea was
confirmed after sequencing the PCR products. These genes have been functionally validated and reported to play significant role in
stress response in model plants like Arabidopsis, rice and other legume crops. Chickpea EST sequences available at NCBI EST
database were used for the identification of abiotic stress responsive genes. A total of 8,536 unique coding long sequences were
used for identification of chickpea homologues of these abiotic stress responsive genes by sequence similarity search (BLASTN and
BLASTX). These genes can be further explored towards achieving the goal of developing superior chickpea varieties providing
improved yields under stress conditions using modern molecular breeding approaches.
Collapse
Affiliation(s)
- Manish Roorkiwal
- University School of Biotechnology, Guru Gobind Singh Indraprastha University, Dwarka Sector 16C, New Delhi-110075, India
- Present Address - International Crops Research Institute for the Semi-Arid Tropics, Patancheru-502324, India
| | - Prakash Chand Sharma
- University School of Biotechnology, Guru Gobind Singh Indraprastha University, Dwarka Sector 16C, New Delhi-110075, India
- Prakash Chand Sharma: Phone: 011-25302306 (Direct), Fax: 25302112
| |
Collapse
|