Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Henderson J, Salzberg S, Fasman KH. Finding genes in DNA with a Hidden Markov Model. J Comput Biol 1997;4:127-41. [PMID: 9228612 DOI: 10.1089/cmb.1997.4.127] [Citation(s) in RCA: 77] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Number

Cited by Other Article(s)

Chen W, Cui Y, He Y, Zhao L, Cui R, Liu X, Huang H, Zhang Y, Fan Y, Feng X, Ni K, Jiang T, Han M, Lei Y, Liu M, Meng Y, Chen X, Lu X, Wang D, Wang J, Wang S, Guo L, Chen Q, Ye W. Raffinose degradation-related gene GhAGAL3 was screened out responding to salinity stress through expression patterns of GhAGALs family genes. FRONTIERS IN PLANT SCIENCE 2023;14:1246677. [PMID: 38192697 PMCID: PMC10773686 DOI: 10.3389/fpls.2023.1246677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 11/27/2023] [Indexed: 01/10/2024]

Affiliation(s)

Wenhua Chen Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China Engineering Research Centre of Cotton, Ministry of Education/College of Agriculture, Xinjiang Agricultural University, Urumqi, China
Yupeng Cui Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Yunxin He Hunan Institute of Cotton Science, Changde, Hunan, China
Lanjie Zhao Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Ruifeng Cui Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Xiaoyu Liu Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Hui Huang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Yuexin Zhang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Yapeng Fan Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Xixian Feng Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Kesong Ni Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Tiantian Jiang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Mingge Han Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Yuqian Lei Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Mengyue Liu Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Yuan Meng Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Xiugui Chen Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Xuke Lu Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Delong Wang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Junjuan Wang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Shuai Wang Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Lixue Guo Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China
Quanjia Chen Engineering Research Centre of Cotton, Ministry of Education/College of Agriculture, Xinjiang Agricultural University, Urumqi, China
Wuwei Ye Institute of Cotton Research of Chinese Academy of Agricultural Sciences/Research Base, Anyang Institute of Technology, National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Anyang, Henan, China Engineering Research Centre of Cotton, Ministry of Education/College of Agriculture, Xinjiang Agricultural University, Urumqi, China

Collapse

Procopio A, Cesarelli G, Donisi L, Merola A, Amato F, Cosentino C. Combined mechanistic modeling and machine-learning approaches in systems biology - A systematic literature review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;240:107681. [PMID: 37385142 DOI: 10.1016/j.cmpb.2023.107681] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/14/2023] [Accepted: 06/14/2023] [Indexed: 07/01/2023]

Abstract

BACKGROUND AND OBJECTIVE

Mechanistic-based Model simulations (MM) are an effective approach commonly employed, for research and learning purposes, to better investigate and understand the inherent behavior of biological systems. Recent advancements in modern technologies and the large availability of omics data allowed the application of Machine Learning (ML) techniques to different research fields, including systems biology. However, the availability of information regarding the analyzed biological context, sufficient experimental data, as well as the degree of computational complexity, represent some of the issues that both MMs and ML techniques could present individually. For this reason, recently, several studies suggest overcoming or significantly reducing these drawbacks by combining the above-mentioned two methods. In the wake of the growing interest in this hybrid analysis approach, with the present review, we want to systematically investigate the studies available in the scientific literature in which both MMs and ML have been combined to explain biological processes at genomics, proteomics, and metabolomics levels, or the behavior of entire cellular populations.

METHODS

Elsevier Scopus®, Clarivate Web of Science™ and National Library of Medicine PubMed® databases were enquired using the queries reported in Table 1, resulting in 350 scientific articles.

RESULTS

Only 14 of the 350 documents returned by the comprehensive search conducted on the three major online databases met our search criteria, i.e. present a hybrid approach consisting of the synergistic combination of MMs and ML to treat a particular aspect of systems biology.

CONCLUSIONS

Despite the recent interest in this methodology, from a careful analysis of the selected papers, it emerged how examples of integration between MMs and ML are already present in systems biology, highlighting the great potential of this hybrid approach to both at micro and macro biological scales.

Collapse

Du J, Wang C, Wang L, Mao S, Zhu B, Li Z, Fan X. Automatic block-wise genotype-phenotype association detection based on hidden Markov model. BMC Bioinformatics 2023;24:138. [PMID: 37029361 PMCID: PMC10082540 DOI: 10.1186/s12859-023-05265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 03/31/2023] [Indexed: 04/09/2023] Open

Ahmed YW, Alemu BA, Bekele SA, Gizaw ST, Zerihun MF, Wabalo EK, Teklemariam MD, Mihrete TK, Hanurry EY, Amogne TG, Gebrehiwot AD, Berga TN, Haile EA, Edo DO, Alemu BD. Epigenetic tumor heterogeneity in the era of single-cell profiling with nanopore sequencing. Clin Epigenetics 2022;14:107. [PMID: 36030244 PMCID: PMC9419648 DOI: 10.1186/s13148-022-01323-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/12/2022] [Indexed: 11/29/2022] Open

Affiliation(s)

Yohannis Wondwosen Ahmed Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia.
Berhan Ababaw Alemu Department of Medical Biochemistry, School of Medicine, St. Paul's Hospital, Millennium Medical College, Addis Ababa, Ethiopia
Sisay Addisu Bekele Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Solomon Tebeje Gizaw Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Muluken Fekadie Zerihun Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Endriyas Kelta Wabalo Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Maria Degef Teklemariam Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Tsehayneh Kelemu Mihrete Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Endris Yibru Hanurry Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Tensae Gebru Amogne Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Assaye Desalegne Gebrehiwot Department of Medical Anatomy, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
Tamirat Nida Berga Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Ebsitu Abate Haile Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Dessiet Oma Edo Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
Bizuwork Derebew Alemu Department of Statistics, College of Natural and Computational Sciences, Mizan Tepi University, Tepi, Ethiopia

Collapse

Yuan Z, Yang H, Pan L, Zhao W, Liang L, Gatera A, Tucker MR, Xu D. Systematic identification and expression profiles of the BAHD superfamily acyltransferases in barley (Hordeum vulgare). Sci Rep 2022;12:5063. [PMID: 35332203 PMCID: PMC8948222 DOI: 10.1038/s41598-022-08983-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 03/14/2022] [Indexed: 12/28/2022] Open

ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features. Int J Mol Sci 2022;23:ijms23031612. [PMID: 35163534 PMCID: PMC8835813 DOI: 10.3390/ijms23031612] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/23/2022] [Accepted: 01/26/2022] [Indexed: 02/04/2023] Open

Nowak S, Rosin M, Stuerzlinger W, Bartram L. Visual Analytics: A Method to Explore Natural Histories of Oral Epithelial Dysplasia. FRONTIERS IN ORAL HEALTH 2022;2:703874. [PMID: 35048041 PMCID: PMC8757761 DOI: 10.3389/froh.2021.703874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/02/2021] [Indexed: 11/17/2022] Open

Domínguez-Santos R, Pérez-Cobas AE, Cuti P, Pérez-Brocal V, García-Ferris C, Moya A, Latorre A, Gil R. Interkingdom Gut Microbiome and Resistome of the Cockroach Blattella germanica. mSystems 2021;6:6/3/e01213-20. [PMID: 33975971 PMCID: PMC8125077 DOI: 10.1128/msystems.01213-20] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

McClintock BT, Langrock R, Gimenez O, Cam E, Borchers DL, Glennie R, Patterson TA. Uncovering ecological state dynamics with hidden Markov models. Ecol Lett 2020;23:1878-1903. [PMID: 33073921 PMCID: PMC7702077 DOI: 10.1111/ele.13610] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/13/2020] [Accepted: 08/25/2020] [Indexed: 01/03/2023]

Khodaei A, Feizi-Derakhshi MR, Mozaffari-Tazehkand B. A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences. ACTA ACUST UNITED AC 2020;11:87-99. [PMID: 33842279 PMCID: PMC8022238 DOI: 10.34172/bi.2021.16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 01/06/2020] [Accepted: 01/21/2020] [Indexed: 01/06/2023]

Li Z, Guan Y, Yuan X, Zheng P, Zhu H. Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method. PLoS One 2019;14:e0214442. [PMID: 30943219 PMCID: PMC6447165 DOI: 10.1371/journal.pone.0214442] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/13/2019] [Indexed: 01/08/2023] Open

Meher PK, Sahu TK, Gahoi S, Tomar R, Rao AR. funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019;20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species.

RESULTS

A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi.

CONCLUSIONS

An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.

Collapse

Mishra A, Siwach P, Singhal P, Jayaram B. ChemGenome2.1: An Ab Initio Gene Prediction Software. Methods Mol Biol 2019;1962:121-138. [PMID: 31020557 DOI: 10.1007/978-1-4939-9173-0_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Abstract

Gene prediction, also known as gene identification, gene finding, gene recognition, or gene discovery, is among one of the important problems of molecular biology and is receiving increasing attention due to the advent of large-scale genome sequencing projects. We designed an ab initio model (called ChemGenome) for gene prediction in prokaryotic genomes based on physicochemical characteristics of codons. In this chapter, we present the methodology of the latest version of this model ChemGenome2.1 (CG2.1). The first module of the protocol builds a three-dimensional vector from three calculated quantities for each codon-the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. As this three-dimensional vector moves along any genome, the net orientation of the resultant vector should differ significantly for gene and non-genic regions to make a distinction feasible. The predicted putative protein-coding genes from above parameters are passed through a second module of the protocol which reduces the number of false positives by utilizing a filter based on stereochemical properties of protein sequences. The chemical properties of amino acid side chains taken into consideration are the presence of sp3 hybridized γ carbon atom, hydrogen bond donor ability, short/absence of δ carbon and linearity of the side chains/non-occurrence of bi-dentate forks with terminal hydrogen atoms in the side chain. The final prediction of the potential protein-coding genes is based on the frequency of occurrence of amino acids in the predicted protein sequences and their deviation from the frequency values of Swissprot protein sequences, both at monomer and tripeptide levels. The final screening is based on Z-score. Though CG2.1 is a gene finding tool for prokaryotes, considering the underlying similarity in the chemical and physical properties of DNA among prokaryotes and eukaryotes, we attempted to evaluate its applicability for gene finding in the lower eukaryotes. The results give a hope that the concept of gene finding based on physicochemical model of codons is a viable idea for eukaryotes as well, though, undoubtedly, improvements are needed.

Collapse

Meher PK, Sahu TK, Mohanty J, Gahoi S, Purru S, Grover M, Rao AR. nifPred: Proteome-Wide Identification and Categorization of Nitrogen-Fixation Proteins of Diaztrophs Based on Composition-Transition-Distribution Features Using Support Vector Machine. Front Microbiol 2018;9:1100. [PMID: 29896173 PMCID: PMC5986947 DOI: 10.3389/fmicb.2018.01100] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 05/08/2018] [Indexed: 11/13/2022] Open

Abstract

As inorganic nitrogen compounds are essential for basic building blocks of life (e.g., nucleotides and amino acids), the role of biological nitrogen-fixation (BNF) is indispensible. All nitrogen fixing microbes rely on the same nitrogenase enzyme for nitrogen reduction, which is in fact an enzyme complex consists of as many as 20 genes. However, the occurrence of six genes viz., nifB, nifD, nifE, nifH, nifK, and nifN has been proposed to be essential for a functional nitrogenase enzyme. Therefore, identification of these genes is important to understand the mechanism of BNF as well as to explore the possibilities for improving BNF from agricultural sustainability point of view. Further, though the computational tools are available for the annotation and phylogenetic analysis of nifH gene sequences alone, to the best of our knowledge no tool is available for the computational prediction of the above mentioned six categories of nitrogen-fixation (nif) genes or proteins. Thus, we proposed an approach, which is first of its kind for the computational identification of nif proteins encoded by the six categories of nif genes. Sequence-derived features were employed to map the input sequences into vectors of numeric observations that were subsequently fed to the support vector machine as input. Two types of classifier were constructed: (i) a binary classifier for classification of nif and non-nitrogen-fixation (non-nif) proteins, and (ii) a multi-class classifier for classification of six categories of nif proteins. Higher accuracies were observed for the combination of composition-transition-distribution (CTD) feature set and radial kernel, as compared to the other feature-kernel combinations. The overall accuracies were observed >90% in both binary and multi-class classifications. The developed approach further achieved >92% accuracy, while evaluated with blind (independent) test datasets. The developed approach also produced higher accuracy in identifying nif proteins, while evaluated using proteome-wide datasets of several species. Furthermore, we established a prediction server nifPred (http://webapp.cabgrid.res.in/nifPred) to assist the scientific community for proteome-wide identification of six categories of nif proteins. Besides, the source code of nifPred is also available at https://github.com/PrabinaMeher/nifPred. The developed web server is expected to supplement the transcriptional profiling and comparative genomics studies for the identification and functional annotation of genes related to BNF.

Collapse

Meher PK, Sahu TK, Banchariya A, Rao AR. DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinformatics 2017;18:190. [PMID: 28340571 PMCID: PMC5364559 DOI: 10.1186/s12859-017-1587-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/09/2017] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides.

RESULTS

Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins.

CONCLUSIONS

This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.

Collapse

Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep 2017;7:42362. [PMID: 28205576 PMCID: PMC5304217 DOI: 10.1038/srep42362] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/09/2017] [Indexed: 11/13/2022] Open

Al Bataineh M, Al-qudah Z. A novel gene identification algorithm with Bayesian classification. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Meher PK, Sahu TK, Rao AR, Wahi SD. A computational approach for prediction of donor splice sites with improved accuracy. J Theor Biol 2016;404:285-294. [PMID: 27302911 DOI: 10.1016/j.jtbi.2016.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 04/18/2016] [Accepted: 06/09/2016] [Indexed: 11/24/2022]

Meher PK, Sahu TK, Rao AR, Wahi SD. Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms Mol Biol 2016;11:16. [PMID: 27252772 PMCID: PMC4888255 DOI: 10.1186/s13015-016-0078-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open

A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Hidden Markov models for gene sequence classification. Pattern Anal Appl 2015. [DOI: 10.1007/s10044-015-0508-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

El Yazid Boudaren M, Monfrini E, Pieczynski W, Aïssani A. Phasic Triplet Markov Chains. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014;36:2310-2316. [PMID: 26353069 DOI: 10.1109/tpami.2014.2327974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Regional effects on chimera formation in 454 pyrosequenced amplicons from a mock community. J Microbiol 2014;52:566-73. [DOI: 10.1007/s12275-014-3485-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 02/04/2014] [Accepted: 03/12/2014] [Indexed: 11/27/2022]

Molina J, Hazzouri KM, Nickrent D, Geisler M, Meyer RS, Pentony MM, Flowers JM, Pelser P, Barcelona J, Inovejas SA, Uy I, Yuan W, Wilkins O, Michel CI, LockLear S, Concepcion GP, Purugganan MD. Possible loss of the chloroplast genome in the parasitic flowering plant Rafflesia lagascae (Rafflesiaceae). Mol Biol Evol 2014;31:793-803. [PMID: 24458431 PMCID: PMC3969568 DOI: 10.1093/molbev/msu051] [Citation(s) in RCA: 125] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Won KJ, Zhang X, Wang T, Ding B, Raha D, Snyder M, Ren B, Wang W. Comparative annotation of functional regions in the human genome using epigenomic data. Nucleic Acids Res 2013;41:4423-32. [PMID: 23482391 PMCID: PMC3632130 DOI: 10.1093/nar/gkt143] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics. BMC Genomics 2012;13 Suppl 8:S19. [PMID: 23282225 PMCID: PMC3535712 DOI: 10.1186/1471-2164-13-s8-s19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Bonneville R, Jin VX. A hidden Markov model to identify combinatorial epigenetic regulation patterns for estrogen receptor α target genes. ACTA ACUST UNITED AC 2012;29:22-8. [PMID: 23104890 DOI: 10.1093/bioinformatics/bts639] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Zhang L, Tian F, Wang S. A modified statistically optimal null filter method for recognizing protein-coding regions. GENOMICS PROTEOMICS & BIOINFORMATICS 2012;10:166-73. [PMID: 22917190 PMCID: PMC5054498 DOI: 10.1016/j.gpb.2012.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 02/04/2012] [Accepted: 02/21/2012] [Indexed: 11/21/2022]

Mørk S, Holmes I. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs. Bioinformatics 2012;28:636-42. [PMID: 22215819 PMCID: PMC3289911 DOI: 10.1093/bioinformatics/btr698] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Alioto T. Gene prediction. Methods Mol Biol 2012;855:175-201. [PMID: 22407709 DOI: 10.1007/978-1-61779-582-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Chen B, Ji P. Numericalization of the self adaptive spectral rotation method for coding region prediction. J Theor Biol 2011;296:95-102. [PMID: 22178641 DOI: 10.1016/j.jtbi.2011.12.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Revised: 10/24/2011] [Accepted: 12/01/2011] [Indexed: 11/27/2022]

Suvorova YM, Rudenko VM, Korotkov EV. Detection change points of triplet periodicity of gene. Gene 2011;491:58-64. [PMID: 21982972 DOI: 10.1016/j.gene.2011.08.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2011] [Revised: 08/10/2011] [Accepted: 08/25/2011] [Indexed: 10/17/2022]

Sahu SS, Panda G. Identification of protein-coding regions in DNA sequences using a time-frequency filtering approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2011;9:45-55. [PMID: 21641562 PMCID: PMC5054166 DOI: 10.1016/s1672-0229(11)60007-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2010] [Accepted: 10/31/2010] [Indexed: 11/13/2022]

Machado-Lima A, Kashiwabara AY, Durham AM. Decreasing the number of false positives in sequence classification. BMC Genomics 2010;11 Suppl 5:S10. [PMID: 21210966 PMCID: PMC3045793 DOI: 10.1186/1471-2164-11-s5-s10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Background

A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation.

Results

For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results.

Conclusions

Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Collapse

Chen B, Ji P. Visualization of the protein-coding regions with a self adaptive spectral rotation approach. Nucleic Acids Res 2010;39:e3. [PMID: 20947567 PMCID: PMC3017620 DOI: 10.1093/nar/gkq891] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Zeng J, Alhajj R, Demetrick D. Adaptive multi-agent architecture for functional sequence motifs recognition. Bioinformatics 2009;25:3084-92. [PMID: 19808882 DOI: 10.1093/bioinformatics/btp567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Providing predictions on distributed HMMs with privacy. Artif Intell Rev 2009. [DOI: 10.1007/s10462-009-9106-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Frenkel FE, Korotkov EV. Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes. DNA Res 2009;16:105-14. [PMID: 19261626 PMCID: PMC2671204 DOI: 10.1093/dnares/dsp002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Galimov AR, Kruglov AA, Bol'sheva NL, Iurkevich OI, Lipin'sh DI, Mufazalov IA, Kuprash DV, Nedospasov SA. [Chromosomal localization and molecular organization of human genomic fragment containing TNF/LT locus in transgenic mice]. Mol Biol (Mosk) 2008;42:629-38. [PMID: 18856063 DOI: 10.1134/s0026893308040201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Frenkel FE, Korotkov EV. Classification analysis of triplet periodicity in protein-coding regions of genes. Gene 2008;421:52-60. [PMID: 18593596 DOI: 10.1016/j.gene.2008.06.012] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2008] [Revised: 05/14/2008] [Accepted: 06/06/2008] [Indexed: 11/16/2022]

Gene Identification: Classical and Computational Intelligence Approaches. ACTA ACUST UNITED AC 2008. [DOI: 10.1109/tsmcc.2007.906066] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Melodelima C, Gautier C, Piau D. A markovian approach for the prediction of mouse isochores. J Math Biol 2007;55:353-64. [PMID: 17486342 DOI: 10.1007/s00285-007-0087-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2006] [Revised: 03/01/2007] [Indexed: 10/23/2022]

Keibler E, Arumugam M, Brent MR. The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 2007;23:545-54. [PMID: 17237054 DOI: 10.1093/bioinformatics/btl659] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Segovia-Juarez JL, Colombano S, Kirschner D. Identifying DNA splice sites using hypernetworks with artificial molecular evolution. Biosystems 2006;87:117-24. [PMID: 17116361 DOI: 10.1016/j.biosystems.2006.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Revised: 07/08/2006] [Accepted: 07/15/2006] [Indexed: 11/28/2022]

Akalin PK. Introduction to bioinformatics. Mol Nutr Food Res 2006;50:610-9. [PMID: 16810733 DOI: 10.1002/mnfr.200500273] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Prediction of wolf (Canis lupus) kill-sites using hidden Markov models. Ecol Modell 2006. [DOI: 10.1016/j.ecolmodel.2006.02.043] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Dutta S, Singhal P, Agrawal P, Tomer R, Kritee K, Khurana E, Jayaram B. A physicochemical model for analyzing DNA sequences. J Chem Inf Model 2006;46:78-85. [PMID: 16426042 DOI: 10.1021/ci050119x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Huang J, Li T, Chen K, Wu J. An approach of encoding for prediction of splice sites using SVM. Biochimie 2006;88:923-9. [PMID: 16626852 DOI: 10.1016/j.biochi.2006.03.006] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2004] [Revised: 03/06/2006] [Accepted: 03/09/2006] [Indexed: 11/18/2022]

Sczyrba A, Beckstette M, Brivanlou AH, Giegerich R, Altmann CR. XenDB: full length cDNA prediction and cross species mapping in Xenopus laevis. BMC Genomics 2005;6:123. [PMID: 16162280 PMCID: PMC1261260 DOI: 10.1186/1471-2164-6-123] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2005] [Accepted: 09/14/2005] [Indexed: 11/23/2022] Open

Abstract

Background

Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems.

Description

Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined.

Conclusion

The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches.

Supplementary material can be found at .

Collapse

A neural network based multi-classifier system for gene identification in DNA sequences. Neural Comput Appl 2004. [DOI: 10.1007/s00521-004-0447-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]