1
|
Sahu TK, Manerkar S, Mondkar J, Kalamdani P, Patra S, Kalathingal T, Kaur S. Effect of early total enteral feeding vs incremental feeding in small for gestational age very low birth weight infants: A randomized controlled trial. J Neonatal Perinatal Med 2024:NPM230195. [PMID: 38640177 DOI: 10.3233/npm-230195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Affiliation(s)
- T K Sahu
- Department of Neonatology, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - S Manerkar
- Department of Neonatology, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - J Mondkar
- Department of Neonatology, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - P Kalamdani
- Department of Neonatology, Ex-faculty, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - S Patra
- Department of Neonatology, Ex-faculty, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - T Kalathingal
- Department of Neonatology, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| | - S Kaur
- Department of Neonatology, Ex-faculty, Lokmanya Tilak Municipal Medical College and General Hospital, Sion, Mumbai, India
| |
Collapse
|
2
|
Sahu S, Rao AR, Sahu TK, Pandey J, Varshney S, Kumar A, Gaikwad K. Predictive Role of Cluster Bean ( Cyamopsis tetragonoloba) Derived miRNAs in Human and Cattle Health. Genes (Basel) 2024; 15:448. [PMID: 38674383 PMCID: PMC11049822 DOI: 10.3390/genes15040448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/22/2023] [Accepted: 09/11/2023] [Indexed: 04/28/2024] Open
Abstract
MicroRNAs (miRNAs) are small non-coding conserved molecules with lengths varying between 18-25nt. Plants miRNAs are very stable, and probably they might have been transferred across kingdoms via food intake. Such miRNAs are also called exogenous miRNAs, which regulate the gene expression in host organisms. The miRNAs present in the cluster bean, a drought tolerant legume crop having high commercial value, might have also played a regulatory role for the genes involved in nutrients synthesis or disease pathways in animals including humans due to dietary intake of plant parts of cluster beans. However, the predictive role of miRNAs of cluster beans for gene-disease association across kingdoms such as cattle and humans are not yet fully explored. Thus, the aim of the present study is to (i) find out the cluster bean miRNAs (cb-miRs) functionally similar to miRNAs of cattle and humans and predict their target genes' involvement in the occurrence of complex diseases, and (ii) identify the role of cb-miRs that are functionally non-similar to the miRNAs of cattle and humans and predict their targeted genes' association with complex diseases in host systems. Here, we predicted a total of 33 and 15 functionally similar cb-miRs (fs-cb-miRs) to human and cattle miRNAs, respectively. Further, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed the participation of targeted genes of fs-cb-miRs in 24 and 12 different pathways in humans and cattle, respectively. Few targeted genes in humans like LCP2, GABRA6, and MYH14 were predicted to be associated with disease pathways of Yesinia infection (hsa05135), neuroactive ligand-receptor interaction (hsa04080), and pathogenic Escherichia coli infection (hsa05130), respectively. However, targeted genes of fs-cb-miRs in humans like KLHL20, TNS1, and PAPD4 are associated with Alzheimer's, malignant tumor of the breast, and hepatitis C virus infection disease, respectively. Similarly, in cattle, targeted genes like ATG2B and DHRS11 of fs-cb-miRs participate in the pathways of Huntington disease and steroid biosynthesis, respectively. Additionally, the targeted genes like SURF4 and EDME2 of fs-cb-miRs are associated with mastitis and bovine osteoporosis, respectively. We also found a few cb-miRs that do not have functional similarity with human and cattle miRNAs but are found to target the genes in the host organisms and as well being associated with human and cattle diseases. Interestingly, a few genes such as NRM, PTPRE and SUZ12 were observed to be associated with Rheumatoid Arthritis, Asthma and Endometrial Stromal Sarcoma diseases, respectively, in humans and genes like SCNN1B associated with renal disease in cattle.
Collapse
Affiliation(s)
- Sarika Sahu
- Indian Agricultural Statistics Research Institute, ICAR, New Delhi 110012, India; (S.S.); (J.P.); (S.V.)
- Amity Institute of Biotechnology, Amity University, Noida 201303, India;
| | - Atmakuri Ramakrishna Rao
- Indian Agricultural Statistics Research Institute, ICAR, New Delhi 110012, India; (S.S.); (J.P.); (S.V.)
- Indian Council of Agricultural Research, New Delhi 110001, India
| | - Tanmaya Kumar Sahu
- Indian Grassland and Fodder Research Institute, ICAR, Jhansi 284003, India;
| | - Jaya Pandey
- Indian Agricultural Statistics Research Institute, ICAR, New Delhi 110012, India; (S.S.); (J.P.); (S.V.)
| | - Shivangi Varshney
- Indian Agricultural Statistics Research Institute, ICAR, New Delhi 110012, India; (S.S.); (J.P.); (S.V.)
| | - Archna Kumar
- Amity Institute of Biotechnology, Amity University, Noida 201303, India;
| | - Kishor Gaikwad
- National Institute for Plant Biotechnology, ICAR, New Delhi 110012, India;
| |
Collapse
|
3
|
Meher PK, Sahu TK, Gupta A, Kumar A, Rustgi S. ASRpro: A machine-learning computational model for identifying proteins associated with multiple abiotic stress in plants. Plant Genome 2024; 17:e20259. [PMID: 36098562 DOI: 10.1002/tpg2.20259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 08/10/2022] [Indexed: 06/15/2023]
Abstract
One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.
Collapse
Affiliation(s)
| | | | - Ajit Gupta
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anuj Kumar
- Dep. of Microbiology and Immunology, Dalhousie Univ., Halifax, Nova Scotia, Canada
- Laboratory of Immunity, Shantou Univ. Medical College, Shantou, PRC
| | - Sachin Rustgi
- Dep. of Plant and Environmental Sciences, Pee Dee Research and Education Centre, Clemson Univ., Florence, SC, USA
| |
Collapse
|
4
|
Mir ZA, Chauhan D, Pradhan AK, Srivastava V, Sharma D, Budhlakoti N, Mishra DC, Jadon V, Sahu TK, Grover M, Gangwar OP, Kumar S, Bhardwaj SC, Padaria JC, Singh AK, Rai A, Singh GP, Kumar S. Comparative transcriptome profiling of near isogenic lines PBW343 and FLW29 to unravel defense related genes and pathways contributing to stripe rust resistance in wheat. Funct Integr Genomics 2023; 23:169. [PMID: 37209309 DOI: 10.1007/s10142-023-01104-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 05/22/2023]
Abstract
Stripe rust (Sr), caused by Puccinia striiformis f. sp. tritici (Pst), is the most devastating disease that poses serious threat to the wheat-growing nations across the globe. Developing resistant cultivars is the most challenging aspect in wheat breeding. The function of resistance genes (R genes) and the mechanisms by which they influence plant-host interactions are poorly understood. In the present investigation, comparative transcriptome analysis was carried out by involving two near-isogenic lines (NILs) PBW343 and FLW29. The seedlings of both the genotypes were inoculated with Pst pathotype 46S119. In total, 1106 differentially expressed genes (DEGs) were identified at early stage of infection (12 hpi), whereas expressions of 877 and 1737 DEGs were observed at later stages (48 and 72 hpi) in FLW29. The identified DEGs were comprised of defense-related genes including putative R genes, 7 WRKY transcriptional factors, calcium, and hormonal signaling associated genes. Moreover, pathways involved in signaling of receptor kinases, G protein, and light showed higher expression in resistant cultivar and were common across different time points. Quantitative real-time PCR was used to further confirm the transcriptional expression of eight critical genes involved in plant defense mechanism against stripe rust. The information about genes are likely to improve our knowledge of the genetic mechanism that controls the stripe rust resistance in wheat, and data on resistance response-linked genes and pathways will be a significant resource for future research.
Collapse
Affiliation(s)
- Zahoor Ahmad Mir
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Divya Chauhan
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | | | - Vivek Srivastava
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Divya Sharma
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Neeraj Budhlakoti
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
| | | | - Vasudha Jadon
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Tanmaya Kumar Sahu
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Monendra Grover
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
| | - Om Prakash Gangwar
- ICAR-Indian Institute of Wheat and Barley Research, Flowerdale, Shimla, Himachal, Pradesh, 171002, India
| | - Subodh Kumar
- ICAR-Indian Institute of Wheat and Barley Research, Flowerdale, Shimla, Himachal, Pradesh, 171002, India
| | - S C Bhardwaj
- ICAR-Indian Institute of Wheat and Barley Research, Flowerdale, Shimla, Himachal, Pradesh, 171002, India
| | - Jasdeep C Padaria
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012, India
| | - Amit Kumar Singh
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
| | - G P Singh
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India
| | - Sundeep Kumar
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, 110012, India.
| |
Collapse
|
5
|
Choudhury N, Sahu TK, Rao AR, Rout AK, Behera BK. An Improved Machine Learning-Based Approach to Assess the Microbial Diversity in Major North Indian River Ecosystems. Genes (Basel) 2023; 14:genes14051082. [PMID: 37239442 DOI: 10.3390/genes14051082] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/08/2023] [Accepted: 05/12/2023] [Indexed: 05/28/2023] Open
Abstract
The rapidly evolving high-throughput sequencing (HTS) technologies generate voluminous genomic and metagenomic sequences, which can help classify the microbial communities with high accuracy in many ecosystems. Conventionally, the rule-based binning techniques are used to classify the contigs or scaffolds based on either sequence composition or sequence similarity. However, the accurate classification of the microbial communities remains a major challenge due to massive data volumes at hand as well as a requirement of efficient binning methods and classification algorithms. Therefore, we attempted here to implement iterative K-Means clustering for the initial binning of metagenomics sequences and applied various machine learning algorithms (MLAs) to classify the newly identified unknown microbes. The cluster annotation was achieved through the BLAST program of NCBI, which resulted in the grouping of assembled scaffolds into five classes, i.e., bacteria, archaea, eukaryota, viruses and others. The annotated cluster sequences were used to train machine learning algorithms (MLAs) to develop prediction models to classify unknown metagenomic sequences. In this study, we used metagenomic datasets of samples collected from the Ganga (Kanpur and Farakka) and the Yamuna (Delhi) rivers in India for clustering and training the MLA models. Further, the performance of MLAs was evaluated by 10-fold cross validation. The results revealed that the developed model based on the Random Forest had a superior performance compared to the other considered learning algorithms. The proposed method can be used for annotating the metagenomic scaffolds/contigs being complementary to existing methods of metagenomic data analysis. An offline predictor source code with the best prediction model is available at (https://github.com/Nalinikanta7/metagenomics).
Collapse
Affiliation(s)
- Nalinikanta Choudhury
- ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Tanmaya Kumar Sahu
- ICAR-Indian Grassland and Fodder Research Institute, Jhansi 284003, India
| | - Atmakuri Ramakrishna Rao
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
- Indian Council of Agricultural Research (ICAR), New Delhi 110001, India
| | - Ajaya Kumar Rout
- ICAR-Central Inland Fisheries Research Institute, West Bengal 700120, India
- Rani Lakshmi Bai Central Agricultural University, Jhansi 284003, India
| | - Bijay Kumar Behera
- ICAR-Central Inland Fisheries Research Institute, West Bengal 700120, India
- Rani Lakshmi Bai Central Agricultural University, Jhansi 284003, India
| |
Collapse
|
6
|
Sahu TK, Singh AK, Mittal S, Jha SK, Kumar S, Jacob SR, Singh K. G-DIRT: a web server for identification and removal of duplicate germplasms based on identity-by-state analysis using single nucleotide polymorphism genotyping data. Brief Bioinform 2022; 23:6678959. [PMID: 36040109 DOI: 10.1093/bib/bbac348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/11/2022] [Accepted: 07/26/2022] [Indexed: 01/26/2023] Open
Abstract
Maintaining duplicate germplasms in genebanks hampers effective conservation and utilization of genebank resources. The redundant germplasm adds to the cost of germplasm conservation by requiring a large proportion of the genebank financial resources towards conservation rather than enriching the diversity. Besides, genome-wide-association analysis using an association panel with over-represented germplasms can be biased resulting in spurious marker-trait associations. The conventional methods of germplasm duplicate removal using passport information suffer from incomplete or missing passport information and data handling errors at various stages of germplasm enrichment. This limitation is less likely in the case of genotypic data. Therefore, we developed a web-based tool, Germplasm Duplicate Identification and Removal Tool (G-DIRT), which allows germplasm duplicate identification based on identity-by-state analysis using single-nucleotide polymorphism genotyping information along with pre-processing of genotypic data. A homozygous genotypic difference threshold of 0.1% for germplasm duplicates has been determined using tetraploid wheat genotypic data with 94.97% of accuracy. Based on the genotypic difference, the tool also builds a dendrogram that can visually depict the relationship between genotypes. To overcome the constraint of high-dimensional genotypic data, an offline version of G-DIRT in the interface of R has also been developed. The G-DIRT is expected to help genebank curators, breeders and other researchers across the world in identifying germplasm duplicates from the global genebank collections by only using the easily sharable genotypic data instead of physically exchanging the seeds or propagating materials. The web server will complement the existing methods of germplasm duplicate identification based on passport or phenotypic information being freely accessible at http://webtools.nbpgr.ernet.in/gdirt/.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India
| | - Amit Kumar Singh
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India
| | - Shikha Mittal
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India
| | | | - Sundeep Kumar
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India
| | - Sherry Rachel Jacob
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India
| | - Kuldeep Singh
- ICAR-National Bureau of Plant Genetic Resources (ICAR-NBPGR), New Delhi, India.,ICAR- Indian Agricultural Research Institute (ICAR-IARI), New Delhi, India.,International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| |
Collapse
|
7
|
Sahu TK, Meher PK, Choudhury NK, Rao AR. A comparative analysis of amino acid encoding schemes for the prediction of flexible length linear B-cell epitopes. Brief Bioinform 2022; 23:6673853. [PMID: 35998895 DOI: 10.1093/bib/bbac356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 07/06/2022] [Accepted: 07/30/2022] [Indexed: 11/12/2022] Open
Abstract
Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.,ICAR-National Bureau of Plant Genetic Resources, New Delhi, India
| | | | | | - Atmakuri Ramakrishna Rao
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.,Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
8
|
Prasad G, Mittal S, Kumar A, Chauhan D, Sahu TK, Kumar S, Singh R, Yadav MC, Singh AK. Transcriptome Analysis of Bread Wheat Genotype KRL3-4 Provides a New Insight Into Regulatory Mechanisms Associated With Sodicity (High pH) Tolerance. Front Genet 2022; 12:782366. [PMID: 35222517 PMCID: PMC8864244 DOI: 10.3389/fgene.2021.782366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/14/2021] [Indexed: 11/24/2022] Open
Abstract
Globally, sodicity is one of the major abiotic stresses limiting the wheat productivity in arid and semi-arid regions. With due consideration, an investigation of the complex gene network associated with sodicity stress tolerance is required to identify transcriptional changes in plants during abiotic stress conditions. For this purpose, we sequenced the flag leaf transcriptome of a highly tolerant bread wheat germplasm (KRL 3-4) in order to extend our knowledge and better understanding of the molecular basis of sodicity tolerance. A total of 1,980 genes were differentially expressed in the flag leaf due to sodicity stress. Among these genes, 872 DEGs were upregulated and 1,108 were downregulated. Furthermore, annotation of DEGs revealed that a total of 1,384 genes were assigned to 2,267 GO terms corresponding to 502 (biological process), 638 (cellular component), and 1,127 (molecular function). GO annotation also revealed the involvement of genes related to several transcription factors; the important ones are expansins, peroxidase, glutathione-S-transferase, and metal ion transporters in response to sodicity. Additionally, from 127 KEGG pathways, only 40 were confidently enriched at a p-value <0.05 covering the five main KEGG categories of metabolism, i.e., environmental information processing, genetic information processing, organismal systems, and cellular processes. Most enriched pathways were prioritized using MapMan software and revealed that lipid metabolism, nutrient uptake, and protein homeostasis were paramount. We have also found 39 SNPs that mapped to the important sodicity stress-responsive genes associated with various pathways such as ROS scavenging, serine/threonine protein kinase, calcium signaling, and metal ion transporters. In a nutshell, only 19 important candidate genes contributing to sodicity tolerance in bread wheat were identified, and these genes might be helpful for better understanding and further improvement of sodicity tolerance in bread wheat.
Collapse
Affiliation(s)
- Geeta Prasad
- Division of Genomic Resources, ICAR-NBPGR, New Delhi, India
| | - Shikha Mittal
- Division of Genomic Resources, ICAR-NBPGR, New Delhi, India
| | - Arvind Kumar
- ICAR-Central Soil Salinity Research Institute, Karnal, India
| | - Divya Chauhan
- Division of Genomic Resources, ICAR-NBPGR, New Delhi, India
| | | | - Sundeep Kumar
- Division of Genomic Resources, ICAR-NBPGR, New Delhi, India
| | - Rakesh Singh
- Division of Genomic Resources, ICAR-NBPGR, New Delhi, India
| | | | | |
Collapse
|
9
|
Meher PK, Dash S, Sahu TK, Satpathy S, Pradhan SK. GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm. Physiol Mol Biol Plants 2022; 28:1-16. [PMID: 35221569 PMCID: PMC8847649 DOI: 10.1007/s12298-022-01130-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/31/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
UNLABELLED In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12298-022-01130-6.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
- Division of Statistical Genetics, ICAR-IASRI, New Delhi-12, India
| | - Sagarika Dash
- Orissa University of Agriculture and Technology, Bhubaneswar, Odisha India
| | - Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Subhrajit Satpathy
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | |
Collapse
|
10
|
Kochhar A, Khan NS, Deval R, Pradhan D, Jena L, Bhuyan R, Sahu TK, Jain AK. Protein-protein interaction and in silico mutagenesis studies on IL17A and its peptide inhibitor. 3 Biotech 2021; 11:305. [PMID: 34194898 DOI: 10.1007/s13205-021-02856-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 05/20/2021] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions of Interleukin-17 (IL17) play vital role in the autoimmune and inflammatory diseases, such as rheumatoid arthritis, multiple sclerosis, and psoriasis. Potent therapeutics for these diseases could be developed by blocking or modulating these interactions through biologics, peptide inhibitors and small molecule inhibitors. Unlike biologics, peptide inhibitors are cost effective and can be orally available. Peptide inhibitors do not require a binding groove as that of small molecules either. Therefore, crystal structure of IL17A in complex with a high affinity peptide inhibitor (HAP) (1-IHVTIPADLWDWIN-14) is investigated with an aim to find hot spots that could improve its potency. An in silico mutagenesis strategy was implemented using FoldX PSSM to scan for positions tolerant to amino acid substitution. Three positions T4, A7, and N14 showed improved stability when mutated with 'F/M/Y', 'P' and 'F/M/Y', respectively. A set of 31 mutant peptides are designed through combinations of these tolerant mutations using Build Model application of FoldX. Binding affinity and interactions of 31 peptides are assessed through protein-peptide docking and binding free energy calculations. Two peptides namely, P1 ("1-IHVTIPPDLWDWIY-14") and P2 ("1-IHVMIPPDLWDWIF-14") showed better binding affinity to IL17A dimerization site compared to HAP. Interactions of P1, P2 and HAP are also analyzed through 100 ns molecular dynamics simulations using GROMACS v5.0. The results revealed that the P2 peptide likely to offer better potency compared to HAP and P1. Therefore, the P2 peptide can be synthesized to develop oral therapies for autoimmune and inflammatory diseases with further experimental evaluations. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-021-02856-y.
Collapse
Affiliation(s)
- Aishwarya Kochhar
- Biomedical Informatics Centre, National Institute of Pathology-ICMR, New Delhi, 110029 India
- Department of Biotechnology, Invertis University, NH-24, Bareilly, U.P. 243123 India
| | - Noor Saba Khan
- Biomedical Informatics Centre, National Institute of Pathology-ICMR, New Delhi, 110029 India
- Department of Biotechnology, Invertis University, NH-24, Bareilly, U.P. 243123 India
| | - Ravi Deval
- Department of Biotechnology, Invertis University, NH-24, Bareilly, U.P. 243123 India
| | - Dibyabhaba Pradhan
- Biomedical Informatics Centre, National Institute of Pathology-ICMR, New Delhi, 110029 India
- ICMR-AIIMS Computational Genomics Centre (ISRM Division)-Indian Council of Medical Research, New Delhi, 110029 India
| | - Lingaraja Jena
- Department of Bioscience and Biotechnology, Banasthali Vidyapith, Banasthali, 304022 India
| | - Rajabrata Bhuyan
- Bioinformatics Centre, Mahatma Gandhi Institute of Medical Sciences, Sevagram, Maharashtra 442102 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Arun Kumar Jain
- Biomedical Informatics Centre, National Institute of Pathology-ICMR, New Delhi, 110029 India
| |
Collapse
|
11
|
Meher PK, Sahu TK, Gahoi S, Satpathy S, Rao AR. Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition. Gene 2019; 705:113-126. [PMID: 31009682 DOI: 10.1016/j.gene.2019.04.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 03/27/2019] [Accepted: 04/17/2019] [Indexed: 02/02/2023]
Abstract
Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Shachi Gahoi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Subhrajit Satpathy
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | | |
Collapse
|
12
|
Meher PK, Sahu TK, Gahoi S, Tomar R, Rao AR. funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019; 20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Ruchi Tomar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
- Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat, Uttar Pradesh 250611 India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| |
Collapse
|
13
|
Sahu TK, Pradhan D, Rao AR, Jena L. In silico site-directed mutagenesis of neutralizing mAb 4C4 and analysis of its interaction with G-H loop of VP1 to explore its therapeutic applications against FMD. J Biomol Struct Dyn 2018; 37:2641-2651. [PMID: 30051760 DOI: 10.1080/07391102.2018.1494631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Investigating the behaviour of bio-molecules through computational mutagenesis is gaining interest to facilitate the development of new therapeutic solutions for infectious diseases. The antigenetically variant genotypes of foot and mouth disease virus (FMDV) and their subsequent infections are challenging to tackle with traditional vaccination. In such scenario, neutralizing antibodies might provide an alternate solution to manage the FMDV infection. Thus, we have analysed the interaction of the mAb 4C4 with a synthetic G-H loop of FMDV-VP1 through in silico mutagenesis and molecular modelling. Initially, a set of 25,434 mutants were designed and the mutants having better energetic stability than 4C4 were clustered based on sequence identity. The best mutant representing each cluster was selected and evaluated for its binding affinity with the antigen in terms of docking scores, interaction energy and binding energy. Six mutants have confirmed better binding affinities towards the antigen than 4C4. Further, interaction of these mutants with the natural G-H loop that is bound to mAb SD6 was also evaluated. One 4C4 variant having mutations at the positions 2034(N→L), 2096(N→C), 2098(D→Y), 2532(T→K) and 2599(A→G) has revealed better binding affinities towards both the synthetic and natural G-H loops than 4C4 and SD6, respectively. A molecular dynamic simulation for 50 ns was conducted for mutant and wild-type antibody structures which supported the pre-simulation results. Therefore, these mutations on mAb 4C4 are believed to provide a better antibody-based therapeutic option for FMD. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- a Centre for Agricultural Bioinformatics , ICAR-Indian Agricultural Statistics Research Institute , New Delhi , Delhi , India
| | - Dibyabhaba Pradhan
- b Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , Delhi , India.,c ICMR-Computational Genomics Centre , Indian Council of Medical Research , New Delhi , Delhi , India
| | - Atmakuri Ramakrishna Rao
- a Centre for Agricultural Bioinformatics , ICAR-Indian Agricultural Statistics Research Institute , New Delhi , Delhi , India
| | - Lingaraj Jena
- d Bioinformatics Centre , Mahatma Gandhi Institute of Medical Sciences , Sevagram , Maharashtra , India
| |
Collapse
|
14
|
Saba Khan N, Verma R, Pradhan D, Nayek A, Bhuyan R, Kumar Sahu T, Kumar Jain A. Analysis of interleukin 23 and 7G10 interactions for computational design of lead antibodies against immune-mediated inflammatory diseases. J Recept Signal Transduct Res 2018; 38:327-334. [PMID: 30481093 DOI: 10.1080/10799893.2018.1511729] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Wealth of structural data on theurapeutic targets in complex with monoclonal antibodies (mAbs) and advances in molecular modeling algorithms present exciting opportunities in the field of novel biologic design. Interleukin 23 (IL23), a well-known drug target for autoimmune diseases, in complex with mAb 7G10 offers prospect to design potent lead antibodies by traversing the complete epitope-paratope interface. Herein, key interactions aiding antibody-based neutralization in IL23-7G10 complex are resolute through PyMOL, LigPlot+, Antibody i-Patch, DiscoTope and FoldX. Six amino acids Ser31, Val33, Asn55, Lys59 in heavy chain and His34, Ser93 in light chain are subjected to in silico mutagenesis with residues Met, Trp, Ile, Leu and Arg. A set of 431 mutant macromolecules are outlined. Binding affinities of these molecules with IL23 are estimated through protein-protein docking by employing ZDOCK, ClusPro and RosettaDock. Subsequently, the macromolecules revealed comparable result with 7G10 are cross validated through binding free-energy calculations by applying Molecular Mechanics/Poisson Boltzman Surface Area method in CHARMM. Thirty nine designed theoretical antibodies showed improved outcome in all evaluations; from these, top 10 molecules showed at least nine unit better binding affinity compared to the known mAb. These molecules have the potential to act as lead antibodies. Subsequent molecular dynamics simulations too favored prospective of best ranked molecule to have therapeutic implications in autoimmune and inflammatory diseases. Abbreviations: IL23: interleukin 23; IL17: interleukin17; Ab: antibody; Ag: antigen; mAbs: monoclonal antibodies; STAT3: signal transducer and activator of transcription 3; STAT4: signal transducer and activator of transcription 4; PDB: protein databank; MM/PBSA: molecular mechanics Poisson-Boltzmann surface area; Ag-Ab: antigen- antibody complex; SPC/E: extended simple point charge; SD: steepest descents; PME: particle mesh ewald; dG: binding free energies; Fv: variable fragment.
Collapse
Affiliation(s)
- Noor Saba Khan
- a Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , India
| | - Rashi Verma
- a Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , India
| | - Dibyabhaba Pradhan
- a Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , India.,b ICMR-AIIMS Computational Genomics Centre , Indian Council of Medical Research , New Delhi , India
| | - Arnab Nayek
- a Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , India
| | - Rajabrata Bhuyan
- c Bioinformatics Infrastructure Facility , University of Kalyani , West Bengal , India
| | - Tanmaya Kumar Sahu
- d Centre for Agricultural Bioinformatics , ICAR-ISARI , New Delhi , India
| | - Arun Kumar Jain
- a Biomedical Informatics Centre , ICMR-National Institute of Pathology , New Delhi , India
| |
Collapse
|
15
|
Meher PK, Sahu TK, Banchariya A, Rao AR. DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinformatics 2017; 18:190. [PMID: 28340571 PMCID: PMC5364559 DOI: 10.1186/s12859-017-1587-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/09/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. RESULTS Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. CONCLUSIONS This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
| | - Anjali Banchariya
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.,Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat, 250611, Uttar Pradesh, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| |
Collapse
|
16
|
Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep 2017; 7:42362. [PMID: 28205576 PMCID: PMC5304217 DOI: 10.1038/srep42362] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/09/2017] [Indexed: 11/13/2022] Open
Abstract
Antimicrobial peptides (AMPs) are important components of the innate immune system that have been found to be effective against disease causing pathogens. Identification of AMPs through wet-lab experiment is expensive. Therefore, development of efficient computational tool is essential to identify the best candidate AMP prior to the in vitro experimentation. In this study, we made an attempt to develop a support vector machine (SVM) based computational approach for prediction of AMPs with improved accuracy. Initially, compositional, physico-chemical and structural features of the peptides were generated that were subsequently used as input in SVM for prediction of AMPs. The proposed approach achieved higher accuracy than several existing approaches, while compared using benchmark dataset. Based on the proposed approach, an online prediction server iAMPpred has also been developed to help the scientific community in predicting AMPs, which is freely accessible at http://cabgrid.res.in:8080/amppred/. The proposed approach is believed to supplement the tools and techniques that have been developed in the past for prediction of AMPs.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| | - Varsha Saini
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India.,Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat-250611, Uttar Pradesh, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| |
Collapse
|
17
|
Meher PK, Sahu TK, Rao AR, Wahi SD. A computational approach for prediction of donor splice sites with improved accuracy. J Theor Biol 2016; 404:285-294. [PMID: 27302911 DOI: 10.1016/j.jtbi.2016.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 04/18/2016] [Accepted: 06/09/2016] [Indexed: 11/24/2022]
Abstract
Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - A R Rao
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - S D Wahi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| |
Collapse
|
18
|
Meher PK, Sahu TK, Rao AR. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene 2016; 592:316-24. [PMID: 27393648 DOI: 10.1016/j.gene.2016.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 07/02/2016] [Accepted: 07/04/2016] [Indexed: 11/17/2022]
Abstract
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - A R Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| |
Collapse
|
19
|
Meher PK, Sahu TK, Rao AR, Wahi SD. Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms Mol Biol 2016; 11:16. [PMID: 27252772 PMCID: PMC4888255 DOI: 10.1186/s13015-016-0078-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open
Abstract
Background Identification of splice sites is essential for annotation of genes. Though existing approaches have achieved an acceptable level of accuracy, still there is a need for further improvement. Besides, most of the approaches are species-specific and hence it is required to develop approaches compatible across species. Results Each splice site sequence was transformed into a numeric vector of length 49, out of which four were positional, four were dependency and 41 were compositional features. Using the transformed vectors as input, prediction was made through support vector machine. Using balanced training set, the proposed approach achieved area under ROC curve (AUC-ROC) of 96.05, 96.96, 96.95, 96.24 % and area under PR curve (AUC-PR) of 97.64, 97.89, 97.91, 97.90 %, while tested on human, cattle, fish and worm datasets respectively. On the other hand, AUC-ROC of 97.21, 97.45, 97.41, 98.06 % and AUC-PR of 93.24, 93.34, 93.38, 92.29 % were obtained, while imbalanced training datasets were used. The proposed approach was found comparable with state-of-art splice site prediction approaches, while compared using the bench mark NN269 dataset and other datasets. Conclusions The proposed approach achieved consistent accuracy across different species as well as found comparable with the existing approaches. Thus, we believe that the proposed approach can be used as a complementary method to the existing methods for the prediction of splice sites. A web server named as ‘HSplice’ has also been developed based on the proposed approach for easy prediction of 5′ splice sites by the users and is freely available at http://cabgrid.res.in:8080/HSplice.
Collapse
|
20
|
Meher PK, Sahu TK, Rao AR. Prediction of donor splice sites using random forest with a new sequence encoding approach. BioData Min 2016; 9:4. [PMID: 26807151 PMCID: PMC4724119 DOI: 10.1186/s13040-016-0086-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 01/19/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of splice sites plays a key role for predicting the gene structure and thus development of efficient analytical methods for splice site prediction is vital. This paper presents a novel sequence encoding approach based on the adjacent di-nucleotide dependencies in which the donor splice site motifs are encoded into numeric vectors. The encoded vectors are then used as input in Random Forest (RF), Support Vector Machines (SVM) and Artificial Neural Network (ANN), Bagging, Boosting, Logistic regression, kNN and Naïve Bayes classifiers for prediction of donor splice sites. RESULTS The performance of the proposed approach is evaluated on the donor splice site sequence data of Homo sapiens, collected from Homo Sapiens Splice Sites Dataset (HS3D). The results showed that RF outperformed all the considered classifiers. Besides, RF achieved higher prediction accuracy than the existing methods viz., MEM, MDD, WMM, MM1, NNSplice and SpliceView, while compared using an independent test dataset. CONCLUSION Based on the proposed approach, we have developed an online prediction server (MaLDoSS) to help the biological community in predicting the donor splice sites. The server is made freely available at http://cabgrid.res.in:8080/maldoss. Due to computational feasibility and high prediction accuracy, the proposed approach is believed to help in predicting the eukaryotic gene structure.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, Indian Agricultural Statistics Research Institute, New Delhi, 110 012 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110 012 India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110 012 India
| |
Collapse
|
21
|
Nayak MK, Nair HG, Bakshi AK, Sahani PK, Singh S, Khan S, Verma D, Dev V, Sahu TK, Khare M, Kumar V, Bandyopadhyay T, Tripathi RM, Sharma DN. Radiation safety aspects of the operation of first three synchrotron beam lines of Indus-2. Radiat Prot Dosimetry 2015; 164:187-193. [PMID: 25209995 DOI: 10.1093/rpd/ncu273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 08/03/2014] [Indexed: 06/03/2023]
Abstract
Five synchrotron radiation beam lines are commissioned and now under regular operation at the Synchrotron Radiation Source, Indus-2 at Raja Ramanna Centre For Advanced Technology (RRCAT), Indore, India. Nine beam lines are under trial operation, and six beam lines are in the installation stage. In the early phase of installation of beam lines on Indus-2, three bending magnet beam lines, Extended X-ray Absorption Fine Structure (EXAFS, BL-8), Energy Dispersive X-ray Diffraction (EDXRD, BL-11) and Angle Dispersive X-ray Diffraction (ADXRD, BL-12), were installed and commissioned, after approval from Atomic Energy Regulatory Board (AERB), India. These beam lines are pink (BL-8), white (BL-11) and monochromatic (BL-12), which are housed in specially designed shielded hutches. In order to ensure safety of users and other working personnel from ionizing radiations present in these beam lines, several safety systems are incorporated and safety procedures are followed. The paper describes the radiological safety aspects of the three beam lines during its initial commissioning trials and also the measurements on radiation levels carried out in and around the beam line hutches.
Collapse
Affiliation(s)
- M K Nayak
- Health Physics Division, BARC, Mumbai 400085, India
| | | | - A K Bakshi
- Radiological Physics and Advisory Division, BARC, Mumbai 400085, India
| | - P K Sahani
- Indus Operation and Accelerator Physics Design Division, RRCAT, Indore, MP 452013, India
| | - Sunil Singh
- Radiological Physics and Advisory Division, BARC, Mumbai 400085, India
| | - Saleem Khan
- Indus Operation and Accelerator Physics Design Division, RRCAT, Indore, MP 452013, India
| | - Dimple Verma
- Health Physics Division, BARC, Mumbai 400085, India
| | - Vipin Dev
- Indus Operation and Accelerator Physics Design Division, RRCAT, Indore, MP 452013, India
| | - T K Sahu
- Health Physics Division, BARC, Mumbai 400085, India
| | - Mukesh Khare
- Indus Operation and Accelerator Physics Design Division, RRCAT, Indore, MP 452013, India
| | - Vijay Kumar
- Indus Operation and Accelerator Physics Design Division, RRCAT, Indore, MP 452013, India
| | | | - R M Tripathi
- Health Physics Division, BARC, Mumbai 400085, India
| | - D N Sharma
- Health Safety and Environmental group, BARC, Mumbai 400085, India
| |
Collapse
|
22
|
Sahu TK, Rao AR, Meher PK, Sahoo BC, Gupta S, Rai A. Computational prediction of MHC class I epitopes for most common viral diseases in cattle (Bos taurus). Indian J Biochem Biophys 2015; 52:34-44. [PMID: 26040110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Viral diseases like foot-and-mouth disease (FMD), calf scour (CS), bovine viral diarrhea (BVD), infectious bovine rhinotracheitis (IBR) etc. affect the growth and milk production of cattle (Bos taurus) causing severe economic loss. Epitope-based vaccine designing have been evolved to provide a new strategy for therapeutic application of pathogen-specific immunity in animals. Therefore, identification of major histocompatibility complex (MHC) binding peptides as potential T-cell epitopes is widely applied in peptide vaccine designing and immunotherapy. In this study, MetaMHCI tool was used with seven different algorithms to predict the potential T-cell epitopes for FMD, BVD, IBR and CS in cattle. A total of 54 protein sequences were filtered out from a total set of 6351 sequences of the pathogens causing the said diseases using bioinformatics approaches. These selected protein sequences were used as the key inputs for MetaMHCI tool to predict the epitopes for the BoLA-All MHC class I allele of B. taurus. Further, the epitopes were ranked based on a proposed principal component analysis based epitope score (PbES). The best epitope for each disease based on its predictability through maximum number of predictors and low PbES was modeled in PEP-FOLD server and docked with the BoLA-A11 protein for understanding the MHC-epitope interaction. Finally, a total of 78 epitopes were predicted, out of which 27 were for FMD, 25 for BVD, 12 for CS and 14 for IBR. These epitopes could be artificially synthesized and recommended to vaccinate the cattle for the considered diseases. Besides, the methodology adapted here could also be used to predict and analyze the epitopes for other microbial diseases of important animal species.
Collapse
|
23
|
Khan S, Sahu TK, Kumar V, Haridas G. Bremsstrahlung photon dose measurement inside Indus-2 synchrotron radiation source ring area. Radiat Prot Environ 2015. [DOI: 10.4103/0972-0464.176162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
|
24
|
Meher PK, Sahu TK, Rao AR, Wahi SD. A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics 2014; 15:362. [PMID: 25420551 PMCID: PMC4702320 DOI: 10.1186/s12859-014-0362-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 10/24/2014] [Indexed: 11/17/2022] Open
Abstract
Background Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. Results An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. Conclusions The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| | - Sant Dass Wahi
- Division of Statistical Genetics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| |
Collapse
|
25
|
Rao AR, Dash M, Sahu TK, Behera BK, Mohapatra T. Detection of novel key residues of MnSOD enzyme and its role in salinity management across species. J Genet 2014; 93:e8-e16. [PMID: 24823309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Affiliation(s)
- A R Rao
- Indian Agricultural Statistics Research Institute, Library Avenue, Pusa, New Delhi 110 012, India.
| | | | | | | | | |
Collapse
|
26
|
Sahu TK, Rao AR, Vasisht S, Singh N, Singh UP. Computational approaches, databases and tools for in silico motif discovery. Interdiscip Sci 2012; 4:239-255. [PMID: 23354813 DOI: 10.1007/s12539-012-0141-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 04/12/2012] [Accepted: 06/13/2012] [Indexed: 06/01/2023]
Abstract
Motifs are the biologically significant fragments of nucleotide or peptide sequences in a specific pattern. Motifs are categorized as structural motifs and sequence motifs. These are discovered by phylogenetic studies of similar genes across species. Structural motifs are formed by three dimensional arrangements of amino acids consisting of two or more α helices or β strands whereas sequence motifs are formed by the nucleotide fragments appearing in the exons of a gene. The arrangement of residues in structural motifs may not be continuous while it is continuous in sequence motifs. Sequence motifs may encode to the structural motifs. The algorithms used for motif discovery are important part of the bio-computational studies. The purpose of motif discovery is to identify patterns in biopolymer (nucleotide or protein) sequences to understand the structure and function of the molecules and their evolutionary aspects. The main aim of this paper is to provide systematic compilation of a review on different approaches, databases and tools used in motif discovery.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | | | | | | |
Collapse
|
27
|
Singh N, Sahu TK, Rao AR, Mohapatra T. shRNAPred (version 1.0): An open source and standalone software for short hairpin RNA (shRNA) prediction. Bioinformation 2012; 8:629-33. [PMID: 22829744 PMCID: PMC3400981 DOI: 10.6026/97320630008629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Accepted: 06/08/2012] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The small hairpin RNAs (shRNA) are useful in many ways like identification of trait specific molecular markers, gene silencing and characterization of a species. In public domain, hardly there exists any standalone software for shRNA prediction. Hence, a software shRNAPred (1.0) is proposed here to offer a user-friendly Command-line User Interface (CUI) to predict 'shRNA-like' regions from a large set of nucleotide sequences. The software is developed using PERL Version 5.12.5 taking into account the parameters such as stem and loop length combinations, specific loop sequence, GC content, melting temperature, position specific nucleotides, low complexity filter, etc. Each of the parameters is assigned with a specific score and based on which the software ranks the predicted shRNAs. The high scored shRNAs obtained from the software are depicted as potential shRNAs and provided to the user in the form of a text file. The proposed software also allows the user to customize certain parameters while predicting specific shRNAs of his interest. The shRNAPred (1.0) is open access software available for academic users. It can be downloaded freely along with user manual, example dataset and output for easy understanding and implementation. AVAILABILITY The database is available for free at http://bioinformatics.iasri.res.in/EDA/downloads/shRNAPred_v1.0.exe.
Collapse
Affiliation(s)
- Nishtha Singh
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India, 110 012
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India, 110 012
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India, 110 012
- AR Rao: Phone: +91-9999422935; Fax: +91-11-25841564
| | | |
Collapse
|