1
|
RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes. Comput Struct Biotechnol J 2024; 23:1631-1640. [PMID: 38660008 PMCID: PMC11039349 DOI: 10.1016/j.csbj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.
Collapse
|
2
|
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins. Protein Sci 2024; 33:e5015. [PMID: 38747369 PMCID: PMC11094783 DOI: 10.1002/pro.5015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/18/2024] [Accepted: 04/21/2024] [Indexed: 05/19/2024]
Abstract
Prokaryotic DNA binding proteins (DBPs) play pivotal roles in governing gene regulation, DNA replication, and various cellular functions. Accurate computational models for predicting prokaryotic DBPs hold immense promise in accelerating the discovery of novel proteins, fostering a deeper understanding of prokaryotic biology, and facilitating the development of therapeutics targeting for potential disease interventions. However, existing generic prediction models often exhibit lower accuracy in predicting prokaryotic DBPs. To address this gap, we introduce ProkDBP, a novel machine learning-driven computational model for prediction of prokaryotic DBPs. For prediction, a total of nine shallow learning algorithms and five deep learning models were utilized, with the shallow learning models demonstrating higher performance metrics compared to their deep learning counterparts. The light gradient boosting machine (LGBM), coupled with evolutionarily significant features selected via random forest variable importance measure (RF-VIM) yielded the highest five-fold cross-validation accuracy. The model achieved the highest auROC (0.9534) and auPRC (0.9575) among the 14 machine learning models evaluated. Additionally, ProkDBP demonstrated substantial performance with an independent dataset, exhibiting higher values of auROC (0.9332) and auPRC (0.9371). Notably, when benchmarked against several cutting-edge existing models, ProkDBP showcased superior predictive accuracy. Furthermore, to promote accessibility and usability, ProkDBP (https://iasri-sg.icar.gov.in/prokdbp/) is available as an online prediction tool, enabling free access to interested users. This tool stands as a significant contribution, enhancing the repertoire of resources for accurate and efficient prediction of prokaryotic DBPs.
Collapse
|
3
|
ASPTF: A computational tool to predict abiotic stress-responsive transcription factors in plants by employing machine learning algorithms. Biochim Biophys Acta Gen Subj 2024; 1868:130597. [PMID: 38490467 DOI: 10.1016/j.bbagen.2024.130597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/10/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.
Collapse
|
4
|
ASRpro: A machine-learning computational model for identifying proteins associated with multiple abiotic stress in plants. THE PLANT GENOME 2024; 17:e20259. [PMID: 36098562 DOI: 10.1002/tpg2.20259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 08/10/2022] [Indexed: 06/15/2023]
Abstract
One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.
Collapse
|
5
|
Evaluation of eight Bayesian genomic prediction models for three micronutrient traits in bread wheat (Triticum aestivum L.). THE PLANT GENOME 2023; 16:e20332. [PMID: 37122189 DOI: 10.1002/tpg2.20332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/21/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
In wheat, genomic prediction accuracy (GPA) was assessed for three micronutrient traits (grain iron, grain zinc, and β-carotenoid concentrations) using eight Bayesian regression models. For this purpose, data on 246 accessions, each genotyped with 17,937 DArT markers, were utilized. The phenotypic data on traits were available for 2013-2014 from Powerkheda (Madhya Pradesh) and for 2014-2015 from Meerut (Uttar Pradesh), India. The accuracy of the models was measured in terms of reliability, which was computed following a repeated cross-validation approach. The predictions were obtained independently for each of the two environments after adjusting for the local effects and across environments after adjusting for the environmental effects. The Bayes ridge regression (BayesRR) model outperformed the other seven models, whereas BayesLASSO (BayesL) was the least efficient. The GPA increased with an increase in the size of the training set as well as with an increase in marker density. The GPA values differed for the three traits and were higher for the best linear unbiased estimate (BLUE) (obtained after adjusting for the environmental effects) relative to those for the two environments. The GPA also remained unaffected after accounting for the population structure. The results of the present study suggest that only the best model should be used for the estimations of genomic estimated breeding values (GEBVs) before their use for genomic selection to improve the grain micronutrient contents.
Collapse
|
6
|
ASLncR: a novel computational tool for prediction of abiotic stress-responsive long non-coding RNAs in plants. Funct Integr Genomics 2023; 23:113. [PMID: 37000299 DOI: 10.1007/s10142-023-01040-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/23/2023] [Accepted: 03/24/2023] [Indexed: 04/01/2023]
Abstract
Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.
Collapse
|
7
|
ASmiR: a machine learning framework for prediction of abiotic stress-specific miRNAs in plants. Funct Integr Genomics 2023; 23:92. [PMID: 36939943 DOI: 10.1007/s10142-023-01014-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 01/18/2023] [Accepted: 03/06/2023] [Indexed: 03/21/2023]
Abstract
Abiotic stresses have become a major challenge in recent years due to their pervasive nature and shocking impacts on plant growth, development, and quality. MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of specific abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational model for prediction of miRNAs associated with four specific abiotic stresses such as cold, drought, heat and salt. The pseudo K-tuple nucleotide compositional features of Kmer size 1 to 5 were used to represent miRNAs in numeric form. Feature selection strategy was employed to select important features. With the selected feature sets, support vector machine (SVM) achieved the highest cross-validation accuracy in all four abiotic stress conditions. The highest cross-validated prediction accuracies in terms of area under precision-recall curve were found to be 90.15, 90.09, 87.71, and 89.25% for cold, drought, heat and salt respectively. Overall prediction accuracies for the independent dataset were respectively observed 84.57, 80.62, 80.38 and 82.78%, for the abiotic stresses. The SVM was also seen to outperform different deep learning models for prediction of abiotic stress-responsive miRNAs. To implement our method with ease, an online prediction server "ASmiR" has been established at https://iasri-sg.icar.gov.in/asmir/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for identification of specific abiotic stress-responsive miRNAs in plants.
Collapse
|
8
|
PlDBPred: a novel computational model for discovery of DNA binding proteins in plants. Brief Bioinform 2023; 24:6840070. [PMID: 36416116 DOI: 10.1093/bib/bbac483] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/10/2022] [Accepted: 10/11/2022] [Indexed: 11/24/2022] Open
Abstract
DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.
Collapse
|
9
|
A comparative analysis of amino acid encoding schemes for the prediction of flexible length linear B-cell epitopes. Brief Bioinform 2022; 23:6673853. [PMID: 35998895 DOI: 10.1093/bib/bbac356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 07/06/2022] [Accepted: 07/30/2022] [Indexed: 11/12/2022] Open
Abstract
Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.
Collapse
|
10
|
Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results. Heredity (Edinb) 2022; 128:519-530. [PMID: 35508540 DOI: 10.1038/s41437-022-00539-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 04/19/2022] [Accepted: 04/19/2022] [Indexed: 11/09/2022] Open
Abstract
We evaluated the performances of three BLUP and five Bayesian methods for genomic prediction by using nine actual and 54 simulated datasets. The genomic prediction accuracy was measured using Pearson's correlation coefficient between the genomic estimated breeding value (GEBV) and the observed phenotypic data using a fivefold cross-validation approach with 100 replications. The Bayesian alphabets performed better for the traits governed by a few genes/QTLs with relatively larger effects. On the contrary, the BLUP alphabets (GBLUP and CBLUP) exhibited higher genomic prediction accuracy for the traits controlled by several small-effect QTLs. Additionally, Bayesian methods performed better for the highly heritable traits and, for other traits, performed at par with the BLUP methods. Further, genomic BLUP (GBLUP) was identified as the least biased method for the GEBV estimation. Among the Bayesian methods, the Bayesian ridge regression and Bayesian LASSO were less biased than other Bayesian alphabets. Nonetheless, genomic prediction accuracy increased with an increase in trait heritability, irrespective of the sample size, marker density, and the QTL type (major/minor effect). In sum, this study provides valuable information regarding the choice of the selection method for genomic prediction in different breeding programs.
Collapse
|
11
|
GWAS for main effects and epistatic interactions for grain morphology traits in wheat. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:651-668. [PMID: 35465203 PMCID: PMC8986918 DOI: 10.1007/s12298-022-01164-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 03/05/2022] [Accepted: 03/07/2022] [Indexed: 06/05/2023]
Abstract
In the present study in wheat, GWAS was conducted for identification of marker trait associations (MTAs) for the following six grain morphology traits: (1) grain cross-sectional area (GCSA), (2) grain perimeter (GP), (3) grain length (GL), (4) grain width (GWid), (5) grain length-width ratio (GLWR) and (6) grain form-density (GFD). The data were recorded on a subset of spring wheat reference set (SWRS) comprising 225 diverse genotypes, which were genotyped using 10,904 SNPs and phenotyped for two consecutive years (2017-2018, 2018-2019). GWAS was conducted using five different models including two single-locus models (CMLM, SUPER), one multi-locus model (FarmCPU), one multi-trait model (mvLMM) and a model for Q x Q epistatic interactions. False discovery rate (FDR) [P value -log10(p) ≥ 5] and Bonferroni correction [P value -log10(p) ≥ 6] (corrected p value < 0.05) were applied to eliminate false positives due to multiple testing. This exercise gave 88 main effect and 29 epistatic MTAs after FDR and 13 main effect and 6 epistatic MTAs after Bonferroni corrections. MTAs obtained after Bonferroni corrections were further utilized for identification of 55 candidate genes (CGs). In silico expression analysis of CGs in different tissues at different parts of the seed at different developmental stages was also carried out. MTAs and CGs identified during the present study are useful addition to available resources for MAS to supplement wheat breeding programmes after due validation and also for future strategic basic research. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-022-01164-w.
Collapse
|
12
|
GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:1-16. [PMID: 35221569 PMCID: PMC8847649 DOI: 10.1007/s12298-022-01130-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/31/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
UNLABELLED In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12298-022-01130-6.
Collapse
|
13
|
Improved recognition of splice sites in A. thaliana by incorporating secondary structure information into sequence-derived features: a computational study. 3 Biotech 2021; 11:484. [PMID: 34790508 DOI: 10.1007/s13205-021-03036-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/18/2021] [Indexed: 10/19/2022] Open
Abstract
Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-021-03036-8.
Collapse
|
14
|
Single-trait, multi-locus and multi-trait GWAS using four different models for yield traits in bread wheat. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:46. [PMID: 37309385 PMCID: PMC10236106 DOI: 10.1007/s11032-021-01240-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 06/30/2021] [Indexed: 06/14/2023]
Abstract
A genome-wide association study (GWAS) for 10 yield and yield component traits was conducted using an association panel comprising 225 diverse spring wheat genotypes. The panel was genotyped using 10,904 SNPs and evaluated for three years (2016-2019), which constituted three environments (E1, E2 and E3). Heritability for different traits ranged from 29.21 to 97.69%. Marker-trait associations (MTAs) were identified for each trait using data from each environment separately and also using BLUP values. Four different models were used, which included three single trait models (CMLM, FarmCPU, SUPER) and one multi-trait model (mvLMM). Hundreds of MTAs were obtained using each model, but after Bonferroni correction, only 6 MTAs for 3 traits were available using CMLM, and 21 MTAs for 4 traits were available using FarmCPU; none of the 525 MTAs obtained using SUPER could qualify after Bonferroni correction. Using BLUP, 20 MTAs were available, five of which also figured among MTAs identified for individual environments. Using mvLMM model, after Bonferroni correction, 38 multi-trait MTAs, for 15 different trait combinations were available. Epistatic interactions involving 28 pairs of MTAs were also available for seven of the 10 traits; no epistatic interactions were available for GNPS, PH, and BYPP. As many as 164 putative candidate genes (CGs) were identified using all the 50 MTAs (CMLM, 3; FarmCPU, 9; mvLMM, 6, epistasis, 21 and BLUP, 11 MTAs), which ranged from 20 (CMLM) to 66 (epistasis) CGs. In-silico expression analysis of CGs was also conducted in different tissues at different developmental stages. The information generated through the present study proved useful for developing a better understanding of the genetics of each of the 10 traits; the study also provided novel markers for marker-assisted selection (MAS) to be utilized for the development of wheat cultivars with improved agronomic traits. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-021-01240-1.
Collapse
|
15
|
mLoc-mRNA: predicting multiple sub-cellular localization of mRNAs using random forest algorithm coupled with feature selection via elastic net. BMC Bioinformatics 2021; 22:342. [PMID: 34167457 PMCID: PMC8223360 DOI: 10.1186/s12859-021-04264-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 06/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. RESULTS The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1-6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. CONCLUSIONS This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server "mLoc-mRNA" is accessible at http://cabgrid.res.in:8080/mlocmrna/ . The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.
Collapse
|
16
|
PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel. PLANT METHODS 2021; 17:46. [PMID: 33902670 PMCID: PMC8074503 DOI: 10.1186/s13007-021-00744-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 04/07/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.
Collapse
|
17
|
ToxA- Tsn1 Interaction for Spot Blotch Susceptibility in Indian Wheat: An Example of Inverse Gene-for-Gene Relationship. PLANT DISEASE 2020; 104:71-81. [PMID: 31697221 DOI: 10.1094/pdis-05-19-1066-re] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The ToxA-Tsn1 system is an example of an inverse gene-for-gene relationship. The gene ToxA encodes a host-selective toxin (HST) which functions as a necrotrophic effector and is often responsible for the virulence of the pathogen. The genomes of several fungal pathogens (e.g., Pyrenophora tritici-repentis, Parastagonospora nodorum, and Bipolaris sorokiniana) have been shown to carry the ToxA gene. Tsn1 is a sensitivity gene in the host, whose presence generally helps a ToxA-positive pathogen to cause spot blotch in wheat. Cultivars lacking Tsn1 are generally resistant to spot blotch; this resistance is attributed to a number of other known genes which impart resistance in the absence of Tsn1. In the present study, 110 isolates of B. sorokiniana strains, collected from the ME5A and ME4C megaenvironments of India, were screened for the presence of the ToxA gene; 77 (70%) were found to be ToxA positive. Similarly, 220 Indian wheat cultivars were screened for the presence of the Tsn1 gene; 81 (36.8%) were found to be Tsn1 positive. When 20 wheat cultivars (11 with Tsn1 and 9 with tsn1) were inoculated with ToxA-positive isolates, seedlings of only those carrying the Tsn1 allele (not tsn1) developed necrotic spots surrounded by a chlorotic halo. No such distinction between Tsn1 and tsn1 carriers was observed when adult plants were inoculated. This study suggests that the absence of Tsn1 facilitated resistance against spot blotch of wheat. Therefore, the selection of wheat genotypes for the absence of the Tsn1 allele can improve resistance to spot blotch.
Collapse
|
18
|
Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition. Gene 2019; 705:113-126. [PMID: 31009682 DOI: 10.1016/j.gene.2019.04.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 03/27/2019] [Accepted: 04/17/2019] [Indexed: 02/02/2023]
Abstract
Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.
Collapse
|
19
|
funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019; 20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Collapse
|
20
|
Genome wide association mapping of agro-morphological traits among a diverse collection of finger millet (Eleusine coracana L.) genotypes using SNP markers. PLoS One 2018; 13:e0199444. [PMID: 30092057 PMCID: PMC6084814 DOI: 10.1371/journal.pone.0199444] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 06/07/2018] [Indexed: 11/19/2022] Open
Abstract
Finger millet (Eleusine coracana L.) is an important dry-land cereal in Asia and Africa because of its ability to provide assured harvest under extreme dry conditions and excellent nutritional properties. However, the genetic improvement of the crop is lacking in the absence of suitable genomic resources for reliable genotype-phenotype associations. Keeping this in view, a diverse global finger millet germplasm collection of 113 accessions was evaluated for 14 agro-morphological characters in two environments viz. ICAR-Vivekananda Institute of Hill Agriculture, Almora (E1) and Crop Research Centre (CRC), GBPUA&T, Pantnagar (E2), India. Principal component analysis and cluster analysis of phenotypic data separated the Indian and exotic accessions into two separate groups. Previously generated SNPs through genotyping by sequencing (GBS) were used for association mapping to identify reliable marker(s) linked to grain yield and its component traits. The marker trait associations were determined using single locus single trait (SLST), multi-locus mixed model (MLMM) and multi-trait mixed model (MTMM) approaches. SLST led to the identification of 20 marker-trait associations (MTAs) (p value<0.01 and <0.001) for 5 traits. While advanced models, MLMM and MTMM resulted in additional 36 and 53 MTAs, respectively. Nine MTAs were common out of total 109 associations in all the three mapping approaches (SLST, MLMM and MTMM). Among these nine SNPs, five SNP sequences showed homology to candidate genes of Oryza sativa (Rice) and Setaria italica (Foxtail millet), which play an important role in flowering, maturity and grain yield. In addition, 67 and 14 epistatic interactions were identified for 10 and 7 traits at E1 and E2 locations, respectively. Hence, the 109 novel SNPs associated with important agro-morphological traits, reported for the first time in this study could be precisely utilized in finger millet genetic improvement after validation.
Collapse
|
21
|
Population structure and genetic diversity of hatchery stocks as revealed by combined mtDNA fragment sequences in Indian major carp, Catla catla. Mitochondrial DNA A DNA Mapp Seq Anal 2018; 30:289-295. [PMID: 29989460 DOI: 10.1080/24701394.2018.1484120] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Catla catla is the second most important Indian major carp due to high growth rate and acceptance to consumers for food value. It is widely cultured in the Indian subcontinent as monoculture or polyculture. In the present study, genetic diversity among hatchery stocks (total 218 samples of catla) collected from different geographical regions of India was examined using mtDNA fragment sequence of Cyt b (306 bp) and D loop (710 bp). High numbers (57) of population specific haplotypes were observed in the present study. The results revealed significant genetic heterogeneity for the sequence data (FST = 0.27546, p < .05). Analysis of molecular variance revealed significant genetic differentiation among different catla populations. The information generated in present study could be useful to develop broad genetic base populations of catla.
Collapse
|
22
|
A study on nitrogen fixation related proteins. CANADIAN JOURNAL OF BIOTECHNOLOGY 2017. [DOI: 10.24870/cjb.2017-a40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
23
|
DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinformatics 2017; 18:190. [PMID: 28340571 PMCID: PMC5364559 DOI: 10.1186/s12859-017-1587-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/09/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. RESULTS Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. CONCLUSIONS This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.
Collapse
|
24
|
Rapid recovery of complete mitogenome of Indian major carp, Catla catla from low depth paired end Illumina sequencing. Mitochondrial DNA B Resour 2017; 2:155-156. [PMID: 33473750 PMCID: PMC7799541 DOI: 10.1080/23802359.2017.1298413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Here we report the reconstruction of the catla (Catla catla) complete mitochondrial genome sequence from low depth paired end Illumina sequencing. The genome is of 16,597 bp in size. Similar to other vertebrate mtgenomes, it consists of 13 protein-coding genes, 22 tRNAs, 2 rRNAs and a putative control region. The present mtgenome is 3 bp longer than the earlier reported catla mtgenome from our laboratory. Majority of the mitochondrial genes are encoded by the H-strand. Phylogenetics analysis revealed that Catla catla is closer to Labeo rohita than other labeo species. Present study demonstrated the power of next generation sequencing towards hassle free and rapid sequencing of mitochondrial genomes of non-model organisms.
Collapse
|
25
|
Inferring Gene Regulatory Networks Using Kendall's Tau Correlation Coefficient and Identification of Salinity Stress Responsive Genes in Rice. CURR SCI INDIA 2017. [DOI: 10.18520/cs/v112/i06/1257-1262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep 2017; 7:42362. [PMID: 28205576 PMCID: PMC5304217 DOI: 10.1038/srep42362] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/09/2017] [Indexed: 11/13/2022] Open
Abstract
Antimicrobial peptides (AMPs) are important components of the innate immune system that have been found to be effective against disease causing pathogens. Identification of AMPs through wet-lab experiment is expensive. Therefore, development of efficient computational tool is essential to identify the best candidate AMP prior to the in vitro experimentation. In this study, we made an attempt to develop a support vector machine (SVM) based computational approach for prediction of AMPs with improved accuracy. Initially, compositional, physico-chemical and structural features of the peptides were generated that were subsequently used as input in SVM for prediction of AMPs. The proposed approach achieved higher accuracy than several existing approaches, while compared using benchmark dataset. Based on the proposed approach, an online prediction server iAMPpred has also been developed to help the scientific community in predicting AMPs, which is freely accessible at http://cabgrid.res.in:8080/amppred/. The proposed approach is believed to supplement the tools and techniques that have been developed in the past for prediction of AMPs.
Collapse
|
27
|
Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.). PLoS One 2017; 12:e0169605. [PMID: 28056073 PMCID: PMC5215982 DOI: 10.1371/journal.pone.0169605] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 12/19/2016] [Indexed: 11/30/2022] Open
Abstract
Selection of informative genes is an important problem in gene expression studies. The small sample size and the large number of genes in gene expression data make the selection process complex. Further, the selected informative genes may act as a vital input for gene co-expression network analysis. Moreover, the identification of hub genes and module interactions in gene co-expression networks is yet to be fully explored. This paper presents a statistically sound gene selection technique based on support vector machine algorithm for selecting informative genes from high dimensional gene expression data. Also, an attempt has been made to develop a statistical approach for identification of hub genes in the gene co-expression network. Besides, a differential hub gene analysis approach has also been developed to group the identified hub genes into various groups based on their gene connectivity in a case vs. control study. Based on this proposed approach, an R package, i.e., dhga (https://cran.r-project.org/web/packages/dhga) has been developed. The comparative performance of the proposed gene selection technique as well as hub gene identification approach was evaluated on three different crop microarray datasets. The proposed gene selection technique outperformed most of the existing techniques for selecting robust set of informative genes. Based on the proposed hub gene identification approach, a few number of hub genes were identified as compared to the existing approach, which is in accordance with the principle of scale free property of real networks. In this study, some key genes along with their Arabidopsis orthologs has been reported, which can be used for Aluminum toxic stress response engineering in soybean. The functional analysis of various selected key genes revealed the underlying molecular mechanisms of Aluminum toxic stress response in soybean.
Collapse
|
28
|
A computational approach for prediction of donor splice sites with improved accuracy. J Theor Biol 2016; 404:285-294. [PMID: 27302911 DOI: 10.1016/j.jtbi.2016.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 04/18/2016] [Accepted: 06/09/2016] [Indexed: 11/24/2022]
Abstract
Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach.
Collapse
|
29
|
Genome Wide Single Locus Single Trait, Multi-Locus and Multi-Trait Association Mapping for Some Important Agronomic Traits in Common Wheat (T. aestivum L.). PLoS One 2016; 11:e0159343. [PMID: 27441835 PMCID: PMC4956103 DOI: 10.1371/journal.pone.0159343] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 06/30/2016] [Indexed: 01/18/2023] Open
Abstract
Genome wide association study (GWAS) was conducted for 14 agronomic traits in wheat following widely used single locus single trait (SLST) approach, and two recent approaches viz. multi locus mixed model (MLMM), and multi-trait mixed model (MTMM). Association panel consisted of 230 diverse Indian bread wheat cultivars (released during 1910–2006 for commercial cultivation in different agro-climatic regions in India). Three years phenotypic data for 14 traits and genotyping data for 250 SSR markers (distributed across all the 21 wheat chromosomes) was utilized for GWAS. Using SLST, as many as 213 MTAs (p ≤ 0.05, 129 SSRs) were identified for 14 traits, however, only 10 MTAs (~9%; 10 out of 123 MTAs) qualified FDR criteria; these MTAs did not show any linkage drag. Interestingly, these genomic regions were coincident with the genomic regions that were already known to harbor QTLs for same or related agronomic traits. Using MLMM and MTMM, many more QTLs and markers were identified; 22 MTAs (19 QTLs, 21 markers) using MLMM, and 58 MTAs (29 QTLs, 40 markers) using MTMM were identified. In addition, 63 epistatic QTLs were also identified for 13 of the 14 traits, flag leaf length (FLL) being the only exception. Clearly, the power of association mapping improved due to MLMM and MTMM analyses. The epistatic interactions detected during the present study also provided better insight into genetic architecture of the 14 traits that were examined during the present study. Following eight wheat genotypes carried desirable alleles of QTLs for one or more traits, WH542, NI345, NI170, Sharbati Sonora, A90, HW1085, HYB11, and DWR39 (Pragati). These genotypes and the markers associated with important QTLs for major traits can be used in wheat improvement programs either using marker-assisted recurrent selection (MARS) or pseudo-backcrossing method.
Collapse
|
30
|
Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene 2016; 592:316-24. [PMID: 27393648 DOI: 10.1016/j.gene.2016.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 07/02/2016] [Accepted: 07/04/2016] [Indexed: 11/17/2022]
Abstract
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists.
Collapse
|
31
|
Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms Mol Biol 2016; 11:16. [PMID: 27252772 PMCID: PMC4888255 DOI: 10.1186/s13015-016-0078-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open
Abstract
Background Identification of splice sites is essential for annotation of genes. Though existing approaches have achieved an acceptable level of accuracy, still there is a need for further improvement. Besides, most of the approaches are species-specific and hence it is required to develop approaches compatible across species. Results Each splice site sequence was transformed into a numeric vector of length 49, out of which four were positional, four were dependency and 41 were compositional features. Using the transformed vectors as input, prediction was made through support vector machine. Using balanced training set, the proposed approach achieved area under ROC curve (AUC-ROC) of 96.05, 96.96, 96.95, 96.24 % and area under PR curve (AUC-PR) of 97.64, 97.89, 97.91, 97.90 %, while tested on human, cattle, fish and worm datasets respectively. On the other hand, AUC-ROC of 97.21, 97.45, 97.41, 98.06 % and AUC-PR of 93.24, 93.34, 93.38, 92.29 % were obtained, while imbalanced training datasets were used. The proposed approach was found comparable with state-of-art splice site prediction approaches, while compared using the bench mark NN269 dataset and other datasets. Conclusions The proposed approach achieved consistent accuracy across different species as well as found comparable with the existing approaches. Thus, we believe that the proposed approach can be used as a complementary method to the existing methods for the prediction of splice sites. A web server named as ‘HSplice’ has also been developed based on the proposed approach for easy prediction of 5′ splice sites by the users and is freely available at http://cabgrid.res.in:8080/HSplice.
Collapse
|
32
|
Prediction of donor splice sites using random forest with a new sequence encoding approach. BioData Min 2016; 9:4. [PMID: 26807151 PMCID: PMC4724119 DOI: 10.1186/s13040-016-0086-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 01/19/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of splice sites plays a key role for predicting the gene structure and thus development of efficient analytical methods for splice site prediction is vital. This paper presents a novel sequence encoding approach based on the adjacent di-nucleotide dependencies in which the donor splice site motifs are encoded into numeric vectors. The encoded vectors are then used as input in Random Forest (RF), Support Vector Machines (SVM) and Artificial Neural Network (ANN), Bagging, Boosting, Logistic regression, kNN and Naïve Bayes classifiers for prediction of donor splice sites. RESULTS The performance of the proposed approach is evaluated on the donor splice site sequence data of Homo sapiens, collected from Homo Sapiens Splice Sites Dataset (HS3D). The results showed that RF outperformed all the considered classifiers. Besides, RF achieved higher prediction accuracy than the existing methods viz., MEM, MDD, WMM, MM1, NNSplice and SpliceView, while compared using an independent test dataset. CONCLUSION Based on the proposed approach, we have developed an online prediction server (MaLDoSS) to help the biological community in predicting the donor splice sites. The server is made freely available at http://cabgrid.res.in:8080/maldoss. Due to computational feasibility and high prediction accuracy, the proposed approach is believed to help in predicting the eukaryotic gene structure.
Collapse
|
33
|
Low-depth shotgun sequencing resolves complete mitochondrial genome sequence of Labeo rohita. Mitochondrial DNA A DNA Mapp Seq Anal 2015; 27:3517-8. [PMID: 26260184 DOI: 10.3109/19401736.2015.1074197] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Labeo rohita, popularly known as rohu, is a widely cultured species in whole Indian subcontinent. In the present study, we used in-silico approach to resolve complete mitochondrial genome of rohu. Low-depth shotgun sequencing using Roche 454 GS FLX (Branford, Connecticut, USA) followed by de novo assembly in CLC Genomics Workbench version 7.0.4 (Aarhus, Denmark) revealed the complete mitogenome of L. rohita to be 16 606 bp long (accession No. KR185963). It comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNAs and 1 putative control region. The gene order and organization are similar to most vertebrates. The mitogenome in the present investigation has 99% similarity with that of previously reported mitogenomes of rohu and this is also evident from the phylogenetic study using maximum-likelihood (ML) tree method. This study was done to determine the feasibility, accuracy and reliability of low-depth sequence data obtained from NGS platform as compared to the Sanger sequencing. Thus, NGS technology has proven to be competent and a rapid in-silico alternative to resolve the complete mitochondrial genome sequence, thereby reducing labors and time.
Collapse
|
34
|
Computational prediction of MHC class I epitopes for most common viral diseases in cattle (Bos taurus). INDIAN JOURNAL OF BIOCHEMISTRY & BIOPHYSICS 2015; 52:34-44. [PMID: 26040110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Viral diseases like foot-and-mouth disease (FMD), calf scour (CS), bovine viral diarrhea (BVD), infectious bovine rhinotracheitis (IBR) etc. affect the growth and milk production of cattle (Bos taurus) causing severe economic loss. Epitope-based vaccine designing have been evolved to provide a new strategy for therapeutic application of pathogen-specific immunity in animals. Therefore, identification of major histocompatibility complex (MHC) binding peptides as potential T-cell epitopes is widely applied in peptide vaccine designing and immunotherapy. In this study, MetaMHCI tool was used with seven different algorithms to predict the potential T-cell epitopes for FMD, BVD, IBR and CS in cattle. A total of 54 protein sequences were filtered out from a total set of 6351 sequences of the pathogens causing the said diseases using bioinformatics approaches. These selected protein sequences were used as the key inputs for MetaMHCI tool to predict the epitopes for the BoLA-All MHC class I allele of B. taurus. Further, the epitopes were ranked based on a proposed principal component analysis based epitope score (PbES). The best epitope for each disease based on its predictability through maximum number of predictors and low PbES was modeled in PEP-FOLD server and docked with the BoLA-A11 protein for understanding the MHC-epitope interaction. Finally, a total of 78 epitopes were predicted, out of which 27 were for FMD, 25 for BVD, 12 for CS and 14 for IBR. These epitopes could be artificially synthesized and recommended to vaccinate the cattle for the considered diseases. Besides, the methodology adapted here could also be used to predict and analyze the epitopes for other microbial diseases of important animal species.
Collapse
|
35
|
A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics 2014; 15:362. [PMID: 25420551 PMCID: PMC4702320 DOI: 10.1186/s12859-014-0362-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 10/24/2014] [Indexed: 11/17/2022] Open
Abstract
Background Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. Results An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. Conclusions The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users.
Collapse
|
36
|
New encoded single-indicator sequences based on physico-chemical parameters for efficient exon identification. INTERNATIONAL JOURNAL OF BIOINFORMATICS RESEARCH AND APPLICATIONS 2012; 8:126-40. [PMID: 22450275 DOI: 10.1504/ijbra.2012.045955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.
Collapse
|
37
|
Derivation and characterization of embryonic stem-like cells of Indian major carp Catla catla. JOURNAL OF FISH BIOLOGY 2010; 77:1096-1113. [PMID: 21039493 DOI: 10.1111/j.1095-8649.2010.02755.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Embryonic stem (ES)-like cells were derived from mid-blastula stage embryos of a freshwater fish, catla Catla catla, under feeder-free condition and designated as CCES cells. The conditioned media was optimized with 10% foetal bovine serum (FBS), fish embryo extract (FEE) having 100 µg ml(-1) protein concentration, 15 ng ml(-1) basic fibroblast growth factor (bFGF) and basic media containing Leibovitz-15, DMEM with 4·5 g l(-1) glucose and Ham's F12 (LDF) in 2:1:1 ratio using a primary culture of CCES cells. Cells attached to gelatin-coated plates after 24 h of seeding and ES-like colonies were obtained at day 5 onwards. A stable cell culture was obtained after passage 10 and further maintained up to passage 44. These cells were characterized by their typical morphology, high alkaline phosphatase activity, positive expression of cell-surface antigen SSEA-1, transcription factor Oct4, germ cell marker vasa and consistent karyotype up to extended periods. The undifferentiated state was confirmed by their ability to form embryoid bodies and their differentiation potential.
Collapse
|
38
|
|
39
|
Effect of endotoxin on the immunity of Indian major carp, Labeo rohita. FISH & SHELLFISH IMMUNOLOGY 2008; 24:394-399. [PMID: 18289877 DOI: 10.1016/j.fsi.2007.09.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2007] [Revised: 09/04/2007] [Accepted: 09/21/2007] [Indexed: 05/25/2023]
Abstract
Endotoxin, a lipopolysaccharide component of outer cell wall membrane of the Gram-negative bacteria is a factor responsible for a number of biological effects including immunostimulatory activities in different animal species including fish. In this study, L. rohita yearlings of weight ranging from 80 to 100g were injected intraperitoneally with 0.5, 1, 2, 5, 10 and 20 EU/fish dose of endotoxin to find out its effect on the immunity. The L. rohita yearlings were found to resist the endotoxin dose up to 20 EU/fish and at the lower doses, i.e., at 1 and 2 EU/fish; it acted as an immune potentiator. Different serum and immune parameters like protein, globulin, lysozyme, respiratory burst activity, myeloperoxidase activity, natural agglutination titre were found to be significantly high (p<0.01) at a dose of 1 EU/fish. While at 10 and 20 EU/fish, most of these parameters were lower thereby indicating the immuno-suppressive nature of the endotoxin at these higher doses.
Collapse
|
40
|
Non-specific immune parameters of brood Indian major carp Labeo rohita and their seasonal variations. FISH & SHELLFISH IMMUNOLOGY 2007; 22:38-43. [PMID: 16679030 DOI: 10.1016/j.fsi.2006.03.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Revised: 02/17/2006] [Accepted: 03/17/2006] [Indexed: 05/09/2023]
Abstract
Different non-specific immune parameters and their seasonal changes in brood Indian major carp Labeo rohita reared in two major freshwater aquaculture regions of India viz. West Bengal and Orissa were investigated. It was undertaken for 2 consecutive years and included three main seasons of a year such as summer (March-May), rainy (July-September) and winter (November-January). Total serum protein, albumin and globulin levels were not significantly different throughout the year (p>0.01). Serum lysozyme and myeloperoxidase activities were lower (7.26+/-0.87mg/ml and, 0.54+/-0.11 OD, respectively) in winter as compared to any other season of the year. The bacterial agglutination titer was higher (p<0.01) in the rainy season (8.70+/-1.70) compared to summer and winter seasons (3.40+/-0.60 and 4.00+/-0.89, respectively). Haemagglutination and haemolytic activities did not vary (p>0.01) throughout the year. In blood smears, lymphocyte percentage was higher (75-80%) as compared to those of neutrophil (10-15%) and monocytes (5-10%) but eosinophilic granulocytes were present only in few cases. The differential leucocyte count did not vary significantly (p>0.05) in any season. This study indicated that certain non-specific immune parameters of this species can be modulated at certain times of the year.
Collapse
|
41
|
The immunomodulatory effects of tuftsin on the non-specific immune system of Indian Major carp, Labeo rohita. FISH & SHELLFISH IMMUNOLOGY 2006; 20:728-38. [PMID: 16293422 DOI: 10.1016/j.fsi.2005.09.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 08/15/2005] [Accepted: 09/05/2005] [Indexed: 05/05/2023]
Abstract
The purpose of this study was to determine if injections of different dosages of tuftsin would enhance the immune response and disease resistance against the infections due to the opportunistic pathogens Aeromonas hydrophila and Edwardsiella tarda in Labeo rohita fingerlings. Hence, four different dosages of tuftsin in PBS suspension at the rate of 0, 5, 10, 15 mg kg(-1) body weight of fish were injected intraperitoneally to the fingerlings of L. rohita at 2-week intervals for four times. After every 2-week interval, different serum biochemical, haematological and immunological parameters of fish were evaluated. Biochemical and haematological parameters including serum total protein content, albumin content, globulin content, albulin:globulin ratio, glucose content, leucocyte counts etc.; cellular immune parameters including superoxide anion production, phagocytic activities, lymphokine production index etc.; humoral immune parameters including lysozyme activity, complement activity, serum bactericidal activity etc., in the fish were evaluated after every 2-week interval. After 56 days, fish were divided into two subgroups under each major treatment group for challenge with two pathogens A. hydrophila and E. tarda. The mortality (%) and agglutinating antibody titre was recorded on 28th day post challenge. Most of the immune parameters including leucocyte count, phagocytic ratio, phagocytic index, lysozyme activity, complement activity, and serum bactericidal activity were significantly (p<or=0.05) maximum on 42 days after three i.p. injections of 10 mg kg(-1) body weight of tuftsin. Challenge study indicated least mortality in the group of fish injected with 10 mg kg(-1) body weight of tuftsin for four times. Multiple injections of tuftsin might have maintained the activation of phagocytic cells for a long period, which in turn led to long-term protection in the fish. Thus, multiple injections of 10 mg kg(-1) body weight of tuftsin for three times can be advocated for enhancing the immune response of fish species under aquaculture.
Collapse
|
42
|
Passive transfer of maternal antibodies and their existence in eggs, larvae and fry of Indian major carp, Labeo rohita (Ham.). FISH & SHELLFISH IMMUNOLOGY 2006; 20:519-27. [PMID: 16157486 DOI: 10.1016/j.fsi.2005.06.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Revised: 05/25/2005] [Accepted: 06/28/2005] [Indexed: 05/04/2023]
Abstract
Lack of immune competence in the early stages of life leads to severe mortality in larval stages of different fish species including Indian major carp (IMC). Investigation through indirect enzyme linked immunosorbent assay (ELISA) and agglutination test revealed a significant increase in specific serum antibody response in the brood fish of Indian major carp, Labeo rohita (Ham.) following immunisation with a virulent Aeromonas hydrophila bacterin 1 month prior to breeding, which was transferred to larvae through the egg. No significant differences (P > 0.05) in mean antibody levels in larvae at the 1st and 2nd weeks post-hatch was recorded while a slight rise in antibody level was observed in 3-week-old fry, perhaps due to exposure to A. hydrophila present in the aquatic environment. Immunised brood fish serum, egg and larval extracts in non-reducing sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and subsequent western blot analysis revealed an antibody molecule of approximate molecular weight 210 kDa. On challenge with virulent A. hydrophila, a significant reduction in mortality was recorded in immunised larvae and fry (58.0, 43.75 and 37.14% in the 1st, 2nd and 3rd week, respectively) relative to control fish (87.0, 79.0 and 76.4% in 1st, 2nd and 3rd week, respectively). The present study indicated the role of maternally derived antibody in protection of hatchlings of Indian major carp against specific pathogens.
Collapse
|
43
|
High antigenic cross-reaction among the bacterial species responsible for diseases of cultured freshwater fishes and strategies to overcome it for specific serodiagnosis. Comp Immunol Microbiol Infect Dis 2003; 26:199-211. [PMID: 12581749 DOI: 10.1016/s0147-9571(02)00059-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Antigenic sharing among the most commonly bacterial pathogens such as Aeromonas hydrophila, Edwardsiella tarda and Pseudomonas fluorescens of Indian major carps has been studied using immunological reactions such as cross-agglutination, disc diffusion and indirect enzyme linked immunosorbent assay (ELISA). The data were analysed using statistical analysis (SAS), version 6.12. The results showed high antigenic similarities among the bacterial whole cells, whole cell lysates, somatic 'O' antigens, lipopolysaccharides (LPS) and extracellular products (ECP). However, few or no similarities were observed in ECP components of <20kD. The present study indicates a need to develop differential diagnostic methods based on serology by choosing the highly specific less cross-reactive ECP antigen.
Collapse
|
44
|
Bath immunisation of spawn, fry and fingerlings of Indian major carps using a particulate bacterial antigen. FISH & SHELLFISH IMMUNOLOGY 2002; 13:133-140. [PMID: 12400863 DOI: 10.1006/fsim.2001.0388] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Larval mortality in Indian major carps is one of the major problems encountered in the pond culture system. The present investigation was carried out to investigate the proper age, duration of exposure, and optimum bacterin concentration for vaccinating rohu (Labeo rohita) and catla (Catla catla) at their early stages with a formalin killed Edwardsiella tarda bacterin suspension. The development of immunological competence was recorded with spawn of rohu and catla of 3 weeks of age exposed to a bacterin at a concentration 10(9) cfu ml(-1) for 15 min, where it persisted up to 4 weeks post vaccination. They showed significant resistance against challenge with virulent E. tarda bacteria. Significant antibody titre could be recorded in advanced fries and fingerlings exposed to 10(9) cfu/ml(-1) bacterin concentration for 45 and 60 min, respectively.
Collapse
|