1
|
Xie Y, Yang J, Ouyang JF, Petretto E. scPanel: a tool for automatic identification of sparse gene panels for generalizable patient classification using scRNA-seq datasets. Brief Bioinform 2024; 25:bbae482. [PMID: 39350339 PMCID: PMC11442147 DOI: 10.1093/bib/bbae482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 08/30/2024] [Accepted: 09/12/2024] [Indexed: 10/04/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies can generate transcriptomic profiles at a single-cell resolution in large patient cohorts, facilitating discovery of gene and cellular biomarkers for disease. Yet, when the number of biomarker genes is large, the translation to clinical applications is challenging due to prohibitive sequencing costs. Here, we introduce scPanel, a computational framework designed to bridge the gap between biomarker discovery and clinical application by identifying a sparse gene panel for patient classification from the cell population(s) most responsive to perturbations (e.g. diseases/drugs). scPanel incorporates a data-driven way to automatically determine a minimal number of informative biomarker genes. Patient-level classification is achieved by aggregating the prediction probabilities of cells associated with a patient using the area under the curve score. Application of scPanel to scleroderma, colorectal cancer, and COVID-19 datasets resulted in high patient classification accuracy using only a small number of genes (<20), automatically selected from the entire transcriptome. In the COVID-19 case study, we demonstrated cross-dataset generalizability in predicting disease state in an external patient cohort. scPanel outperforms other state-of-the-art gene selection methods for patient classification and can be used to identify parsimonious sets of reliable biomarker candidates for clinical translation.
Collapse
Affiliation(s)
- Yi Xie
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Jianfei Yang
- The School of Mechanical and Aerospace Engineering and the School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798, Singapore
| | - John F Ouyang
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Enrico Petretto
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| |
Collapse
|
2
|
Karimi-Fard A, Saidi A, TohidFar M, Emami SN. Novel candidate genes for environmental stresses response in Synechocystis sp. PCC 6803 revealed by machine learning algorithms. Braz J Microbiol 2024; 55:1219-1229. [PMID: 38705959 PMCID: PMC11153407 DOI: 10.1007/s42770-024-01338-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024] Open
Abstract
Cyanobacteria have developed acclimation strategies to adapt to harsh environments, making them a model organism. Understanding the molecular mechanisms of tolerance to abiotic stresses can help elucidate how cells change their gene expression patterns in response to stress. Recent advances in sequencing techniques and bioinformatics analysis methods have led to the discovery of many genes involved in stress response in organisms. The Synechocystis sp. PCC 6803 is a suitable microorganism for studying transcriptome response under environmental stress. Therefore, for the first time, we employed two effective feature selection techniques namely and support vector machine recursive feature elimination (SVM-RFE) and LASSO (Least Absolute Shrinkage Selector Operator) to pinpoint the crucial genes responsive to environmental stresses in Synechocystis sp. PCC 6803. We applied these algorithms of machine learning to analyze the transcriptomic data of Synechocystis sp. PCC 6803 under distinct conditions, encompassing light, salt and iron stress conditions. Seven candidate genes namely sll1862, slr0650, sll0760, slr0091, ssl3044, slr1285, and slr1687 were selected by both LASSO and SVM-RFE algorithms. RNA-seq analysis was performed to validate the efficiency of our feature selection approach in selecting the most important genes. The RNA-seq analysis revealed significantly high expression for five genes namely sll1862, slr1687, ssl3044, slr1285, and slr0650 under ion stress condition. Among these five genes, ssl3044 and slr0650 could be introduced as new potential candidate genes for further confirmatory genetic studies, to determine their roles in their response to abiotic stresses.
Collapse
Affiliation(s)
- Abbas Karimi-Fard
- Department of Cell and Molecular Biology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Abbas Saidi
- Department of Cell and Molecular Biology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | - Masoud TohidFar
- Department of Cell and Molecular Biology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | - Seyedeh Noushin Emami
- Department of Molecular Biosciences, Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| |
Collapse
|
3
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
4
|
Tahmasebi A, Niazi A, Akrami S. Integration of meta-analysis, machine learning and systems biology approach for investigating the transcriptomic response to drought stress in Populus species. Sci Rep 2023; 13:847. [PMID: 36646724 PMCID: PMC9842770 DOI: 10.1038/s41598-023-27746-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 01/06/2023] [Indexed: 01/18/2023] Open
Abstract
In Populus, drought is a major problem affecting plant growth and development which can be closely reflected by corresponding transcriptomic changes. Nevertheless, how these changes in Populus are not fully understood. Here, we first used meta-analysis and machine learning methods to identify water stress-responsive genes and then performed a systematic approach to discover important gene networks. Our analysis revealed that large transcriptional variations occur during drought stress. These changes were more associated with the response to stress, cellular catabolic process, metabolic pathways, and hormone-related genes. The differential gene coexpression analysis highlighted two acetyltransferase NATA1-like and putative cytochrome P450 genes that have a special contribution in response to drought stress. In particular, the findings showed that MYBs and MAPKs have a prominent role in the drought stress response that could be considered to improve the drought tolerance of Populus. We also suggest ARF2-like and PYL4-like genes as potential markers for use in breeding programs. This study provides a better understanding of how Populus responses to drought that could be useful for improving tolerance to stress in Populus.
Collapse
Affiliation(s)
- Ahmad Tahmasebi
- Institute of Biotechnology, Shiraz University, Shiraz, 7144165186, Iran.
| | - Ali Niazi
- Institute of Biotechnology, Shiraz University, Shiraz, 7144165186, Iran.
| | - Sahar Akrami
- Institute of Biotechnology, Shiraz University, Shiraz, 7144165186, Iran
| |
Collapse
|
5
|
Favreau E, Geist KS, Wyatt CDR, Toth AL, Sumner S, Rehan SM. Co-expression Gene Networks and Machine-learning Algorithms Unveil a Core Genetic Toolkit for Reproductive Division of Labour in Rudimentary Insect Societies. Genome Biol Evol 2023; 15:evac174. [PMID: 36527688 PMCID: PMC9830183 DOI: 10.1093/gbe/evac174] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 12/06/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022] Open
Abstract
The evolution of eusociality requires that individuals forgo some or all their own reproduction to assist the reproduction of others in their group, such as a primary egg-laying queen. A major open question is how genes and genetic pathways sculpt the evolution of eusociality, especially in rudimentary forms of sociality-those with smaller cooperative nests when compared with species such as honeybees that possess large societies. We lack comprehensive comparative studies examining shared patterns and processes across multiple social lineages. Here we examine the mechanisms of molecular convergence across two lineages of bees and wasps exhibiting such rudimentary societies. These societies consist of few individuals and their life histories range from facultative to obligately social. Using six species across four independent origins of sociality, we conduct a comparative meta-analysis of publicly available transcriptomes. Standard methods detected little similarity in patterns of differential gene expression in brain transcriptomes among reproductive and non-reproductive individuals across species. By contrast, both supervised machine learning and consensus co-expression network approaches uncovered sets of genes with conserved expression patterns among reproductive and non-reproductive phenotypes across species. These sets overlap substantially, and may comprise a shared genetic "toolkit" for sociality across the distantly related taxa of bees and wasps and independently evolved lineages of sociality. We also found many lineage-specific genes and co-expression modules associated with social phenotypes and possible signatures of shared life-history traits. These results reveal how taxon-specific molecular mechanisms complement a core toolkit of molecular processes in sculpting traits related to the evolution of eusociality.
Collapse
Affiliation(s)
- Emeline Favreau
- Department of Genetics, Environment, Evolution, University College London, London WC1E 6BT, United Kingdom
| | - Katherine S Geist
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011
| | - Christopher D R Wyatt
- Department of Genetics, Environment, Evolution, University College London, London WC1E 6BT, United Kingdom
| | - Amy L Toth
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011
| | - Seirian Sumner
- Department of Genetics, Environment, Evolution, University College London, London WC1E 6BT, United Kingdom
| | - Sandra M Rehan
- Department of Biology, York University, Toronto, ON M3J 1P3, Canada
| |
Collapse
|
6
|
Chataika BY, Akundabweni LSM, Sibiya J, Achigan-Dako EG, Sogbohossou DEO, Kwapata K, Awala S. Major Production Constraints and Spider Plant [Gynandropsis gynandra (L.) Briq.] Traits Preferences Amongst Smallholder Farmers of Northern Namibia and Central Malawi. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2022. [DOI: 10.3389/fsufs.2022.831821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Spider plant (Gynandropsis gynandra (L.) Briq.) is among the most important African Leafy Vegetables (ALVs) as a source of essential nutrients with the potential of contributing significantly to household food and nutritional security and mitigation of hidden hunger. Nevertheless, the vegetable is considered an orphan crop and its production is challenged by inadequate research to identify and improve traits preferred by smallholder farmers. The research was conducted to identify the main challenges impacting the production of spider plants and identify traits preferred by smallholder farmers in northern Namibia and central Malawi for use in demand-led crop improvement. Semi-structured interviews involving a random selection of 197 farming households from five regions of northern Namibia and three districts of central Malawi were conducted. In addition, six key informant interviews and four focus group discussions were conducted to triangulate the findings. Data were analyzed using IBM SPSS version 20. Fischer's exact test was used to test for independence in the ranking of production constraints and agronomic traits, while Kendall's Coefficient of Concordance (W) was used to measure agreement levels in the ranking across the countries. Farmers indicated lack of seed, poor soil fertility, poor seed germination and drought as the main production challenges across the two countries. Production constraints were ranked differently (p < 0.001) across the study sites suggesting the influence of biophysical and socio-economic factors associated with production. High yield and drought tolerance were considered the most important agronomic traits among the smallholder farmers in both countries. The findings of this study are useful for designing demand-driven pre-breeding trials that prioritize the needs of the end-users. Demand-led breeding has the potential to stimulate the production and utilization of spider plant, hence contributing to household food and nutritional security.
Collapse
|
7
|
Gene Correlation Guided Gene Selection for Microarray Data Classification. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6490118. [PMID: 34435048 PMCID: PMC8382518 DOI: 10.1155/2021/6490118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 08/09/2021] [Indexed: 12/14/2022]
Abstract
The microarray cancer data obtained by DNA microarray technology play an important role for cancer prevention, diagnosis, and treatment. However, predicting the different types of tumors is a challenging task since the sample size in microarray data is often small but the dimensionality is very high. Gene selection, which is an effective means, is aimed at mitigating the curse of dimensionality problem and can boost the classification accuracy of microarray data. However, many of previous gene selection methods focus on model design, but neglect the correlation between different genes. In this paper, we introduce a novel unsupervised gene selection method by taking the gene correlation into consideration, named gene correlation guided gene selection (G3CS). Specifically, we calculate the covariance of different gene dimension pairs and embed it into our unsupervised gene selection model to regularize the gene selection coefficient matrix. In such a manner, redundant genes can be effectively excluded. In addition, we utilize a matrix factorization term to exploit the cluster structure of original microarray data to assist the learning process. We design an iterative updating algorithm with convergence guarantee to solve the resultant optimization problem. Experimental results on six publicly available microarray datasets are conducted to validate the efficacy of our proposed method.
Collapse
|
8
|
Modern Approaches for Transcriptome Analyses in Plants. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:11-50. [DOI: 10.1007/978-3-030-80352-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Das S, Rai SN. Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1205. [PMID: 33286973 PMCID: PMC7712650 DOI: 10.3390/e22111205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/16/2022]
Abstract
Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
- Netaji Subhas-Indian Council of Agricultural Research (ICAR) International Fellow, Indian Council of Agricultural Research, Krishi Bhawan, New Delhi 110001, India
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40292, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
| | - Shesh N. Rai
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40292, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Alcohol Research Center, University of Louisville, Louisville, KY 40292, USA
- Department of Hepatobiology and Toxicology, University of Louisville, Louisville, KY 40292, USA
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA
- Wendell Cherry Chair in Clinical Trial Research, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
10
|
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. ENTROPY 2020; 22:e22040427. [PMID: 33286201 PMCID: PMC7516904 DOI: 10.3390/e22040427] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/18/2020] [Accepted: 04/03/2020] [Indexed: 12/22/2022]
Abstract
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
Collapse
|
11
|
Sun S, Wang C, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics 2019; 19:40-48. [DOI: 10.1093/bfgp/elz036] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 01/16/2023] Open
Abstract
Abstract
The advent of high-throughput genomic technologies has resulted in the accumulation of massive amounts of genomic information. However, biologists are challenged with how to effectively analyze these data. Machine learning can provide tools for better and more efficient data analysis. Unfortunately, because many plant biologists are unfamiliar with machine learning, its application in plant molecular studies has been restricted to a few species and a limited set of algorithms. Thus, in this study, we provide the basic steps for developing machine learning frameworks and present a comprehensive overview of machine learning algorithms and various evaluation metrics. Furthermore, we introduce sources of important curated plant genomic data and R packages to enable plant biologists to easily and quickly apply appropriate machine learning algorithms in their research. Finally, we discuss current applications of machine learning algorithms for identifying various genes related to resistance to biotic and abiotic stress. Broad application of machine learning and the accumulation of plant sequencing data will advance plant molecular studies.
Collapse
Affiliation(s)
- Shanwen Sun
- University of Bayreuth in Germany. He is now a postdoctoral fellow at the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Chunyu Wang
- Harbin Institute of Technology in China. He is an associate professor in the School of Computer Science and Technology, Harbin Institute of Technology
| | - Hui Ding
- Inner Mongolia University in China. She is an associate professor in the Center for Informational Biology, University of Electronic Science and Technology of China
| | - Quan Zou
- Harbin Institute of Technology in China. He is a professor in the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
12
|
Rahaman MM, Ahsan MA, Chen M. Data-mining Techniques for Image-based Plant Phenotypic Traits Identification and Classification. Sci Rep 2019; 9:19526. [PMID: 31862925 PMCID: PMC6925301 DOI: 10.1038/s41598-019-55609-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Accepted: 11/21/2019] [Indexed: 11/09/2022] Open
Abstract
Statistical data-mining (DM) and machine learning (ML) are promising tools to assist in the analysis of complex dataset. In recent decades, in the precision of agricultural development, plant phenomics study is crucial for high-throughput phenotyping of local crop cultivars. Therefore, integrated or a new analytical approach is needed to deal with these phenomics data. We proposed a statistical framework for the analysis of phenomics data by integrating DM and ML methods. The most popular supervised ML methods; Linear Discriminant Analysis (LDA), Random Forest (RF), Support Vector Machine with linear (SVM-l) and radial basis (SVM-r) kernel are used for classification/prediction plant status (stress/non-stress) to validate our proposed approach. Several simulated and real plant phenotype datasets were analyzed. The results described the significant contribution of the features (selected by our proposed approach) throughout the analysis. In this study, we showed that the proposed approach removed phenotype data analysis complexity, reduced computational time of ML algorithms, and increased prediction accuracy.
Collapse
Affiliation(s)
- Md Matiur Rahaman
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh
| | - Md Asif Ahsan
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
13
|
Gene selection for microarray data classification via adaptive hypergraph embedded dictionary learning. Gene 2019; 706:188-200. [DOI: 10.1016/j.gene.2019.04.060] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 04/03/2019] [Accepted: 04/22/2019] [Indexed: 01/19/2023]
|
14
|
Olohan L, Gardiner LJ, Lucaci A, Steuernagel B, Wulff B, Kenny J, Hall N, Hall A. A modified sequence capture approach allowing standard and methylation analyses of the same enriched genomic DNA sample. BMC Genomics 2018; 19:250. [PMID: 29653520 PMCID: PMC5899405 DOI: 10.1186/s12864-018-4640-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 04/03/2018] [Indexed: 01/01/2023] Open
Abstract
Background Bread wheat has a large complex genome that makes whole genome resequencing costly. Therefore, genome complexity reduction techniques such as sequence capture make re-sequencing cost effective. With a high-quality draft wheat genome now available it is possible to design capture probe sets and to use them to accurately genotype and anchor SNPs to the genome. Furthermore, in addition to genetic variation, epigenetic variation provides a source of natural variation contributing to changes in gene expression and phenotype that can be profiled at the base pair level using sequence capture coupled with bisulphite treatment. Here, we present a new 12 Mbp wheat capture probe set, that allows both the profiling of genotype and methylation from the same DNA sample. Furthermore, we present a method, based on Agilent SureSelect Methyl-Seq, that will use a single capture assay as a starting point to allow both DNA sequencing and methyl-seq. Results Our method uses a single capture assay that is sequentially split and used for both DNA sequencing and methyl-seq. The resultant genotype and epi-type data is highly comparable in terms of coverage and SNP/methylation site identification to that generated from separate captures for DNA sequencing and methyl-seq. Furthermore, by defining SNP frequencies in a diverse landrace from the Watkins collection we highlight the importance of having genotype data to prevent false positive methylation calls. Finally, we present the design of a new 12 Mbp wheat capture and demonstrate its successful application to re-sequence wheat. Conclusions We present a cost-effective method for performing both DNA sequencing and methyl-seq from a single capture reaction thus reducing reagent costs, sample preparation time and DNA requirements for these complementary analyses. Electronic supplementary material The online version of this article (10.1186/s12864-018-4640-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lisa Olohan
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool, UK
| | | | - Anita Lucaci
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool, UK
| | | | - Brande Wulff
- John Innes Centre, Norwich Research Park, Norwich, UK
| | - John Kenny
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool, UK
| | - Neil Hall
- Earlham Institute, Norwich research Park, Norwich, UK.,University of East Anglia, Norwich, UK
| | - Anthony Hall
- Earlham Institute, Norwich research Park, Norwich, UK. .,University of East Anglia, Norwich, UK.
| |
Collapse
|
15
|
Statistical approach for selection of biologically informative genes. Gene 2018; 655:71-83. [PMID: 29458166 DOI: 10.1016/j.gene.2018.02.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 11/26/2017] [Accepted: 02/14/2018] [Indexed: 11/23/2022]
Abstract
Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies.
Collapse
|
16
|
Statistical Approach for Gene Set Analysis with Trait Specific Quantitative Trait Loci. Sci Rep 2018; 8:2391. [PMID: 29402907 PMCID: PMC5799309 DOI: 10.1038/s41598-018-19736-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 12/06/2017] [Indexed: 11/20/2022] Open
Abstract
The analysis of gene sets is usually carried out based on gene ontology terms and known biological pathways. These approaches may not establish any formal relation between genotype and trait specific phenotype. In plant biology and breeding, analysis of gene sets with trait specific Quantitative Trait Loci (QTL) data are considered as great source for biological knowledge discovery. Therefore, we proposed an innovative statistical approach called Gene Set Analysis with QTLs (GSAQ) for interpreting gene expression data in context of gene sets with traits. The utility of GSAQ was studied on five different complex abiotic and biotic stress scenarios in rice, which yields specific trait/stress enriched gene sets. Further, the GSAQ approach was more innovative and effective in performing gene set analysis with underlying QTLs and identifying QTL candidate genes than the existing approach. The GSAQ approach also provided two potential biological relevant criteria for performance analysis of gene selection methods. Based on this proposed approach, an R package, i.e., GSAQ (https://cran.r-project.org/web/packages/GSAQ) has been developed. The GSAQ approach provides a valuable platform for integrating the gene expression data with genetically rich QTL data.
Collapse
|
17
|
Sogbohossou EOD, Achigan-Dako EG, Maundu P, Solberg S, Deguenon EMS, Mumm RH, Hale I, Van Deynze A, Schranz ME. A roadmap for breeding orphan leafy vegetable species: a case study of Gynandropsis gynandra (Cleomaceae). HORTICULTURE RESEARCH 2018; 5:2. [PMID: 29423232 PMCID: PMC5798814 DOI: 10.1038/s41438-017-0001-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 10/23/2017] [Accepted: 11/29/2017] [Indexed: 05/24/2023]
Abstract
Despite an increasing awareness of the potential of "orphan" or unimproved crops to contribute to food security and enhanced livelihoods for farmers, coordinated research agendas to facilitate production and use of orphan crops by local communities are generally lacking. We provide an overview of the current knowledge on leafy vegetables with a focus on Gynandropsis gynandra, a highly nutritious species used in Africa and Asia, and highlight general and species-specific guidelines for participatory, genomics-assisted breeding of orphan crops. Key steps in genome-enabled orphan leafy vegetables improvement are identified and discussed in the context of Gynandropsis gynandra breeding, including: (1) germplasm collection and management; (2) product target definition and refinement; (3) characterization of the genetic control of key traits; (4) design of the 'process' for cultivar development; (5) integration of genomic data to optimize that 'process'; (6) multi-environmental participatory testing and end-user evaluation; and (7) crop value chain development. The review discusses each step in detail, with emphasis on improving leaf yield, phytonutrient content, organoleptic quality, resistance to biotic and abiotic stresses and post-harvest management.
Collapse
Affiliation(s)
- E. O. Deedi Sogbohossou
- Biosystematics Group, Wageningen University, Postbus 647 6700AP, Wageningen, The Netherlands
- Laboratory of Genetics, Horticulture and Seed Sciences, Faculty of Agronomic Sciences, University of Abomey-Calavi, BP 2549 Abomey-Calavi, Benin
| | - Enoch G. Achigan-Dako
- Laboratory of Genetics, Horticulture and Seed Sciences, Faculty of Agronomic Sciences, University of Abomey-Calavi, BP 2549 Abomey-Calavi, Benin
| | - Patrick Maundu
- Kenya Resource Center for Indigenous Knowledge (KENRIK), Centre for Biodiversity, National Museums of Kenya, Museum Hill, P.O. Box 40658, Nairobi, 00100 Kenya
| | - Svein Solberg
- World Vegetable Center (AVRDC), P.O. Box 42, Shanhua, Tainan 74199 Taiwan
| | | | - Rita H. Mumm
- Department of Crop Sciences, University of Illinois, Urbana-Champaign, IL 61801 USA
| | - Iago Hale
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH 03824 USA
| | - Allen Van Deynze
- Department of Plant Sciences, University of California, Davis, CA 95616 USA
| | - M. Eric Schranz
- Biosystematics Group, Wageningen University, Postbus 647 6700AP, Wageningen, The Netherlands
| |
Collapse
|
18
|
Kumar A, Jeya Sundara Sharmila D, Singh S. SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes. GENOMICS DATA 2017; 12:28-37. [PMID: 28275550 PMCID: PMC5331150 DOI: 10.1016/j.gdata.2017.02.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/07/2017] [Accepted: 02/15/2017] [Indexed: 12/22/2022]
Abstract
Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the new potential targets for the drugs which not only control the disease but also can treat it. Support vector machines are the classifier which has a potential to make a classification of the discriminatory genes and non-discriminatory genes. SVMRFE a modification of SVM ranks the genes based on their discriminatory power and eliminate the genes which are not involved in causing the disease. A gene regulatory network has been formed with the top ranked coding genes to identify their role in causing diabetes. To further validate the results pathway study was performed to identify the involvement of the coding genes in type II diabetes. The genes obtained from this study showed a significant involvement in causing the disease, which may be used as a potential drug target.
Collapse
Affiliation(s)
- Atul Kumar
- Department of Biotechnology and Health Sciences, Karunya University, Coimbatore, Tamil Nadu, India
| | - D Jeya Sundara Sharmila
- Department of Nanosciences and Technology, Tamil Nadu Agriculture University, Coimbatore, Tamil Nadu, India
| | - Sachidanand Singh
- Department of Biotechnology and Health Sciences, Karunya University, Coimbatore, Tamil Nadu, India
| |
Collapse
|
19
|
Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 2017; 67:59-68. [PMID: 28215562 DOI: 10.1016/j.jbi.2017.02.007] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 01/25/2017] [Accepted: 02/09/2017] [Indexed: 01/04/2023]
Abstract
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification.
Collapse
|
20
|
Transcriptomic basis for drought-resistance in Brassica napus L. Sci Rep 2017; 7:40532. [PMID: 28091614 PMCID: PMC5238399 DOI: 10.1038/srep40532] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 12/07/2016] [Indexed: 01/06/2023] Open
Abstract
Based on transcriptomic data from four experimental settings with drought-resistant and drought-sensitive cultivars under drought and well-watered conditions, statistical analysis revealed three categories encompassing 169 highly differentially expressed genes (DEGs) in response to drought in Brassica napus L., including 37 drought-resistant cultivar-related genes, 35 drought-sensitive cultivar-related genes and 97 cultivar non-specific ones. We provide evidence that the identified DEGs were fairly uniformly distributed on different chromosomes and their expression patterns are variety specific. Except commonly enriched in response to various stimuli or stresses, different categories of DEGs show specific enrichment in certain biological processes or pathways, which indicated the possibility of functional differences among the three categories. Network analysis revealed relationships among the 169 DEGs, annotated biological processes and pathways. The 169 DEGs can be classified into different functional categories via preferred pathways or biological processes. Some pathways might simultaneously involve a large number of shared DEGs, and these pathways are likely to cross-talk and have overlapping biological functions. Several members of the identified DEGs fit to drought stress signal transduction pathway in Arabidopsis thaliana. Finally, quantitative real-time PCR validations confirmed the reproducibility of the RNA-seq data. These investigations are profitable for the improvement of crop varieties through transgenic engineering.
Collapse
|
21
|
Das S, Meher PK, Rai A, Bhar LM, Mandal BN. Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.). PLoS One 2017; 12:e0169605. [PMID: 28056073 PMCID: PMC5215982 DOI: 10.1371/journal.pone.0169605] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 12/19/2016] [Indexed: 11/30/2022] Open
Abstract
Selection of informative genes is an important problem in gene expression studies. The small sample size and the large number of genes in gene expression data make the selection process complex. Further, the selected informative genes may act as a vital input for gene co-expression network analysis. Moreover, the identification of hub genes and module interactions in gene co-expression networks is yet to be fully explored. This paper presents a statistically sound gene selection technique based on support vector machine algorithm for selecting informative genes from high dimensional gene expression data. Also, an attempt has been made to develop a statistical approach for identification of hub genes in the gene co-expression network. Besides, a differential hub gene analysis approach has also been developed to group the identified hub genes into various groups based on their gene connectivity in a case vs. control study. Based on this proposed approach, an R package, i.e., dhga (https://cran.r-project.org/web/packages/dhga) has been developed. The comparative performance of the proposed gene selection technique as well as hub gene identification approach was evaluated on three different crop microarray datasets. The proposed gene selection technique outperformed most of the existing techniques for selecting robust set of informative genes. Based on the proposed hub gene identification approach, a few number of hub genes were identified as compared to the existing approach, which is in accordance with the principle of scale free property of real networks. In this study, some key genes along with their Arabidopsis orthologs has been reported, which can be used for Aluminum toxic stress response engineering in soybean. The functional analysis of various selected key genes revealed the underlying molecular mechanisms of Aluminum toxic stress response in soybean.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Lal Mohan Bhar
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Baidya Nath Mandal
- Division of Design of Experiments, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
22
|
Lu X, Yang Y, Wu F, Gao M, Xu Y, Zhang Y, Yao Y, Du X, Li C, Wu L, Zhong X, Zhou Y, Fan N, Zheng Y, Xiong D, Peng H, Escudero J, Huang B, Li X, Ning Y, Wu K. Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images. Medicine (Baltimore) 2016; 95:e3973. [PMID: 27472673 PMCID: PMC5265810 DOI: 10.1097/md.0000000000003973] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 05/16/2016] [Accepted: 05/26/2016] [Indexed: 12/11/2022] Open
Abstract
Structural abnormalities in schizophrenia (SZ) patients have been well documented with structural magnetic resonance imaging (MRI) data using voxel-based morphometry (VBM) and region of interest (ROI) analyses. However, these analyses can only detect group-wise differences and thus, have a poor predictive value for individuals. In the present study, we applied a machine learning method that combined support vector machine (SVM) with recursive feature elimination (RFE) to discriminate SZ patients from normal controls (NCs) using their structural MRI data. We first employed both VBM and ROI analyses to compare gray matter volume (GMV) and white matter volume (WMV) between 41 SZ patients and 42 age- and sex-matched NCs. The method of SVM combined with RFE was used to discriminate SZ patients from NCs using significant between-group differences in both GMV and WMV as input features. We found that SZ patients showed GM and WM abnormalities in several brain structures primarily involved in the emotion, memory, and visual systems. An SVM with a RFE classifier using the significant structural abnormalities identified by the VBM analysis as input features achieved the best performance (an accuracy of 88.4%, a sensitivity of 91.9%, and a specificity of 84.4%) in the discriminative analyses of SZ patients. These results suggested that distinct neuroanatomical profiles associated with SZ patients might provide a potential biomarker for disease diagnosis, and machine-learning methods can reveal neurobiological mechanisms in psychiatric diseases.
Collapse
Affiliation(s)
- Xiaobing Lu
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
- GBH-SCUT Joint Research Centre for Neuroimaging, Guangzhou, China
| | - Yongzhe Yang
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
- School of Medicine, South China University of Technology (SCUT), Guangzhou, China
- Department of Radiology, Guangdong Academy of Medical Sciences, Guangdong General Hospital, Guangzhou, China
| | - Fengchun Wu
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
- GBH-SCUT Joint Research Centre for Neuroimaging, Guangzhou, China
| | - Minjian Gao
- School of Computer Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Yong Xu
- School of Computer Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Yue Zhang
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Yongcheng Yao
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Xin Du
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Chengwei Li
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Lei Wu
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
- School of Medicine, South China University of Technology (SCUT), Guangzhou, China
- Department of Radiology, Guangdong Academy of Medical Sciences, Guangdong General Hospital, Guangzhou, China
| | - Xiaomei Zhong
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
- GBH-SCUT Joint Research Centre for Neuroimaging, Guangzhou, China
| | - Yanling Zhou
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
| | - Ni Fan
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
| | - Yingjun Zheng
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
| | - Dongsheng Xiong
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
| | - Hongjun Peng
- Department of Clinical Psychology, Guangzhou Brain Hospital (GBH)/ (Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
| | - Javier Escudero
- Institute for Digital Communications, School of Engineering, The University of Edinburgh, Edinburgh EH9 3JL, UK
| | - Biao Huang
- School of Medicine, South China University of Technology (SCUT), Guangzhou, China
- Department of Radiology, Guangdong Academy of Medical Sciences, Guangdong General Hospital, Guangzhou, China
| | - Xiaobo Li
- Department of Biomedical Engineering, New Jersey Institute of Technology, NJ, US
- Department of Electric and Computer Engineering, New Jersey Institute of Technology, NJ, US
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, US
| | - Yuping Ning
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
- GBH-SCUT Joint Research Centre for Neuroimaging, Guangzhou, China
| | - Kai Wu
- Department of Psychiatry, Guangzhou Brain Hospital (GBH)/(Guangzhou Huiai Hospital, The Affiliated Brain Hospital of Guangzhou Medical University), Guangzhou, China
- Department of Biomedical Engineering, School of Materials Science and Engineering, South China University of Technology (SCUT), Guangzhou, China
- GBH-SCUT Joint Research Centre for Neuroimaging, Guangzhou, China
- Department of Nuclear Medicine and Radiology, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan
| |
Collapse
|
23
|
D'Andrea RM, Triassi A, Casas MI, Andreo CS, Lara MV. Identification of genes involved in the drought adaptation and recovery in Portulaca oleracea by differential display. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2015; 90:38-49. [PMID: 25767913 DOI: 10.1016/j.plaphy.2015.02.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 02/28/2015] [Indexed: 06/04/2023]
Abstract
Portulaca oleracea is one of the richest plant sources of ω-3 and ω-6 fatty acids and other compounds potentially valuable for nutrition. It is broadly established in arid, semiarid and well-watered fields, thus making it a promising candidate for research on abiotic stress resistance mechanisms. It is capable of withstanding severe drought and then of recovering upon rehydration. Here, the adaptation to drought and the posterior recovery was evaluated at transcriptomic level by differential display validated by qRT-PCR. Of the 2279 transcript-derived fragments amplified, 202 presented differential expression. Ninety of them were successfully isolated and sequenced. Selected genes were tested against different abiotic stresses in P. oleracea and the behavior of their orthologous genes in Arabidopsis thaliana was also explored to seek for conserved response mechanisms. In drought adapted and in recovered plants changes in expression of many protein metabolism-, lipid metabolism- and stress-related genes were observed. Many genes with unknown function were detected, which also respond to other abiotic stresses. Some of them are also involved in the seed desiccation/imbibition process and thus would be of great interest for further research. The potential use of candidate genes to engineer drought tolerance improvement and recovery is discussed.
Collapse
Affiliation(s)
- Rodrigo Matías D'Andrea
- Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, Rosario, 2000, Argentina.
| | - Agustina Triassi
- Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, Rosario, 2000, Argentina.
| | - María Isabel Casas
- Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, Rosario, 2000, Argentina.
| | - Carlos Santiago Andreo
- Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, Rosario, 2000, Argentina.
| | - María Valeria Lara
- Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, Rosario, 2000, Argentina.
| |
Collapse
|
24
|
Chen Y, Zhou W, Wang H, Yuan Z. Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection. Med Biol Eng Comput 2015; 53:535-44. [PMID: 25752770 DOI: 10.1007/s11517-015-1268-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 03/02/2015] [Indexed: 12/21/2022]
Abstract
Protein glycosylation is one of the most important and complex post-translational modification that provides greater proteomic diversity than any other post-translational modification. Fast and reliable computational methods to identify glycosylation sites are in great demand. Two key issues, feature encoding and feature selection, can critically affect the accuracy of a computational method. We present a new O-glycosylation sites prediction method using only amino acid sequence information. The method includes the following components: (1) on the basis of multi-scale theory, features based on multi-scale composition of amino acids were extracted from the training sequences with identified glycosylation sites; (2) perform a two-stage feature selection to remove features that had adverse effects on the prediction, including a stage one preliminary filtering with Student's t test, and a second stage screening through iterative elimination using novel pairwise comparisons conducted in random subspace using support vector machine. Important features retained are used to build prediction model. The method is evaluated with sequence-based tenfold cross-validation tests on balanced datasets. The results of our experiments show that our method significantly outperforms those reported in the literature in terms of sensitivity, specificity, accuracy, Matthew's correlation coefficient. The prediction accuracy of serine and threonine residues sites reached 95.7 and 92.7%. The Matthew correlation coefficient of our method for S and T sites is 0.914 and 0.873, respectively. This method can evaluate each feature with the interactions of the rest of the features, which are still included in the model and have the advantage of high efficiency.
Collapse
Affiliation(s)
- Yuan Chen
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Hunan Agricultural University, Changsha, 410128, China
| | | | | | | |
Collapse
|
25
|
Meng J, Zhang J, Luan Y. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:433-444. [PMID: 26357229 DOI: 10.1109/tcbb.2014.2361329] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.
Collapse
|
26
|
Gene selection using rough set based on neighborhood for the analysis of plant stress response. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Zhang X, Lu G, Long W, Zou X, Li F, Nishio T. Recent progress in drought and salt tolerance studies in Brassica crops. BREEDING SCIENCE 2014; 64:60-73. [PMID: 24987291 PMCID: PMC4031111 DOI: 10.1270/jsbbs.64.60] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 03/19/2014] [Indexed: 05/19/2023]
Abstract
Water deficit imposed by either drought or salinity brings about severe growth retardation and yield loss of crops. Since Brassica crops are important contributors to total oilseed production, it is urgently needed to develop tolerant cultivars to ensure yields under such adverse conditions. There are various physiochemical mechanisms for dealing with drought and salinity in plants at different developmental stages. Accordingly, different indicators of tolerance to drought or salinity at the germination, seedling, flowering and mature stages have been developed and used for germplasm screening and selection in breeding practices. Classical genetic and modern genomic approaches coupled with precise phenotyping have boosted the unravelling of genes and metabolic pathways conferring drought or salt tolerance in crops. QTL mapping of drought and salt tolerance has provided several dozen target QTLs in Brassica and the closely related Arabidopsis. Many drought- or salt-tolerant genes have also been isolated, some of which have been confirmed to have great potential for genetic improvement of plant tolerance. It has been suggested that molecular breeding approaches, such as marker-assisted selection and gene transformation, that will enhance oil product security under a changing climate be integrated in the development of drought- and salt-tolerant Brassica crops.
Collapse
Affiliation(s)
- Xuekun Zhang
- Key Laboratory of Oil Crops Biology and Genetic Improvement, Ministry of Agriculture, Oil Crops Research Institute,
CAAS, Wuhan 430062,
China
| | - Guangyuan Lu
- Key Laboratory of Oil Crops Biology and Genetic Improvement, Ministry of Agriculture, Oil Crops Research Institute,
CAAS, Wuhan 430062,
China
| | - Weihua Long
- Key Laboratory of Oil Crops Biology and Genetic Improvement, Ministry of Agriculture, Oil Crops Research Institute,
CAAS, Wuhan 430062,
China
- Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences,
Nanjing 210014,
China
| | - Xiling Zou
- Key Laboratory of Oil Crops Biology and Genetic Improvement, Ministry of Agriculture, Oil Crops Research Institute,
CAAS, Wuhan 430062,
China
| | - Feng Li
- Key Laboratory of Oil Crops Biology and Genetic Improvement, Ministry of Agriculture, Oil Crops Research Institute,
CAAS, Wuhan 430062,
China
| | - Takeshi Nishio
- Graduate School of Agricultural Science, Tohoku University,
Sendai, Miyagi 981-8555,
Japan
| |
Collapse
|
28
|
Shaik R, Ramakrishna W. Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice. PLANT PHYSIOLOGY 2014; 164:481-95. [PMID: 24235132 PMCID: PMC3875824 DOI: 10.1104/pp.113.225862] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/13/2013] [Indexed: 05/18/2023]
Abstract
Abiotic and biotic stress responses are traditionally thought to be regulated by discrete signaling mechanisms. Recent experimental evidence revealed a more complex picture where these mechanisms are highly entangled and can have synergistic and antagonistic effects on each other. In this study, we identified shared stress-responsive genes between abiotic and biotic stresses in rice (Oryza sativa) by performing meta-analyses of microarray studies. About 70% of the 1,377 common differentially expressed genes showed conserved expression status, and the majority of the rest were down-regulated in abiotic stresses and up-regulated in biotic stresses. Using dimension reduction techniques, principal component analysis, and partial least squares discriminant analysis, we were able to segregate abiotic and biotic stresses into separate entities. The supervised machine learning model, recursive-support vector machine, could classify abiotic and biotic stresses with 100% accuracy using a subset of differentially expressed genes. Furthermore, using a random forests decision tree model, eight out of 10 stress conditions were classified with high accuracy. Comparison of genes contributing most to the accurate classification by partial least squares discriminant analysis, recursive-support vector machine, and random forests revealed 196 common genes with a dynamic range of expression levels in multiple stresses. Functional enrichment and coexpression network analysis revealed the different roles of transcription factors and genes responding to phytohormones or modulating hormone levels in the regulation of stress responses. We envisage the top-ranked genes identified in this study, which highly discriminate abiotic and biotic stresses, as key components to further our understanding of the inherently complex nature of multiple stress responses in plants.
Collapse
|
29
|
Bhardwaj J, Chauhan R, Swarnkar MK, Chahota RK, Singh AK, Shankar R, Yadav SK. Comprehensive transcriptomic study on horse gram (Macrotyloma uniflorum): De novo assembly, functional characterization and comparative analysis in relation to drought stress. BMC Genomics 2013; 14:647. [PMID: 24059455 PMCID: PMC3853109 DOI: 10.1186/1471-2164-14-647] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 09/13/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drought tolerance is an attribute maintained in plants by cross-talk between multiple and cascading metabolic pathways. Without a sequenced genome available for horse gram, it is difficult to comprehend such complex networks and intercalated genes associated with drought tolerance of horse gram (Macrotyloma uniflorum). Therefore, de novo transcriptome discovery and associated analyses was done for this highly drought tolerant yet under exploited legume to decipher its genetic makeup. RESULTS Eight samples comprising of shoot and root tissues of two horse gram genotypes (drought-sensitive; M-191 and drought-tolerant; M-249) were used for comparison under control and polyethylene glycol-induced drought stress conditions. Using Illumina sequencing technology, a total of 229,297,896 paired end read pairs were generated and utilized for de novo assembly of horse gram. Significant BLAST hits were obtained for 26,045 transcripts while, 3,558 transcripts had no hits but contained important conserved domains. A total of 21,887 unigenes were identified. SSRs containing sequences covered 16.25% of the transcriptome with predominant tri- and mono-nucleotides (43%). The total GC content of the transcriptome was found to be 43.44%. Under Gene Ontology response to stimulus, DNA binding and catalytic activity was highly expressed during drought stress conditions. Serine/threonine protein kinase was found to dominate in Enzyme Classification while pathways belonging to ribosome metabolism followed by plant pathogen interaction and plant hormone signal transduction were predominant in Kyoto Encyclopedia of Genes and Genomes analysis. Independent search on plant metabolic network pathways suggested valine degradation, gluconeogenesis and purine nucleotide degradation to be highly influenced under drought stress in horse gram. Transcription factors belonging to NAC, MYB-related, and WRKY families were found highly represented under drought stress. qRT-PCR validated the expression profile for 9 out of 10 genes analyzed in response to drought stress. CONCLUSIONS De novo transcriptome discovery and analysis has generated enormous information over horse gram genomics. The genes and pathways identified suggest efficient regulation leading to active adaptation as a basal defense response against drought stress by horse gram. The knowledge generated can be further utilized for exploring other underexploited plants for stress responsive genes and improving plant tolerance.
Collapse
Affiliation(s)
- Jyoti Bhardwaj
- Plant Metabolic Engineering Laboratory, Council of Scientific and Industrial Research-Institute of Himalayan Bioresource Technology, Palampur 176061, HP, India.
| | | | | | | | | | | | | |
Collapse
|
30
|
Wang J, Chen L, Wang Y, Zhang J, Liang Y, Xu D. A computational systems biology study for understanding salt tolerance mechanism in rice. PLoS One 2013; 8:e64929. [PMID: 23762267 PMCID: PMC3676415 DOI: 10.1371/journal.pone.0064929] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 04/19/2013] [Indexed: 01/22/2023] Open
Abstract
Salinity is one of the most common abiotic stresses in agriculture production. Salt tolerance of rice (Oryza sativa) is an important trait controlled by various genes. The mechanism of rice salt tolerance, currently with limited understanding, is of great interest to molecular breeding in improving grain yield. In this study, a gene regulatory network of rice salt tolerance is constructed using a systems biology approach with a number of novel computational methods. We developed an improved volcano plot method in conjunction with a new machine-learning method for gene selection based on gene expression data and applied the method to choose genes related to salt tolerance in rice. The results were then assessed by quantitative trait loci (QTL), co-expression and regulatory binding motif analysis. The selected genes were constructed into a number of network modules based on predicted protein interactions including modules of phosphorylation activity, ubiquity activity, and several proteinase activities such as peroxidase, aspartic proteinase, glucosyltransferase, and flavonol synthase. All of these discovered modules are related to the salt tolerance mechanism of signal transduction, ion pump, abscisic acid mediation, reactive oxygen species scavenging and ion sequestration. We also predicted the three-dimensional structures of some crucial proteins related to the salt tolerance QTL for understanding the roles of these proteins in the network. Our computational study sheds some new light on the mechanism of salt tolerance and provides a systems biology pipeline for studying plant traits in general.
Collapse
Affiliation(s)
- Juexin Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Digital Biology Laboratory, Computer Science Department, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| | - Liang Chen
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Jingfen Zhang
- Digital Biology Laboratory, Computer Science Department, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, China
- * E-mail: (YL); (DX)
| | - Dong Xu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Digital Biology Laboratory, Computer Science Department, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
- * E-mail: (YL); (DX)
| |
Collapse
|
31
|
Hu C, Wang J, Zheng C, Xu S, Zhang H, Liang Y, Bi L, Fan Z, Han B, Xu W. Raman spectra exploring breast tissues: Comparison of principal component analysis and support vector machine-recursive feature elimination. Med Phys 2013; 40:063501. [DOI: 10.1118/1.4804054] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
32
|
Landschoot S, Waegeman W, Audenaert K, Vandepitte J, Haesaert G, De Baets B. Toward a Reliable Evaluation of Forecasting Systems for Plant Diseases: A Case Study Using Fusarium Head Blight of Wheat. PLANT DISEASE 2012; 96:889-896. [PMID: 30727362 DOI: 10.1094/pdis-08-11-0665] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Despite great efforts to forecast plant diseases, many of the existing systems often fall short in providing farmers with accurate predictions. One of the main problems arises from the existence of year and location effects, so that more advanced procedures are required for evaluating existing systems in an unbiased manner. This paper illustrates the case of Fusarium head blight of winter wheat in Belgium. We present a new cross-validation strategy that enables the evaluation of the predictive performance of a forecasting system for years and locations that are different from the years and locations on which the forecast was developed. Four different cross-validation strategies and five regression techniques are used. The results demonstrated that traditional evaluation strategies are too optimistic in their predictions, whereas the cross-year cross-location validation strategy yielded more realistic outcomes. Using this procedure, the mean squared error increased and the coefficient of determination decreased in predicting disease severity and deoxynivalenol content, suggesting that existing evaluation strategies may generate a substantial optimistic bias. The strongest discrepancies between the cross-validation strategies were observed for multiple linear regression models.
Collapse
Affiliation(s)
- S Landschoot
- KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure links 653, BE-9000 Gent, Belgium, and Faculty of Applied Bioscience Engineering, University College Ghent, Valentin Vaerwyckweg 1, BE-9000 Gent, Belgium
| | - W Waegeman
- KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University
| | - K Audenaert
- Faculty of Applied Bioscience Engineering, University College Ghent, and Department of Crop Protection, Laboratory of Phytopathology, Ghent University
| | - J Vandepitte
- KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University
| | - G Haesaert
- Faculty of Applied Bioscience Engineering, University College Ghent, and Department of Crop Protection, Laboratory of Phytopathology, Ghent University
| | - B De Baets
- KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University
| |
Collapse
|