Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Weston J, Leslie C, Ie E, Zhou D, Elisseeff A, Noble WS. Semi-supervised protein classification using cluster kernels. Bioinformatics 2005;21:3241-7. [PMID: 15905279 DOI: 10.1093/bioinformatics/bti497] [Citation(s) in RCA: 127] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Weston J, Leslie C, Ie E, Zhou D, Elisseeff A, Noble WS. Semi-supervised protein classification using cluster kernels. Bioinformatics 2005;21:3241-7. [PMID: 15905279 DOI: 10.1093/bioinformatics/bti497] [Citation(s) in RCA: 127] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Ghosh D, Chakraborty S, Kodamana H, Chakraborty S. Application of machine learning in understanding plant virus pathogenesis: trends and perspectives on emergence, diagnosis, host-virus interplay and management. Virol J 2022;19:42. [PMID: 35264189 PMCID: PMC8905280 DOI: 10.1186/s12985-022-01767-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 02/27/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Inclusion of high throughput technologies in the field of biology has generated massive amounts of data in the recent years. Now, transforming these huge volumes of data into knowledge is the primary challenge in computational biology. The traditional methods of data analysis have failed to carry out the task. Hence, researchers are turning to machine learning based approaches for the analysis of high-dimensional big data. In machine learning, once a model is trained with a training dataset, it can be applied on a testing dataset which is independent. In current times, deep learning algorithms further promote the application of machine learning in several field of biology including plant virology.

MAIN BODY

Plant viruses have emerged as one of the principal global threats to food security due to their devastating impact on crops and vegetables. The emergence of new viral strains and species help viruses to evade the concurrent preventive methods. According to a survey conducted in 2014, plant viruses are anticipated to cause a global yield loss of more than thirty billion USD per year. In order to design effective, durable and broad-spectrum management protocols, it is very important to understand the mechanistic details of viral pathogenesis. The application of machine learning enables precise diagnosis of plant viral diseases at an early stage. Furthermore, the development of several machine learning-guided bioinformatics platforms has primed plant virologists to understand the host-virus interplay better. In addition, machine learning has tremendous potential in deciphering the pattern of plant virus evolution and emergence as well as in developing viable control options.

CONCLUSIONS

Considering a significant progress in the application of machine learning in understanding plant virology, this review highlights an introductory note on machine learning and comprehensively discusses the trends and prospects of machine learning in the diagnosis of viral diseases, understanding host-virus interplay and emergence of plant viruses.

Collapse

Yakimovich A, Beaugnon A, Huang Y, Ozkirimli E. Labels in a haystack: Approaches beyond supervised learning in biomedical applications. PATTERNS (NEW YORK, N.Y.) 2021;2:100383. [PMID: 34950904 PMCID: PMC8672145 DOI: 10.1016/j.patter.2021.100383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Di Grazia L, Aminpour M, Vezzetti E, Rezania V, Marcolin F, Tuszynski JA. A new method for protein characterization and classification using geometrical features for 3D face analysis: An example of tubulin structures. Proteins 2020;89:e25993. [PMID: 32779779 DOI: 10.1002/prot.25993] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 07/22/2020] [Accepted: 07/26/2020] [Indexed: 11/12/2022]

A probabilistic approach towards an unbiased semi-supervised cluster tree. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Breitman MF, Domingos FM, Bagley JC, Wiederhecker HC, Ferrari TB, Cavalcante VH, Pereira AC, Abreu TL, De-Lima AKS, Morais CJ, Prette ACD, Silva IP, Mello RD, Carvalho G, Lima TM, Silva AA, Matias CA, Carvalho GC, Pantoja JA, Monteiro Gomes I, Paschoaletto IP, Rodrigues GF, Talarico ÂNV, Barreto-Lima AF, Colli GR. A New Species of Enyalius (Squamata, Leiosauridae) Endemic to the Brazilian Cerrado. HERPETOLOGICA 2018. [DOI: 10.1655/0018-0831.355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Affiliation(s)

M. Florencia Breitman Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Fabricius M.C.B. Domingos Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Justin C. Bagley Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Helga C. Wiederhecker Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Tayná B. Ferrari Campus I, Universidade Cató lica de Brasília, Águas Claras, DF 71966-700, Brazil
Vitor H.G.L. Cavalcante Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
André C. Pereira Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
TarcÍSio L.S. Abreu Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Anderson Kennedy Soares De-Lima Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Carlos J.S. Morais Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Ana C.H. Del Prette Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Izabella P.M.C. Silva Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Rodrigo De Mello Campus I, Universidade Cató lica de Brasília, Águas Claras, DF 71966-700, Brazil
Gabriela Carvalho Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Thiago M.De Lima Campus I, Universidade Cató lica de Brasília, Águas Claras, DF 71966-700, Brazil
Anandha A. Silva Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Caroline Azevedo Matias Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Gabriel C. Carvalho Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
João A.L. Pantoja Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Isabella Monteiro Gomes Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Ingrid Pinheiro Paschoaletto Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Gabriela Ferreira Rodrigues Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
ÂNgela V.C. Talarico Campus I, Universidade Cató lica de Brasília, Águas Claras, DF 71966-700, Brazil
André F. Barreto-Lima Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Guarino R. Colli Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil

Collapse

Breitman MF, Domingos FM, Bagley JC, Wiederhecker HC, Ferrari TB, Cavalcante VH, Pereira AC, Abreu TL, De-Lima AKS, Morais CJ, del Prette AC, Silva IP, de Mello R, Carvalho G, de Lima TM, Silva AA, Matias CA, Carvalho GC, Pantoja JA, Gomes IM, Paschoaletto IP, Rodrigues GF, Talarico ÂV, Barreto-Lima AF, Colli GR. A New Species ofEnyalius(Squamata, Leiosauridae) Endemic to the Brazilian Cerrado. HERPETOLOGICA 2018. [DOI: 10.1655/herpetologica-d-17-00041.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Affiliation(s)

M. Florencia Breitman Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Fabricius M.C.B. Domingos Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil Instituto de Ciências Biológicas e da Saúde, Universidade Federal de Mato Grosso, Pontal do Araguaia, MT 78698-000, Brazil
Justin C. Bagley Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil Departamento de Zoologia e Botânica, Universidade Estadual Paulista, São José do Rio Preto, SP 15054-000, Brazil
Helga C. Wiederhecker Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil Campus I, Universidade Católica de Brasília, Águas Claras, DF 71966-700, Brazil
Tayná B. Ferrari Campus I, Universidade Católica de Brasília, Águas Claras, DF 71966-700, Brazil
Vitor H.G.L. Cavalcante Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil Instituto Federal do Piauí, Teresina, PI 64000-040, Brazil
André C. Pereira Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Tarcísio L.S. Abreu Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Anderson Kennedy Soares De-Lima Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Carlos J.S. Morais Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Ana C.H. del Prette Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Izabella P.M.C. Silva Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Rodrigo de Mello Campus I, Universidade Católica de Brasília, Águas Claras, DF 71966-700, Brazil
Gabriela Carvalho Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Thiago M. de Lima Campus I, Universidade Católica de Brasília, Águas Claras, DF 71966-700, Brazil
Anandha A. Silva Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Caroline Azevedo Matias Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Gabriel C. Carvalho Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
João A.L. Pantoja Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Isabella Monteiro Gomes Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Ingrid Pinheiro Paschoaletto Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Gabriela Ferreira Rodrigues Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Ângela V.C. Talarico Campus I, Universidade Católica de Brasília, Águas Claras, DF 71966-700, Brazil
André F. Barreto-Lima Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil
Guarino R. Colli Departamento de Zoologia, Universidade de Brasília, Brasília, DF 70910-900, Brazil

Collapse

Yu J, Kim SB. Consensus rate-based label propagation for semi-supervised classification. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.06.074] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Peikari M, Salama S, Nofech-Mozes S, Martel AL. A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification. Sci Rep 2018;8:7193. [PMID: 29739993 PMCID: PMC5940864 DOI: 10.1038/s41598-018-24876-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 04/11/2018] [Indexed: 01/25/2023] Open

Liu F, Ma R, Tay CYA, Octavia S, Lan R, Chung HKL, Riordan SM, Grimm MC, Leong RW, Tanaka MM, Connor S, Zhang L. Genomic analysis of oral Campylobacter concisus strains identified a potential bacterial molecular marker associated with active Crohn's disease. Emerg Microbes Infect 2018;7:64. [PMID: 29636463 PMCID: PMC5893538 DOI: 10.1038/s41426-018-0065-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 03/14/2018] [Accepted: 03/20/2018] [Indexed: 02/08/2023]

Leveraging Big Data Tools and Technologies: Addressing the Challenges of the Water Quality Sector. SUSTAINABILITY 2017. [DOI: 10.3390/su9122160] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Deepthi P, Thampi SM. Predicting cancer subtypes from microarray data using semi-supervised fuzzy C-means algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-169222] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

K. K, P. G. L, Rangarajan L, K. AK. Effective Feature Selection for Classification of Promoter Sequences. PLoS One 2016;11:e0167165. [PMID: 27978541 PMCID: PMC5158321 DOI: 10.1371/journal.pone.0167165] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 11/09/2016] [Indexed: 11/18/2022] Open

Abstract

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

Collapse

Hanif M, Hafeez A, Suleman Y, Mustafa Rafique M, Butt AR, Iqbal SM. An accelerated framework for the classification of biological targets from solid-state micropore data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016;134:53-67. [PMID: 27480732 DOI: 10.1016/j.cmpb.2016.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 05/05/2016] [Accepted: 06/13/2016] [Indexed: 06/06/2023]

Koyano H, Hayashida M, Akutsu T. Maximum margin classifier working in a set of strings. Proc Math Phys Eng Sci 2016;472:20150551. [PMID: 27118908 PMCID: PMC4841474 DOI: 10.1098/rspa.2015.0551] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 02/02/2016] [Indexed: 11/12/2022] Open

Stanescu A, Caragea D. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets. BMC SYSTEMS BIOLOGY 2015;9 Suppl 5:S1. [PMID: 26356316 PMCID: PMC4565116 DOI: 10.1186/1752-0509-9-s5-s1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers.

RESULTS

Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines.

CONCLUSIONS

In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.

Collapse

Dai HL. Imbalanced Protein Data Classification Using Ensemble FTM-SVM. IEEE Trans Nanobioscience 2015;14:350-359. [DOI: 10.1109/tnb.2015.2431292] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Spectral clustering with the probabilistic cluster kernel. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.08.068] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Brayet J, Zehraoui F, Jeanson-Leh L, Israeli D, Tahi F. Towards a piRNA prediction using multiple kernel fusion and support vector machine. Bioinformatics 2015;30:i364-70. [PMID: 25161221 PMCID: PMC4147894 DOI: 10.1093/bioinformatics/btu441] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Yu G, Rangwala H, Domeniconi C, Zhang G, Zhang Z. Predicting Protein Function Using Multiple Kernels. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:219-233. [PMID: 26357091 DOI: 10.1109/tcbb.2014.2351821] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Chakraborty D, Maulik U. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2014;2:4300211. [PMID: 27170887 PMCID: PMC4848046 DOI: 10.1109/jtehm.2014.2375820] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2014] [Revised: 09/20/2014] [Accepted: 11/22/2014] [Indexed: 11/07/2022]

Abstract

Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.

Collapse

Charuvaka A, Rangwala H. Classifying Protein Sequences Using Regularized Multi-Task Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:1087-1098. [PMID: 26357046 DOI: 10.1109/tcbb.2014.2338303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification. IEEE Trans Nanobioscience 2014;13:152-60. [DOI: 10.1109/tnb.2014.2312132] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Kuksa PP. Biological sequence classification with multivariate string kernels. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:1201-1210. [PMID: 24384708 DOI: 10.1109/tcbb.2013.15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Yu G, Rangwala H, Domeniconi C, Zhang G, Yu Z. Protein function prediction using multilabel ensemble classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:1045-57. [PMID: 24334396 DOI: 10.1109/tcbb.2013.111] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Hamp T, Goldberg T, Rost B. Accelerating the Original Profile Kernel. PLoS One 2013;8:e68459. [PMID: 23825697 PMCID: PMC3688983 DOI: 10.1371/journal.pone.0068459] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Accepted: 05/31/2013] [Indexed: 11/19/2022] Open

Xu X, Lu L, He P, Chen L. Protein localization prediction using random walks on graphs. BMC Bioinformatics 2013;14 Suppl 8:S4. [PMID: 23815126 PMCID: PMC3654884 DOI: 10.1186/1471-2105-14-s8-s4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Maulik U, Mukhopadhyay A, Chakraborty D. Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM. IEEE Trans Biomed Eng 2013;60:1111-7. [DOI: 10.1109/tbme.2012.2225622] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Maulik U, Sarkar A. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels. PLoS One 2013;8:e46468. [PMID: 23457439 PMCID: PMC3574063 DOI: 10.1371/journal.pone.0046468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Accepted: 09/04/2012] [Indexed: 11/18/2022] Open

Wang S, Huang Q, Jiang S, Tian Q, Qin L. Nearest-neighbor method using multiple neighborhood similarities for social media data mining. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2011.06.039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

BEYER OLIVER, CIMIANO PHILIPP. ONLINE SEMI-SUPERVISED GROWING NEURAL GAS. Int J Neural Syst 2012;22:1250023. [DOI: 10.1142/s0129065712500232] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Recursive weighted kernel regression for semi-supervised soft-sensing modeling of fed-batch processes. J Taiwan Inst Chem Eng 2012. [DOI: 10.1016/j.jtice.2011.06.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Mutual or Unrequited Love: Identifying Stable Clusters in Social Networks with Uni- and Bi-directional Links. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-30541-2_9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Bespalov D, Qi Y, Bai B, Shokoufandeh A. Sentiment Classification with Supervised Sequence Embedding. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/978-3-642-33460-3_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]

Nguyen TP, Ho TB. Detecting disease genes based on semi-supervised learning and protein-protein interaction networks. Artif Intell Med 2011;54:63-71. [PMID: 22000346 DOI: 10.1016/j.artmed.2011.09.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Revised: 05/24/2011] [Accepted: 09/01/2011] [Indexed: 11/19/2022]

Abstract

OBJECTIVE

Predicting or prioritizing the human genes that cause disease, or "disease genes", is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is carried out upon the key assumption of "the network-neighbour of a disease gene is likely to cause the same or a similar disease", and mostly employs data regarding well-known disease genes, using supervised learning methods. This work aims to find an effective method to exploit the disease gene neighbourhood and the integration of several useful omics data sources, which potentially enhance disease gene predictions.

METHODS

We have presented a novel method to effectively predict disease genes by exploiting, in the semi-supervised learning (SSL) scheme, data regarding both disease genes and disease gene neighbours via protein-protein interaction network. Multiple proteomic and genomic data were integrated from six biological databases, including Universal Protein Resource, Interologous Interaction Database, Reactome, Gene Ontology, Pfam, and InterDom, and a gene expression dataset.

RESULTS

By employing a 10 times stratified 10-fold cross validation, the SSL method performs better than the k-nearest neighbour method and the support vector machines method in terms of sensitivity of 85%, specificity of 79%, precision of 81%, accuracy of 82%, and a balanced F-function of 83%. The other comparative experimental evaluations demonstrate advantages of the proposed method given a small amount of labeled data with accuracy of 78%. We have applied the proposed method to detect 572 putative disease genes, which are biologically validated by some indirect ways.

CONCLUSION

Semi-supervised learning improved ability to study disease genes, especially a specific disease when the known disease genes (as labeled data) are very often limited. In addition to the computational improvement, the analysis of predicted disease proteins indicates that the findings are beneficial in deciphering the pathogenic mechanisms.

Collapse

Shi M, Zhang B. Semi-supervised learning improves gene expression-based prediction of cancer recurrence. ACTA ACUST UNITED AC 2011;27:3017-23. [PMID: 21893520 DOI: 10.1093/bioinformatics/btr502] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Conotoxin protein classification using free scores of words and support vector machines. BMC Bioinformatics 2011;12:217. [PMID: 21619696 PMCID: PMC3133552 DOI: 10.1186/1471-2105-12-217] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/29/2011] [Indexed: 11/23/2022] Open

Gui J, Wang SL, Lei YK. Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data. Artif Intell Med 2011;50:181-91. [PMID: 20599367 DOI: 10.1016/j.artmed.2010.05.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Revised: 04/28/2010] [Accepted: 05/18/2010] [Indexed: 11/29/2022]

A sparse large margin semi-supervised learning method. J Korean Stat Soc 2010. [DOI: 10.1016/j.jkss.2009.10.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Caragea C, Caragea D, Silvescu A, Honavar V. Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models. BMC Bioinformatics 2010;11 Suppl 8:S6. [PMID: 21034431 PMCID: PMC2966293 DOI: 10.1186/1471-2105-11-s8-s6] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Toussaint NC, Widmer C, Kohlbacher O, Rätsch G. Exploiting physico-chemical properties in string kernels. BMC Bioinformatics 2010;11 Suppl 8:S7. [PMID: 21034432 PMCID: PMC2966294 DOI: 10.1186/1471-2105-11-s8-s7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Santos MA, Turinsky AL, Ong S, Tsai J, Berger MF, Badis G, Talukder S, Gehrke AR, Bulyk ML, Hughes TR, Wodak SJ. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences. Nucleic Acids Res 2010;38:7927-42. [PMID: 20705649 PMCID: PMC3001082 DOI: 10.1093/nar/gkq714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Classifying proteins using gapped Markov feature pairs. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2009.12.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Shen YQ, Lang BF, Burger G. Diversity and dispersal of a ubiquitous protein family: acyl-CoA dehydrogenases. Nucleic Acids Res 2009;37:5619-31. [PMID: 19625492 PMCID: PMC2761260 DOI: 10.1093/nar/gkp566] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Min R, Bonner A, Li J, Zhang Z. Learned random-walk kernels and empirical-map kernels for protein sequence classification. J Comput Biol 2009;16:457-74. [PMID: 19254184 DOI: 10.1089/cmb.2008.0031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Kuksa P, Huang PH, Pavlovic V. Efficient use of unlabeled data for protein sequence classification: a comparative study. BMC Bioinformatics 2009;10 Suppl 4:S2. [PMID: 19426450 PMCID: PMC2681072 DOI: 10.1186/1471-2105-10-s4-s2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Semi-supervised Bayesian ARTMAP. APPL INTELL 2009. [DOI: 10.1007/s10489-009-0167-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Jung I, Kim D. SIMPRO: simple protein homology detection method by using indirect signals. Bioinformatics 2009;25:729-35. [DOI: 10.1093/bioinformatics/btp048] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection. BMC Bioinformatics 2008;9:298. [PMID: 18590572 PMCID: PMC2459191 DOI: 10.1186/1471-2105-9-298] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 07/01/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.

RESULTS

The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC50 scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.

CONCLUSION

The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.

Collapse

Shah AR, Oehmen CS, Webb-Robertson BJ. SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics 2008;24:783-90. [DOI: 10.1093/bioinformatics/btn028] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Mitra J, Mundra P, Kulkarni BD, Jayaraman VK. Using Recurrence Quantification Analysis Descriptors for Protein Sequence Classification with Support Vector Machines. J Biomol Struct Dyn 2007;25:289-98. [DOI: 10.1080/07391102.2007.10507177] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]