Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Di Camillo B, Sanavia T, Martini M, Jurman G, Sambo F, Barla A, Squillario M, Furlanello C, Toffolo G, Cobelli C. Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment. PLoS One 2012;7:e32200. [PMID: 22403633 PMCID: PMC3293892 DOI: 10.1371/journal.pone.0032200] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/24/2012] [Indexed: 01/04/2023] Open

For:	Di Camillo B, Sanavia T, Martini M, Jurman G, Sambo F, Barla A, Squillario M, Furlanello C, Toffolo G, Cobelli C. Effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment. PLoS One 2012;7:e32200. [PMID: 22403633 PMCID: PMC3293892 DOI: 10.1371/journal.pone.0032200] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/24/2012] [Indexed: 01/04/2023] Open

Number

Cited by Other Article(s)

Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024;19:1297-1307. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]

Gagliardi I, Campolo F, Borges de Souza P, Rossi L, Albertelli M, Grillo F, Caputi L, Mazza M, Faggiano A, Zatelli MC. Comparative Targeted Genome Profiling between Solid and Liquid Biopsies in Gastroenteropancreatic Neuroendocrine Neoplasms: A Proof-of-Concept Pilot Study. Neuroendocrinology 2024:1-12. [PMID: 39447548 DOI: 10.1159/000541346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 06/19/2024] [Indexed: 10/26/2024]

Lee Y, Cappellato M, Di Camillo B. Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease. Gigascience 2022;12:giad083. [PMID: 37882604 PMCID: PMC10600917 DOI: 10.1093/gigascience/giad083] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/23/2023] [Accepted: 09/17/2023] [Indexed: 10/27/2023] Open

Abstract

BACKGROUND

Biomarker discovery exploiting feature importance of machine learning has risen recently in the microbiome landscape with its high predictive performance in several disease states. To have a concrete selection among a high number of features, recursive feature elimination (RFE) has been widely used in the bioinformatics field. However, machine learning-based RFE has factors that decrease the stability of feature selection. In this article, we suggested methods to improve stability while sustaining performance.

RESULTS

We exploited the abundance matrices of the gut microbiome (283 taxa at species level and 220 at genus level) to classify between patients with inflammatory bowel disease (IBD) and healthy control (1,569 samples). We found that applying an already published data transformation before RFE improves feature stability significantly. Moreover, we performed an in-depth evaluation of different variants of the data transformation and identify those that demonstrate better improvement in stability while not sacrificing classification performance. To ensure a robust comparison, we evaluated stability using various similarity metrics, distances, the common number of features, and the ability to filter out noise features. We were able to confirm that the mapping by the Bray-Curtis similarity matrix before RFE consistently improves the stability while maintaining good performance. Multilayer perceptron algorithm exhibited the highest performance among 8 different machine learning algorithms when a large number of features (a few hundred) were considered based on the best performance across 100 bootstrapped internal test sets. Conversely, when utilizing only a limited number of biomarkers as a trade-off between optimal performance and method generalizability, the random forest algorithm demonstrated the best performance. Using the optimal pipeline we developed, we identified 14 biomarkers for IBD at the species level and analyzed their roles using Shapley additive explanations.

CONCLUSION

Taken together, our work not only showed how to improve biomarker discovery in the metataxonomic field without sacrificing classification performance but also provided useful insights for future comparative studies.

Collapse

White BS, Khan SA, Mason MJ, Ammad-Ud-Din M, Potdar S, Malani D, Kuusanmäki H, Druker BJ, Heckman C, Kallioniemi O, Kurtz SE, Porkka K, Tognon CE, Tyner JW, Aittokallio T, Wennerberg K, Guinney J. Bayesian multi-source regression and monocyte-associated gene expression predict BCL-2 inhibitor resistance in acute myeloid leukemia. NPJ Precis Oncol 2021;5:71. [PMID: 34302041 PMCID: PMC8302655 DOI: 10.1038/s41698-021-00209-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 06/22/2021] [Indexed: 11/09/2022] Open

Affiliation(s)

Brian S White Computational Oncology, Sage Bionetworks, Seattle, WA, USA. The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
Suleiman A Khan Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Mike J Mason Computational Oncology, Sage Bionetworks, Seattle, WA, USA
Muhammad Ammad-Ud-Din Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Swapnil Potdar Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Disha Malani Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Heikki Kuusanmäki Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland Biotech Research & Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark
Brian J Druker Howard Hughes Medical Institute, Portland, OR, USA Division of Hematology and Medical Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
Caroline Heckman Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Olli Kallioniemi Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland Scilifelab, Karolinska Institute, Solna, Sweden
Stephen E Kurtz Division of Hematology and Medical Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
Kimmo Porkka HUS Comprehensive Cancer Center, Hematology Research Unit Helsinki and iCAN Digital Precision Cancer Center Medicine Flagship, University of Helsinki, Helsinki, Finland
Cristina E Tognon Howard Hughes Medical Institute, Portland, OR, USA Division of Hematology and Medical Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
Jeffrey W Tyner Division of Hematology and Medical Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
Tero Aittokallio Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland Department of Mathematics and Statistics, University of Turku, Turku, Finland Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
Krister Wennerberg Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland Biotech Research & Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark
Justin Guinney Computational Oncology, Sage Bionetworks, Seattle, WA, USA Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA

Collapse

Ye Z, Ke H, Chen S, Cruz-Cano R, He X, Zhang J, Dorgan J, Milton DK, Ma T. Biomarker Categorization in Transcriptomic Meta-Analysis by Concordant Patterns With Application to Pan-Cancer Studies. Front Genet 2021;12:651546. [PMID: 34276766 PMCID: PMC8283696 DOI: 10.3389/fgene.2021.651546] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/28/2021] [Indexed: 01/21/2023] Open

Comin M, Di Camillo B, Pizzi C, Vandin F. Comparison of microbiome samples: methods and computational challenges. Brief Bioinform 2020;22:88-95. [PMID: 32577746 DOI: 10.1093/bib/bbaa121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 05/09/2020] [Accepted: 05/18/2020] [Indexed: 12/14/2022] Open

Brichetto G, Monti Bragadin M, Fiorini S, Battaglia MA, Konrad G, Ponzio M, Pedullà L, Verri A, Barla A, Tacchino A. The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach. Neurol Sci 2019;41:459-462. [PMID: 31659583 PMCID: PMC7005074 DOI: 10.1007/s10072-019-04093-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 09/28/2019] [Indexed: 11/30/2022]

Di Camillo B, Hakaste L, Sambo F, Gabriel R, Kravic J, Isomaa B, Tuomilehto J, Alonso M, Longato E, Facchinetti A, Groop LC, Cobelli C, Tuomi T. HAPT2D: high accuracy of prediction of T2D with a model combining basic and advanced data depending on availability. Eur J Endocrinol 2018;178:331-341. [PMID: 29371336 DOI: 10.1530/eje-17-0921] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 01/25/2018] [Indexed: 12/26/2022]

Affiliation(s)

Barbara Di Camillo Department of Information EngineeringUniversity of Padova, Padova, Italy
Liisa Hakaste EndocrinologyAbdominal Centre, University of Helsinki and Helsinki University Hospital, Research Program for Diabetes and Obesity, University of Helsinki, Helsinki, Finland Folkhälsan Research CenterHelsinki, Finland
Francesco Sambo Department of Information EngineeringUniversity of Padova, Padova, Italy
Rafael Gabriel Department of International HealthNational School of Public Health, Instituto de Salud Carlos III, Madrid, Spain Asociación Española Para el Desarrollo de la Epidemiología Clínica (AEDEC)Madrid, Spain
Jasmina Kravic Lund University Diabetes CentreDepartment of Clinical Sciences Malmö, Lund University, Skåne University Hospital, Malmö, Sweden
Bo Isomaa Folkhälsan Research CenterHelsinki, Finland
Jaakko Tuomilehto Asociación Española Para el Desarrollo de la Epidemiología Clínica (AEDEC)Madrid, Spain Dasman Diabetes InstituteDasman, Kuwait City, Kuwait Department of Neuroscience and Preventive MedicineDanube-University Krems, Krems, Austria Saudi Diabetes Research GroupKing Abdulaziz University, Jeddah, Saudi Arabia
Margarita Alonso Department of International HealthNational School of Public Health, Instituto de Salud Carlos III, Madrid, Spain Asociación Española Para el Desarrollo de la Epidemiología Clínica (AEDEC)Madrid, Spain
Enrico Longato Department of Information EngineeringUniversity of Padova, Padova, Italy
Andrea Facchinetti Department of Information EngineeringUniversity of Padova, Padova, Italy
Leif C Groop Lund University Diabetes CentreDepartment of Clinical Sciences Malmö, Lund University, Skåne University Hospital, Malmö, Sweden Institute for Molecular Medicine Finland (FIMM)University of Helsinki, Helsinki, Finland
Claudio Cobelli Department of Information EngineeringUniversity of Padova, Padova, Italy
Tiinamaija Tuomi EndocrinologyAbdominal Centre, University of Helsinki and Helsinki University Hospital, Research Program for Diabetes and Obesity, University of Helsinki, Helsinki, Finland Folkhälsan Research CenterHelsinki, Finland Institute for Molecular Medicine Finland (FIMM)University of Helsinki, Helsinki, Finland

Collapse

Vitova L, Tuma Z, Moravec J, Kvapil M, Matejovic M, Mares J. Early urinary biomarkers of diabetic nephropathy in type 1 diabetes mellitus show involvement of kallikrein-kinin system. BMC Nephrol 2017;18:112. [PMID: 28359252 PMCID: PMC5372325 DOI: 10.1186/s12882-017-0519-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 03/21/2017] [Indexed: 01/06/2023] Open

Abstract

BACKGROUND

Additional urinary biomarkers for diabetic nephropathy (DN) are needed, providing early and reliable diagnosis and new insights into its mechanisms. Rigorous selection criteria and homogeneous study population may improve reproducibility of the proteomic approach.

METHODS

Long-term type 1 diabetes patients without metabolic comorbidities were included, 11 with sustained microalbuminuria (MA) and 14 without MA (nMA). Morning urine proteins were precipitated and resolved by 2D electrophoresis. Principal component analysis (PCA) and Projection to latent structures discriminatory analysis (PLS-DA) were adopted to assess general data validity, to pick protein fractions for identification with mass spectrometry (MS), and to test predictive value of the resulting model.

RESULTS

Proteins (n = 113) detected in more than 90% patients were considered representative. Unsupervised PCA showed excellent natural data clustering without outliers. Protein spots reaching Variable Importance in Projection score above 1 in PLS (n = 42) were subjected to MS, yielding 33 positive identifications. The PLS model rebuilt with these proteins achieved accurate classification of all patients (R2X = 0.553, R2Y = 0.953, Q2 = 0.947). Thus, multiple earlier recognized biomarkers of DN were confirmed and several putative new biomarkers suggested. Among them, the highest significance was met in kininogen-1. Its activation products detected in nMA patients exceeded by an order of magnitude the amount found in MA patients.

CONCLUSIONS

Reducing metabolic complexity of the diseased and control groups by meticulous patients' selection allows to focus the biomarker search in DN. Suggested new biomarkers, particularly kininogen fragments, exhibit the highest degree of correlation with MA and substantiate validation in larger and more varied cohorts.

Collapse

Gangeh MJ, Zarkoob H, Ghodsi A. Fast and Scalable Feature Selection for Gene Expression Data Using Hilbert-Schmidt Independence Criterion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:167-181. [PMID: 28182548 DOI: 10.1109/tcbb.2016.2631164] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Omae K, Komori O, Eguchi S. Reproducible detection of disease-associated markers from gene expression data. BMC Med Genomics 2016;9:53. [PMID: 27538512 PMCID: PMC4991096 DOI: 10.1186/s12920-016-0214-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 08/03/2016] [Indexed: 01/22/2023] Open

Kamkar I, Gupta SK, Phung D, Venkatesh S. Stabilizing l1-norm prediction models by supervised feature grouping. J Biomed Inform 2015;59:149-68. [PMID: 26689771 DOI: 10.1016/j.jbi.2015.11.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 11/18/2015] [Accepted: 11/23/2015] [Indexed: 01/05/2023]

Georga EI, Protopappas VC, Polyzos D, Fotiadis DI. Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models. Med Biol Eng Comput 2015;53:1305-18. [PMID: 25773366 DOI: 10.1007/s11517-015-1263-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 02/27/2015] [Indexed: 01/04/2023]

Pegolo S, Di Camillo B, Montesissa C, Cannizzo FT, Biolatti B, Bargelloni L. Toxicogenomic markers for corticosteroid treatment in beef cattle: Integrated analysis of transcriptomic data. Food Chem Toxicol 2015;77:1-11. [DOI: 10.1016/j.fct.2014.12.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Revised: 11/26/2014] [Accepted: 12/02/2014] [Indexed: 11/29/2022]

Sambo F, Malovini A, Sandholm N, Stavarachi M, Forsblom C, Mäkinen VP, Harjutsalo V, Lithovius R, Gordin D, Parkkonen M, Saraheimo M, Thorn LM, Tolonen N, Wadén J, He B, Osterholm AM, Tuomilehto J, Lajer M, Salem RM, McKnight AJ, Tarnow L, Panduru NM, Barbarini N, Di Camillo B, Toffolo GM, Tryggvason K, Bellazzi R, Cobelli C, Groop PH. Novel genetic susceptibility loci for diabetic end-stage renal disease identified through robust naive Bayes classification. Diabetologia 2014;57:1611-22. [PMID: 24871321 DOI: 10.1007/s00125-014-3256-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 04/11/2014] [Indexed: 10/25/2022]

Abstract

AIMS/HYPOTHESIS

Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD.

METHODS

We exploited a novel algorithm, 'Bag of Naive Bayes', whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK-Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US).

RESULTS

Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case-control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno.

CONCLUSIONS/INTERPRETATION

This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.

Collapse

Östlund G, Sonnhammer EL. Avoiding pitfalls in gene (co)expression meta-analysis. Genomics 2014;103:21-30. [DOI: 10.1016/j.ygeno.2013.10.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Revised: 09/30/2013] [Accepted: 10/22/2013] [Indexed: 11/16/2022]

Di Camillo B, Sambo F, Toffolo G, Cobelli C. ABACUS: an entropy-based cumulative bivariate statistic robust to rare variants and different direction of genotype effect. ACTA ACUST UNITED AC 2013;30:384-91. [PMID: 24292361 DOI: 10.1093/bioinformatics/btt697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Lauria M. Rank-based transcriptional signatures. SYSTEMS BIOMEDICINE 2013;1:228-239. [DOI: 10.4161/sysb.25982] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Wu MY, Dai DQ, Zhang XF, Zhu Y. Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm. PLoS One 2013;8:e66256. [PMID: 23799085 PMCID: PMC3684607 DOI: 10.1371/journal.pone.0066256] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 05/02/2013] [Indexed: 11/29/2022] Open

Abstract

In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student’s t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes.

Collapse

Zycinski G, Barla A, Squillario M, Sanavia T, Camillo BD, Verri A. Knowledge Driven Variable Selection (KDVS) - a new approach to enrichment analysis of gene signatures obtained from high-throughput data. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013;8:2. [PMID: 23302187 PMCID: PMC3605163 DOI: 10.1186/1751-0473-8-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/13/2012] [Indexed: 11/10/2022]

Abstract

Background

High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power.

Results

We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method.

Conclusions

We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches.

Collapse

Jurman G, Riccadonna S, Visintainer R, Furlanello C. Algebraic comparison of partial lists in bioinformatics. PLoS One 2012;7:e36540. [PMID: 22615778 PMCID: PMC3355159 DOI: 10.1371/journal.pone.0036540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 04/06/2012] [Indexed: 12/20/2022] Open