Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee YS, Krishnan A, Zhu Q, Troyanskaya OG. Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. ACTA ACUST UNITED AC 2013;29:3036-44. [PMID: 24037214 PMCID: PMC3834796 DOI: 10.1093/bioinformatics/btt529] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Lee YS, Krishnan A, Zhu Q, Troyanskaya OG. Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. ACTA ACUST UNITED AC 2013;29:3036-44. [PMID: 24037214 PMCID: PMC3834796 DOI: 10.1093/bioinformatics/btt529] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Hawkins NT, Maldaver M, Yannakopoulos A, Guare LA, Krishnan A. Systematic tissue annotations of genomics samples by modeling unstructured metadata. Nat Commun 2022;13:6736. [PMID: 36347858 PMCID: PMC9643451 DOI: 10.1038/s41467-022-34435-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/25/2022] [Indexed: 11/10/2022] Open

Mancuso CA, Canfield JL, Singla D, Krishnan A. A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes. Nucleic Acids Res 2020;48:e125. [PMID: 33074331 PMCID: PMC7708069 DOI: 10.1093/nar/gkaa881] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 08/24/2020] [Accepted: 09/28/2020] [Indexed: 12/15/2022] Open

Bernstein MN, Ma Z, Gleicher M, Dewey CN. CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology. iScience 2020;24:101913. [PMID: 33364592 PMCID: PMC7753962 DOI: 10.1016/j.isci.2020.101913] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/28/2020] [Accepted: 12/02/2020] [Indexed: 12/15/2022] Open

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020;36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open

Abstract

Background

Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem.

Results

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

Availability and implementation

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

Contact

arjun@msu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020;36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]

Abstract

BACKGROUND

RESULTS

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

AVAILABILITY AND IMPLEMENTATION

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

CONTACT

arjun@msu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Wang Z, Lachmann A, Ma'ayan A. Mining data and metadata from the gene expression omnibus. Biophys Rev 2019;11:103-110. [PMID: 30594974 PMCID: PMC6381352 DOI: 10.1007/s12551-018-0490-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 12/04/2018] [Indexed: 12/16/2022] Open

Lee YS, Krishnan A, Oughtred R, Rust J, Chang CS, Ryu J, Kristensen VN, Dolinski K, Theesfeld CL, Troyanskaya OG. A Computational Framework for Genome-wide Characterization of the Human Disease Landscape. Cell Syst 2019;8:152-162.e6. [PMID: 30685436 PMCID: PMC7374759 DOI: 10.1016/j.cels.2018.12.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 10/16/2018] [Accepted: 12/20/2018] [Indexed: 01/21/2023]

Giles CB, Brown CA, Ripperger M, Dennis Z, Roopnarinesingh X, Porter H, Perz A, Wren JD. ALE: automated label extraction from GEO metadata. BMC Bioinformatics 2017;18:509. [PMID: 29297276 PMCID: PMC5751806 DOI: 10.1186/s12859-017-1888-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

NCBI’s Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence.

Results

Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method.

Conclusion

Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.

Collapse

Schomburg I, Jeske L, Ulbrich M, Placzek S, Chang A, Schomburg D. The BRENDA enzyme information system–From a database to an expert system. J Biotechnol 2017;261:194-206. [DOI: 10.1016/j.jbiotec.2017.04.020] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 04/11/2017] [Accepted: 04/18/2017] [Indexed: 02/06/2023]

Breschi A, Gingeras TR, Guigó R. Comparative transcriptomics in human and mouse. Nat Rev Genet 2017;18:425-440. [PMID: 28479595 PMCID: PMC6413734 DOI: 10.1038/nrg.2017.19] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Amar D, Izraeli S, Shamir R. Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications. Oncogene 2017;36:3375-3383. [PMID: 28092680 PMCID: PMC5485176 DOI: 10.1038/onc.2016.489] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 02/07/2023]

Yang G, Hu Z. Gene Feature Extraction Based on Nonnegative Dual Graph Regularized Latent Low-Rank Representation. BIOMED RESEARCH INTERNATIONAL 2017;2017:1096028. [PMID: 28466003 PMCID: PMC5390636 DOI: 10.1155/2017/1096028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2017] [Accepted: 03/13/2017] [Indexed: 01/16/2023]

Angeles-Albores D, N Lee RY, Chan J, Sternberg PW. Tissue enrichment analysis for C. elegans genomics. BMC Bioinformatics 2016;17:366. [PMID: 27618863 PMCID: PMC5020436 DOI: 10.1186/s12859-016-1229-9] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Accepted: 08/26/2016] [Indexed: 01/04/2023] Open

He F, Yoo S, Wang D, Kumari S, Gerstein M, Ware D, Maslov S. Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016;86:472-480. [PMID: 27015116 DOI: 10.1111/tpj.13175] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 02/24/2016] [Accepted: 03/21/2016] [Indexed: 06/05/2023]

Li WV, Razaee ZS, Li JJ. Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states. BMC Genomics 2016;17 Suppl 1:10. [PMID: 26817822 PMCID: PMC4895267 DOI: 10.1186/s12864-015-2303-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Abstract

Background

The dynamics of epigenomic marks in their relevant chromatin states regulate distinct gene expression patterns, biological functions and phenotypic variations in biological processes. The availability of high-throughput epigenomic data generated by next-generation sequencing technologies allows a data-driven approach to evaluate the similarities and differences of diverse tissue and cell types in terms of epigenomic features. While ChromImpute has allowed for the imputation of large-scale epigenomic information to yield more robust data to capture meaningful relationships between biological samples, widely used methods such as hierarchical clustering and correlation analysis cannot adequately utilize epigenomic data to accurately reveal the distinction and grouping of different tissue and cell types.

Methods

We utilize a three-step testing procedure–ANOVA, t test and overlap test to identify tissue/cell-type- associated enhancers and promoters and to calculate a newly defined Epigenomic Overlap Measure (EPOM). EPOM results in a clear correspondence map of biological samples from different tissue and cell types through comparison of epigenomic marks evaluated in their relevant chromatin states.

Results

Correspondence maps by EPOM show strong capability in distinguishing and grouping different tissue and cell types and reveal biologically meaningful similarities between Heart and Muscle, Blood & T-cell and HSC & B-cell, Brain and Neurosphere, etc. The gene ontology enrichment analysis both supports and explains the discoveries made by EPOM and suggests that the associated enhancers and promoters demonstrate distinguishable functions across tissue and cell types. Moreover, the tissue/cell-type-associated enhancers and promoters show enrichment in the disease-related SNPs that are also associated with the corresponding tissue or cell types. This agreement suggests the potential of identifying causal genetic variants relevant to cell-type-specific diseases from our identified associated enhancers and promoters.

Conclusions

The proposed EPOM measure demonstrates superior capability in grouping and finding a clear correspondence map of biological samples from different tissue and cell types. The identified associated enhancers and promoters provide a comprehensive catalog to study distinct biological processes and disease variants in different tissue and cell types. Our results also find that the associated promoters exhibit more cell-type-specific functions than the associated enhancers do, suggesting that the non-associated promoters have more housekeeping functions than the non-associated enhancers.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-2303-9) contains supplementary material, which is available to authorized users.

Collapse

SOKOLOV ARTEM, PAULL EVANO, STUART JOSHUAM. ONE-CLASS DETECTION OF CELL STATES IN TUMOR SUBTYPES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016;21:405-16. [PMID: 26776204 PMCID: PMC4856035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Ramírez-Gordillo D, Powers TR, van Velkinburgh JC, Trujillo-Provencio C, Schilkey F, Serrano EE. RNA-Seq and microarray analysis of the Xenopus inner ear transcriptome discloses orthologous OMIM(®) genes for hereditary disorders of hearing and balance. BMC Res Notes 2015;8:691. [PMID: 26582541 PMCID: PMC4652436 DOI: 10.1186/s13104-015-1485-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 09/21/2015] [Indexed: 12/14/2022] Open

Amar D, Hait T, Izraeli S, Shamir R. Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets. Nucleic Acids Res 2015;43:7779-89. [PMID: 26261215 PMCID: PMC4652780 DOI: 10.1093/nar/gkv810] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2015] [Revised: 07/23/2015] [Accepted: 07/29/2015] [Indexed: 12/18/2022] Open

Kim M, Zorraquino V, Tagkopoulos I. Microbial forensics: predicting phenotypic characteristics and environmental conditions from large-scale gene expression profiles. PLoS Comput Biol 2015;11:e1004127. [PMID: 25774498 PMCID: PMC4361189 DOI: 10.1371/journal.pcbi.1004127] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 01/14/2015] [Indexed: 01/13/2023] Open

Abstract

A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium Escherichia coli that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5%) to 98.3% (±2.3%) for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain) achieved 10.6% (±1.0%) higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications.

The transcriptional profile of an organism contains clues about the environmental context in which it has evolved and currently lives, its behavior and cellular state. It is yet unclear, however, how much information can be efficiently extracted and how it can be used to classify new samples with respect to their environmental and genetic characteristics. Here, we have constructed an extensive transcriptome compendium of Escherichia coli that we have further enriched via an iterative learning approach. We then apply an ensemble of various machine learning algorithms to infer environmental and cellular information such as strain, growth phase, medium, oxygen level, antibiotic and carbon source. Functional analysis of the most informative genes provides mechanistic insights and palpable hypotheses regarding their role in each environmental or genetic context. Our work argues that genome-scale gene expression can be a multi-purpose marker for identifying latent, heterogeneous cellular and environmental states and that optimal classification can be achieved with a feature set of a couple hundred genes that might not necessarily have the most pronounced differential expression in the respective conditions.

Collapse

Sebestyén E, Zawisza M, Eyras E. Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res 2015;43:1345-56. [PMID: 25578962 PMCID: PMC4330360 DOI: 10.1093/nar/gku1392] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Sparse representation for tumor classification based on feature extraction using latent low-rank representation. BIOMED RESEARCH INTERNATIONAL 2014;2014:420856. [PMID: 24678505 PMCID: PMC3942202 DOI: 10.1155/2014/420856] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 12/27/2013] [Accepted: 12/27/2013] [Indexed: 11/17/2022]