Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Thompson JA, Tan J, Greene CS. Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ 2016;4:e1621. [PMID: 26844019 PMCID: PMC4736986 DOI: 10.7717/peerj.1621] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 01/02/2016] [Indexed: 01/08/2023] Open

For:	Thompson JA, Tan J, Greene CS. Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ 2016;4:e1621. [PMID: 26844019 PMCID: PMC4736986 DOI: 10.7717/peerj.1621] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 01/02/2016] [Indexed: 01/08/2023] Open

Number

Cited by Other Article(s)

Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024;724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]

Wang B, Luan Y. Evaluation of normalization methods for predicting quantitative phenotypes in metagenomic data analysis. Front Genet 2024;15:1369628. [PMID: 38903761 PMCID: PMC11188486 DOI: 10.3389/fgene.2024.1369628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open

Skubleny D, Ghosh S, Spratlin J, Schiller DE, Rayat GR. Feature-specific quantile normalization and feature-specific mean-variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data. BMC Bioinformatics 2024;25:136. [PMID: 38549046 PMCID: PMC11265146 DOI: 10.1186/s12859-024-05759-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/20/2024] [Indexed: 04/02/2024] Open

Abstract

BACKGROUND

Cross-platform normalization seeks to minimize technological bias between microarray and RNAseq whole-transcriptome data. Incorporating multiple gene expression platforms permits external validation of experimental findings, and augments training sets for machine learning models. Here, we compare the performance of Feature Specific Quantile Normalization (FSQN) to a previously used but unvalidated and uncharacterized method we label as Feature Specific Mean Variance Normalization (FSMVN). We evaluate the performance of these methods for bidirectional normalization in the context of nested feature selection.

RESULTS

FSQN and FSMVN provided clinically equivalent bidirectional model performance with and without feature selection for colon CMS and breast PAM50 classification. Using principal component analysis, we determine that these methods eliminate batch effects related to technological platforms. Without feature selection, no statistical difference was identified between the performance of FSQN and FSMVN of cross-platform data compared to within-platform distributions. Under optimal feature selection conditions, balanced accuracy was FSQN and FSMVN were statistically equivalent to the within-platform distribution performance in multivariable linear regression analysis. FSQN and FSMVN also provided similar performance to within-platform distributions as the number of selected genes used to create models decreases.

CONCLUSIONS

In the context of generating supervised machine learning classifiers for molecular subtypes, FSQN and FSMVN are equally effective. Under optimal modeling conditions, FSQN and FSMVN provide equivalent model accuracy performance on cross-platform normalization data compared to within-platform data. Using cross-platform data should still be approached with caution as subtle performance differences may exist depending on the classification problem, training, and testing distributions.

Collapse

Wang B, Sun F, Luan Y. Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity. Sci Rep 2024;14:7024. [PMID: 38528097 DOI: 10.1038/s41598-024-57670-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/20/2024] [Indexed: 03/27/2024] Open

Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023;10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open

Abstract

Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.

Collapse

Zhou M, Bao S, Gong T, Wang Q, Sun J, Li J, Lu M, Sun W, Su J, Chen H, Liu Z. The transcriptional landscape and diagnostic potential of long non-coding RNAs in esophageal squamous cell carcinoma. Nat Commun 2023;14:3799. [PMID: 37365153 DOI: 10.1038/s41467-023-39530-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 06/14/2023] [Indexed: 06/28/2023] Open

Affiliation(s)

Meng Zhou School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, 325027, Wenzhou, P. R. China
Siqi Bao School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, 325027, Wenzhou, P. R. China
Tongyang Gong State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China
Qiang Wang Department of Anesthesiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China
Jie Sun School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, 325027, Wenzhou, P. R. China
Jiaqi Li School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, 325027, Wenzhou, P. R. China
Minyi Lu State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China
Wanyuan Sun State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China
Jianzhong Su School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, 325027, Wenzhou, P. R. China.
Hongyan Chen State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China. Key Laboratory of Cancer and Microbiome, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China.
Zhihua Liu State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 100021, Beijing, P. R. China.

Collapse

Qin ZX, Chen GZ, Yang QQ, Wu YJ, Sun CQ, Yang XM, Luo M, Yi CR, Zhu J, Chen WH, Liu Z. Cross-Platform Transcriptomic Data Integration, Profiling, and Mining in Vibrio cholerae. Microbiol Spectr 2023;11:e0536922. [PMID: 37191528 PMCID: PMC10269641 DOI: 10.1128/spectrum.05369-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/24/2023] [Indexed: 05/17/2023] Open

Abstract

A large number of transcriptome studies generate important data and information for the study of pathogenic mechanisms of pathogens, including Vibrio cholerae. V. cholerae transcriptome data include RNA-seq and microarray: microarray data mainly include clinical human and environmental samples, and RNA-seq data mainly focus on laboratory processing conditions, including different stresses and experimental animals in vivo. In this study, we integrated the data sets of both platforms using Rank-in and the Limma R package normalized Between Arrays function, achieving the first cross-platform transcriptome data integration of V. cholerae. By integrating the entire transcriptome data, we obtained the profiles of the most active or silent genes. By transferring the integrated expression profiles into the weighted correlation network analysis (WGCNA) pipeline, we identified the important functional modules of V. cholerae in vitro stress treatment, gene manipulation, and in vitro culture as DNA transposon, chemotaxis and signaling, signal transduction, and secondary metabolic pathways, respectively. The analysis of functional module hub genes revealed the uniqueness of clinical human samples; however, under specific expression patterning, the Δhns, ΔoxyR1 strains, and tobramycin treatment group showed high expression profile similarity with human samples. By constructing a protein-protein interaction (PPI) interaction network, we discovered several unreported novel protein interactions within transposon functional modules. IMPORTANCE We used two techniques to integrate RNA-seq data for laboratory studies with clinical microarray data for the first time. The interactions between V. cholerae genes were obtained from a global perspective, as well as comparing the similarity between clinical human samples and the current experimental conditions, and uncovering the functional modules that play a major role under different conditions. We believe that this data integration can provide us with some insight and basis for elucidating the pathogenesis and clinical control of V. cholerae.

Collapse

Sun R, Zhu H, Wang Y, Wang J, Jiang C, Cao Q, Zhang Y, Zhang Y, Yuan S, Liu Q. Circular RNA expression and the competitive endogenous RNA network in pathological, age-related macular degeneration events: A cross-platform normalization study. J Biomed Res 2023;37:367-381. [PMID: 37366063 PMCID: PMC10541779 DOI: 10.7555/jbr.37.20230010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/20/2023] [Accepted: 02/20/2023] [Indexed: 06/28/2023] Open

Sadeghi M, Karimi MR, Karimi AH, Ghorbanpour Farshbaf N, Barzegar A, Schmitz U. Network-Based and Machine-Learning Approaches Identify Diagnostic and Prognostic Models for EMT-Type Gastric Tumors. Genes (Basel) 2023;14:genes14030750. [PMID: 36981021 PMCID: PMC10048224 DOI: 10.3390/genes14030750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/10/2023] [Accepted: 03/14/2023] [Indexed: 03/30/2023] Open

Foltz SM, Greene CS, Taroni JN. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun Biol 2023;6:222. [PMID: 36841852 PMCID: PMC9968332 DOI: 10.1038/s42003-023-04588-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 02/13/2023] [Indexed: 02/27/2023] Open

Zeng W, Li W, Huang K, Lin Z, Dai H, He Z, Liu R, Zeng Z, Qin G, Chen W, Wu Y. Predicting futile recanalization, malignant cerebral edema, and cerebral herniation using intelligible ensemble machine learning following mechanical thrombectomy for acute ischemic stroke. Front Neurol 2022;13:982783. [PMID: 36247767 PMCID: PMC9554641 DOI: 10.3389/fneur.2022.982783] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open

Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022;10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open

Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022;2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract

Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).

Collapse

Isali I, McClellan P, Calaway A, Prunty M, Abbosh P, Mishra K, Ponsky L, Markt S, Psutka SP, Bukavina L. Gene network profiling in muscle-invasive bladder cancer: A systematic review and meta-analysis. Urol Oncol 2022;40:197.e11-197.e23. [PMID: 35039218 PMCID: PMC10123538 DOI: 10.1016/j.urolonc.2021.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 10/17/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]

Belotti Y, Lim EH, Lim CT. The Role of the Extracellular Matrix and Tumor-Infiltrating Immune Cells in the Prognostication of High-Grade Serous Ovarian Cancer. Cancers (Basel) 2022;14:404. [PMID: 35053566 PMCID: PMC8773831 DOI: 10.3390/cancers14020404] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/05/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open

Udaondo Z. Big data and computational advancements for next generation of Microbial Biotechnology. Microb Biotechnol 2022;15:107-109. [PMID: 34713973 PMCID: PMC8719813 DOI: 10.1111/1751-7915.13936] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 09/09/2021] [Indexed: 11/30/2022] Open

Prognostic Matrisomal Gene Panel and Its Association with Immune Cell Infiltration in Head and Neck Carcinomas. Cancers (Basel) 2021;13:cancers13225761. [PMID: 34830910 PMCID: PMC8616409 DOI: 10.3390/cancers13225761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/09/2021] [Accepted: 11/13/2021] [Indexed: 01/04/2023] Open

Abstract

Simple Summary

Squamous cell carcinoma of the head and neck (SCCHN) is a heterogeneous group of tumors arising from squamous cells lining different anatomic sites. This type of malignancy has been mainly investigated by focusing primarily on tumor cells, but recent evidence highlighted the importance of the tumor microenvironment (TME) in cancer growth, progression and metastasis. Hence, we hypothesized that dysregulated matrisomal components could have a common association with patient survival, irrespective of the subsite of origin of the SCCHN. Using bioinformatic methods and public datasets, we successfully identified a gene panel with prognostic value in HPV-negative and non-metastatic node-negative tumors and demonstrated its association with immune cell infiltration.

Abstract

Squamous cell carcinoma of the head and neck (SCCHN) is common worldwide and related to several risk factors including smoking, alcohol consumption, poor dentition and human papillomavirus (HPV) infection. Different etiological factors may influence the tumor microenvironment and play a role in dictating response to therapeutics. Here, we sought to investigate whether an early-stage SCCHN-specific prognostic matrisome-derived gene signature could be identified for HPV-negative SCCHN patients (n = 168), by applying a bioinformatics pipeline to the publicly available SCCHN-TCGA dataset. We identified six matrisome-derived genes with high association with prognostic outcomes in SCCHN. A six-gene risk score, the SCCHN TMI (SCCHN-tumor matrisome index: composed of MASP1, EGFL6, SFRP5, SPP1, MMP8 and P4HA1) was constructed and used to stratify patients into risk groups. Using machine learning-based deconvolution methods, we found that the risk groups were characterized by a differing abundance of infiltrating immune cells. This work highlights the key role of immune infiltration cells in the overall survival of patients affected by HPV-negative SCCHN. The identified SCCHN TMI represents a genomic tool that could potentially aid patient stratification and selection for therapy in these patients.

Collapse

Andrieux G, Chakraborty S. Editorial: Integration of Multi-Omics Techniques in Cancer. Front Genet 2021;12:733965. [PMID: 34434225 PMCID: PMC8380985 DOI: 10.3389/fgene.2021.733965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/13/2022] Open

Tang K, Ji X, Zhou M, Deng Z, Huang Y, Zheng G, Cao Z. Rank-in: enabling integrative analysis across microarray and RNA-seq for cancer. Nucleic Acids Res 2021;49:e99. [PMID: 34214174 PMCID: PMC8464058 DOI: 10.1093/nar/gkab554] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 05/10/2021] [Accepted: 06/25/2021] [Indexed: 12/13/2022] Open

Identification of transcriptional subtypes in lung adenocarcinoma and squamous cell carcinoma through integrative analysis of microarray and RNA sequencing data. Sci Rep 2021;11:8709. [PMID: 33888829 PMCID: PMC8062554 DOI: 10.1038/s41598-021-88209-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 04/08/2021] [Indexed: 02/02/2023] Open

Abstract

Classification of tumors into subtypes can inform personalized approaches to treatment including the choice of targeted therapies. The two most common lung cancer histological subtypes, lung adenocarcinoma and lung squamous cell carcinoma, have been previously divided into transcriptional subtypes using microarray data, and corresponding signatures were subsequently used to classify RNA-seq data. Cross-platform unsupervised classification facilitates the identification of robust transcriptional subtypes by combining vast amounts of publicly available microarray and RNA-seq data. However, cross-platform classification is challenging because of intrinsic differences in data generated using the two gene expression profiling technologies. In this report, we show that robust gene expression subtypes can be identified in integrated data representing over 3500 normal and tumor lung samples profiled using two widely used platforms, Affymetrix HG-U133 Plus 2.0 Array and Illumina HiSeq RNA sequencing. We tested and analyzed consensus clustering for 384 combinations of data processing methods. The agreement between subtypes identified in single-platform and cross-platform normalized data was then evaluated using a variety of statistics. Results show that unsupervised learning can be achieved with combined microarray and RNA-seq data using selected preprocessing, cross-platform normalization, and unsupervised feature selection methods. Our analysis confirmed three lung adenocarcinoma transcriptional subtypes, but only two consistent subtypes in squamous cell carcinoma, as opposed to four subtypes previously identified. Further analysis showed that tumor subtypes were associated with distinct patterns of genomic alterations in genes coding for therapeutic targets. Importantly, by integrating quantitative proteomics data, we were able to identify tumor subtype biomarkers that effectively classify samples on the basis of both gene and protein expression. This study provides the basis for further integrative data analysis across gene and protein expression profiling platforms.

Collapse

Chen W, Alexandre PA, Ribeiro G, Fukumasu H, Sun W, Reverter A, Li Y. Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data. Front Genet 2021;12:619857. [PMID: 33664767 PMCID: PMC7921797 DOI: 10.3389/fgene.2021.619857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/15/2021] [Indexed: 12/22/2022] Open

Zoabi Y, Shomron N. Processing and Analysis of RNA-seq Data from Public Resources. Methods Mol Biol 2021;2243:81-94. [PMID: 33606253 DOI: 10.1007/978-1-0716-1103-6_4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Linke F, Aldighieri M, Lourdusamy A, Grabowska AM, Stolnik S, Kerr ID, Merry CL, Coyle B. 3D hydrogels reveal medulloblastoma subgroup differences and identify extracellular matrix subtypes that predict patient outcome. J Pathol 2020;253:326-338. [PMID: 33206391 PMCID: PMC7986745 DOI: 10.1002/path.5591] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 10/19/2020] [Accepted: 11/10/2020] [Indexed: 12/13/2022]

Liu Z, Jiang Z, Wu N, Zhou G, Wang X. Classification of gastric cancers based on immunogenomic profiling. Transl Oncol 2020;14:100888. [PMID: 33096337 PMCID: PMC7576512 DOI: 10.1016/j.tranon.2020.100888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 09/03/2020] [Accepted: 09/21/2020] [Indexed: 12/24/2022] Open

van der Kloet FM, Buurmans J, Jonker MJ, Smilde AK, Westerhuis JA. Increased comparability between RNA-Seq and microarray data by utilization of gene sets. PLoS Comput Biol 2020;16:e1008295. [PMID: 32997685 PMCID: PMC7549825 DOI: 10.1371/journal.pcbi.1008295] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 10/12/2020] [Accepted: 08/27/2020] [Indexed: 12/30/2022] Open

Abstract

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.

Collapse

Angel PW, Rajab N, Deng Y, Pacheco CM, Chen T, Lê Cao KA, Choi J, Wells CA. A simple, scalable approach to building a cross-platform transcriptome atlas. PLoS Comput Biol 2020;16:e1008219. [PMID: 32986694 PMCID: PMC7544119 DOI: 10.1371/journal.pcbi.1008219] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 10/08/2020] [Accepted: 08/04/2020] [Indexed: 12/21/2022] Open

Abstract

Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an example, we combine 850 samples from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows.

Collapse

Li X, Liu L, Goodall GJ, Schreiber A, Xu T, Li J, Le TD. A novel single-cell based method for breast cancer prognosis. PLoS Comput Biol 2020;16:e1008133. [PMID: 32833968 PMCID: PMC7470419 DOI: 10.1371/journal.pcbi.1008133] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 09/03/2020] [Accepted: 07/09/2020] [Indexed: 12/12/2022] Open

Fajarda O, Duarte-Pereira S, Silva RM, Oliveira JL. Merging microarray studies to identify a common gene expression signature to several structural heart diseases. BioData Min 2020;13:8. [PMID: 32670412 PMCID: PMC7346458 DOI: 10.1186/s13040-020-00217-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/05/2020] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used.

RESULTS

Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified.

CONCLUSIONS

We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.

Collapse

Analysis of the Circadian Regulation of Cancer Hallmarks by a Cross-Platform Study of Colorectal Cancer Time-Series Data Reveals an Association with Genes Involved in Huntington's Disease. Cancers (Basel) 2020;12:cancers12040963. [PMID: 32295075 PMCID: PMC7226183 DOI: 10.3390/cancers12040963] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/07/2020] [Accepted: 04/10/2020] [Indexed: 02/06/2023] Open

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. UNSUPERVISED AND SEMI-SUPERVISED LEARNING 2020. [DOI: 10.1007/978-3-030-22475-2_1] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Emmett MJ, Lazar MA. Integrative regulation of physiology by histone deacetylase 3. Nat Rev Mol Cell Biol 2019;20:102-115. [PMID: 30390028 DOI: 10.1038/s41580-018-0076-0] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Jiang S, Cheng SJ, Ren LC, Wang Q, Kang YJ, Ding Y, Hou M, Yang XX, Lin Y, Liang N, Gao G. An expanded landscape of human long noncoding RNA. Nucleic Acids Res 2019;47:7842-7856. [PMID: 31350901 PMCID: PMC6735957 DOI: 10.1093/nar/gkz621] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 06/18/2019] [Accepted: 07/11/2019] [Indexed: 12/21/2022] Open

Affiliation(s)

Shuai Jiang Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Si-Jin Cheng Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Li-Chen Ren Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Qian Wang Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Yu-Jian Kang Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Yang Ding Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Mei Hou Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Xiao-Xu Yang Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Yuan Lin Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Nan Liang Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
Ge Gao Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China

Collapse

Zhang L, Thapa I, Haas C, Bastola D. Multiplatform biomarker identification using a data-driven approach enables single-sample classification. BMC Bioinformatics 2019;20:601. [PMID: 31752658 PMCID: PMC6868758 DOI: 10.1186/s12859-019-3140-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 10/09/2019] [Indexed: 11/10/2022] Open

Peters TJ, French HJ, Bradford ST, Pidsley R, Stirzaker C, Varinli H, Nair S, Qu W, Song J, Giles KA, Statham AL, Speirs H, Speed TP, Clark SJ. Evaluation of cross-platform and interlaboratory concordance via consensus modelling of genomic measurements. Bioinformatics 2019;35:560-570. [PMID: 30084929 PMCID: PMC6378945 DOI: 10.1093/bioinformatics/bty675] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/10/2018] [Accepted: 07/31/2018] [Indexed: 01/23/2023] Open

Abstract

Motivation

A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a “gold standard” measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a ”gold standard” we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies.

Results

We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA-Seq protocols (PolyA+ and Ribo-Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row-linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross-hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty-one separate testing laboratories.

Availability and implementation

A full implementation of the row-linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Affiliation(s)

Timothy J Peters Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Hugh J French Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, NSW, Australia
Stephen T Bradford Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,CSIRO Health and Biosecurity, North Ryde, NSW, Australia
Ruth Pidsley Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Clare Stirzaker Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW, Darlinghurst, NSW, Australia
Hilal Varinli Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,CSIRO Health and Biosecurity, North Ryde, NSW, Australia.,Department of Biological Sciences, Macquarie University, North Ryde, NSW, Australia.,NSW Ministry of Health, LMB 961, North Sydney, NSW, Australia
Shalima Nair Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Wenjia Qu Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Jenny Song Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Katherine A Giles Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Aaron L Statham Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
Helen Speirs Ramaciotti Centre for Genomics, University of New South Wales, Randwick, NSW, Australia
Terence P Speed Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.,Department of Mathematics & Statistics, University of Melbourne, Melbourne, VIC, Australia
Susan J Clark Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW, Darlinghurst, NSW, Australia

Collapse

Lim SB, Tan SJ, Lim WT, Lim CT. Compendiums of cancer transcriptomes for machine learning applications. Sci Data 2019;6:194. [PMID: 31594947 PMCID: PMC6783425 DOI: 10.1038/s41597-019-0207-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 07/25/2019] [Indexed: 12/18/2022] Open

Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T. Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data. Front Genet 2019;10:766. [PMID: 31552087 PMCID: PMC6737999 DOI: 10.3389/fgene.2019.00766] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/19/2019] [Indexed: 12/29/2022] Open

Franks JM, Cai G, Whitfield ML. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data. Bioinformatics 2019;34:1868-1874. [PMID: 29360996 DOI: 10.1093/bioinformatics/bty026] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 01/16/2018] [Indexed: 12/22/2022] Open

Taroni JN, Grayson PC, Hu Q, Eddy S, Kretzler M, Merkel PA, Greene CS. MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease. Cell Syst 2019;8:380-394.e4. [PMID: 31121115 PMCID: PMC6538307 DOI: 10.1016/j.cels.2019.04.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 01/15/2019] [Accepted: 04/12/2019] [Indexed: 12/22/2022]

Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 2019;95:133-145. [DOI: 10.1016/j.artmed.2018.10.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 10/23/2018] [Accepted: 10/23/2018] [Indexed: 01/14/2023]

Bobak CA, Titus AJ, Hill JE. Comparison of common machine learning models for classification of tuberculosis using transcriptional biomarkers from integrated datasets. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2018.10.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Chen C, Meng Q, Xia Y, Ding C, Wang L, Dai R, Cheng L, Gunaratne P, Gibbs RA, Min S, Coarfa C, Reid JG, Zhang C, Jiao C, Jiang Y, Giase G, Thomas A, Fitzgerald D, Brunetti T, Shieh A, Xia C, Wang Y, Wang Y, Badner JA, Gershon ES, White KP, Liu C. The transcription factor POU3F2 regulates a gene coexpression network in brain tissue from patients with psychiatric disorders. Sci Transl Med 2018;10:eaat8178. [PMID: 30545964 PMCID: PMC6494100 DOI: 10.1126/scitranslmed.aat8178] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 07/26/2018] [Accepted: 11/07/2018] [Indexed: 12/22/2022]

Affiliation(s)

Chao Chen Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China. National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
Qingtuan Meng Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
Yan Xia Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Chaodong Ding Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Le Wang Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China Child Health Institute of New Jersey, Department of Neuroscience, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
Rujia Dai Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Lijun Cheng Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
Preethi Gunaratne Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
Richard A Gibbs Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Shishi Min Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
Cristian Coarfa Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Jeffrey G Reid Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown, NY, USA
Chunling Zhang Department of Neuroscience and Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
Chuan Jiao Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Yi Jiang Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
Gina Giase School of Public Health, University of Illinois at Chicago, Chicago, IL, USA
Amber Thomas Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
Dominic Fitzgerald Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
Tonya Brunetti Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Annie Shieh Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Cuihua Xia Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
Yongjun Wang The Second Xiangya Hospital, Central South University, Changsha, China
Yunpeng Wang Norwegian Centre for Mental Disorders Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway LifeSpan Changes in Brain and Cognition (LCBC), Department of Psychology, University of Oslo, Oslo, Norway
Judith A Badner Department of Psychiatry, Rush University Medical Center, Chicago, IL, USA
Elliot S Gershon Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL, USA
Kevin P White Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA Tempus Labs Inc., Chicago, IL, USA
Chunyu Liu Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China. Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA Department of Psychology, Shaanxi Normal University, Xi'an, China

Collapse

Pedersen CB, Nielsen FC, Rossing M, Olsen LR. Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing. Mol Oncol 2018;12:2136-2146. [PMID: 30289602 PMCID: PMC6275246 DOI: 10.1002/1878-0261.12389] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 09/19/2018] [Accepted: 09/25/2018] [Indexed: 11/30/2022] Open

Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA (NEW YORK, N.Y.) 2018;24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]

Xiang R, Hayes BJ, Vander Jagt CJ, MacLeod IM, Khansefid M, Bowman PJ, Yuan Z, Prowse-Wilkins CP, Reich CM, Mason BA, Garner JB, Marett LC, Chen Y, Bolormaa S, Daetwyler HD, Chamberlain AJ, Goddard ME. Genome variants associated with RNA splicing variations in bovine are extensively shared between tissues. BMC Genomics 2018;19:521. [PMID: 29973141 PMCID: PMC6032541 DOI: 10.1186/s12864-018-4902-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 06/27/2018] [Indexed: 12/12/2022] Open

Abstract

Background

Mammalian phenotypes are shaped by numerous genome variants, many of which may regulate gene transcription or RNA splicing. To identify variants with regulatory functions in cattle, an important economic and model species, we used sequence variants to map a type of expression quantitative trait loci (expression QTLs) that are associated with variations in the RNA splicing, i.e., sQTLs. To further the understanding of regulatory variants, sQTLs were compare with other two types of expression QTLs, 1) variants associated with variations in gene expression, i.e., geQTLs and 2) variants associated with variations in exon expression, i.e., eeQTLs, in different tissues.

Results

Using whole genome and RNA sequence data from four tissues of over 200 cattle, sQTLs identified using exon inclusion ratios were verified by matching their effects on adjacent intron excision ratios. sQTLs contained the highest percentage of variants that are within the intronic region of genes and contained the lowest percentage of variants that are within intergenic regions, compared to eeQTLs and geQTLs. Many geQTLs and sQTLs are also detected as eeQTLs. Many expression QTLs, including sQTLs, were significant in all four tissues and had a similar effect in each tissue. To verify such expression QTL sharing between tissues, variants surrounding (±1 Mb) the exon or gene were used to build local genomic relationship matrices (LGRM) and estimated genetic correlations between tissues. For many exons, the splicing and expression level was determined by the same cis additive genetic variance in different tissues. Thus, an effective but simple-to-implement meta-analysis combining information from three tissues is introduced to increase power to detect and validate sQTLs. sQTLs and eeQTLs together were more enriched for variants associated with cattle complex traits, compared to geQTLs. Several putative causal mutations were identified, including an sQTL at Chr6:87392580 within the 5th exon of kappa casein (CSN3) associated with milk production traits.

Conclusions

Using novel analytical approaches, we report the first identification of numerous bovine sQTLs which are extensively shared between multiple tissue types. The significant overlaps between bovine sQTLs and complex traits QTL highlight the contribution of regulatory mutations to phenotypic variations.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-4902-8) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Ruidong Xiang Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia. .,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
Ben J Hayes Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St. Lucia, QLD, 4067, Australia
Christy J Vander Jagt Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Iona M MacLeod Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Majid Khansefid Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Phil J Bowman Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
Zehu Yuan Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Claire P Prowse-Wilkins Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Coralie M Reich Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Brett A Mason Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Josie B Garner Agriculture Victoria, Dairy Production Science, Ellinbank, VIC, 3821, Australia
Leah C Marett Agriculture Victoria, Dairy Production Science, Ellinbank, VIC, 3821, Australia
Yizhou Chen Elizabeth Macarthur Agricultural Institute, New South Wales Department of Primary Industries, Camden, NSW, 2570, Australia
Sunduimijid Bolormaa Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Hans D Daetwyler Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
Amanda J Chamberlain Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
Michael E Goddard Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia

Collapse

Thompson JA, Christensen BC, Marsit CJ. Methylation-to-Expression Feature Models of Breast Cancer Accurately Predict Overall Survival, Distant-Recurrence Free Survival, and Pathologic Complete Response in Multiple Cohorts. Sci Rep 2018;8:5190. [PMID: 29581450 PMCID: PMC5979962 DOI: 10.1038/s41598-018-23494-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 03/13/2018] [Indexed: 12/03/2022] Open

Song Y, Yan Z. Exploring of the molecular mechanism of rhinitis via bioinformatics methods. Mol Med Rep 2017;17:3014-3020. [PMID: 29257233 PMCID: PMC5783521 DOI: 10.3892/mmr.2017.8213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 10/06/2017] [Indexed: 12/27/2022] Open

Zhang W, Wang J, Menon S. Advancing cancer drug development through precision medicine and innovative designs. J Biopharm Stat 2017;28:229-244. [PMID: 29173004 DOI: 10.1080/10543406.2017.1402784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Tan J, Huyck M, Hu D, Zelaya RA, Hogan DA, Greene CS. ADAGE signature analysis: differential expression analysis with data-defined gene sets. BMC Bioinformatics 2017;18:512. [PMID: 29166858 PMCID: PMC5700673 DOI: 10.1186/s12859-017-1905-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 11/01/2017] [Indexed: 12/18/2022] Open

Abstract

BACKGROUND

Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.

RESULTS

Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr.

CONCLUSIONS

We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.

Collapse

Tan J, Doing G, Lewis KA, Price CE, Chen KM, Cady KC, Perchuk B, Laub MT, Hogan DA, Greene CS. Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks. Cell Syst 2017;5:63-71.e6. [PMID: 28711280 PMCID: PMC5532071 DOI: 10.1016/j.cels.2017.06.003] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 04/11/2017] [Accepted: 06/08/2017] [Indexed: 01/18/2023]

Way GP, Allaway RJ, Bouley SJ, Fadul CE, Sanchez Y, Greene CS. A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma. BMC Genomics 2017;18:127. [PMID: 28166733 PMCID: PMC5292791 DOI: 10.1186/s12864-017-3519-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2016] [Accepted: 01/26/2017] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

We have identified molecules that exhibit synthetic lethality in cells with loss of the neurofibromin 1 (NF1) tumor suppressor gene. However, recognizing tumors that have inactivation of the NF1 tumor suppressor function is challenging because the loss may occur via mechanisms that do not involve mutation of the genomic locus. Degradation of the NF1 protein, independent of NF1 mutation status, phenocopies inactivating mutations to drive tumors in human glioma cell lines. NF1 inactivation may alter the transcriptional landscape of a tumor and allow a machine learning classifier to detect which tumors will benefit from synthetic lethal molecules.

RESULTS

We developed a strategy to predict tumors with low NF1 activity and hence tumors that may respond to treatments that target cells lacking NF1. Using RNAseq data from The Cancer Genome Atlas (TCGA), we trained an ensemble of 500 logistic regression classifiers that integrates mutation status with whole transcriptomes to predict NF1 inactivation in glioblastoma (GBM). On TCGA data, the classifier detected NF1 mutated tumors (test set area under the receiver operating characteristic curve (AUROC) mean = 0.77, 95% quantile = 0.53 - 0.95) over 50 random initializations. On RNA-Seq data transformed into the space of gene expression microarrays, this method produced a classifier with similar performance (test set AUROC mean = 0.77, 95% quantile = 0.53 - 0.96). We applied our ensemble classifier trained on the transformed TCGA data to a microarray validation set of 12 samples with matched RNA and NF1 protein-level measurements. The classifier's NF1 score was associated with NF1 protein concentration in these samples.

CONCLUSIONS

We demonstrate that TCGA can be used to train accurate predictors of NF1 inactivation in GBM. The ensemble classifier performed well for samples with very high or very low NF1 protein concentrations but had mixed performance in samples with intermediate NF1 concentrations. Nevertheless, high-performing and validated predictors have the potential to be paired with targeted therapies and personalized medicine.

Collapse