Chang YT, Yao CT, Su SL, Chou YC, Chu CM, Huang CS, Terng HJ, Chou HL, Wetter T, Chen KH, Chang CW, Shih YW, Lai CH. Verification of gene expression profiles for colorectal cancer using 12 internet public microarray datasets.
World J Gastroenterol 2014;
20:17476-17482. [PMID:
25516661 PMCID:
PMC4265608 DOI:
10.3748/wjg.v20.i46.17476]
[Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2013] [Accepted: 03/13/2014] [Indexed: 02/06/2023] Open
Abstract
AIM: To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.
METHODS: Logistic regression analysis was performed, and odds ratios for each gene were determined between colorectal cancer (CRC) and controls. Twelve public microarray datasets of GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105, which included 519 cases of adenocarcinoma and 88 normal mucosa controls, were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.
RESULTS: We validated the 17 CRC-associated genes from studies by Chang et al (Model 1: 5 genes), Marshall et al (Model 2: 7 genes) and Han et al (Model 3: 5 genes) and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation. The goodness-of-fit test of Hosmer-Lemeshow (H-L) showed statistical significance (P = 0.044) for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population. Expected and observed event rates in subgroups were similar, which are called well calibrated, in Models 1, 3 and 4 with non-significant P values of 0.460, 0.194 and 1.000 for H-L tests, respectively. A 7-gene model of CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB was pairwise selected, which showed the best results in logistic regression analysis (H-L P = 1.000, R2 = 0.951, areas under the curve = 0.999, accuracy = 0.968, specificity = 0.966 and sensitivity = 0.994).
CONCLUSION: A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.
Collapse