1
|
Jiao Y, Hu R, Hu S, Wang B, Huang D, Lan Z. Shenrong Wuzi Pill affects the pathway of thyroid hormone synthesis by targeting thyrotropin-releasing hormone receptor and altering cAMP production. J Herb Med 2021. [DOI: 10.1016/j.hermed.2021.100470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
2
|
Alshawaqfeh M, Al Kawam A, Serpedin E, Datta A. Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1056-1067. [PMID: 30387737 DOI: 10.1109/tcbb.2018.2878560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.
Collapse
|
3
|
Gao X. Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations. BMC Bioinformatics 2015; 16:407. [PMID: 26652207 PMCID: PMC4676147 DOI: 10.1186/s12859-015-0835-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 11/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. RESULT We develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors. CONCLUSION Our numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated.
Collapse
Affiliation(s)
- Xiaoli Gao
- Department of Mathematics and Statistics, University of North Carolina at Greensboro, 1400 Spring Garden St, Greensoboro, NC, USA.
| |
Collapse
|
4
|
Goswami S, Sharma-Walia N. Osteoprotegerin secreted by inflammatory and invasive breast cancer cells induces aneuploidy, cell proliferation and angiogenesis. BMC Cancer 2015; 15:935. [PMID: 26608463 PMCID: PMC4660791 DOI: 10.1186/s12885-015-1837-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2015] [Indexed: 12/12/2022] Open
Abstract
Background Osteoprotegerin (OPG) is a glycoprotein that has multifaceted role and is associated with several cancer malignancies like that of bladder carcinoma, gastric carcinoma, prostate cancer, multiple myeloma and breast cancer. Also OPG has been associated with several organ pathologies. The widespread expression of OPG suggests that OPG may have multiple biological activities that are yet to be explored. Methods The anchorage-independent sphere cultures of the adherent cells were instrumental in our study as it provided a deeper insight into the complexity of a 3D tumor. Cytokine profiling was performed for OPG’s detection in the microenvironment. ELISA and western blotting were performed to quantify the OPG secretion and measure the protein levels respectively. OPG expression was detected in human breast cancer tissue samples by IHC. To decipher OPG’s role in tumor aggressiveness both recombinant human OPG as well as OPG rich and depleted breast cancer cell conditioned media were tested. Western blotting and MTT assay were performed to detect changes in signaling pathways and proliferation that were induced in presence of OPG. Onset of aneuploidy, in presence of OPG, was measured by cell cycle analysis and western blotting. Finally, human Breast Cancer qBiomarker Copy Number PCR Array was used to detect how OPG remarkably induced gene copy numbers for oncogenic pathway regulators. Results SUM149PT and SUM1315M02 cells secrete high levels of the cytokine OPG compared to primary human mammary epithelial cells (HMEC). High expression of OPG was also detected in human breast cancer tissue samples compared to the uninvolved tissue from the same patient. OPG induced proliferation of control HMEC spheres and triggered the onset of aneuploidy in HMEC sphere cultures. OPG induced the expression of aneuploidy related kinases Aurora-A Kinase (IAK-1), Bub1 and BubR1 probably through the receptor activator of nuclear factor kappa-B ligand (RANKL) and syndecan-1 receptors via the Erk, AKT and GSK3(3 signaling pathway. Gene copy numbers for oncogenic pathway regulators such AKT1, Aurora-A Kinase (AURKA or IAK-1), epidermal growth factor receptor (EGFR) and MYC with a reduction in the copy numbers of cyclin dependent kinase inhibitor 2A (CDKN2A), PTEN and DNA topoisomerase 2 alpha (TOP2A) were induced in presence of OPG. Conclusions These results highlight the role of OPG in reprogramming normal mammary epithelial cells to a tumorigenic state and suggest promising avenues for treating inflammatory breast cancer as well as highly invasive breast cancer with new therapeutic targets. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1837-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sudeshna Goswami
- Department of Microbiology and Immunology, H. M. Bligh Cancer Research Laboratories, Chicago Medical School, Rosalind Franklin University of Medicine and Science, 3333 Green Bay Road, North Chicago, IL, 60064, USA.
| | - Neelam Sharma-Walia
- Department of Microbiology and Immunology, H. M. Bligh Cancer Research Laboratories, Chicago Medical School, Rosalind Franklin University of Medicine and Science, 3333 Green Bay Road, North Chicago, IL, 60064, USA.
| |
Collapse
|
5
|
Genome-wide scan identifies a copy number variable region at 3p21.1 that influences the TLR9 expression levels in IgA nephropathy patients. Eur J Hum Genet 2014; 23:940-8. [PMID: 25293716 DOI: 10.1038/ejhg.2014.208] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 08/01/2014] [Accepted: 08/29/2014] [Indexed: 11/08/2022] Open
Abstract
Immunoglobulin A nephropathy (IgAN) is a complex multifactorial disease characterized by genetic factors that influence the pathogenesis of the disease. In this context, an intriguing role could be ascribed to copy number variants (CNVs). We performed the whole-genome screening of CNVs in familial IgAN patients, their healthy relatives and healthy subjects (HSs). In the initial screening, we included 217 individuals consisting of 51 biopsy-proven familial IgAN cases and 166 healthy relatives. We identified 148 IgAN-specific aberrations, specifically 105 loss and 43 gain, using a new statistical approach that allowed us to identify aberrations that were concordant across multiple samples. Several CNVs overlapped with regions evidenced by previous genome-wide genetic studies. We focused our attention on a CNV located in chromosome 3, which contains the TLR9 gene and found that IgAN patients characterized by deteriorated renal function carried low copy number of this CNV. Moreover, the TLR9 gene expression was low and significantly correlated with the loss aberration. Conversely, IgAN patients with normal renal function had no aberration and the TLR9 mRNA was expressed at the same level as in HSs. We confirmed our data in another cohort of Greek subjects. In conclusion, here we performed the first genome-wide CNV study in IgAN identifying structural variants that could help the genetic dissection of this complex disease, and pointed out a loss aberration in the chromosome 3, which is responsible for the downregulation of TLR9 expression that, in turn, could contribute to the deterioration of the renal function in IgAN patients.
Collapse
|
6
|
Zhou X, Liu J, Wan X, Yu W. Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics 2014; 30:1943-9. [PMID: 24642062 DOI: 10.1093/bioinformatics/btu131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The post-genome era sees urgent need for more novel approaches to extracting useful information from the huge amount of genetic data. The identification of recurrent copy number variations (CNVs) from array-based comparative genomic hybridization (aCGH) data can help understand complex diseases, such as cancer. Most of the previous computational methods focused on single-sample analysis or statistical testing based on the results of single-sample analysis. Finding recurrent CNVs from multi-sample data remains a challenging topic worth further study. RESULTS We present a general and robust method to identify recurrent CNVs from multi-sample aCGH profiles. We express the raw dataset as a matrix and demonstrate that recurrent CNVs will form a low-rank matrix. Hence, we formulate the problem as a matrix recovering problem, where we aim to find a piecewise-constant and low-rank approximation (PLA) to the input matrix. We propose a convex formulation for matrix recovery and an efficient algorithm to globally solve the problem. We demonstrate the advantages of PLA compared with alternative methods using synthesized datasets and two breast cancer datasets. The experimental results show that PLA can successfully reconstruct the recurrent CNV patterns from raw data and achieve better performance compared with alternative methods under a wide range of scenarios. AVAILABILITY AND IMPLEMENTATION The MATLAB code is available at http://bioinformatics.ust.hk/pla.zip.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Jiming Liu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Xiang Wan
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| |
Collapse
|
7
|
Sykulski M, Gambin T, Bartnik M, Derwińska K, Wiśniowiecka-Kowalnik B, Stankiewicz P, Gambin A. Multiple samples aCGH analysis for rare CNVs detection. J Clin Bioinforma 2013; 3:12. [PMID: 23758813 PMCID: PMC3691624 DOI: 10.1186/2043-9113-3-12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 05/23/2013] [Indexed: 11/20/2022] Open
Abstract
Background DNA copy number variations (CNV) constitute an important source of genetic variability. The standard method used for CNV detection is array comparative genomic hybridization (aCGH). Results We propose a novel multiple sample aCGH analysis methodology aiming in rare CNVs detection. In contrast to the majority of previous approaches, which deal with cancer datasets, we focus on constitutional genomic abnormalities identified in a diverse spectrum of diseases in human. Our method is tested on exon targeted aCGH array of 366 patients affected with developmental delay/intellectual disability, epilepsy, or autism. The proposed algorithms can be applied as a post–processing filtering to any given segmentation method. Conclusions Thanks to the additional information obtained from multiple samples, we could efficiently detect significant segments corresponding to rare CNVs responsible for pathogenic changes. The robust statistical framework applied in our method enables to eliminate the influence of widespread technical artifact termed ‘waves’.
Collapse
Affiliation(s)
- Maciej Sykulski
- Institute of Informatics, University of Warsaw, Warsaw, Poland.
| | | | | | | | | | | | | |
Collapse
|
8
|
Niida A, Imoto S, Shimamura T, Miyano S. Statistical model-based testing to evaluate the recurrence of genomic aberrations. Bioinformatics 2013; 28:i115-20. [PMID: 22689750 PMCID: PMC3371835 DOI: 10.1093/bioinformatics/bts203] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation: In cancer genomes, chromosomal regions harboring cancer genes are often subjected to genomic aberrations like copy number alteration and loss of heterozygosity. Given this, finding recurrent genomic aberrations is considered an apt approach for screening cancer genes. Although several permutation-based tests have been proposed for this purpose, none of them are designed to find recurrent aberrations from the genomic dataset without paired normal sample controls. Their application to unpaired genomic data may lead to false discoveries, because they retrieve pseudo-aberrations that exist in normal genomes as polymorphisms. Results: We develop a new parametric method named parametric aberration recurrence test (PART) to test for the recurrence of genomic aberrations. The introduction of Poisson-binomial statistics allow us to compute small P-values more efficiently and precisely than the previously proposed permutation-based approach. Moreover, we extended PART to cover unpaired data (PART-up) so that there is a statistical basis for analyzing unpaired genomic data. PART-up uses information from unpaired normal sample controls to remove pseudo-aberrations in unpaired genomic data. Using PART-up, we successfully predict recurrent genomic aberrations in cancer cell line samples whose paired normal sample controls are unavailable. This article thus proposes a powerful statistical framework for the identification of driver aberrations, which would be applicable to ever-increasing amounts of cancer genomic data seen in the era of next generation sequencing. Availability: Our implementations of PART and PART-up are available from http://www.hgc.jp/~niiyan/PART/manual.html. Contact:aniida@ims.u-tokyo.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Atushi Niida
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
| | | | | | | |
Collapse
|
9
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
10
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|
11
|
Yuan X, Zhang J, Yang L, Zhang S, Chen B, Geng Y, Wang Y. TAGCNA: a method to identify significant consensus events of copy number alterations in cancer. PLoS One 2012; 7:e41082. [PMID: 22815924 PMCID: PMC3399811 DOI: 10.1371/journal.pone.0041082] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 06/17/2012] [Indexed: 01/20/2023] Open
Abstract
Somatic copy number alteration (CNA) is a common phenomenon in cancer genome. Distinguishing significant consensus events (SCEs) from random background CNAs in a set of subjects has been proven to be a valuable tool to study cancer. In order to identify SCEs with an acceptable type I error rate, better computational approaches should be developed based on reasonable statistics and null distributions. In this article, we propose a new approach named TAGCNA for identifying SCEs in somatic CNAs that may encompass cancer driver genes. TAGCNA employs a peel-off permutation scheme to generate a reasonable null distribution based on a prior step of selecting tag CNA markers from the genome being considered. We demonstrate the statistical power of TAGCNA on simulated ground truth data, and validate its applicability using two publicly available cancer datasets: lung and prostate adenocarcinoma. TAGCNA identifies SCEs that are known to be involved with proto-oncogenes (e.g. EGFR, CDK4) and tumor suppressor genes (e.g. CDKN2A, CDKN2B), and provides many additional SCEs with potential biological relevance in these data. TAGCNA can be used to analyze the significance of CNAs in various cancers. It is implemented in R and is freely available at http://tagcna.sourceforge.net/.
Collapse
Affiliation(s)
- Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
- * E-mail: (JZ); (YW)
| | - Liying Yang
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Shengli Zhang
- Department of Mathematics, Xidian University, Xi'an, People’s Republic of China
| | - Baodi Chen
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Yaojun Geng
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, United States of America
- * E-mail: (JZ); (YW)
| |
Collapse
|
12
|
Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. Neuroimage 2012; 63:310-9. [PMID: 22732562 DOI: 10.1016/j.neuroimage.2012.06.026] [Citation(s) in RCA: 233] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2012] [Revised: 06/12/2012] [Accepted: 06/15/2012] [Indexed: 11/21/2022] Open
Abstract
At its best, connectivity mapping can offer researchers great insight into how spatially disparate regions of the human brain coordinate activity during brain processing. A recent investigation conducted by Smith and colleagues (2011) on methods for estimating connectivity maps suggested that those which attempt to ascertain the direction of influence among ROIs rarely provide reliable results. Another problem gaining increasing attention is heterogeneity in connectivity maps. Most group-level methods require that the data come from homogeneous samples, and misleading findings may arise from current methods if the connectivity maps for individuals vary across the sample (which is likely the case). The utility of maps resulting from effective connectivity on the individual or group levels is thus diminished because they do not accurately inform researchers. The present paper introduces a novel estimation technique for fMRI researchers, Group Iterative Multiple Model Estimation (GIMME), which demonstrates that using information across individuals assists in the recovery of the existence of connections among ROIs used by Smith and colleagues (2011) and the direction of the influence. Using heterogeneous in-house data, we demonstrate that GIMME offers a unique improvement over current approaches by arriving at reliable group and individual structures even when the data are highly heterogeneous across individuals comprising the group. An added benefit of GIMME is that it obtains reliable connectivity map estimates equally well using the data from resting state, block, or event-related designs. GIMME provides researchers with a powerful, flexible tool for identifying directed connectivity maps at the group and individual levels.
Collapse
|
13
|
Cutts RJ, Dayem Ullah AZ, Sangaralingam A, Gadaleta E, Lemoine NR, Chelala C. O-miner: an integrative platform for automated analysis and mining of -omics data. Nucleic Acids Res 2012; 40:W560-8. [PMID: 22600742 PMCID: PMC3394300 DOI: 10.1093/nar/gks432] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-throughput profiling has generated massive amounts of data across basic, clinical and translational research fields. However, open source comprehensive web tools for analysing data obtained from different platforms and technologies are still lacking. To fill this gap and the unmet computational needs of ongoing research projects, we developed O-miner, a rapid, comprehensive, efficient web tool that covers all the steps required for the analysis of both transcriptomic and genomic data starting from raw image files through in-depth bioinformatics analysis and annotation to biological knowledge extraction. O-miner was developed from a biologist end-user perspective. Hence, it is as simple to use as possible within the confines of the complexity of the data being analysed. It provides a strong analytical suite able to overlay and harness large, complicated, raw and heterogeneous sets of profiles with biological/clinical data. Biologists can use O-miner to analyse and integrate different types of data and annotations to build knowledge of relevant altered mechanisms and pathways in order to identify and prioritize novel targets for further biological validation. Here we describe the analytical workflows currently available using O-miner and present examples of use. O-miner is freely available at www.o-miner.org.
Collapse
Affiliation(s)
- Rosalind J Cutts
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | | | | | | | | |
Collapse
|
14
|
Miyaguchi K, Fukuoka Y, Mizushima H, Yasen M, Nemoto S, Ishikawa T, Uetake H, Tanaka S, Sugihara K, Arii S, Tanaka H. Genome-wide integrative analysis revealed a correlation between lengths of copy number segments and corresponding gene expression profile. Bioinformation 2011; 7:280-4. [PMID: 22355221 PMCID: PMC3280495 DOI: 10.6026/97320630007280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2011] [Accepted: 11/08/2011] [Indexed: 12/31/2022] Open
Abstract
Microarray analysis has been applied to comprehensively reveal the abnormalities of DNA copy number (CN) and gene expression in human cancer research during the last decade. These analyses have individually contributed to identify the genes associated with carcinogenesis, progression, metastasis of tumor cells and poor prognosis of cancer patients. However, it is known that the correlation between profiles of CN and gene expression does not highly correlate. Factors which determine the degree of correlation remain largely unexplained. To investigate one such factor, we performed trend analyses between the lengths of CN segments and corresponding gene expression profiles from microarray data in hepatocellular carcinoma (HCC) and colorectal carcinoma (CRC). Significant correlations were observed in CN gain of HCC and CRC (p<0.05). The trend of the CN loss showed a significant correlation in HCC although there was no correlation between the length of CN loss segments and gene expression in CRC. Our findings suggest that the influence of CN on gene expression highly depends on the length of CN region, especially in the case of CN gain. To the best of our knowledge, this is the first study describing the correlation between lengths of CNA segments and expression profiles of corresponding genes.
Collapse
Affiliation(s)
| | - Yutaka Fukuoka
- Department of Biosystem Modeling, Graduate School of Biomedical Science, Tokyo Medical and Dental University, Japan
| | - Hiroshi Mizushima
- Center for Public Health Informatics, National Institute of Public Health, Japan
| | | | | | | | | | | | - Kenichi Sugihara
- Department of Surgical Oncology, Graduate School, Tokyo Medical and Dental University, Japan
| | | | | |
Collapse
|
15
|
Broadbent KM, Park D, Wolf AR, Van Tyne D, Sims JS, Ribacke U, Volkman S, Duraisingh M, Wirth D, Sabeti PC, Rinn JL. A global transcriptional analysis of Plasmodium falciparum malaria reveals a novel family of telomere-associated lncRNAs. Genome Biol 2011; 12:R56. [PMID: 21689454 PMCID: PMC3218844 DOI: 10.1186/gb-2011-12-6-r56] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Revised: 04/27/2011] [Accepted: 06/20/2011] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Mounting evidence suggests a major role for epigenetic feedback in Plasmodium falciparum transcriptional regulation. Long non-coding RNAs (lncRNAs) have recently emerged as a new paradigm in epigenetic remodeling. We therefore set out to investigate putative roles for lncRNAs in P. falciparum transcriptional regulation. RESULTS We used a high-resolution DNA tiling microarray to survey transcriptional activity across 22.6% of the P. falciparum strain 3D7 genome. We identified 872 protein-coding genes and 60 putative P. falciparum lncRNAs under developmental regulation during the parasite's pathogenic human blood stage. Further characterization of lncRNA candidates led to the discovery of an intriguing family of lncRNA telomere-associated repetitive element transcripts, termed lncRNA-TARE. We have quantified lncRNA-TARE expression at 15 distinct chromosome ends and mapped putative transcriptional start and termination sites of lncRNA-TARE loci. Remarkably, we observed coordinated and stage-specific expression of lncRNA-TARE on all chromosome ends tested, and two dominant transcripts of approximately 1.5 kb and 3.1 kb transcribed towards the telomere. CONCLUSIONS We have characterized a family of 22 telomere-associated lncRNAs in P. falciparum. Homologous lncRNA-TARE loci are coordinately expressed after parasite DNA replication, and are poised to play an important role in P. falciparum telomere maintenance, virulence gene regulation, and potentially other processes of parasite chromosome end biology. Further study of lncRNA-TARE and other promising lncRNA candidates may provide mechanistic insight into P. falciparum transcriptional regulation.
Collapse
Affiliation(s)
- Kate M Broadbent
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Daniel Park
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Ashley R Wolf
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Daria Van Tyne
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - Jennifer S Sims
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - Ulf Ribacke
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - Sarah Volkman
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
- School of Nursing and Health Sciences, Simmons College, 300 The Fenway, Boston, MA 02115, USA
| | - Manoj Duraisingh
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - Dyann Wirth
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - Pardis C Sabeti
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- FAS Center for Systems Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - John L Rinn
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
- Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA
| |
Collapse
|
16
|
Nowak G, Hastie T, Pollack JR, Tibshirani R. A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 2011; 12:776-91. [PMID: 21642389 DOI: 10.1093/biostatistics/kxr012] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many methods for analyzing single-sample aCGH data, the analysis of multi-sample aCGH data is a relatively new area of research. Further, many of the current approaches for analyzing multi-sample aCGH data do not appropriately utilize the additional information present in the multiple samples. We propose a procedure called the Fused Lasso Latent Feature Model (FLLat) that provides a statistical framework for modeling multi-sample aCGH data and identifying regions of CNV. The procedure involves modeling each sample of aCGH data as a weighted sum of a fixed number of features. Regions of CNV are then identified through an application of the fused lasso penalty to each feature. Some simulation analyses show that FLLat outperforms single-sample methods when the simulated samples share common information. We also propose a method for estimating the false discovery rate. An analysis of an aCGH data set obtained from human breast tumors, focusing on chromosomes 8 and 17, shows that FLLat and Significance Testing of Aberrant Copy number (an alternative, existing approach) identify similar regions of CNV that are consistent with previous findings. However, through the estimated features and their corresponding weights, FLLat is further able to discern specific relationships between the samples, for example, identifying 3 distinct groups of samples based on their patterns of CNV for chromosome 17.
Collapse
Affiliation(s)
- Gen Nowak
- Department of Biostatistics, Harvard University, Boston, MA 02115, USA.
| | | | | | | |
Collapse
|
17
|
|
18
|
Hur Y, Lee H. Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinformatics 2011; 12:146. [PMID: 21569311 PMCID: PMC3114745 DOI: 10.1186/1471-2105-12-146] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 05/11/2011] [Indexed: 11/10/2022] Open
Abstract
Background Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs. Results In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12). Conclusions Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.
Collapse
Affiliation(s)
- Youngmi Hur
- Dept. of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | | |
Collapse
|
19
|
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 2011; 12:R41. [PMID: 21527027 PMCID: PMC3218867 DOI: 10.1186/gb-2011-12-4-r41] [Citation(s) in RCA: 2186] [Impact Index Per Article: 168.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 02/14/2011] [Accepted: 04/28/2011] [Indexed: 12/18/2022] Open
Abstract
We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Craig H Mermel
- Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
20
|
Walter V, Nobel AB, Wright FA. DiNAMIC: a method to identify recurrent DNA copy number aberrations in tumors. Bioinformatics 2011; 27:678-85. [PMID: 21183584 PMCID: PMC3042182 DOI: 10.1093/bioinformatics/btq717] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Revised: 12/06/2010] [Accepted: 12/21/2010] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION DNA copy number gains and losses are commonly found in tumor tissue, and some of these aberrations play a role in tumor genesis and development. Although high resolution DNA copy number data can be obtained using array-based techniques, no single method is widely used to distinguish between recurrent and sporadic copy number aberrations. RESULTS Here we introduce Discovering Copy Number Aberrations Manifested In Cancer (DiNAMIC), a novel method for assessing the statistical significance of recurrent copy number aberrations. In contrast to competing procedures, the testing procedure underlying DiNAMIC is carefully motivated, and employs a novel cyclic permutation scheme. Extensive simulation studies show that DiNAMIC controls false positive discoveries in a variety of realistic scenarios. We use DiNAMIC to analyze two publicly available tumor datasets, and our results show that DiNAMIC detects multiple loci that have biological relevance. AVAILABILITY Source code implemented in R, as well as text files containing examples and sample datasets are available at http://www.bios.unc.edu/research/genomic_software/DiNAMIC.
Collapse
Affiliation(s)
- Vonn Walter
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| | | | | |
Collapse
|
21
|
Mei TS, Salim A, Calza S, Seng KC, Seng CK, Pawitan Y. Identification of recurrent regions of Copy-Number Variants across multiple individuals. BMC Bioinformatics 2010; 11:147. [PMID: 20307285 PMCID: PMC2851607 DOI: 10.1186/1471-2105-11-147] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/22/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed. RESULTS In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions. CONCLUSIONS The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.
Collapse
Affiliation(s)
- Teo Shu Mei
- Department of Epidemiology and Public Health, National University of Singapore, 16 Medical Drive, Singapore
| | | | | | | | | | | |
Collapse
|
22
|
Zhang NR. DNA Copy Number Profiling in Normal and Tumor Genomes. FRONTIERS IN COMPUTATIONAL AND SYSTEMS BIOLOGY 2010. [DOI: 10.1007/978-1-84996-196-7_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
23
|
Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, Borecki IB, Province MA. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. ACTA ACUST UNITED AC 2009; 26:464-9. [PMID: 20031968 DOI: 10.1093/bioinformatics/btp708] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Rueda OM, Diaz-Uriarte R. Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously. BMC Bioinformatics 2009; 10:308. [PMID: 19775444 PMCID: PMC2760535 DOI: 10.1186/1471-2105-10-308] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2009] [Accepted: 09/23/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Alterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods. RESULTS We have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions. CONCLUSION These methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genes.
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
- Breast Cancer Functional Genomics, Cancer Research UK, Cambridge, UK
| | - Ramon Diaz-Uriarte
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
25
|
Maciejewski JP, Tiu RV, O'Keefe C. Application of array-based whole genome scanning technologies as a cytogenetic tool in haematological malignancies. Br J Haematol 2009; 146:479-88. [PMID: 19563474 DOI: 10.1111/j.1365-2141.2009.07757.x] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Karyotypic analysis provides useful diagnostic information in many haematological malignancies. However, standard metaphase cytogenetics has technical limitations that result in the underestimation of the degree of chromosomal changes. Array-based technologies can be used for karyotyping and can supplant some of the shortcomings of metaphase cytogenetics, and include single nucleotide polymorphism arrays (SNP-A) and comparative genomic hybridization arrays (CGH-A). Array-based cytogenetic tools do not rely on cell division, have superb resolution for unbalanced lesions and allow for the detection of copy number-neutral loss of heterozygosity, a type of lesion not seen with metaphase cytogenetics. Moreover, genomic array analysis is automated and results can be objectively and systematically analysed using biostatistical algorithms. As a potential advantage over genomic approaches, metaphase cytogenetics can detect balanced chromosomal defects and resolves clonal mosaicism. Initial studies performed in various haematological malignancies indicate the potential of SNP-A-based karyotyping as a useful clinical cytogenetic detection tool. The current effort is aimed at developing rational diagnostic algorithms for the detection of somatic defects and the establishment of clinical correlations for novel SNP-A-detected chromosomal defects, including acquired somatic uniparental disomy. SNP-A can complement metaphase karyotyping and will probably play an important role in clinical cytogenetic diagnostics.
Collapse
Affiliation(s)
- Jaroslaw P Maciejewski
- Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44195, USA.
| | | | | |
Collapse
|
26
|
Bicciato S, Spinelli R, Zampieri M, Mangano E, Ferrari F, Beltrame L, Cifola I, Peano C, Solari A, Battaglia C. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res 2009; 37:5057-70. [PMID: 19542187 PMCID: PMC2731905 DOI: 10.1093/nar/gkp520] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The integration of high-throughput genomic data represents an opportunity for deciphering the interplay between structural and functional organization of genomes and for discovering novel biomarkers. However, the development of integrative approaches to complement gene expression (GE) data with other types of gene information, such as copy number (CN) and chromosomal localization, still represents a computational challenge in the genomic arena. This work presents a computational procedure that directly integrates CN and GE profiles at genome-wide level. When applied to DNA/RNA paired data, this approach leads to the identification of Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions (SODEGIR). This goal is accomplished in three steps. The first step extends to CN a method for detecting regional imbalances in GE. The second part provides the integration of CN and GE data and identifies chromosomal regions with concordantly altered genomic and transcriptional status in a tumor sample. The last step elevates the single-sample analysis to an entire dataset of tumor specimens. When applied to study chromosomal aberrations in a collection of astrocytoma and renal carcinoma samples, the procedure proved to be effective in identifying discrete chromosomal regions of coordinated CN alterations and changes in transcriptional levels.
Collapse
Affiliation(s)
- Silvio Bicciato
- Department of Biomedical Sciences, University of Modena and Reggio Emilia, Modena 41100, Italy.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Shay T, Lambiv WL, Reiner-Benaim A, Hegi ME, Domany E. Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes. Cancer Inform 2009; 7:91-104. [PMID: 19352461 PMCID: PMC2664703 DOI: 10.4137/cin.s2144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Many types of tumors exhibit characteristic chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletions are also observed. Typically, a region that is aberrant in more tumors, or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the "volume" associated with an aberration, as the product of three factors: (a) fraction of patients with the aberration, (b) the aberration's length and (c) its amplitude. Our algorithm compares the values of V derived from the real data to a null distribution obtained by permutations, and yields the statistical significance (p-value) of the measured value of V. We detected genetic locations that were significantly aberrant, and combine them with chromosomal arm status (gain/loss) to create a succinct fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co-occurring or mutually exclusive. We apply the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.
Collapse
Affiliation(s)
- Tal Shay
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Wanyu L. Lambiv
- Laboratory of Brain Tumor Biology and Genetics, Neurosurgery, University Hospital Lausanne (CHUV), Lausanne, Switzerland
| | - Anat Reiner-Benaim
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
- Department of Statistics, University of Haifa, Haifa, Israel
| | - Monika E. Hegi
- Laboratory of Brain Tumor Biology and Genetics, Neurosurgery, University Hospital Lausanne (CHUV), Lausanne, Switzerland
- National Center for Competence Research Molecular Oncology, ISREC, Epalinges, Switzerland
| | - Eytan Domany
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
28
|
Freire P, Vilela M, Deus H, Kim YW, Koul D, Colman H, Aldape KD, Bogler O, Yung WKA, Coombes K, Mills GB, Vasconcelos AT, Almeida JS. Exploratory analysis of the copy number alterations in glioblastoma multiforme. PLoS One 2008; 3:e4076. [PMID: 19115005 PMCID: PMC2605252 DOI: 10.1371/journal.pone.0004076] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Accepted: 11/18/2008] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND The Cancer Genome Atlas project (TCGA) has initiated the analysis of multiple samples of a variety of tumor types, starting with glioblastoma multiforme. The analytical methods encompass genomic and transcriptomic information, as well as demographic and clinical data about the sample donors. The data create the opportunity for a systematic screening of the components of the molecular machinery for features that may be associated with tumor formation. The wealth of existing mechanistic information about cancer cell biology provides a natural reference for the exploratory exercise. METHODOLOGY/PRINCIPAL FINDINGS Glioblastoma multiforme DNA copy number data was generated by The Cancer Genome Atlas project for 167 patients using 227 aCGH experiments, and was analyzed to build a catalog of aberrant regions. Genome screening was performed using an information theory approach in order to quantify aberration as a deviation from a centrality without the bias of untested assumptions about its parametric nature. A novel Cancer Genome Browser software application was developed and is made public to provide a user-friendly graphical interface in which the reported results can be reproduced. The application source code and stand alone executable are available at (http://code.google.com/p/cancergenome) and (http://bioinformaticstation.org), respectively. CONCLUSIONS/SIGNIFICANCE The most important known copy number alterations for glioblastoma were correctly recovered using entropy as a measure of aberration. Additional alterations were identified in different pathways, such as cell proliferation, cell junctions and neural development. Moreover, novel candidates for oncogenes and tumor suppressors were also detected. A detailed map of aberrant regions is provided.
Collapse
Affiliation(s)
- Pablo Freire
- Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
- Laboratório Nacional de Computação Científica, Laboratório de Bioinformática, Petrópolis, Rio de Janeiro, Brasil
| | - Marco Vilela
- Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Helena Deus
- Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Yong-Wan Kim
- Department of Neuro-Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Dimpy Koul
- Department of Neuro-Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Howard Colman
- Department of Neuro-Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Kenneth D. Aldape
- Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Oliver Bogler
- Department of Neurosurgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - W. K. Alfred Yung
- Department of Neuro-Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Kevin Coombes
- Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| | - Gordon B. Mills
- Department of Systems Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Ana T. Vasconcelos
- Laboratório Nacional de Computação Científica, Laboratório de Bioinformática, Petrópolis, Rio de Janeiro, Brasil
| | - Jonas S. Almeida
- Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas United States of America
| |
Collapse
|
29
|
Harada T, Chelala C, Crnogorac-Jurcevic T, Lemoine NR. Genome-wide analysis of pancreatic cancer using microarray-based techniques. Pancreatology 2008; 9:13-24. [PMID: 19077451 DOI: 10.1159/000178871] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
BACKGROUND/AIMS Microarray-based comparative genomic hybridisation (CGH) has allowed high-resolution analysis of DNA copy number alterations across the entire cancer genome. Recent advances in bioinformatics tools enable us to perform a robust and highly sensitive analysis of array CGH data and facilitate the discovery of novel cancer-related genes. METHODS We analysed a total of 29 pancreatic ductal adenocarcinoma (PDAC) samples (6 cell lines and 23 microdissected tissue specimens) using 1-Mb-spaced CGH arrays. The transcript levels of all genes within the identified regions of genetic alterations were then screened using our Pancreatic Expression Database. RESULTS In addition to 238 high-level amplifications and 35 homozygous deletions, we identified 315 minimal common regions of 'non-random' genetic alterations (115 gains and 200 losses) which were consistently observed across our tumour samples. The small size of these aberrations (median size of 880 kb) contributed to the reduced number of candidate genes included (on average 12 Ensembl-annotated genes). The database has further specified the genes whose expression levels are consistent with their copy number status. Such genes were UQCRB, SQLE, DDEF1, SLA, ERICH1 and DLC1, indicating that these may be potential target candidates within regions of aberrations. CONCLUSION This study has revealed multiple novel regions that may indicate the locations of oncogenes or tumour suppressor genes in PDAC. Using the database, we provide a list of novel target genes whose altered DNA copy numbers could lead to significant changes in transcript levels in PDAC.
Collapse
Affiliation(s)
- Tomohiko Harada
- Centre for Molecular Oncology, Cancer Research UK, Institute of Cancer, Barts and The London School of Medicine and Dentistry, Queen Mary, University of London, London, UK
| | | | | | | |
Collapse
|
30
|
Chiang DY, Getz G, Jaffe DB, O'Kelly MJT, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 2008; 6:99-103. [PMID: 19043412 DOI: 10.1038/nmeth.1276] [Citation(s) in RCA: 376] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2008] [Accepted: 10/28/2008] [Indexed: 12/29/2022]
Abstract
Cancer results from somatic alterations in key genes, including point mutations, copy-number alterations and structural rearrangements. A powerful way to discover cancer-causing genes is to identify genomic regions that show recurrent copy-number alterations (gains and losses) in tumor genomes. Recent advances in sequencing technologies suggest that massively parallel sequencing may provide a feasible alternative to DNA microarrays for detecting copy-number alterations. Here we present: (i) a statistical analysis of the power to detect copy-number alterations of a given size; (ii) SegSeq, an algorithm to segment equal copy numbers from massively parallel sequence data; and (iii) analysis of experimental data from three matched pairs of tumor and normal cell lines. We show that a collection of approximately 14 million aligned sequence reads from human cell lines has comparable power to detect events as the current generation of DNA microarrays and has over twofold better precision for localizing breakpoints (typically, to within approximately 1 kilobase).
Collapse
Affiliation(s)
- Derek Y Chiang
- Broad Institute, Massachusetts Institute of Technology, 7 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Taylor BS, Barretina J, Socci ND, Decarolis P, Ladanyi M, Meyerson M, Singer S, Sander C. Functional copy-number alterations in cancer. PLoS One 2008; 3:e3179. [PMID: 18784837 PMCID: PMC2527508 DOI: 10.1371/journal.pone.0003179] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Accepted: 08/19/2008] [Indexed: 11/24/2022] Open
Abstract
Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer.
Collapse
Affiliation(s)
- Barry S Taylor
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
Over the years, methods of cytogenetic analysis evolved and became part of routine laboratory testing, providing valuable diagnostic and prognostic information in hematologic disorders. Karyotypic aberrations contribute to the understanding of the molecular pathogenesis of disease and thereby to rational application of therapeutic modalities. Most of the progress in this field stems from the application of metaphase cytogenetics (MC), but recently, novel molecular technologies have been introduced that complement MC and overcome many of the limitations of traditional cytogenetics, including a need for cell culture. Whole genome scanning using comparative genomic hybridization and single nucleotide polymorphism arrays (CGH-A; SNP-A) can be used for analysis of somatic or clonal unbalanced chromosomal defects. In SNP-A, the combination of copy number detection and genotyping enables diagnosis of copy-neutral loss of heterozygosity, a lesion that cannot be detected using MC but may have important pathogenetic implications. Overall, whole genome scanning arrays, despite the drawback of an inability to detect balanced translocations, allow for discovery of chromosomal defects in a higher proportion of patients with hematologic malignancies. Newly detected chromosomal aberrations, including somatic uniparental disomy, may lead to more precise prognostic schemes in many diseases.
Collapse
|