1
|
Wang J, Gui L, Su WJ, Sabatti C, Owen AB. DETECTING MULTIPLE REPLICATING SIGNALS USING ADAPTIVE FILTERING PROCEDURES. Ann Stat 2022; 50:1890-1909. [PMID: 39421244 PMCID: PMC11486506 DOI: 10.1214/21-aos2139] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, different populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple studies. In many contemporary applications, for example, comparing multiple high-throughput genetic experiments, a large number M of PC nulls need to be tested simultaneously, calling for a multiple comparisons correction. However, standard multiple testing adjustments on the M PC p -values can be severely conservative, especially when M is large and the signals are sparse. We introduce AdaFilter, a new multiple testing procedure that increases power by adaptively filtering out unlikely candidates of PC nulls. We prove that AdaFilter can control FWER and FDR as long as data across studies are independent, and has much higher power than other existing methods. We illustrate the application of AdaFilter with three examples: microarray studies of Duchenne muscular dystrophy, single-cell RNA sequencing of T cells in lung cancer tumors and GWAS for metabolomics.
Collapse
Affiliation(s)
- Jingshu Wang
- Department of Statistics, The University of Chicago
| | - Lin Gui
- Department of Statistics, The University of Chicago
| | - Weijie J. Su
- Department of Statistics and Data Science, University of Pennsylvania
| | | | - Art B. Owen
- Department of Statistics, Stanford University
| |
Collapse
|
2
|
Synthesis, structure and properties of tris(1-ethyl-4-isopropyl-imidazolyl-κN)phosphine copper(II). Inorganica Chim Acta 2015. [DOI: 10.1016/j.ica.2015.05.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
3
|
Subramanian A, Shackney S, Schwartz R. Novel multisample scheme for inferring phylogenetic markers from whole genome tumor profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1422-1431. [PMID: 24407301 PMCID: PMC3830698 DOI: 10.1109/tcbb.2013.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Computational cancer phylogenetics seeks to enumerate the temporal sequences of aberrations in tumor evolution, thereby delineating the evolution of possible tumor progression pathways, molecular subtypes, and mechanisms of action. We previously developed a pipeline for constructing phylogenies describing evolution between major recurring cell types computationally inferred from whole-genome tumor profiles. The accuracy and detail of the phylogenies, however, depend on the identification of accurate, high-resolution molecular markers of progression, i.e., reproducible regions of aberration that robustly differentiate different subtypes and stages of progression. Here, we present a novel hidden Markov model (HMM) scheme for the problem of inferring such phylogenetically significant markers through joint segmentation and calling of multisample tumor data. Our method classifies sets of genome-wide DNA copy number measurements into a partitioning of samples into normal (diploid) or amplified at each probe. It differs from other similar HMM methods in its design specifically for the needs of tumor phylogenetics, by seeking to identify robust markers of progression conserved across a set of copy number profiles. We show an analysis of our method in comparison to other methods on both synthetic and real tumor data, which confirms its effectiveness for tumor phylogeny inference and suggests avenues for future advances.
Collapse
Affiliation(s)
- Ayshwarya Subramanian
- Graduate student at the Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213.
| | | | | |
Collapse
|
4
|
McCallum KJ, Wang JP. Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions. Biostatistics 2013; 14:600-11. [PMID: 23428932 DOI: 10.1093/biostatistics/kxt003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.
Collapse
|
5
|
Nilsen G, Liestøl K, Van Loo P, Moen Vollan HK, Eide MB, Rueda OM, Chin SF, Russell R, Baumbusch LO, Caldas C, Børresen-Dale AL, Lingjaerde OC. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 2012; 13:591. [PMID: 23442169 PMCID: PMC3582591 DOI: 10.1186/1471-2164-13-591] [Citation(s) in RCA: 212] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 10/15/2012] [Indexed: 12/15/2022] Open
Abstract
Background Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. Results A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. Conclusions The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
Collapse
Affiliation(s)
- Gro Nilsen
- Biomedical Informatics, Dept of Informatics, University of Oslo, Oslo, Norway
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Yang TY. Simple binary segmentation frameworks for identifying variation in DNA copy number. BMC Bioinformatics 2012; 13:277. [PMID: 23107320 PMCID: PMC3571941 DOI: 10.1186/1471-2105-13-277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 10/22/2012] [Indexed: 11/25/2022] Open
Abstract
Background Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nested hypothesis tests, each using the Bayesian information criterion. Results Our procedure is convenient for analyzing DNA copy number in two general situations: (1) when using data from multiple sources and (2) when using cohort analysis of multiple patients suffering from the same type of cancer. In the first case, data from multiple sources such as different platforms, labs, or preprocessing methods are used to study variation in copy number in the same individual. Combining these sources provides a higher resolution, which leads to a more detailed genome-wide survey of the individual. In this case, we provide a simple statistical framework to derive a consensus molecular signature. In the framework, the multiple sequences from various sources are integrated into a single sequence, and then the proposed segmentation procedure is applied to this sequence to detect aberrant regions. In the second case, cohort analysis of multiple patients is carried out to derive overall molecular signatures for the cohort. For this case, we provide another simple statistical framework in which data across multiple profiles is standardized before segmentation. The proposed segmentation procedure is then applied to the standardized profiles one at a time to detect aberrant regions. Any such regions that are common across two or more profiles are probably real and may play important roles in the cancer pathogenesis process. Conclusions The main advantages of the proposed procedure are flexibility and simplicity.
Collapse
Affiliation(s)
- Tae Young Yang
- Department of Mathematics, Myongji University, Yongin, Kyonggi, 449-728, Korea.
| |
Collapse
|
7
|
Zhang Z, Lange K, Sabatti C. Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics 2012; 13:205. [PMID: 22897923 PMCID: PMC3534631 DOI: 10.1186/1471-2105-13-205] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Accepted: 07/27/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. RESULTS We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. CONCLUSIONS The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Kenneth Lange
- Department of Human Genetics, Biomathematics and Statistics, University of California, Los Angeles, CA, USA
| | - Chiara Sabatti
- Department of Health Research and Policy and Statistics, Stanford University, Stanford, CA, USA
| |
Collapse
|
8
|
Abstract
Psychiatric disorders are multifactorial in nature with complex genetic architecture. A number of recent studies, building upon earlier findings of copy number variants (CNVs) at the 22q11.2 locus, suggest that rare CNVs represent an important component of genetic heterogeneity in the etiology of complex psychiatric diseases, such as schizophrenia. De novo CNVs are found with higher frequency among sporadic cases, whereas inherited CNVs are enriched among familial cases. Despite substantial progress, a number of challenges remain, such as pinpointing causative relationships between specific gene(s) affected by CNVs and disease phenotypes as well as distinguishing abnormal structural mutations from neutral polymorphisms and establishing a clear association between individual pathogenic CNV and disease phenotypes.
Collapse
Affiliation(s)
- Rebecca J Levy
- Department of Psychiatry, Columbia University Medical Center, New York, NY, USA
| | | | | | | |
Collapse
|
9
|
A meta-analysis of array-CGH studies implicates antiviral immunity pathways in the development of hepatocellular carcinoma. PLoS One 2011; 6:e28404. [PMID: 22174799 PMCID: PMC3236189 DOI: 10.1371/journal.pone.0028404] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 11/07/2011] [Indexed: 12/20/2022] Open
Abstract
Background The development and progression of hepatocellular carcinoma (HCC) is significantly correlated to the accumulation of genomic alterations. Array-based comparative genomic hybridization (array CGH) has been applied to a wide range of tumors including HCCs for the genome-wide high resolution screening of DNA copy number changes. However, the relevant chromosomal variations that play a central role in the development of HCC still are not fully elucidated. Methods In present study, in order to further characterize the copy number alterations (CNAs) important to HCC development, we conducted a meta-analysis of four published independent array-CGH datasets including total 159 samples. Results Eighty five significant gains (frequency ≥25%) were mostly mapped to five broad chromosomal regions including 1q, 6p, 8q, 17q and 20p, as well as two narrow regions 5p15.33 and 9q34.2-34.3. Eighty eight significant losses (frequency ≥25%) were most frequently present in 4q, 6q, 8p, 9p, 13q, 14q, 16q, and 17p. Significant correlations existed between chromosomal aberrations either located on the same chromosome or the different chromosomes. HCCs with different etiologies largely exhibited surprisingly similar profiles of chromosomal aberrations with only a few exceptions. Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated that the genes affected by these chromosomal aberrations were significantly enriched in 31 canonical pathways with the highest enrichment observed for antiviral immunity pathways. Conclusions Taken together, our findings provide novel and important clues for the implications of antiviral immunity-related gene pathways in the pathogenesis and progression of HCC.
Collapse
|
10
|
Morganella S, Pagnotta SM, Ceccarelli M. Finding recurrent copy number alterations preserving within-sample homogeneity. ACTA ACUST UNITED AC 2011; 27:2949-56. [PMID: 21873327 DOI: 10.1093/bioinformatics/btr488] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published. RESULTS We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study. AVAILABILITY GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia. CONTACT ceccarelli@unisannio.it; morganella@unisannio.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Science, University of Sannio, 82100, Benevento, Italy.
| | | | | |
Collapse
|
11
|
Siegmund D, Yakir B, Zhang NR. Detecting simultaneous variant intervals in aligned sequences. Ann Appl Stat 2011. [DOI: 10.1214/10-aoas400] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Teo SM, Pawitan Y, Kumar V, Thalamuthu A, Seielstad M, Chia KS, Salim A. Multi-platform segmentation for joint detection of copy number variants. Bioinformatics 2011; 27:1555-61. [PMID: 21471018 DOI: 10.1093/bioinformatics/btr162] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION With the expansion of whole-genome studies, there is rapid evolution of genotyping platforms. This leads to practical issues such as upgrading of genotyping equipment which often results in research groups having data from different platforms for the same samples. While having more data can potentially yield more accurate copy-number estimates, combining such data is not straightforward as different platforms show different degrees of attenuation of the true copy-number or different noise characteristics and marker panels. Currently, there is still a relative lack of procedures for combining information from different platforms. RESULTS We develop a method, called MPSS, based on a correlated random-effect model for the unobserved patterns and extend the robust smooth segmentation approach to the multiple-platform scenario. We also propose an objective criterion for discrete segmentation required for downstream analyses. For each identified segment, the software reports a P-value to indicate the likelihood of the segment being a true CNV. From the analyses of real and simulated data, we show that MPSS has better operating characteristics when compared to single-platform methods, and have substantially higher sensitivity compared to an existing multiplatform method. AVAILABILITY The methods are implemented in an R package MPSS, and the source is available from http://www.meb.ki.se/~yudpaw.
Collapse
Affiliation(s)
- Shu Mei Teo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, National University of Singapore, Singapore
| | | | | | | | | | | | | |
Collapse
|
13
|
Picard F, Lebarbier E, Hoebeke M, Rigaill G, Thiam B, Robin S. Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 2011; 12:413-28. [PMID: 21209153 DOI: 10.1093/biostatistics/kxq076] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed in the univariate case to the joint analysis of multiple CGH profiles. Our contribution is multiple: we propose an integrated model to perform joint segmentation, normalization, and calling for multiple array CGH profiles. This model shows great flexibility, especially in the modeling of the wave effect that gives a likelihood framework to approaches proposed by others. We propose a new dynamic programming algorithm for break point positioning, as well as a model selection criterion based on a modified bayesian information criterion proposed in the univariate case. The performance of our method is assessed using simulated and real data sets. Our method is implemented in the R package cghseg.
Collapse
Affiliation(s)
- Franck Picard
- Laboratoire de Biometrie et Biologie Evolutive, UMR CNRS 5558 - Univ. Lyon 1, F-69622, Villeurbanne, France.
| | | | | | | | | | | |
Collapse
|
14
|
Bengtsson H, Neuvial P, Speed TP. TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics 2010; 11:245. [PMID: 20462408 PMCID: PMC2894037 DOI: 10.1186/1471-2105-11-245] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/12/2010] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND High-throughput genotyping microarrays assess both total DNA copy number and allelic composition, which makes them a tool of choice for copy number studies in cancer, including total copy number and loss of heterozygosity (LOH) analyses. Even after state of the art preprocessing methods, allelic signal estimates from genotyping arrays still suffer from systematic effects that make them difficult to use effectively for such downstream analyses. RESULTS We propose a method, TumorBoost, for normalizing allelic estimates of one tumor sample based on estimates from a single matched normal. The method applies to any paired tumor-normal estimates from any microarray-based technology, combined with any preprocessing method. We demonstrate that it increases the signal-to-noise ratio of allelic signals, making it significantly easier to detect allelic imbalances. CONCLUSIONS TumorBoost increases the power to detect somatic copy-number events (including copy-neutral LOH) in the tumor from allelic signals of Affymetrix or Illumina origin. We also conclude that high-precision allelic estimates can be obtained from a single pair of tumor-normal hybridizations, if TumorBoost is combined with single-array preprocessing methods such as (allele-specific) CRMA v2 for Affymetrix or BeadStudio's (proprietary) XY-normalization method for Illumina. A bounded-memory implementation is available in the open-source and cross-platform R package aroma.cn, which is part of the Aroma Project (http://www.aroma-project.org/).
Collapse
Affiliation(s)
- Henrik Bengtsson
- Department of Statistics, University of California, Berkeley, USA
| | - Pierre Neuvial
- Department of Statistics, University of California, Berkeley, USA
| | - Terence P Speed
- Department of Statistics, University of California, Berkeley, USA
- Bioinformatics Division, Walter & Eliza Hall Institute of Medical Research, Parkville, Australia
| |
Collapse
|
15
|
Zhang NR. DNA Copy Number Profiling in Normal and Tumor Genomes. FRONTIERS IN COMPUTATIONAL AND SYSTEMS BIOLOGY 2010. [DOI: 10.1007/978-1-84996-196-7_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|