1
|
Jiang W, Bogdan M, Josse J, Majewski S, Miasojedow B, Ročková V. Adaptive Bayesian SLOPE: Model Selection With Incomplete Data. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1963263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Wei Jiang
- Inria XPOP and CMAP, École Polytechnique, Palaiseau, France
| | - Małgorzata Bogdan
- Faculty of Mathematics and Computer Science Reference, University of Wroclaw, Wroclaw, Poland
- Department of Statistics Reference, Lund University, Lund, Sweden
| | - Julie Josse
- Inria XPOP and CMAP, École Polytechnique, Palaiseau, France
| | | | - Błażej Miasojedow
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | | | | |
Collapse
|
2
|
Kremer PJ, Brzyski D, Bogdan M, Paterlini S. Sparse Index Clones via the sorted ℓ 1 - Norm. QUANTITATIVE FINANCE 2021; 22:349-366. [PMID: 35465255 PMCID: PMC9031478 DOI: 10.1080/14697688.2021.1962539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 07/26/2021] [Indexed: 06/14/2023]
Abstract
Index tracking and hedge fund replication aim at cloning the return time series properties of a given benchmark, by either using only a subset of its original constituents or by a set of risk factors. In this paper, we propose a model that relies on the Sorted ℓ 1 Penalized Estimator, called SLOPE, for index tracking and hedge fund replication. We show that SLOPE is capable of not only providing sparsity, but also to form groups among assets depending on their partial correlation with the index or the hedge fund return times series. The grouping structure can then be exploited to create individual investment strategies that allow building portfolios with a smaller number of active positions, but still comparable tracking properties. Considering equity index data and hedge fund returns, we discuss the real-world properties of SLOPE based approaches with respect to state-of-the art approaches.
Collapse
|
3
|
Wallin J, Bogdan M, Szulc PA, Doerge RW, Siegmund DO. Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects. Genetics 2021; 217:6067404. [PMID: 33789342 DOI: 10.1093/genetics/iyaa041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 12/10/2020] [Indexed: 11/14/2022] Open
Abstract
Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.
Collapse
Affiliation(s)
- Jonas Wallin
- Department of Statistics, Lund University, 220 07 Lund, Sweden
| | - Małgorzata Bogdan
- Department of Statistics, Lund University, 220 07 Lund, Sweden.,Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - Piotr A Szulc
- Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - R W Doerge
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15 213, USA.,Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15 213, USA
| | - David O Siegmund
- Department of Statistics, Stanford University, Stanford, CA 94 305, USA
| |
Collapse
|
4
|
Zhu G, Zhao T. Deep-gKnock: Nonlinear group-feature selection with deep neural networks. Neural Netw 2021; 135:139-147. [PMID: 33385830 DOI: 10.1016/j.neunet.2020.12.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 11/26/2020] [Accepted: 12/02/2020] [Indexed: 01/21/2023]
Abstract
Feature selection is central to contemporary high-dimensional data analysis. Group structure among features arises naturally in various scientific problems. Many methods have been proposed to incorporate the group structure information into feature selection. However, these methods are normally restricted to a linear regression setting. To relax the linear constraint, we design a new Deep Neural Network (DNN) architecture and integrating it with the recently proposed knockoff technique to perform nonlinear group-feature selection with controlled group-wise False Discovery Rate (gFDR). Experimental results on high-dimensional synthetic data demonstrate that our method achieves the highest power and accurate gFDR control compared with state-of-the-art methods. The performance of Deep-gKnock is especially superior in the following five situations: (1) nonlinearity relationship; (2) dimension p greater than sample size n; (3) high between-group correlation; (4) high within-group correlation; (5) large number of associated groups. And Deep-gKnock is also demonstrated to be robust to the misspecification of the feature distribution and the change of network architecture. Moreover, Deep-gKnock achieves scientifically meaningful group-feature selection results for cutting-edge real world datasets.
Collapse
Affiliation(s)
- Guangyu Zhu
- Department of Computer Science and Statistics, University of Rhode Island, United States of America.
| | - Tingting Zhao
- Department of Electrical and Computer Engineering, Northeastern University, United States of America
| |
Collapse
|
5
|
Liu L, Wang B, Han Q, Zhen C, Li J, Qu X, Wang F, Kong X, Zheng L. Bioinformatic Analysis to Identify a Multi-mRNA Signature for the Prediction of Metastasis in Hepatocellular Carcinoma. DNA Cell Biol 2020; 39:2028-2039. [PMID: 33147069 DOI: 10.1089/dna.2020.5513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Hepatocellular carcinoma (HCC) with metastasis indicates worse prognosis for patients. However, the current methods are insufficient to accurately predict HCC metastasis at early stage. Based on the expression profiles of three Gene Expression Omnibus datasets, the differentially expressed genes associated with HCC metastasis were screened by online analytical tool GEO2R and weighted gene co-expression network analysis. Second, a risk score model including 27-mRNA was established by univariate Cox regression analyses, time-dependent ROC curves and least absolute shrinkage and selection operator Cox regression analysis. Then, we validated the model in cohort The Cancer Genome Atlas-liver hepatocellular carcinoma and analyzed the functions and key signaling pathways of the genes associated with the risk score model. According to the risk score model, patients were divided into two subgroups (high risk and low risk groups). The metastasis rate between two subgroups was significantly different in training cohort (p < 0.0001, hazard ratio [HR]: 10.3, confidence interval [95% CI]: 6.827-15.55) and external validation cohort (p = 0.0008, HR: 1.768, 95% CI: 1.267-2.467). Multivariable analysis showed that the risk score model was superior to and independent of other clinical factors (such as tumor stage, tumor size, and other parameters) in predicting early HCC metastasis. Moreover, the risk score model could predict the overall survival of patients with HCC. Finally, most of 27-mRNA were enriched in exosome and membrane bounded organelle, and these were involved in transportation and metabolic biological process. Protein-protein interaction network analysis showed most of these genes might be key genes affecting the progression of HCC. In addition, 3 genes of 27-mRNA were also differentially expressed in peripheral blood mononuclear cell. In conclusion, by using two combined methods and a broader of HCC datasets, our study provided reliable and superior predictive model for HCC metastases, which will facilitate individual medical management for these high metastatic risk HCC patients.
Collapse
Affiliation(s)
- Longgen Liu
- Institute of Hepatology, The Third People's Hospital of Changzhou, Jiangsu, P.R. China
| | - Bingrui Wang
- Department of Tumor Interventional Oncology, Renji South Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, P.R. China
| | - Qiucheng Han
- Department of Liver Diseases, Central Laboratory, ShuGuang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine, Shanghai, P.R. China
| | - Chao Zhen
- Department of Tumor Interventional Oncology, Renji South Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, P.R. China
| | - Jichang Li
- Department of Tumor Interventional Oncology, Renji South Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, P.R. China
| | - Xiaoye Qu
- Department of Tumor Interventional Oncology, Renji South Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, P.R. China
| | - Fang Wang
- Department of Liver Diseases, Central Laboratory, ShuGuang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine, Shanghai, P.R. China
| | - Xiaoni Kong
- Department of Liver Diseases, Central Laboratory, ShuGuang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine, Shanghai, P.R. China
| | - Liming Zheng
- Institute of Hepatology, The Third People's Hospital of Changzhou, Jiangsu, P.R. China
| |
Collapse
|
6
|
Structure Learning of Gaussian Markov Random Fields with False Discovery Rate Control. Symmetry (Basel) 2019. [DOI: 10.3390/sym11101311] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In this paper, we propose a new estimation procedure for discovering the structure of Gaussian Markov random fields (MRFs) with false discovery rate (FDR) control, making use of the sorted ℓ 1 -norm (SL1) regularization. A Gaussian MRF is an acyclic graph representing a multivariate Gaussian distribution, where nodes are random variables and edges represent the conditional dependence between the connected nodes. Since it is possible to learn the edge structure of Gaussian MRFs directly from data, Gaussian MRFs provide an excellent way to understand complex data by revealing the dependence structure among many inputs features, such as genes, sensors, users, documents, etc. In learning the graphical structure of Gaussian MRFs, it is desired to discover the actual edges of the underlying but unknown probabilistic graphical model—it becomes more complicated when the number of random variables (features) p increases, compared to the number of data points n. In particular, when p ≫ n , it is statistically unavoidable for any estimation procedure to include false edges. Therefore, there have been many trials to reduce the false detection of edges, in particular, using different types of regularization on the learning parameters. Our method makes use of the SL1 regularization, introduced recently for model selection in linear regression. We focus on the benefit of SL1 regularization that it can be used to control the FDR of detecting important random variables. Adapting SL1 for probabilistic graphical models, we show that SL1 can be used for the structure learning of Gaussian MRFs using our suggested procedure nsSLOPE (neighborhood selection Sorted L-One Penalized Estimation), controlling the FDR of detecting edges.
Collapse
|
7
|
Hilgers RD, Bogdan M, Burman CF, Dette H, Karlsson M, König F, Male C, Mentré F, Molenberghs G, Senn S. Lessons learned from IDeAl - 33 recommendations from the IDeAl-net about design and analysis of small population clinical trials. Orphanet J Rare Dis 2018; 13:77. [PMID: 29751809 PMCID: PMC5948846 DOI: 10.1186/s13023-018-0820-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 05/01/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND IDeAl (Integrated designs and analysis of small population clinical trials) is an EU funded project developing new statistical design and analysis methodologies for clinical trials in small population groups. Here we provide an overview of IDeAl findings and give recommendations to applied researchers. METHOD The description of the findings is broken down by the nine scientific IDeAl work packages and summarizes results from the project's more than 60 publications to date in peer reviewed journals. In addition, we applied text mining to evaluate the publications and the IDeAl work packages' output in relation to the design and analysis terms derived from in the IRDiRC task force report on small population clinical trials. RESULTS The results are summarized, describing the developments from an applied viewpoint. The main result presented here are 33 practical recommendations drawn from the work, giving researchers a comprehensive guidance to the improved methodology. In particular, the findings will help design and analyse efficient clinical trials in rare diseases with limited number of patients available. We developed a network representation relating the hot topics developed by the IRDiRC task force on small population clinical trials to IDeAl's work as well as relating important methodologies by IDeAl's definition necessary to consider in design and analysis of small-population clinical trials. These network representation establish a new perspective on design and analysis of small-population clinical trials. CONCLUSION IDeAl has provided a huge number of options to refine the statistical methodology for small-population clinical trials from various perspectives. A total of 33 recommendations developed and related to the work packages help the researcher to design small population clinical trial. The route to improvements is displayed in IDeAl-network representing important statistical methodological skills necessary to design and analysis of small-population clinical trials. The methods are ready for use.
Collapse
Affiliation(s)
- Ralf-Dieter Hilgers
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany.
| | - Malgorzata Bogdan
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Carl-Fredrik Burman
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Holger Dette
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Mats Karlsson
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Franz König
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Christoph Male
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - France Mentré
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Geert Molenberghs
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| | - Stephen Senn
- Department of Medical Statistics, RWTH Aachen University, Pauwelsstr. 19, D-52074, Aachen, Germany
| |
Collapse
|