1
|
Landeros A, Xu J, Lange K. MM optimization: Proximal distance algorithms, path following, and trust regions. Proc Natl Acad Sci U S A 2023; 120:e2303168120. [PMID: 37339185 PMCID: PMC10319036 DOI: 10.1073/pnas.2303168120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 05/09/2023] [Indexed: 06/22/2023] Open
Abstract
We briefly review the majorization-minimization (MM) principle and elaborate on the closely related notion of proximal distance algorithms, a generic approach for solving constrained optimization problems via quadratic penalties. We illustrate how the MM and proximal distance principles apply to a variety of problems from statistics, finance, and nonlinear optimization. Drawing from our selected examples, we also sketch a few ideas pertinent to the acceleration of MM algorithms: a) structuring updates around efficient matrix decompositions, b) path following in proximal distance iteration, and c) cubic majorization and its connections to trust region methods. These ideas are put to the test on several numerical examples, but for the sake of brevity, we omit detailed comparisons to competing methods. The current article, which is a mix of review and current contributions, celebrates the MM principle as a powerful framework for designing optimization algorithms and reinterpreting existing ones.
Collapse
Affiliation(s)
- Alfonso Landeros
- Department of Computational Medicine, University of California, Los Angeles, CA90095
| | - Jason Xu
- Department of Statistical Science, Duke University, Durham, NC27708
| | - Kenneth Lange
- Department of Computational Medicine, University of California, Los Angeles, CA90095
- Department of Human Genetics, University of California, Los Angeles, CA90095
- Department of Statistics, University of California, Los Angeles, CA90095
| |
Collapse
|
2
|
Abstract
AbstractMultivariate networks comprising several compositional and structural variables can be represented as multigraphs by various forms of aggregations based on vertex attributes. We propose a framework to perform exploratory and confirmatory multiplexity analysis of aggregated multigraphs in order to find relevant associations between vertex and edge attributes. The exploration is performed by comparing frequencies of the different edges within and between aggregated vertex categories, while the confirmatory analysis is performed using derived complexity or multiplexity statistics under different random multigraph models. These statistics are defined by the distribution of edge multiplicities and provide information on the covariation and dependencies of different edges given vertex attributes. The presented approach highlights the need to further analyse and model structural dependencies with respect to edge entrainment. We illustrate the approach by applying it on a well known multivariate network dataset which has previously been analysed in the context of multiplexity.
Collapse
|
3
|
Khan AH, Lin A, Wang RT, Bloom JS, Lange K, Smith DJ. Pooled analysis of radiation hybrids identifies loci for growth and drug action in mammalian cells. Genome Res 2020; 30:1458-1467. [PMID: 32878976 PMCID: PMC7605260 DOI: 10.1101/gr.262204.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022]
Abstract
Genetic screens in mammalian cells commonly focus on loss-of-function approaches. To evaluate the phenotypic consequences of extra gene copies, we used bulk segregant analysis (BSA) of radiation hybrid (RH) cells. We constructed six pools of RH cells, each consisting of ∼2500 independent clones, and placed the pools under selection in media with or without paclitaxel. Low pass sequencing identified 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci for mitochondrial abundance at genome-wide significance. Resolution was measured as ∼30 kb, close to single-gene. Divergent properties were displayed by the RH-BSA growth genes compared to those from loss-of-function screens, refuting the balance hypothesis. In addition, enhanced retention of human centromeres in the RH pools suggests a new approach to functional dissection of these chromosomal elements. Pooled analysis of RH cells showed high power and resolution and should be a useful addition to the mammalian genetic toolkit.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| | - Andy Lin
- Office of Information Technology, UCLA, Los Angeles, California 90095-1557, USA
| | - Richard T Wang
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
- Howard Hughes Medical Institute, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| |
Collapse
|
4
|
Huang XT, Zhu Y, Chan LLH, Zhao Z, Yan H. An integrative C. elegans protein-protein interaction network with reliability assessment based on a probabilistic graphical model. MOLECULAR BIOSYSTEMS 2016; 12:85-92. [PMID: 26555698 DOI: 10.1039/c5mb00417a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In Caenorhabditis elegans, a large number of protein-protein interactions (PPIs) are identified by different experiments. However, a comprehensive weighted PPI network, which is essential for signaling pathway inference, is not yet available in this model organism. Therefore, we firstly construct an integrative PPI network in C. elegans with 12,951 interactions involving 5039 proteins from seven molecular interaction databases. Then, a reliability score based on a probabilistic graphical model (RSPGM) is proposed to assess PPIs. It assumes that the random number of interactions between two proteins comes from the Bernoulli distribution to avoid multi-links. The main parameter of the RSPGM score contains a few latent variables which can be considered as several common properties between two proteins. Validations on high-confidence yeast datasets show that RSPGM provides more accurate evaluation than other approaches, and the PPIs in the reconstructed PPI network have higher biological relevance than that in the original network in terms of gene ontology, gene expression, essentiality and the prediction of known protein complexes. Furthermore, this weighted integrative PPI network in C. elegans is employed on inferring interaction path of the canonical Wnt/β-catenin pathway as well. Most genes on the inferred interaction path have been validated to be Wnt pathway components. Therefore, RSPGM is essential and effective for evaluating PPIs and inferring interaction path. Finally, the PPI network with RSPGM scores can be queried and visualized on a user interactive website, which is freely available at .
Collapse
Affiliation(s)
- Xiao-Tai Huang
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Yuan Zhu
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China and School of Automation, China University of Geosciences, Wuhan, China.
| | - Leanne Lai Hang Chan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Zhongying Zhao
- Department of Biology, Faculty of Science, Hong Kong Baptist University, Hong Kong, China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
5
|
Amal ES, Ahmed FA, Magdy AA, Shabaan HA, Tamer ME. Isolation and characterization of two malathion-degrading Pseudomonas sp. in Egypt. ACTA ACUST UNITED AC 2016. [DOI: 10.5897/ajb2016.15273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
6
|
Lange K, Chi EC, Zhou H. A Brief Survey of Modern Optimization for Statisticians. Int Stat Rev 2014; 82:46-70. [PMID: 25242858 PMCID: PMC4166522 DOI: 10.1111/insr.12022] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 04/20/2013] [Indexed: 11/30/2022]
Abstract
Modern computational statistics is turning more and more to high-dimensional optimization to handle the deluge of big data. Once a model is formulated, its parameters can be estimated by optimization. Because model parsimony is important, models routinely include nondifferentiable penalty terms such as the lasso. This sober reality complicates minimization and maximization. Our broad survey stresses a few important principles in algorithm design. Rather than view these principles in isolation, it is more productive to mix and match them. A few well chosen examples illustrate this point. Algorithm derivation is also emphasized, and theory is downplayed, particularly the abstractions of the convex calculus. Thus, our survey should be useful and accessible to a broad audience.
Collapse
Affiliation(s)
- Kenneth Lange
- Departments of Biomathematics, Human Genetics, and Statistics University of California Los Angeles, CA 90095-1766
| | - Eric C Chi
- Department of Human Genetics University of California Los Angeles, CA 90095
| | - Hua Zhou
- Department of Statistics North Carolina State University Raleigh, NC 27695-8203
| |
Collapse
|
7
|
Ranola JM, Langfelder P, Lange K, Horvath S. Cluster and propensity based approximation of a network. BMC SYSTEMS BIOLOGY 2013; 7:21. [PMID: 23497424 PMCID: PMC3663730 DOI: 10.1186/1752-0509-7-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 02/14/2013] [Indexed: 11/15/2022]
Abstract
Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust.
Collapse
|
8
|
Zhu Y, Zhang XF, Dai DQ, Wu MY. Identifying spurious interactions and predicting missing interactions in the protein-protein interaction networks via a generative network model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:219-225. [PMID: 23702559 DOI: 10.1109/tcbb.2012.164] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
With the rapid development of high-throughput experiment techniques for protein-protein interaction (PPI) detection, a large amount of PPI network data are becoming available. However, the data produced by these techniques have high levels of spurious and missing interactions. This study assigns a new reliably indication for each protein pairs via the new generative network model (RIGNM) where the scale-free property of the PPI network is considered to reliably identify both spurious and missing interactions in the observed high-throughput PPI network. The experimental results show that the RIGNM is more effective and interpretable than the compared methods, which demonstrate that this approach has the potential to better describe the PPI networks and drive new discoveries.
Collapse
Affiliation(s)
- Yuan Zhu
- Department of Mathematics, Guangdong University of Business Studies, Guangzhou, China.
| | | | | | | |
Collapse
|
9
|
Zhang XF, Dai DQ, Ou-Yang L, Wu MY. Exploring overlapping functional units with various structure in protein interaction networks. PLoS One 2012; 7:e43092. [PMID: 22916212 PMCID: PMC3423443 DOI: 10.1371/journal.pone.0043092] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 07/16/2012] [Indexed: 11/18/2022] Open
Abstract
Revealing functional units in protein-protein interaction (PPI) networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non-cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM) to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Dao-Qing Dai
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
- * E-mail:
| | - Le Ou-Yang
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Meng-Yun Wu
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
10
|
Zhang XF, Dai DQ, Li XX. Protein complexes discovery based on protein-protein interaction data via a regularized sparse generative network model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:857-870. [PMID: 22291160 DOI: 10.1109/tcbb.2012.20] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China.
| | | | | |
Collapse
|
11
|
Abstract
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203
| | - Kenneth Lange
- Departments of Biomathematics, Human Genetics, and Statistics, 5357A Gonda Building, UCLA, Los Angeles, CA 90095-1766
| | - Marc A. Suchard
- Departments of Biomathematics, Biostatistics, and Human Genetics, 6558 Gonda Building, UCLA, Los Angeles, CA 90095-1766
| |
Collapse
|