1
|
Lyu X, Kang J, Li L. High-dimensional multisubject time series transition matrix inference with application to brain connectivity analysis. Biometrics 2024; 80:ujae021. [PMID: 38567733 PMCID: PMC10988359 DOI: 10.1093/biomtc/ujae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 04/05/2024]
Abstract
Brain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.
Collapse
Affiliation(s)
- Xiang Lyu
- Division of Biostatistics, University of California, Berkeley, CA 94720, United States
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Lexin Li
- Division of Biostatistics, University of California, Berkeley, CA 94720, United States
| |
Collapse
|
2
|
Cho AE, Xiao J, Wang C, Xu G. Regularized Variational Estimation for Exploratory Item Factor Analysis. Psychometrika 2024; 89:347-375. [PMID: 35831697 DOI: 10.1007/s11336-022-09874-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 05/09/2022] [Accepted: 06/02/2022] [Indexed: 06/15/2023]
Abstract
Item factor analysis (IFA), also known as Multidimensional Item Response Theory (MIRT), is a general framework for specifying the functional relationship between respondents' multiple latent traits and their responses to assessment items. The key element in MIRT is the relationship between the items and the latent traits, so-called item factor loading structure. The correct specification of this loading structure is crucial for accurate calibration of item parameters and recovery of individual latent traits. This paper proposes a regularized Gaussian Variational Expectation Maximization (GVEM) algorithm to efficiently infer item factor loading structure directly from data. The main idea is to impose an adaptive L 1 -type penalty to the variational lower bound of the likelihood to shrink certain loadings to 0. This new algorithm takes advantage of the computational efficiency of GVEM algorithm and is suitable for high-dimensional MIRT applications. Simulation studies show that the proposed method accurately recovers the loading structure and is computationally efficient. The new method is also illustrated using the National Education Longitudinal Study of 1988 (NELS:88) mathematics and science assessment data.
Collapse
Affiliation(s)
- April E Cho
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
| | - Jiaying Xiao
- College of Education, University of Washington, 312E Miller Hall, 2012 Skagit Ln, Seattle, WA, 98105, USA
| | - Chun Wang
- College of Education, University of Washington, 312E Miller Hall, 2012 Skagit Ln, Seattle, WA, 98105, USA.
| | - Gongjun Xu
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
3
|
Wang W, Tong G, Hirani SP, Newman SP, Halpern SD, Small DS, Li F, Harhay MO. A mixed model approach to estimate the survivor average causal effect in cluster-randomized trials. Stat Med 2024; 43:16-33. [PMID: 37985966 DOI: 10.1002/sim.9939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 09/05/2023] [Accepted: 10/12/2023] [Indexed: 11/22/2023]
Abstract
In many medical studies, the outcome measure (such as quality of life, QOL) for some study participants becomes informatively truncated (censored, missing, or unobserved) due to death or other forms of dropout, creating a nonignorable missing data problem. In such cases, the use of a composite outcome or imputation methods that fill in unmeasurable QOL values for those who died rely on strong and untestable assumptions and may be conceptually unappealing to certain stakeholders when estimating a treatment effect. The survivor average causal effect (SACE) is an alternative causal estimand that surmounts some of these issues. While principal stratification has been applied to estimate the SACE in individually randomized trials, methods for estimating the SACE in cluster-randomized trials are currently limited. To address this gap, we develop a mixed model approach along with an expectation-maximization algorithm to estimate the SACE in cluster-randomized trials. We model the continuous outcome measure with a random intercept to account for intracluster correlations due to cluster-level randomization, and model the principal strata membership both with and without a random intercept. In simulations, we compare the performance of our approaches with an existing fixed-effects approach to illustrate the importance of accounting for clustering in cluster-randomized trials. The methodology is then illustrated using a cluster-randomized trial of telecare and assistive technology on health-related QOL in the elderly.
Collapse
Affiliation(s)
- Wei Wang
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Guangyu Tong
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Prevention Science, Yale School of Public Health, New Haven, CT, USA
| | | | - Stanton P Newman
- School of Health Sciences, City University London, London, UK
- Division of Medicine, University College London, London, UK
| | - Scott D Halpern
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dylan S Small
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Fan Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Prevention Science, Yale School of Public Health, New Haven, CT, USA
| | - Michael O Harhay
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Hyun BS, Cape MR, Ribalet F, Bien J. MODELING CELL POPULATIONS MEASURED BY FLOW CYTOMETRY WITH COVARIATES USING SPARSE MIXTURE OF REGRESSIONS. Ann Appl Stat 2023; 17:357-377. [PMID: 37485300 PMCID: PMC10360992 DOI: 10.1214/22-aoas1631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
The ocean is filled with microscopic microalgae, called phytoplankton, which together are responsible for as much photosynthesis as all plants on land combined. Our ability to predict their response to the warming ocean relies on understanding how the dynamics of phytoplankton populations is influenced by changes in environmental conditions. One powerful technique to study the dynamics of phytoplankton is flow cytometry which measures the optical properties of thousands of individual cells per second. Today, oceanographers are able to collect flow cytometry data in real time onboard a moving ship, providing them with fine-scale resolution of the distribution of phytoplankton across thousands of kilometers. One of the current challenges is to understand how these small- and large-scale variations relate to environmental conditions, such as nutrient availability, temperature, light and ocean currents. In this paper we propose a novel sparse mixture of multivariate regressions model to estimate the time-varying phytoplankton subpopulations while simultaneously identifying the specific environmental covariates that are predictive of the observed changes to these subpopulations. We demonstrate the usefulness and interpretability of the approach using both synthetic data and real observations collected on an oceanographic cruise conducted in the northeast Pacific in the spring of 2017.
Collapse
Affiliation(s)
- By Sangwon Hyun
- Department of Data Sciences and Operations, University of Southern California
| | | | | | - Jacob Bien
- Department of Data Sciences and Operations, University of Southern California
| |
Collapse
|
5
|
Akdemir D, Somo M, Isidro-Sanchéz J. An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices. Axioms 2023; 12:161. [PMID: 37284612 PMCID: PMC10243021 DOI: 10.3390/axioms12020161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.
Collapse
Affiliation(s)
- Deniz Akdemir
- Center of International Bone Marrow Transplantation Research, Minneapolis, MN 55401-1206, USA
| | | | - Julio Isidro-Sanchéz
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, 28223, Madrid, Spain
| |
Collapse
|
6
|
Fu H, Nicolet D, Mrózek K, Stone RM, Eisfeld A, Byrd JC, Archer KJ. Controlled variable selection in Weibull mixture cure models for high-dimensional data. Stat Med 2022; 41:4340-4366. [PMID: 35792553 PMCID: PMC9545322 DOI: 10.1002/sim.9513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 06/14/2022] [Accepted: 06/19/2022] [Indexed: 12/03/2022]
Abstract
Medical breakthroughs in recent years have led to cures for many diseases. The mixture cure model (MCM) is a type of survival model that is often used when a cured fraction exists. Many have sought to identify genomic features associated with a time-to-event outcome which requires variable selection strategies for high-dimensional spaces. Unfortunately, currently few variable selection methods exist for MCMs especially when there are more predictors than samples. This study develops high-dimensional penalized Weibull MCMs, which allow for identification of prognostic factors associated with both cure status and/or survival. We demonstrated how such models may be estimated using two different iterative algorithms. The model-X knockoffs method was combined with these algorithms to control the false discovery rate (FDR) in variable selection. Through extensive simulation studies, our penalized MCMs have been shown to outperform alternative methods on multiple metrics and achieve high statistical power with FDR being controlled. In an acute myeloid leukemia (AML) application with gene expression data, our proposed approach identified 14 genes associated with potential cure and 12 genes with time-to-relapse, which may help inform treatment decisions for AML patients.
Collapse
Affiliation(s)
- Han Fu
- Division of BiostatisticsCollege of Public Health, The Ohio State UniversityColumbusOhioUSA
| | - Deedra Nicolet
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
- Alliance Statistics and Data Management CenterThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - Krzysztof Mrózek
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - Richard M. Stone
- Dana‐Farber/Partners CancerHarvard UniversityBostonMassachusettsUSA
| | - Ann‐Kathrin Eisfeld
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - John C. Byrd
- Department of Internal MedicineUniversity of CincinnatiCincinnatiOhioUSA
| | - Kellie J. Archer
- Division of BiostatisticsCollege of Public Health, The Ohio State UniversityColumbusOhioUSA
| |
Collapse
|
7
|
Orellana R, Carvajal R, Escárate P, Agüero JC. On the Uncertainty Identification for Linear Dynamic Systems Using Stochastic Embedding Approach with Gaussian Mixture Models. Sensors (Basel) 2021; 21:s21113837. [PMID: 34206104 PMCID: PMC8199550 DOI: 10.3390/s21113837] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/27/2021] [Accepted: 05/27/2021] [Indexed: 11/16/2022]
Abstract
In control and monitoring of manufacturing processes, it is key to understand model uncertainty in order to achieve the required levels of consistency, quality, and economy, among others. In aerospace applications, models need to be very precise and able to describe the entire dynamics of an aircraft. In addition, the complexity of modern real systems has turned deterministic models impractical, since they cannot adequately represent the behavior of disturbances in sensors and actuators, and tool and machine wear, to name a few. Thus, it is necessary to deal with model uncertainties in the dynamics of the plant by incorporating a stochastic behavior. These uncertainties could also affect the effectiveness of fault diagnosis methodologies used to increment the safety and reliability in real-world systems. Determining suitable dynamic system models of real processes is essential to obtain effective process control strategies and accurate fault detection and diagnosis methodologies that deliver good performance. In this paper, a maximum likelihood estimation algorithm for the uncertainty modeling in linear dynamic systems is developed utilizing a stochastic embedding approach. In this approach, system uncertainties are accounted for as a stochastic error term in a transfer function. In this paper, we model the error-model probability density function as a finite Gaussian mixture model. For the estimation of the nominal model and the probability density function of the parameters of the error-model, we develop an iterative algorithm based on the Expectation-Maximization algorithm using the data from independent experiments. The benefits of our proposal are illustrated via numerical simulations.
Collapse
Affiliation(s)
- Rafael Orellana
- Departamento Electrónica, Universidad Técnica Federico Santa María (UTFSM), Av. España 1680, Valparaíso 2390123, Chile; (R.C.); (J.C.A.)
- Advanced Center for Electrical and Electronic Engineering, AC3E, Av. Matta 222, Valparaíso 2580129, Chile
- Departamento de Ingeniería Electrica, Facultad de Ingeniería, Universidad de Los Andes, Av. Alberto Carnevalli, Mérida 5101, Venezuela
- Correspondence:
| | - Rodrigo Carvajal
- Departamento Electrónica, Universidad Técnica Federico Santa María (UTFSM), Av. España 1680, Valparaíso 2390123, Chile; (R.C.); (J.C.A.)
| | - Pedro Escárate
- Instituto de Electricidad y Electrónica, Facultad de Ciencias de la Ingeniería, Universidad Austral de Chile (UACH), Genaral Lagos 2086, Valdivia 5111187, Chile;
| | - Juan C. Agüero
- Departamento Electrónica, Universidad Técnica Federico Santa María (UTFSM), Av. España 1680, Valparaíso 2390123, Chile; (R.C.); (J.C.A.)
- Advanced Center for Electrical and Electronic Engineering, AC3E, Av. Matta 222, Valparaíso 2580129, Chile
| |
Collapse
|
8
|
Abstract
OBJECTIVE To illustrate a method that accounts for sampling variation in identifying suppliers and counties with outlying rates of a particular pattern of inconsistent billing for ambulance services to Medicare. DATA SOURCES US Medicare claims for a 20% simple random sample of 2010-2014 fee-for-service beneficiaries. STUDY DESIGN We identified instances in which ambulance suppliers billed Medicare for transporting a patient to a hospital, but no corresponding hospital visit appeared in billing claims. We estimated the distributions of outlier supplier and county rates of such "ghost rides" by fitting a nonparametric empirical Bayes model with flexible distributional assumptions to account for sampling variation. DATA COLLECTION We included Basic and advanced life support ground emergency ambulance claims with a hospital destination. PRINCIPAL FINDINGS "Ghost ride" rates varied considerably across both ambulance suppliers and counties. We estimated 6.1% of suppliers and 5.0% of counties had rates that exceeded 3.6%, which was twice the national average of "ghost rides" (1.8% of all ambulance transports). CONCLUSIONS Health care fraud and abuse are frequently asserted but can be difficult to detect. Our data-driven approach may be a useful starting point for further investigation.
Collapse
Affiliation(s)
- Prachi Sanghavi
- Biological Sciences Division, Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
| | - Anupam B Jena
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,National Bureau of Economic Research, Cambridge, Massachusetts, USA
| | - Joseph P Newhouse
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA.,National Bureau of Economic Research, Cambridge, Massachusetts, USA.,Department of Health Policy and Management, Harvard School of Public Health, Boston, Massachusetts, USA.,Harvard Kennedy School, Cambridge, Massachusetts, USA
| | - Alan M Zaslavsky
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
9
|
Akdemir D, Knox R, Isidro y Sánchez J. Combining Partially Overlapping Multi-Omics Data in Databases Using Relationship Matrices. Front Plant Sci 2020; 11:947. [PMID: 32765543 PMCID: PMC7381228 DOI: 10.3389/fpls.2020.00947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/10/2020] [Indexed: 05/08/2023]
Abstract
Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.
Collapse
Affiliation(s)
- Deniz Akdemir
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
| | - Ron Knox
- SCRDC-CRDSW, Swift Current Research and Developmental Centre, Swift Current, SK, Canada
| | - Julio Isidro y Sánchez
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM – INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| |
Collapse
|
10
|
Koslovsky MD, Swartz MD, Leon-Novelo L, Chan W, Wilkinson AV. Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. J STAT COMPUT SIM 2018; 88:575-596. [PMID: 29731525 DOI: 10.1080/00949655.2017.1398255] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.
Collapse
Affiliation(s)
- M D Koslovsky
- Department of Biostatistics, UTHealth, Houston, TX, USA
| | - M D Swartz
- Department of Biostatistics, UTHealth, Houston, TX, USA
| | - L Leon-Novelo
- Department of Biostatistics, UTHealth, Houston, TX, USA
| | - W Chan
- Department of Biostatistics, UTHealth, Houston, TX, USA
| | - A V Wilkinson
- Department of Epidemiology, UTHealth, Austin, TX, USA
| |
Collapse
|
11
|
Abstract
At present, most existing cognitive diagnosis models (CDMs) are designed to either identify the presence and absence of skills or misconceptions, but not both. This article proposes a CDM that can be used to simultaneously identify what skills and misconceptions students possess. In addition, it proposes the use of the expectation-maximization algorithm to estimate the model parameters. A simulation study is conducted to evaluate the viability of the proposed model and algorithm. Real data are analyzed to demonstrate the applicability of the proposed model, and compare it with existing CDMs. Furthermore, a real data-based simulation study is conducted to determine how the correct classification rates in the context of the proposed model can be improved. Issues related to the proposed model and future research are discussed.
Collapse
Affiliation(s)
- Bor-Chen Kuo
- National Taichung University of Education, Taiwan
| | | | | |
Collapse
|
12
|
Yang J, Fritsche LG, Zhou X, Abecasis G. A Scalable Bayesian Method for Integrating Functional Information in Genome-wide Association Studies. Am J Hum Genet 2017; 101:404-416. [PMID: 28844487 DOI: 10.1016/j.ajhg.2017.08.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 08/03/2017] [Indexed: 11/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many complex loci. However, most loci reside in noncoding regions and have unknown biological functions. Integrative analysis that incorporates known functional information into GWASs can help elucidate the underlying biological mechanisms and prioritize important functional variants. Hence, we develop a flexible Bayesian variable selection model with efficient computational techniques for such integrative analysis. Different from previous approaches, our method models the effect-size distribution and probability of causality for variants with different annotations and jointly models genome-wide variants to account for linkage disequilibrium (LD), thus prioritizing associations based on the quantification of the annotations and allowing for multiple associated variants per locus. Our method dramatically improves both computational speed and posterior sampling convergence by taking advantage of the block-wise LD structures in human genomes. In simulations, our method accurately quantifies the functional enrichment and performs more powerfully for prioritizing the true associations than alternative methods, where the power gain is especially apparent when multiple associated variants in LD reside in the same locus. We applied our method to an in-depth GWAS of age-related macular degeneration with 33,976 individuals and 9,857,286 variants. We find the strongest enrichment for causality among non-synonymous variants (54× more likely to be causal, 1.4× larger effect sizes) and variants in transcription, repressed Polycomb, and enhancer regions, as well as identify five additional candidate loci beyond the 32 known AMD risk loci. In conclusion, our method is shown to efficiently integrate functional information in GWASs, helping identify functional associated-variants and underlying biology.
Collapse
Affiliation(s)
- Jingjing Yang
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| | - Lars G Fritsche
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Xiang Zhou
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
| | - Gonçalo Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
| |
Collapse
|
13
|
Jain S, Ribbens A, Sima DM, Cambron M, De Keyser J, Wang C, Barnett MH, Van Huffel S, Maes F, Smeets D. Two Time Point MS Lesion Segmentation in Brain MRI: An Expectation-Maximization Framework. Front Neurosci 2016; 10:576. [PMID: 28066162 PMCID: PMC5165245 DOI: 10.3389/fnins.2016.00576] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open
Abstract
Purpose: Lesion volume is a meaningful measure in multiple sclerosis (MS) prognosis. Manual lesion segmentation for computing volume in a single or multiple time points is time consuming and suffers from intra and inter-observer variability. Methods: In this paper, we present MSmetrix-long: a joint expectation-maximization (EM) framework for two time point white matter (WM) lesion segmentation. MSmetrix-long takes as input a 3D T1-weighted and a 3D FLAIR MR image and segments lesions in three steps: (1) cross-sectional lesion segmentation of the two time points; (2) creation of difference image, which is used to model the lesion evolution; (3) a joint EM lesion segmentation framework that uses output of step (1) and step (2) to provide the final lesion segmentation. The accuracy (Dice score) and reproducibility (absolute lesion volume difference) of MSmetrix-long is evaluated using two datasets. Results: On the first dataset, the median Dice score between MSmetrix-long and expert lesion segmentation was 0.63 and the Pearson correlation coefficient (PCC) was equal to 0.96. On the second dataset, the median absolute volume difference was 0.11 ml. Conclusions: MSmetrix-long is accurate and consistent in segmenting MS lesions. Also, MSmetrix-long compares favorably with the publicly available longitudinal MS lesion segmentation algorithm of Lesion Segmentation Toolbox.
Collapse
Affiliation(s)
| | | | - Diana M Sima
- icometrixLeuven, Belgium; STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU LeuvenLeuven, Belgium
| | - Melissa Cambron
- Department of Neurology, Universitair Ziekenhuis Brussel, Vrije Universiteit Brussel (VUB) Brussel, Belgium
| | - Jacques De Keyser
- Department of Neurology, Universitair Ziekenhuis Brussel, Vrije Universiteit Brussel (VUB)Brussel, Belgium; Department of Neurology, University Medical Center Groningen (UMCG)Groningen, Netherlands
| | - Chenyu Wang
- Sydney Neuroimaging Analysis Centre, Brain and Mind Centre, University of Sydney Sydney, NSW, Australia
| | - Michael H Barnett
- Sydney Neuroimaging Analysis Centre, Brain and Mind Centre, University of Sydney Sydney, NSW, Australia
| | - Sabine Van Huffel
- STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU LeuvenLeuven, Belgium; ImecLeuven, Belgium
| | - Frederik Maes
- Medical Image Computing, Processing Speech and Images (PSI), Department of Electrical Engineering (ESAT), KU Leuven Leuven, Belgium
| | - Dirk Smeets
- icometrixLeuven, Belgium; BioImaging Lab, Universiteit AntwerpenAntwerp, Belgium
| |
Collapse
|
14
|
Barri A, Wang Y, Hansel D, Mongillo G. Quantifying Repetitive Transmission at Chemical Synapses: A Generative-Model Approach. eNeuro 2016; 3:ENEURO. [PMID: 27200414 DOI: 10.1523/ENEURO.0113-15.2016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 03/28/2016] [Accepted: 04/02/2016] [Indexed: 12/13/2022] Open
Abstract
The dependence of the synaptic responses on the history of activation and their large variability are both distinctive features of repetitive transmission at chemical synapses. Quantitative investigations have mostly focused on trial-averaged responses to characterize dynamic aspects of the transmission—thus disregarding variability—or on the fluctuations of the responses in steady conditions to characterize variability—thus disregarding dynamics. We present a statistically principled framework to quantify the dynamics of the probability distribution of synaptic responses under arbitrary patterns of activation. This is achieved by constructing a generative model of repetitive transmission, which includes an explicit description of the sources of stochasticity present in the process. The underlying parameters are then selected via an expectation-maximization algorithm that is exact for a large class of models of synaptic transmission, so as to maximize the likelihood of the observed responses. The method exploits the information contained in the correlation between responses to produce highly accurate estimates of both quantal and dynamic parameters from the same recordings. The method also provides important conceptual and technical advances over existing state-of-the-art techniques. In particular, the repetition of the same stimulation in identical conditions becomes unnecessary. This paves the way to the design of optimal protocols to estimate synaptic parameters, to the quantitative comparison of synaptic models over benchmark datasets, and, most importantly, to the study of repetitive transmission under physiologically relevant patterns of synaptic activation.
Collapse
|
15
|
Abstract
Dynamic Causal Modeling (DCM) can be used to quantify cognitive function in individuals as effective connectivity. However, ambiguity among subjects in the number and location of discernible active regions prevents all candidate models from being compared in all subjects, precluding the use of DCM as an individual cognitive phenotyping tool. This paper proposes a solution to this problem by treating missing regions in the first-level analysis as missing data, and performing estimation of the time course associated with any missing region using one of four candidate methods: zero-filling, average-filling, noise-filling using a fixed stochastic process, or one estimated using expectation-maximization. The effect of this estimation scheme was analyzed by treating it as a preprocessing step to DCM and observing the resulting effects on model evidence. Simulation studies show that estimation using expectation-maximization yields the highest classification accuracy using a simple loss function and highest model evidence, relative to other methods. This result held for various dataset sizes and varying numbers of model choice. In real data, application to Go/No-Go and Simon tasks allowed computation of signals from the missing nodes and the consequent computation of model evidence in all subjects compared to 62 and 48 percent respectively if no preprocessing was performed. These results demonstrate the face validity of the preprocessing scheme and open the possibility of using single-subject DCM as an individual cognitive phenotyping tool.
Collapse
Affiliation(s)
- Shaza B Zaghlool
- Bradley Department of Electrical and Computer Engineering, Virginia Tech Blacksburg, VA, USA
| | - Christopher L Wyatt
- Bradley Department of Electrical and Computer Engineering, Virginia Tech Blacksburg, VA, USA
| |
Collapse
|
16
|
Shi M, Umbach DM, Weinberg CR. Disentangling pooled triad genotypes for association studies. Ann Hum Genet 2014; 78:345-56. [PMID: 24962618 DOI: 10.1111/ahg.12073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 05/05/2014] [Indexed: 11/30/2022]
Abstract
Association studies that genotype affected offspring and their parents (triads) offer robustness to genetic population structure while enabling assessments of maternal effects, parent-of-origin effects, and gene-by-environment interaction. We propose case-parents designs that use pooled DNA specimens to make economical use of limited available specimens. One can markedly reduce the number of genotyping assays required by randomly partitioning the case-parent triads into pooling sets of h triads each and creating three pools from every pooling set, one pool each for mothers, fathers, and offspring. Maximum-likelihood estimation of relative risk parameters proceeds via log-linear modeling using the expectation-maximization algorithm. The approach can assess offspring and maternal genetic effects and accommodate genotyping errors and missing genotypes. We compare the power of our proposed analysis for testing offspring and maternal genetic effects to that based on a difference approach and that of the gold standard based on individual genotypes, under a range of allele frequencies, missing parent proportions, and genotyping error rates. Power calculations show that the pooling strategies cause only modest reductions in power if genotyping errors are low, while reducing genotyping costs and conserving limited specimens.
Collapse
Affiliation(s)
- Min Shi
- Biostatistics Branch, NIEHS, NIH, DHHS, Research Triangle Park, NC, USA
| | | | | |
Collapse
|
17
|
Eberhard HP, Madbouly AS, Gourraud PA, Balère ML, Feldmann U, Gragert L, Torres HM, Pingel J, Schmidt AH, Steiner D, van der Zanden HGM, Oudshoorn M, Marsh SGE, Maiers M, Müller CR. Comparative validation of computer programs for haplotype frequency estimation from donor registry data. ACTA ACUST UNITED AC 2014; 82:93-105. [PMID: 23849067 DOI: 10.1111/tan.12160] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 04/15/2013] [Accepted: 05/31/2013] [Indexed: 11/30/2022]
Abstract
Estimation of human leukocyte antigen (HLA) haplotype frequencies from unrelated stem cell donor registries presents a challenge because of large sample sizes and heterogeneity of HLA typing data. For the 14th International HLA and Immunogenetics Workshop, five bioinformatics groups initiated the 'Registry Diversity Component' aiming to cross-validate and improve current haplotype estimation tools. Five datasets were derived from different donor registries and then used as input for five different computer programs for haplotype frequency estimation. Because of issues related to heterogeneity and complexity of HLA typing data identified in the initial phase, the same five implementations, and two new ones, were used on simulated datasets in a controlled experiment where the correct results were known a priori. These datasets contained various fractions of missing HLA-DR modeled after European haplotype frequencies. We measured the contribution of sampling fluctuation and estimation error to the deviation of the frequencies from their true values, finding equivalent contributions of each for the chosen samples. Because of patient-directed activities, selective prospective typing strategies and the variety and evolution of typing technology, some donors have more complete and better HLA data. In this setting, we show that restricting estimation to fully typed individuals introduces biases that could be overcome by including all donors in frequency estimation. Our study underlines the importance of critical review and validation of tools in registry-related activity and provides a sustainable framework for validating the computational tools used. Accurate frequencies are essential for match prediction to improve registry operations and to help more patients identify suitably matched donors.
Collapse
Affiliation(s)
- H-P Eberhard
- Zentrales Knochenmarkspender-Register Deutschland (ZKRD), Ulm, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Cheng G, Tang CSM, Wong EHM, Cheng WWC, So MT, Miao X, Zhang R, Cui L, Liu X, Ngan ESW, Lui VCH, Chung PHY, Chan IHY, Liu J, Zhong W, Xia H, Yu J, Qiu X, Wu XZ, Wang B, Dong X, Tou J, Huang L, Yi B, Ren H, Chan EKW, Ye K, O'Reilly PF, Wong KKY, Sham PC, Cherny SS, Tam PKH, Garcia-Barceló MM. Common genetic variants regulating ADD3 gene expression alter biliary atresia risk. J Hepatol 2013; 59:1285-91. [PMID: 23872602 DOI: 10.1016/j.jhep.2013.07.021] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 06/17/2013] [Accepted: 07/10/2013] [Indexed: 01/14/2023]
Abstract
BACKGROUND & AIMS Biliary atresia (BA) is a rare and most severe cholestatic disease in neonates, but the pathogenic mechanisms are unknown. Through a previous genome wide association study (GWAS) on Han Chinese, we discovered association of the 10q24.2 region encompassing ADD3 and XPNPEP1 genes, which was replicated in Chinese and Thai populations. This study aims to fully characterize the genetic architecture at 10q24.2 and to reveal the link between the genetic variants and BA. METHODS We genotyped 107 single nucleotide polymorphisms (SNPs) in 10q24.2 in 339 Han Chinese patients and 401 matched controls using Sequenom. Exhaustive follow-up studies of the association signals were performed. RESULTS The combined BA-association p-value of the GWAS SNP (rs17095355) achieved 6.06×10(-10). Further, we revealed the common risk haplotype encompassing 5 tagging-SNPs, capturing the risk-predisposing alleles in 10q24.2 [p=5.32×10(-11); odds ratio, OR: 2.38; confidence interval, CI: (2.14-2.62)]. Through Sanger sequencing, no deleterious rare variants (RVs) residing in the risk haplotype were found, dismissing the theory of "synthetic" association. Moreover, in bioinformatics and in vivo genotype-expression investigations, the BA-associated potentially regulatory SNPs correlated with ADD3 gene expression (n=36; p=0.0030). Remarkably, the risk haplotype frequency coincides with BA incidences in the population, and, positive selection (favoring the derived alleles that arose from mutations) was evident at the ADD3 locus, suggesting a possible role for the BA-associated common variants in shaping the general population diversity. CONCLUSIONS Common genetic variants in 10q24.2 can alter BA risk by regulating ADD3 expression levels in the liver, and may exert an effect on disease epidemiology and on the general population.
Collapse
Affiliation(s)
- Guo Cheng
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
Aims: A methodology for quantitative comparison of histological stains based on their classification and clustering performance, which may facilitate the choice of histological stains for automatic pattern and image analysis. Background: Machine learning and image analysis are becoming increasingly important in pathology applications for automatic analysis of histological tissue samples. Pathologists rely on multiple, contrasting stains to analyze tissue samples, but histological stains are developed for visual analysis and are not always ideal for automatic analysis. Materials and Methods: Thirteen different histological stains were used to stain adjacent prostate tissue sections from radical prostatectomies. We evaluate the stains for both supervised and unsupervised classification of stain/tissue combinations. For supervised classification we measure the error rate of nonlinear support vector machines, and for unsupervised classification we use the Rand index and the F-measure to assess the clustering results of a Gaussian mixture model based on expectation–maximization. Finally, we investigate class separability measures based on scatter criteria. Results: A methodology for quantitative evaluation of histological stains in terms of their classification and clustering efficacy that aims at improving segmentation and color decomposition. We demonstrate that for a specific tissue type, certain stains perform consistently better than others according to objective error criteria. Conclusions: The choice of histological stain for automatic analysis must be based on its classification and clustering performance, which are indicators of the performance of automatic segmentation of tissue into morphological components, which in turn may be the basis for diagnosis.
Collapse
Affiliation(s)
- Jimmy C Azar
- Department of Information Technology, Centre for Image Analysis, Uppsala University, Uppsala, Sweden
| | | | | |
Collapse
|
20
|
Abstract
To make fast and accurate behavioral choices, we need to integrate noisy sensory input, take prior knowledge into account, and adjust our decision criteria. It was shown previously that in two-alternative-forced-choice tasks, optimal decision making can be formalized in the framework of a sequential probability ratio test and is then equivalent to a diffusion model. However, this analogy hides a “chicken and egg” problem: to know how quickly we should integrate the sensory input and set the optimal decision threshold, the reliability of the sensory observations must be known in advance. Most of the time, we cannot know this reliability without first observing the decision outcome. We consider here a Bayesian decision model that simultaneously infers the probability of two different choices and at the same time estimates the reliability of the sensory information on which this choice is based. We show that this can be achieved within a single trial, based on the noisy responses of sensory spiking neurons. The resulting model is a non-linear diffusion to bound where the weight of the sensory inputs and the decision threshold are both dynamically changing over time. In difficult decision trials, early sensory inputs have a stronger impact on the decision, and the threshold collapses such that choices are made faster but with low accuracy. The reverse is true in easy trials: the sensory weight and the threshold increase over time, leading to slower decisions but at much higher accuracy. In contrast to standard diffusion models, adaptive sensory weights construct an accurate representation for the probability of each choice. This information can then be combined appropriately with other unreliable cues, such as priors. We show that this model can account for recent findings in a motion discrimination task, and can be implemented in a neural architecture using fast Hebbian learning.
Collapse
Affiliation(s)
- Sophie Deneve
- Département d'Etudes Cognitives, Group for Neural Theory, Ecole Normale Supérieure Paris, France
| |
Collapse
|
21
|
Yu C, Han Z, Zeng W, Liu S. Morphology cluster and prediction of growth of human brain pyramidal neurons. Neural Regen Res 2012; 7:36-40. [PMID: 25806056 PMCID: PMC4354113 DOI: 10.3969/j.issn.1673-5374.2012.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2011] [Accepted: 11/12/2011] [Indexed: 11/12/2022] Open
Abstract
Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons.
Collapse
Affiliation(s)
- Chao Yu
- School of Computer, China University of Geosciences, Wuhan 430074, Hubei Province, China,
Corresponding author: Chao Yu, School of Computer, China University of Geosciences, Wuhan 430074, Hubei Province, China (N20110831003/H)
| | - Zengxin Han
- School of Computer, China University of Geosciences, Wuhan 430074, Hubei Province, China
| | - Wencong Zeng
- School of Computer, China University of Geosciences, Wuhan 430074, Hubei Province, China
| | - Shenquan Liu
- School of Science, South China University of Technology, Guangzhou 510640, Guangdong Province, China
| |
Collapse
|
22
|
Abstract
High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here, we introduce a probabilistic approach for ChIP-Seq data analysis that utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem.
Collapse
Affiliation(s)
- Daniel Newkirk
- Department of Biological Chemistry, University of California, Irvine, California
- The Institute for Genomics and Bioinformatics, University of California, Irvine, California
| | - Jacob Biesinger
- Department of Computer Science, University of California, Irvine, California
- The Institute for Genomics and Bioinformatics, University of California, Irvine, California
| | - Alvin Chon
- Department of Computer Science, University of California, Irvine, California
- The Institute for Genomics and Bioinformatics, University of California, Irvine, California
| | - Kyoko Yokomori
- Department of Biological Chemistry, University of California, Irvine, California
| | - Xiaohui Xie
- Department of Computer Science, University of California, Irvine, California
- The Institute for Genomics and Bioinformatics, University of California, Irvine, California
| |
Collapse
|
23
|
Befekadu GK, Tadesse MG, Tsai TH, Ressom HW. Probabilistic mixture regression models for alignment of LC-MS data. IEEE/ACM Trans Comput Biol Bioinform 2011; 8:1417-1424. [PMID: 20837998 PMCID: PMC3006656 DOI: 10.1109/tcbb.2010.88] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A novel framework of a probabilistic mixture regression model (PMRM) is presented for alignment of liquid chromatography-mass spectrometry (LC-MS) data with respect to retention time (RT) points. The expectation maximization algorithm is used to estimate the joint parameters of spline-based mixture regression models and prior transformation density models. The latter accounts for the variability in RT points and peak intensities. The applicability of PMRM for alignment of LC-MS data is demonstrated through three data sets. The performance of PMRM is compared with other alignment approaches including dynamic time warping, correlation optimized warping, and continuous profile model in terms of coefficient variation of replicate LC-MS runs and accuracy in detecting differentially abundant peptides/proteins.
Collapse
Affiliation(s)
- Getachew K. Befekadu
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057.
| | - Mahlet G. Tadesse
- Department of Mathematics, Georgetown University, Washington, DC 20057.
| | - Tsung-Heng Tsai
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203.
| | - Habtom W. Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057.
| |
Collapse
|
24
|
Kornak J, Young K, Soher BJ, Maudsley AA. Bayesian k -space-time reconstruction of MR spectroscopic imaging for enhanced resolution. IEEE Trans Med Imaging 2010; 29:1333-50. [PMID: 20304734 PMCID: PMC2911978 DOI: 10.1109/tmi.2009.2037956] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A k-space-time Bayesian statistical reconstruction method (K-Bayes) is proposed for the reconstruction of metabolite images of the brain from proton (1H) magnetic resonance (MR) spectroscopic imaging (MRSI) data. K-Bayes performs full spectral fitting of the data while incorporating structural (anatomical) spatial information through the prior distribution. K-Bayes provides increased spatial resolution over conventional discrete Fourier transform (DFT) based methods by incorporating structural information from higher resolution coregistered and segmented structural MR images. The structural information is incorporated via a Markov random field (MRF) model that allows for differential levels of expected smoothness in metabolite levels within homogeneous tissue regions and across tissue boundaries. By further combining the structural prior model with a k -space-time MRSI signal and noise model (for a specific set of metabolites and based on knowledge from prior spectral simulations of metabolite signals), the impact of artifacts generated by low-resolution sampling is also reduced. The posterior-mode estimates are used to define the metabolite map reconstructions, obtained via a generalized expectation-maximization algorithm. K-Bayes was tested using simulated and real MRSI datasets consisting of sets of k-space-time-series (the recorded free induction decays). The results demonstrated that K-Bayes provided qualitative and quantitative improvement over DFT methods.
Collapse
Affiliation(s)
- John Kornak
- Department of Radiology and Biomedical Imaging and the Department of Epidemiology and Biostatistics, University of California, SanFrancisco, San Francisco, CA 94107, USA.
| | | | | | | |
Collapse
|
25
|
Abstract
The evaluation of the quality of segmentations of an image, and the assessment of intra- and inter-expert variability in segmentation performance, has long been recognized as a difficult task. For a segmentation validation task, it may be effective to compare the results of an automatic segmentation algorithm to multiple expert segmentations. Recently an expectation-maximization (EM) algorithm for simultaneous truth and performance level estimation (STAPLE) was developed to this end to compute both an estimate of the reference standard segmentation and performance parameters from a set of segmentations of an image. The performance is characterized by the rate of detection of each segmentation label by each expert in comparison to the estimated reference standard. This previous work provides estimates of performance parameters,but does not provide any information regarding the uncertainty of the estimated values. An estimate of this inferential uncertainty, if available, would allow the estimation of confidence intervals for the values of the parameters. This would facilitate the interpretation of the performance of segmentation generators and help determine if sufficient data size and number of segmentations have been obtained to precisely characterize the performance parameters. We present a new algorithm to estimate the inferential uncertainty of the performance parameters for binary and multi-category segmentations. It is derived for the special case of the STAPLE algorithm based on established theory for general purpose covariance matrix estimation for EM algorithms. The bounds on the performance parameters are estimated by the computation of the observed information matrix.We use this algorithm to study the bounds on performance parameters estimates from simulated images with specified performance parameters, and from interactive segmentations of neonatal brain MRIs. We demonstrate that confidence intervals for expert segmentation performance parameters can be estimated with our algorithm. We investigate the influence of the number of experts and of the segmented data size on these bounds, showing that it is possible to determine the number of image segmentations and the size of images necessary to achieve a chosen level of accuracy in segmentation performance assessment.
Collapse
Affiliation(s)
- Olivier Commowick
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Boston, MA 02115, USA.
| | | |
Collapse
|
26
|
Abstract
The comparison of images of a patient to a reference standard may enable the identification of structural brain changes. These comparisons may involve the use of vector or tensor images (i.e., 3-D images for which each voxel can be represented as an RN vector) such as diffusion tensor images (DTI) or transformations. The recent introduction of the Log-Euclidean framework for diffeomorphisms and tensors has greatly simplified the use of these images by allowing all the computations to be performed on a vector-space. However, many sources can result in a bias in the images, including disease or imaging artifacts. In order to estimate and compensate for these sources of variability, we developed a new algorithm, called continuous STAPLE, that estimates the reference standard underlying a set of vector images. This method, based on an expectation-maximization method similar in principle to the validation method STAPLE, also estimates for each image a set of parameters characterizing their bias and variance with respect to the reference standard. We demonstrate how to use these parameters for the detection of atypical images or outliers in the population under study. We identified significant differences between the tensors of diffusion images of multiple sclerosis patients and those of control subjects in the vicinity of lesions.
Collapse
Affiliation(s)
- Olivier Commowick
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Boston, MA 02115, USA.
| | | |
Collapse
|
27
|
Abstract
We present a new method for mapping ontology schemas that address similar domains. The problem of ontology matching is crucial since we are witnessing a decentralized development and publication of ontological data. We formulate the problem of inferring a match between two ontologies as a maximum likelihood problem, and solve it using the technique of expectation-maximization (EM). Specifically, we adopt directed graphs as our model for ontology schemas and use a generalized version of EM to arrive at a map between the nodes of the graphs. We exploit the structural, lexical and instance similarity between the graphs, and differ from the previous approaches in the way we utilize them to arrive at, a possibly inexact, match. Inexact matching is the process of finding a best possible match between the two graphs when exact matching is not possible or is computationally difficult. In order to scale the method to large ontologies, we identify the computational bottlenecks and adapt the generalized EM by using a memory bounded partitioning scheme. We provide comparative experimental results in support of our method on two well-known ontology alignment benchmarks and discuss their implications.
Collapse
Affiliation(s)
- Prashant Doshi
- LSDIS Lab, Dept. of Computer Science, University of Georgia, Athens, GA 30602,
| | - Ravikanth Kolli
- LSDIS Lab, Dept. of Computer Science, University of Georgia, Athens, GA 30602,
| | - Christopher Thomas
- Kno.e.sis Center, Dept. of Computer Science and Engineering, Wright State University, Dayton, OH 45435,
| |
Collapse
|
28
|
Zhou J, Senhadji L, Coatrieux JL, Luo L. Iterative PET Image Reconstruction Using Translation Invariant Wavelet Transform. IEEE Trans Nucl Sci 2009; 56:116-128. [PMID: 21869846 PMCID: PMC3156812 DOI: 10.1109/tns.2008.2009445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The present work describes a Bayesian maximum a posteriori (MAP) method using a statistical multiscale wavelet prior model. Rather than using the orthogonal discrete wavelet transform (DWT), this prior is built on the translation invariant wavelet transform (TIWT). The statistical modeling of wavelet coefficients relies on the generalized Gaussian distribution. Image reconstruction is performed in spatial domain with a fast block sequential iteration algorithm. We study theoretically the TIWT MAP method by analyzing the Hessian of the prior function to provide some insights on noise and resolution properties of image reconstruction. We adapt the key concept of local shift invariance and explore how the TIWT MAP algorithm behaves with different scales. It is also shown that larger support wavelet filters do not offer better performance in contrast recovery studies. These theoretical developments are confirmed through simulation studies. The results show that the proposed method is more attractive than other MAP methods using either the conventional Gibbs prior or the DWT-based wavelet prior.
Collapse
Affiliation(s)
- Jian Zhou
- LTSI, Laboratoire Traitement du Signal et de l'Image
INSERM : U642Université de Rennes ICampus de Beaulieu, 263 Avenue du Général Leclerc - CS 74205 - 35042 Rennes Cedex,FR
- CRIBS, Centre de Recherche en Information Biomédicale sino-français
INSERM : LABORATOIRE INTERNATIONAL ASSOCIÉUniversité de Rennes ISouthEast UniversityRennes,FR
| | - Lotfi Senhadji
- LTSI, Laboratoire Traitement du Signal et de l'Image
INSERM : U642Université de Rennes ICampus de Beaulieu, 263 Avenue du Général Leclerc - CS 74205 - 35042 Rennes Cedex,FR
- CRIBS, Centre de Recherche en Information Biomédicale sino-français
INSERM : LABORATOIRE INTERNATIONAL ASSOCIÉUniversité de Rennes ISouthEast UniversityRennes,FR
| | - Jean-Louis Coatrieux
- LTSI, Laboratoire Traitement du Signal et de l'Image
INSERM : U642Université de Rennes ICampus de Beaulieu, 263 Avenue du Général Leclerc - CS 74205 - 35042 Rennes Cedex,FR
- CRIBS, Centre de Recherche en Information Biomédicale sino-français
INSERM : LABORATOIRE INTERNATIONAL ASSOCIÉUniversité de Rennes ISouthEast UniversityRennes,FR
| | - Limin Luo
- CRIBS, Centre de Recherche en Information Biomédicale sino-français
INSERM : LABORATOIRE INTERNATIONAL ASSOCIÉUniversité de Rennes ISouthEast UniversityRennes,FR
- LIST, Laboratory of Image Science and Technology
SouthEast UniversitySi Pai Lou 2, Nanjing, 210096,CN
| |
Collapse
|
29
|
Oun W, Numënmaa A, Hämäläinen M, Golland P. Multimodal functional imaging using fMRI-informed regional EEG/MEG source estimation. Inf Process Med Imaging 2009; 21:88-100. [PMID: 19694255 PMCID: PMC4031612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We propose a novel method, fMRI-Informed Regional Estimation (FIRE), which utilizes information from fMRI in E/MEG source reconstruction. FIRE takes advantage of the spatial alignment between the neural and the vascular activities, while allowing for substantial differences in their dynamics. Furthermore, with the regional approach, FIRE can be efficiently applied to a dense grid of sources. Inspection of our optimization procedure reveals that FIRE is related to the re-weighted minimum-norm algorithms, the difference being that the weights in the proposed approach are computed from both the current estimates and fMRI data. Analysis of both simulated and human fMRI-MEG data shows that FIRE reduces the ambiguities in source localization present in the minimum-norm estimates. Comparisons with several joint fMRI-E/MEG algorithms demonstrate robustness of FIRE in the presence of sources silent to either fMRI or E/MEG measurements.
Collapse
Affiliation(s)
- Wanmei Oun
- Computer Science and Artificial Intelligence Laboratory, MIT, USA.
| | | | | | | |
Collapse
|
30
|
Abstract
The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a 'ground truth' or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data. An alternative assessment approach is to compare with segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically, these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear. We present here a new algorithm to enable the estimation of performance characteristics, and a true labelling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, among others, surface, distance transform or level-set representations of segmentations, and can be used to assess whether or not a rater consistently overestimates or underestimates the position of a boundary.
Collapse
Affiliation(s)
- Simon K. Warfield
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Harvard Medical School300 Longwood Avenue, Boston, MA 02115, USA
| | - Kelly H. Zou
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Harvard Medical School300 Longwood Avenue, Boston, MA 02115, USA
| | - William M. Wells
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School221 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
31
|
Pohl KM, Bouix S, Nakamura M, Rohlfing T, McCarley RW, Kikinis R, Grimson WEL, Shenton ME, Wells WM. A hierarchical algorithm for MR brain image parcellation. IEEE Trans Med Imaging 2007; 26:1201-12. [PMID: 17896593 PMCID: PMC2768067 DOI: 10.1109/tmi.2007.901433] [Citation(s) in RCA: 115] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
We introduce an algorithm for segmenting brain magnetic resonance (MR) images into anatomical compartments such as the major tissue classes and neuro-anatomical structures of the gray matter. The algorithm is guided by prior information represented within a tree structure. The tree mirrors the hierarchy of anatomical structures and the subtrees correspond to limited segmentation problems. The solution to each problem is estimated via a conventional classifier. Our algorithm can be adapted to a wide range of segmentation problems by modifying the tree structure or replacing the classifier. We evaluate the performance of our new segmentation approach by revisiting a previously published statistical group comparison between first-episode schizophrenia patients, first-episode affective psychosis patients, and comparison subjects. The original study is based on 50 MR volumes in which an expert identified the brain tissue classes as well as the superior temporal gyrus, amygdala, and hippocampus. We generate analogous segmentations using our new method and repeat the statistical group comparison. The results of our analysis are similar to the original findings, except for one structure (the left superior temporal gyrus) in which a trend-level statistical significance (p = 0.07) was observed instead of statistical significance.
Collapse
Affiliation(s)
- Kilian M Pohl
- Surgical Planning Laboratory, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Linguraru MG, Vasilyev NV, Del Nido PJ, Howe RD. Statistical segmentation of surgical instruments in 3-D ultrasound images. Ultrasound Med Biol 2007; 33:1428-37. [PMID: 17521802 PMCID: PMC2597268 DOI: 10.1016/j.ultrasmedbio.2007.03.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2006] [Revised: 01/31/2007] [Accepted: 03/04/2007] [Indexed: 05/15/2023]
Abstract
The recent development of real-time 3-D ultrasound (US) enables intracardiac beating-heart procedures, but the distorted appearance of surgical instruments is a major challenge to surgeons. In addition, tissue and instruments have similar gray levels in US images and the interface between instruments and tissue is poorly defined. We present an algorithm that automatically estimates instrument location in intracardiac procedures. Expert-segmented images are used to initialize the statistical distributions of blood, tissue and instruments. Voxels are labeled through an iterative expectation-maximization algorithm using information from the neighboring voxels through a smoothing kernel. Once the three classes of voxels are separated, additional neighboring information is combined with the known shape characteristics of instruments to correct for misclassifications. We analyze the major axis of segmented data through their principal components and refine the results by a watershed transform, which corrects the results at the contact between instrument and tissue. We present results on 3-D in-vitro data from a tank trial and 3-D in-vivo data from cardiac interventions on porcine beating hearts, using instruments of four types of materials. The comparison of algorithm results to expert-annotated images shows the correct segmentation and position of the instrument shaft.
Collapse
Affiliation(s)
- Marius George Linguraru
- Division of Engineering and Applied Sciences, Harvard Medical School, Harvard University, Cambridge, and Department of Cardiac Surgery, Children's Hospital, Boston, MA, USA.
| | | | | | | |
Collapse
|
33
|
Heard NA, Holmes CC, Stephens DA, Hand DJ, Dimopoulos G. Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. Proc Natl Acad Sci U S A 2005; 102:16939-44. [PMID: 16287981 PMCID: PMC1287961 DOI: 10.1073/pnas.0408393102] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2004] [Accepted: 08/30/2005] [Indexed: 11/18/2022] Open
Abstract
We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.
Collapse
Affiliation(s)
- Nicholas A Heard
- Department of Mathematics, Imperial College London, Huxley Building, 180 Queens Gate, London SW7 2AZ, United Kingdom.
| | | | | | | | | |
Collapse
|
34
|
Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 2004; 23:903-21. [PMID: 15250643 PMCID: PMC1283110 DOI: 10.1109/tmi.2004.828354] [Citation(s) in RCA: 1118] [Impact Index Per Article: 55.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptable approach, and yet suffers from intra-rater and inter-rater variability. Automated algorithms have been sought in order to remove the variability introduced by raters, but such algorithms must be assessed to ensure they are suitable for the task. The performance of raters (human or algorithmic) generating segmentations of medical images has been difficult to quantify because of the difficulty of obtaining or estimating a known true segmentation for clinical data. Although physical and digital phantoms can be constructed for which ground truth is known or readily estimated, such phantoms do not fully reflect clinical images due to the difficulty of constructing phantoms which reproduce the full range of imaging characteristics and normal and pathological anatomical variability observed in clinical data. Comparison to a collection of segmentations by raters is an attractive alternative since it can be carried out directly on the relevant clinical imaging data. However, the most appropriate measure or set of measures with which to compare such segmentations has not been clarified and several measures are used in practice. We present here an expectation-maximization algorithm for simultaneous truth and performance level estimation (STAPLE). The algorithm considers a collection of segmentations and computes a probabilistic estimate of the true segmentation and a measure of the performance level represented by each segmentation. The source of each segmentation in the collection may be an appropriately trained human rater or raters, or may be an automated segmentation algorithm. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation depending upon the estimated performance level, and incorporating a prior model for the spatial distribution of structures being segmented as well as spatial homogeneity constraints. STAPLE is straightforward to apply to clinical imaging data, it readily enables assessment of the performance of an automated image segmentation algorithm, and enables direct comparison of human rater and algorithm performance.
Collapse
Affiliation(s)
- Simon K Warfield
- Harvard Medical School and the Department of Radiology of Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115, USA.
| | | | | |
Collapse
|