1
|
Song M, Kim M, Kang K, Kim YH, Jeon S. Application of Public Knowledge Discovery Tool (PKDE4J) to Represent Biomedical Scientific Knowledge. Front Res Metr Anal 2018. [DOI: 10.3389/frma.2018.00007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
2
|
Gündel M, Younesi E, Malhotra A, Wang J, Li H, Zhang B, de Bono B, Mevissen HT, Hofmann-Apitius M. HuPSON: the human physiology simulation ontology. J Biomed Semantics 2013; 4:35. [PMID: 24267822 PMCID: PMC4177144 DOI: 10.1186/2041-1480-4-35] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 10/07/2013] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Large biomedical simulation initiatives, such as the Virtual Physiological Human (VPH), are substantially dependent on controlled vocabularies to facilitate the exchange of information, of data and of models. Hindering these initiatives is a lack of a comprehensive ontology that covers the essential concepts of the simulation domain. RESULTS We propose a first version of a newly constructed ontology, HuPSON, as a basis for shared semantics and interoperability of simulations, of models, of algorithms and of other resources in this domain. The ontology is based on the Basic Formal Ontology, and adheres to the MIREOT principles; the constructed ontology has been evaluated via structural features, competency questions and use case scenarios.The ontology is freely available at: http://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads.html (owl files) and http://bishop.scai.fraunhofer.de/scaiview/ (browser). CONCLUSIONS HuPSON provides a framework for a) annotating simulation experiments, b) retrieving relevant information that are required for modelling, c) enabling interoperability of algorithmic approaches used in biomedical simulation, d) comparing simulation results and e) linking knowledge-based approaches to simulation-based approaches. It is meant to foster a more rapid uptake of semantic technologies in the modelling and simulation domain, with particular focus on the VPH domain.
Collapse
Affiliation(s)
- Michaela Gündel
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| | - Erfan Younesi
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| | - Ashutosh Malhotra
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| | - Jiali Wang
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Hui Li
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Bijun Zhang
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Bernard de Bono
- University College London (UCI), Gower Street, WC1E 6BT, London, UK
| | - Heinz-Theodor Mevissen
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| |
Collapse
|
3
|
Lee KH, Lee SH. Predicting Survival of DLBCL Patients in Pathway-Based Microarray Analysis. KOREAN JOURNAL OF APPLIED STATISTICS 2010. [DOI: 10.5351/kjas.2010.23.4.705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
4
|
Ni TT, Lemon WJ, Shyr Y, Zhong TP. Use of normalization methods for analysis of microarrays containing a high degree of gene effects. BMC Bioinformatics 2008; 9:505. [PMID: 19040742 PMCID: PMC2612699 DOI: 10.1186/1471-2105-9-505] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2008] [Accepted: 11/28/2008] [Indexed: 11/15/2022] Open
Abstract
Background High-throughput microarrays are widely used to study gene expression across tissues and developmental stages. Analysis of gene expression data is challenging in these experiments due to the presence of significant percentages of differentially expressed genes (DEG) observed between tissues and developmental stages. Data normalization methods that are widely used today are not designed for data with a large proportion of tissue or gene effects. Results In our current study, we describe a novel two-dimensional nonparametric normalization method for analyzing microarray data which functions well in the absence or presence of large numbers of gene effects. Rather than relying on an assumption of low variability among most genes, the method implements a unique peak selection strategy to distinguish DEG from genes that are invariant in expression, prior to nonlinear curve fitting. We compared the method under simulated and experimental conditions with five alternative nonlinear normalization approaches: quantile, lowess, robust lowess, invariant set, and cross-correlation (Xcorr). Simulations included various percentages of simulated DEG and the experimental data used is from publicly available datasets known to be difficult to analyze due to the presence of approximately 34% DEG. Conclusion We have demonstrated that the new method provides considerable improvement in the accuracy of data normalization when large proportions of gene effects are present. The performance improvement is mostly attributed to its variable selection component, which is designed to separate expression invariant genes from DEG. Adding this key component of the new method to alternative normalization approaches rescues the most of the sensitivity of these methods to gene effects. The results indicate that our method may be used without prior knowledge of or assumptions about housekeeping genes to normalize microarrays that are quite different.
Collapse
Affiliation(s)
- Terri T Ni
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
| | | | | | | |
Collapse
|
5
|
Xiong H, Zhang D, Martyniuk CJ, Trudeau VL, Xia X. Using generalized procrustes analysis (GPA) for normalization of cDNA microarray data. BMC Bioinformatics 2008; 9:25. [PMID: 18199333 PMCID: PMC2275243 DOI: 10.1186/1471-2105-9-25] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 01/16/2008] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice. RESULTS In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias. CONCLUSION The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.
Collapse
Affiliation(s)
- Huiling Xiong
- Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada.
| | | | | | | | | |
Collapse
|
6
|
Fan J, Niu Y. Selection and validation of normalization methods for c-DNA microarrays using within-array replications. Bioinformatics 2007; 23:2391-8. [PMID: 17660210 DOI: 10.1093/bioinformatics/btm361] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Normalization of microarray data is essential for multiple-array analyses. Several normalization protocols have been proposed based on different biological or statistical assumptions. A fundamental problem arises whether they have effectively normalized arrays. In addition, for a given array, the question arises how to choose a method to most effectively normalize the microarray data. RESULTS We propose several techniques to compare the effectiveness of different normalization methods. We approach the problem by constructing statistics to test whether there are any systematic biases in the expression profiles among duplicated spots within an array. The test statistics involve estimating the genewise variances. This is accomplished by using several novel methods, including empirical Bayes methods for moderating the genewise variances and the smoothing methods for aggregating variance information. P-values are estimated based on a normal or chi approximation. With estimated P-values, we can choose a most appropriate method to normalize a specific array and assess the extent to which the systematic biases due to the variations of experimental conditions have been removed. The effectiveness and validity of the proposed methods are convincingly illustrated by a carefully designed simulation study. The method is further illustrated by an application to human placenta cDNAs comprising a large number of clones with replications, a customized microarray experiment carrying just a few hundred genes on the study of the molecular roles of Interferons on tumor, and the Agilent microarrays carrying tens of thousands of total RNA samples in the MAQC project on the study of reproducibility, sensitivity and specificity of the data. AVAILABILITY Code to implement the method in the statistical package R is available from the authors.
Collapse
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
7
|
Fujita A, Sato JR, Rodrigues LDO, Ferreira CE, Sogayar MC. Evaluating different methods of microarray data normalization. BMC Bioinformatics 2006; 7:469. [PMID: 17059609 PMCID: PMC1636075 DOI: 10.1186/1471-2105-7-469] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 10/23/2006] [Indexed: 11/10/2022] Open
Abstract
Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.
Collapse
Affiliation(s)
- André Fujita
- Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010 – São Paulo, 05508-090 SP, Brazil
- Chemistry Institute, University of São Paulo, Av. Lineu Prestes, 748 – São Paulo, 05513-970 SP, Brazil
| | - João Ricardo Sato
- Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010 – São Paulo, 05508-090 SP, Brazil
| | | | - Carlos Eduardo Ferreira
- Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010 – São Paulo, 05508-090 SP, Brazil
| | - Mari Cleide Sogayar
- Chemistry Institute, University of São Paulo, Av. Lineu Prestes, 748 – São Paulo, 05513-970 SP, Brazil
| |
Collapse
|
8
|
Ma S, Kosorok MR, Huang J, Xie H, Manzella L, Soares MB. Robust semiparametric microarray normalization and significance analysis. Biometrics 2006; 62:555-61. [PMID: 16918920 DOI: 10.1111/j.1541-0420.2005.00452.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Microarray technology allows the monitoring of expression levels of thousands of genes simultaneously. A semiparametric location and scale model is proposed to model gene expression levels for normalization and significance analysis purposes. Robust estimation based on weighted least absolute deviation regression and significance analysis based on the weighted bootstrap are investigated. The proposed approach naturally combines normalization and significance analysis, and incorporates the variations due to normalization into the significance analysis properly. A small simulation study is used to compare finite sample performance of the proposed approach with alternatives. We also demonstrate the proposed method with a real dataset.
Collapse
Affiliation(s)
- Shuangge Ma
- Department of Biostatistics, University of Washington, Seattle, Washington 98115, USA.
| | | | | | | | | | | |
Collapse
|
9
|
Engelen K, Naudts B, De Moor B, Marchal K. A calibration method for estimating absolute expression levels from microarray data. Bioinformatics 2006; 22:1251-8. [PMID: 16522672 DOI: 10.1093/bioinformatics/btl068] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We describe an approach to normalize spotted microarray data, based on a physically motivated calibration model. This model consists of two major components, describing the hybridization of target transcripts to their corresponding probes on the one hand, and the measurement of fluorescence from the hybridized, labeled target on the other hand. The model parameters and error distributions are estimated from external control spikes. RESULTS Using a publicly available dataset, we show that our procedure is capable of adequately removing the typical non-linearities of the data, without making any assumptions on the distribution of differences in gene expression from one biological sample to the next. Since our model links target concentration to measured intensity, we show how absolute expression values of target transcripts in the hybridization solution can be estimated up to a certain degree.
Collapse
Affiliation(s)
- Kristof Engelen
- BIOI@SCD, Department of Electrical Engineering K.U.Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
| | | | | | | |
Collapse
|
10
|
Wang D, Huang J, Xie H, Manzella L, Soares MB. A robust two-way semi-linear model for normalization of cDNA microarray data. BMC Bioinformatics 2005; 6:14. [PMID: 15663789 PMCID: PMC549200 DOI: 10.1186/1471-2105-6-14] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2004] [Accepted: 01/21/2005] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values. METHODS We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach. RESULTS The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method. CONCLUSIONS Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods.
Collapse
Affiliation(s)
- Deli Wang
- Biostatistics and Bioinformatics Unit, Comprehensive Cancer Center, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jian Huang
- Department of Statistics and Actuarial Science, and Program in Public Health Genetics, the University of Iowa, Iowa City, IA 52242, USA
| | - Hehuang Xie
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
| | - Liliana Manzella
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
| | - Marcelo Bento Soares
- Department of Pediatrics, the University of Iowa, Iowa City, IA 52242, USA
- Departments of Biochemistry, Orthopaedics, Physiology and Biophysics, the University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|