1
|
Quantile Regression Approach for Analyzing Similarity of Gene Expressions under Multiple Biological Conditions. STATS 2022. [DOI: 10.3390/stats5030036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Temporal gene expression data contain ample information to characterize gene function and are now widely used in bio-medical research. A dense temporal gene expression usually shows various patterns in expression levels under different biological conditions. The existing literature investigates the gene trajectory using the mean function. However, temporal gene expression curves usually show a strong degree of heterogeneity under multiple conditions. As a result, rates of change for gene expressions may be different in non-central locations and a mean function model may not capture the non-central location of the gene expression distribution. Further, the mean regression model depends on the normality assumptions of the error terms of the model, which may be impractical when analyzing gene expression data. In this research, a linear quantile mixed model is used to find the trajectory of gene expression data. This method enables the changes in gene expression over time to be studied by estimating a family of quantile functions. A statistical test is proposed to test the similarity between two different gene expressions based on estimated parameters using a quantile model. Then, the performance of the proposed test statistic is examined using extensive simulation studies. Simulation studies demonstrate the good statistical performance of this proposed test statistic and show that this method is robust against normal error assumptions. As an illustration, the proposed method is applied to analyze a dataset of 18 genes in P. aeruginosa, expressed in 24 biological conditions. Furthermore, a minimum Mahalanobis distance is used to find the clustering tree for gene expressions.
Collapse
|
2
|
Arenas AF, Salcedo GE, Gomez-Marin JE. R Script Approach to Infer Toxoplasma Infection Mechanisms From Microarrays and Domain-Domain Protein Interactions. Bioinform Biol Insights 2017; 11:1177932217747256. [PMID: 29317802 PMCID: PMC5753922 DOI: 10.1177/1177932217747256] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 11/18/2017] [Indexed: 01/25/2023] Open
Abstract
Pathogen-host protein-protein interaction systems examine the interactions between the protein repertoires of 2 distinct organisms. Some of these pathogen proteins interact with the host protein system and may manipulate it for their own advantages. In this work, we designed an R script by concatenating 2 functions called rowDM and rowCVmed to infer pathogen-host interaction using previously reported microarray data, including host gene enrichment analysis and the crossing of interspecific domain-domain interactions. We applied this script to the Toxoplasma-host system to describe pathogen survival mechanisms from human, mouse, and Toxoplasma Gene Expression Omnibus series. Our outcomes exhibited similar results with previously reported microarray analyses, but we found other important proteins that could contribute to toxoplasma pathogenesis. We observed that Toxoplasma ROP38 is the most differentially expressed protein among toxoplasma strains. Enrichment analysis and KEGG mapping indicated that the human retinal genes most affected by Toxoplasma infections are those related to antiapoptotic mechanisms. We suggest that proteins PIK3R1, PRKCA, PRKCG, PRKCB, HRAS, and c-JUN could be the possible substrates for differentially expressed Toxoplasma kinase ROP38. Likewise, we propose that Toxoplasma causes overexpression of apoptotic suppression human genes.
Collapse
Affiliation(s)
- Ailan F Arenas
- Grupo de Estudio en Parasitología Molecular (GEPAMOL), Universidad del Quindío, Armenia, Colombia
- Ailan F Arenas, Grupo de Estudio en Parasitología Molecular (GEPAMOL), Universidad del Quindío, Carrera 15 Calle 12N, Armenia, 630001 Quindío, Colombia.
| | - Gladys E Salcedo
- Grupo de Investigación y Asesoría en Estadística, Universidad del Quindío, Armenia, Colombia
| | - Jorge E Gomez-Marin
- Grupo de Estudio en Parasitología Molecular (GEPAMOL), Universidad del Quindío, Armenia, Colombia
| |
Collapse
|
3
|
Zhou Y, Zhang B, Li G, Tong T, Wan X. GD-RDA: A New Regularized Discriminant Analysis for High-Dimensional Data. J Comput Biol 2017; 24:1099-1111. [DOI: 10.1089/cmb.2017.0029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Affiliation(s)
- Yan Zhou
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, ShenZhen, China
| | - Baoxue Zhang
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Gaorong Li
- Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
| | - Xiang Wan
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| |
Collapse
|
4
|
|
5
|
Park HJ, Jung WY, Lee SS, Song JH, Kwon SY, Kim H, Kim C, Ahn JC, Cho HS. Use of heat stress responsive gene expression levels for early selection of heat tolerant cabbage (Brassica oleracea L.). Int J Mol Sci 2013; 14:11871-94. [PMID: 23736694 PMCID: PMC3709761 DOI: 10.3390/ijms140611871] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 05/21/2013] [Accepted: 05/21/2013] [Indexed: 01/11/2023] Open
Abstract
Cabbage is a relatively robust vegetable at low temperatures. However, at high temperatures, cabbage has disadvantages, such as reduced disease tolerance and lower yields. Thus, selection of heat-tolerant cabbage is an important goal in cabbage breeding. Easier or faster selection of superior varieties of cabbage, which are tolerant to heat and disease and have improved taste and quality, can be achieved with molecular and biological methods. We compared heat-responsive gene expression between a heat-tolerant cabbage line (HTCL), "HO", and a heat-sensitive cabbage line (HSCL), "JK", by Genechip assay. Expression levels of specific heat stress-related genes were increased in response to high-temperature stress, according to Genechip assays. We performed quantitative RT-PCR (qRT-PCR) to compare expression levels of these heat stress-related genes in four HTCLs and four HSCLs. Transcript levels for heat shock protein BoHsp70 and transcription factor BoGRAS (SCL13) were more strongly expressed only in all HTCLs compared to all HSCLs, showing much lower level expressions at the young plant stage under heat stress (HS). Thus, we suggest that expression levels of these genes may be early selection markers for HTCLs in cabbage breeding. In addition, several genes that are involved in the secondary metabolite pathway were differentially regulated in HTCL and HSCL exposed to heat stress.
Collapse
Affiliation(s)
- Hyun Ji Park
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
| | - Won Yong Jung
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
- Department of Animal Resources Technology, Gyeongnam National University of Science and Technology, Jinju 660-758, Korea; E-Mail:
| | - Sang Sook Lee
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
| | - Jun Ho Song
- Asia Seed Company, 447-2, Inhwang-Ri, Janghowon-Eup, Ichen 467-906, Korea; E-Mail:
| | - Suk-Yoon Kwon
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
| | - HyeRan Kim
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
| | - ChulWook Kim
- Department of Animal Resources Technology, Gyeongnam National University of Science and Technology, Jinju 660-758, Korea; E-Mail:
| | - Jun Cheul Ahn
- Department of Pharmacology, Medical Sciences, Seonam University, Kwangchi-dong, Namwon 590-711, Korea
- Authors to whom correspondence should be addressed; E-Mails: (J.C.A.); (H.S.C.); Tel.: +82-63-620-0256 (J.C.A.); +82-42-860-4469 (H.S.C.); Fax: +82-63-620-0031 (J.C.A.); +82-42-860-4608 (H.S.C.)
| | - Hye Sun Cho
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea; E-Mails: (H.J.P.); (W.Y.J.); (S.S.L.); (S.-Y.K.); (H.K.)
- Authors to whom correspondence should be addressed; E-Mails: (J.C.A.); (H.S.C.); Tel.: +82-63-620-0256 (J.C.A.); +82-42-860-4469 (H.S.C.); Fax: +82-63-620-0031 (J.C.A.); +82-42-860-4608 (H.S.C.)
| |
Collapse
|
6
|
Analysis for temporal gene expressions under multiple biological conditions. STATISTICS IN BIOSCIENCES 2012. [DOI: 10.1007/s12561-012-9063-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
7
|
Dutta B, Wallqvist A, Reifman J. PathNet: a tool for pathway analysis using topological information. SOURCE CODE FOR BIOLOGY AND MEDICINE 2012; 7:10. [PMID: 23006764 PMCID: PMC3563509 DOI: 10.1186/1751-0473-7-10] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 08/03/2012] [Indexed: 01/01/2023]
Abstract
Background Identification of canonical pathways through enrichment of differentially expressed genes in a given pathway is a widely used method for interpreting gene lists generated from high-throughput experimental studies. However, most algorithms treat pathways as sets of genes, disregarding any inter- and intra-pathway connectivity information, and do not provide insights beyond identifying lists of pathways. Results We developed an algorithm (PathNet) that utilizes the connectivity information in canonical pathway descriptions to help identify study-relevant pathways and characterize non-obvious dependencies and connections among pathways using gene expression data. PathNet considers both the differential expression of genes and their pathway neighbors to strengthen the evidence that a pathway is implicated in the biological conditions characterizing the experiment. As an adjunct to this analysis, PathNet uses the connectivity of the differentially expressed genes among all pathways to score pathway contextual associations and statistically identify biological relations among pathways. In this study, we used PathNet to identify biologically relevant results in two Alzheimer’s disease microarray datasets, and compared its performance with existing methods. Importantly, PathNet identified de-regulation of the ubiquitin-mediated proteolysis pathway as an important component in Alzheimer’s disease progression, despite the absence of this pathway in the standard enrichment analyses. Conclusions PathNet is a novel method for identifying enrichment and association between canonical pathways in the context of gene expression data. It takes into account topological information present in pathways to reveal biological information. PathNet is available as an R workspace image from
http://www.bhsai.org/downloads/pathnet/.
Collapse
Affiliation(s)
- Bhaskar Dutta
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U,S, Army Medical Research and Materiel Command, Ft, Detrick, MD, 21702, USA.
| | | | | |
Collapse
|
8
|
Optimal gene subset selection using the modified SFFS algorithm for tumor classification. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-1148-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
9
|
Ronges D, Walsh JP, Sinclair BJ, Stillman JH. Changes in extreme cold tolerance, membrane composition and cardiac transcriptome during the first day of thermal acclimation in the porcelain crab Petrolisthes cinctipes. ACTA ACUST UNITED AC 2012; 215:1824-36. [PMID: 22573761 DOI: 10.1242/jeb.069658] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Intertidal zone organisms can experience transient freezing temperatures during winter low tides, but their extreme cold tolerance mechanisms are not known. Petrolisthes cinctipes is a temperate mid-high intertidal zone crab species that can experience wintertime habitat temperatures below the freezing point of seawater. We examined how cold tolerance changed during the initial phase of thermal acclimation to cold and warm temperatures, as well as the persistence of cold tolerance during long-term thermal acclimation. Thermal acclimation for as little as 6 h at 8°C enhanced cold tolerance during a 1 h exposure to -2°C relative to crabs acclimated to 18°C. Potential mechanisms for this enhanced tolerance were elucidated using cDNA microarrays to probe for differences in gene expression in cardiac tissue of warm- and cold-acclimated crabs during the first day of thermal acclimation. No changes in gene expression were detected until 12 h of thermal acclimation. Genes strongly upregulated in warm-acclimated crabs represented immune response and extracellular/intercellular processes, suggesting that warm-acclimated crabs had a generalized stress response and may have been remodelling tissues or altering intercellular processes. Genes strongly upregulated in cold-acclimated crabs included many that are involved in glucose production, suggesting that cold acclimation involves increasing intracellular glucose as a cryoprotectant. Structural cytoskeletal proteins were also strongly represented among the genes upregulated in only cold-acclimated crabs. There were no consistent changes in composition or the level of unsaturation of membrane phospholipid fatty acids with cold acclimation, which suggests that neither short- nor long-term changes in cold tolerance are mediated by changes in membrane fatty acid composition. Overall, our study demonstrates that initial changes in cold tolerance are likely not regulated by transcriptomic responses, but that gene-expression-related changes in homeostasis begin within 12 h, the length of a tidal cycle.
Collapse
Affiliation(s)
- Daria Ronges
- Romberg Tiburon Center and Department of Biology, San Francisco State University, Tiburon, CA 94920, USA
| | | | | | | |
Collapse
|
10
|
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowé A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1106-19. [PMID: 22350210 DOI: 10.1109/tcbb.2012.33] [Citation(s) in RCA: 214] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.
Collapse
Affiliation(s)
- Cosmin Lazar
- Computational Modeling Group, Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Tong T, Chen L, Zhao H. Improved mean estimation and its application to diagonal discriminant analysis. Bioinformatics 2012; 28:531-7. [PMID: 22171335 DOI: 10.1093/bioinformatics/btr690] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired. RESULTS In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings.
Collapse
Affiliation(s)
- Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| | | | | |
Collapse
|
12
|
Maudsley S, Chadwick W, Wang L, Zhou Y, Martin B, Park SS. Bioinformatic approaches to metabolic pathways analysis. Methods Mol Biol 2011; 756:99-130. [PMID: 21870222 DOI: 10.1007/978-1-61779-160-4_5] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The growth and development in the last decade of accurate and reliable mass data collection techniques has greatly enhanced our comprehension of cell signaling networks and pathways. At the same time however, these technological advances have also increased the difficulty of satisfactorily analyzing and interpreting these ever-expanding datasets. At the present time, multiple diverse scientific communities including molecular biological, genetic, proteomic, bioinformatic, and cell biological, are converging upon a common endpoint, that is, the measurement, interpretation, and potential prediction of signal transduction cascade activity from mass datasets. Our ever increasing appreciation of the complexity of cellular or receptor signaling output and the structural coordination of intracellular signaling cascades has to some extent necessitated the generation of a new branch of informatics that more closely associates functional signaling effects to biological actions and even whole-animal phenotypes. The ability to untangle and hopefully generate theoretical models of signal transduction information flow from transmembrane receptor systems to physiological and pharmacological actions may be one of the greatest advances in cell signaling science. In this overview, we shall attempt to assist the navigation into this new field of cell signaling and highlight several methodologies and technologies to appreciate this exciting new age of signal transduction.
Collapse
Affiliation(s)
- Stuart Maudsley
- Receptor Pharmacology Unit, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| | | | | | | | | | | |
Collapse
|
13
|
Liu JZ, Horstman HD, Braun E, Graham MA, Zhang C, Navarre D, Qiu WL, Lee Y, Nettleton D, Hill JH, Whitham SA. Soybean homologs of MPK4 negatively regulate defense responses and positively regulate growth and development. PLANT PHYSIOLOGY 2011; 157:1363-78. [PMID: 21878550 PMCID: PMC3252160 DOI: 10.1104/pp.111.185686] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 08/25/2011] [Indexed: 05/18/2023]
Abstract
Mitogen-activated protein kinase (MAPK) cascades play important roles in disease resistance in model plant species such as Arabidopsis (Arabidopsis thaliana) and tobacco (Nicotiana tabacum). However, the importance of MAPK signaling pathways in the disease resistance of crops is still largely uninvestigated. To better understand the role of MAPK signaling pathways in disease resistance in soybean (Glycine max), 13, nine, and 10 genes encoding distinct MAPKs, MAPKKs, and MAPKKKs, respectively, were silenced using virus-induced gene silencing mediated by Bean pod mottle virus. Among the plants silenced for various MAPKs, MAPKKs, and MAPKKKs, those in which GmMAPK4 homologs (GmMPK4s) were silenced displayed strong phenotypes including stunted stature and spontaneous cell death on the leaves and stems, the characteristic hallmarks of activated defense responses. Microarray analysis showed that genes involved in defense responses, such as those in salicylic acid (SA) signaling pathways, were significantly up-regulated in GmMPK4-silenced plants, whereas genes involved in growth and development, such as those in auxin signaling pathways and in cell cycle and proliferation, were significantly down-regulated. As expected, SA and hydrogen peroxide accumulation was significantly increased in GmMPK4-silenced plants. Accordingly, GmMPK4-silenced plants were more resistant to downy mildew and Soybean mosaic virus compared with vector control plants. Using bimolecular fluorescence complementation analysis and in vitro kinase assays, we determined that GmMKK1 and GmMKK2 might function upstream of GmMPK4. Taken together, our results indicate that GmMPK4s negatively regulate SA accumulation and defense response but positively regulate plant growth and development, and their functions are conserved across plant species.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Steven A. Whitham
- Department of Plant Pathology (J.-Z.L., H.D.H., E.B., C.Z., W.-L.Q., Y.L., J.H.H., S.A.W.), Department of Agronomy (M.A.G.), and Department of Statistics (D.N.), Iowa State University, Ames, Iowa 50011; Corn Insects and Crop Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Ames, Iowa 50011 (M.A.G.); United States Department of Agriculture-Agricultural Research Service, Department of Plant Pathology, Washington State University, Prosser, Washington 99350 (D.N.)
| |
Collapse
|
14
|
Subramaniam S, Fahy E, Gupta S, Sud M, Byrnes RW, Cotter D, Dinasarapu AR, Maurya MR. Bioinformatics and systems biology of the lipidome. Chem Rev 2011; 111:6452-90. [PMID: 21939287 PMCID: PMC3383319 DOI: 10.1021/cr200295k] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Shankar Subramaniam
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
- Departments of Chemistry and Biochemistry, and Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, California 92093, USA
| | - Eoin Fahy
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Shakti Gupta
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Manish Sud
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Robert W. Byrnes
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Dawn Cotter
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Ashok Reddy Dinasarapu
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Mano Ram Maurya
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| |
Collapse
|
15
|
Zheng CH, Chong YW, Wang HQ. Gene selection using independent variable group analysis for tumor classification. Neural Comput Appl 2011. [DOI: 10.1007/s00521-010-0513-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
16
|
Olex AL, Hiltbold EM, Leng X, Fetrow JS. Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates. BMC Immunol 2010; 11:41. [PMID: 20682054 PMCID: PMC2928180 DOI: 10.1186/1471-2172-11-41] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2009] [Accepted: 08/03/2010] [Indexed: 01/04/2023] Open
Abstract
Background Dendritic cells (DC) play a central role in primary immune responses and become potent stimulators of the adaptive immune response after undergoing the critical process of maturation. Understanding the dynamics of DC maturation would provide key insights into this important process. Time course microarray experiments can provide unique insights into DC maturation dynamics. Replicate experiments are necessary to address the issues of experimental and biological variability. Statistical methods and averaging are often used to identify significant signals. Here a novel strategy for filtering of replicate time course microarray data, which identifies consistent signals between the replicates, is presented and applied to a DC time course microarray experiment. Results The temporal dynamics of DC maturation were studied by stimulating DC with poly(I:C) and following gene expression at 5 time points from 1 to 24 hours. The novel filtering strategy uses standard statistical and fold change techniques, along with the consistency of replicate temporal profiles, to identify those differentially expressed genes that were consistent in two biological replicate experiments. To address the issue of cluster reproducibility a consensus clustering method, which identifies clusters of genes whose expression varies consistently between replicates, was also developed and applied. Analysis of the resulting clusters revealed many known and novel characteristics of DC maturation, such as the up-regulation of specific immune response pathways. Intriguingly, more genes were down-regulated than up-regulated. Results identify a more comprehensive program of down-regulation, including many genes involved in protein synthesis, metabolism, and housekeeping needed for maintenance of cellular integrity and metabolism. Conclusions The new filtering strategy emphasizes the importance of consistent and reproducible results when analyzing microarray data and utilizes consistency between replicate experiments as a criterion in both feature selection and clustering, without averaging or otherwise combining replicate data. Observation of a significant down-regulation program during DC maturation indicates that DC are preparing for cell death and provides a path to better understand the process. This new filtering strategy can be adapted for use in analyzing other large-scale time course data sets with replicates.
Collapse
Affiliation(s)
- Amy L Olex
- Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109, USA
| | | | | | | |
Collapse
|
17
|
Yi M, Mudunuri U, Che A, Stephens RM. Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis. BMC Bioinformatics 2009; 10:200. [PMID: 19563622 PMCID: PMC2709625 DOI: 10.1186/1471-2105-10-200] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Accepted: 06/29/2009] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself. RESULTS We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets. CONCLUSION This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at http://www.abcc.ncifcrf.gov/wps/wps_index.php.
Collapse
Affiliation(s)
- Ming Yi
- Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc, NCI-Frederick, Frederick, MD 21702, USA.
| | | | | | | |
Collapse
|
18
|
Characterizing Gene Expressions Based on Their Temporal Observations. J Biomed Biotechnol 2009; 2009:357937. [PMID: 19390582 PMCID: PMC2668864 DOI: 10.1155/2009/357937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2008] [Revised: 02/04/2009] [Accepted: 03/05/2009] [Indexed: 11/17/2022] Open
Abstract
Temporal gene expression data are of particular interest to researchers as they contain rich information in characterization of gene function and have been widely used in biomedical studies. However, extracting information and identifying efficient treatment effects without loss of temporal information are still in problem. In this paper, we propose a method of classifying temporal gene expression curves in which individual expression trajectory is modeled as longitudinal data with changeable variance and covariance structure. The method, mainly based on generalized mixed model, is illustrated by a dense temporal gene expression data in bacteria. We aimed at evaluating gene effects and treatments. The power and time points of measurements are also characterized via the longitudinal mixed model. The results indicated that the proposed methodology is promising for the analysis of temporal gene expression data, and that it could be generally applicable to other high-throughput temporal gene expression analyses.
Collapse
|
19
|
Fernández EA, Girotti MR, López del Olmo JA, Llera AS, Podhajcer OL, Cantet RJC, Balzarini M. Improving 2D-DIGE protein expression analysis by two-stage linear mixed models: assessing experimental effects in a melanoma cell study. ACTA ACUST UNITED AC 2008; 24:2706-12. [PMID: 18818217 DOI: 10.1093/bioinformatics/btn508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Difference in-gel electrophoresis (DIGE)-based protein expression analysis allows assessing the relative expression of proteins in two biological samples differently labeled (Cy5, Cy3 CyDyes). In the same gel, a reference sample is also used (Cy2 CyDye) for spot matching during image analysis and volume normalization. The standard statistical techniques to identify differentially expressed (DE) proteins are the calculation of fold-changes and the comparison of treatment means by the t-test. The analyses rarely accounts for other experimental effects, such as CyDye and gel effects, which could be important sources of noise while detecting treatment effects. RESULTS We propose to identify DIGE DE proteins using a two-stage linear mixed model. The proposal consists of splitting the overall model for the measured intensity into two interconnected models. First, we fit a normalization model that accounts for the general experimental effects, such as gel and CyDye effects as well as for the features of the associated random term distributions. Second, we fit a model that uses the residuals from the first step to account for differences between treatments in protein-by-protein basis. The modeling strategy was evaluated using data from a melanoma cell study. We found that a heteroskedastic model in the first stage, which also account for CyDye and gel effects, best normalized the data, while allowing for an efficient estimation of the treatment effects. The Cy2 reference channel was used as a covariate in the normalization model to avoid skewness of the residual distribution. Its inclusion improved the detection of DE proteins in the second stage.
Collapse
Affiliation(s)
- Elmer A Fernández
- School of Engineering, Intelligent Data Analysis Group, Catholic University of Córdoba, Argentina.
| | | | | | | | | | | | | |
Collapse
|
20
|
Yi M, Stephens RM. SLEPR: a sample-level enrichment-based pathway ranking method -- seeking biological themes through pathway-level consistency. PLoS One 2008; 3:e3288. [PMID: 18818771 PMCID: PMC2546449 DOI: 10.1371/journal.pone.0003288] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2008] [Accepted: 08/29/2008] [Indexed: 11/25/2022] Open
Abstract
Analysis of microarray and other high throughput data often involves identification of genes consistently up or down-regulated across samples as the first step in extraction of biological meaning. This gene-level paradigm can be limited as a result of valid sample fluctuations and biological complexities. In this report, we describe a novel method, SLEPR, which eliminates this limitation by relying on pathway-level consistencies. Our method first selects the sample-level differentiated genes from each individual sample, capturing genes missed by other analysis methods, ascertains the enrichment levels of associated pathways from each of those lists, and then ranks annotated pathways based on the consistency of enrichment levels of individual samples from both sample classes. As a proof of concept, we have used this method to analyze three public microarray datasets with a direct comparison with the GSEA method, one of the most popular pathway-level analysis methods in the field. We found that our method was able to reproduce the earlier observations with significant improvements in depth of coverage for validated or expected biological themes, but also produced additional insights that make biological sense. This new method extends existing analyses approaches and facilitates integration of different types of HTP data.
Collapse
Affiliation(s)
- Ming Yi
- Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, Maryland, United States of America
| | - Robert M. Stephens
- Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, Maryland, United States of America
- * E-mail:
| |
Collapse
|
21
|
Gadgil M. A Population Proportion approach for ranking differentially expressed genes. BMC Bioinformatics 2008; 9:380. [PMID: 18801167 PMCID: PMC2566584 DOI: 10.1186/1471-2105-9-380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Accepted: 09/18/2008] [Indexed: 11/14/2022] Open
Abstract
Background DNA microarrays are used to investigate differences in gene expression between two or more classes of samples. Most currently used approaches compare mean expression levels between classes and are not geared to find genes whose expression is significantly different in only a subset of samples in a class. However, biological variability can lead to situations where key genes are differentially expressed in only a subset of samples. To facilitate the identification of such genes, a new method is reported. Methods The key difference between the Population Proportion Ranking Method (PPRM) presented here and almost all other methods currently used is in the quantification of variability. PPRM quantifies variability in terms of inter-sample ratios and can be used to calculate the relative merit of differentially expressed genes with a specified difference in expression level between at least some samples in the two classes, which at the same time have lower than a specified variability within each class. Results PPRM is tested on simulated data and on three publicly available cancer data sets. It is compared to the t test, PPST, COPA, OS, ORT and MOST using the simulated data. Under the conditions tested, it performs as well or better than the other methods tested under low intra-class variability and better than t test, PPST, COPA and OS when a gene is differentially expressed in only a subset of samples. It performs better than ORT and MOST in recognizing non differentially expressed genes with high variability in expression levels across all samples. For biological data, the success of predictor genes identified in appropriately classifying an independent sample is reported.
Collapse
Affiliation(s)
- Mugdha Gadgil
- Chemical Engineering and Process Development, National Chemical Laboratory, Pune, India .
| |
Collapse
|
22
|
Dutta B, Snyder R, Klapa MI. Significance analysis of time-series transcriptomic data: a methodology that enables the identification and further exploration of the differentially expressed genes at each time-point. Biotechnol Bioeng 2007; 98:668-78. [PMID: 17385748 DOI: 10.1002/bit.21432] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Time-series transcriptional profiling experiments are becoming increasingly popular, in light of the abundance of information regarding a biological system's regulation that they are expected to reveal. However, identification of differentially expressed genes as a function of time and comparison between physiological states based on the genes' variability in significance level over time remain intriguing tasks, due to certain limitations in the currently available algorithms. Based on the principles of significance analysis of microarrays (SAM) method, we developed an algorithm that allows for the identification of the differentially expressed genes at each time-point of a time sequence, using a common reference distribution and significance threshold for all time-points. These results are further explored in a systematic way to extract information about (a) individual gene and gene class variability in significance level with time, (b) gene and time-point correlation based on (a), and (c) gene class comparison based on (a). All algorithms have been programmed in C language in the form of four executable files for both Windows and Macintosh platforms under the overall name MiTimeS. MiTimeS was validated in the context of real transcriptomic data. It enables the extraction of biologically relevant information from the dynamic transcriptomic profiles currently unnoticed from the available algorithms. The applicability of MiTimeS is not limited to transcriptomic data, but it could be accordingly used for the analysis of dynamic data from other cellular fingerprints.
Collapse
Affiliation(s)
- Bhaskar Dutta
- Metabolic Engineering and Systems Biology Laboratory, Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland, USA
| | | | | |
Collapse
|
23
|
Mathur S, Dolo S. A new efficient statistical test for detecting variability in the gene expression data. Stat Methods Med Res 2007; 17:405-19. [PMID: 17698928 DOI: 10.1177/0962280206078643] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions. The detection of differential gene expression under two different conditions is very important in microarray studies. Microarray experiments are multi-step procedures and each step is a potential source of variance. This makes the measurement of variability difficult because approach based on gene-by-gene estimation of variance will have few degrees of freedom. It is highly possible that the assumption of equal variance for all the expression levels may not hold. Also, the assumption of normality of gene expressions may not hold. Thus it is essential to have a statistical procedure which is not based on the normality assumption and also it can detect genes with differential variance efficiently. The detection of differential gene expression variance will allow us to identify experimental variables that affect different biological processes and accuracy of DNA microarray measurements.In this article, a new nonparametric test for scale is developed based on the arctangent of the ratio of two expression levels. Most of the tests available in literature require the assumption of normal distribution, which makes them inapplicable in many situations, and it is also hard to verify the suitability of the normal distribution assumption for the given data set. The proposed test does not require the assumption of the distribution for the underlying population and hence makes it more practical and widely applicable. The asymptotic relative efficiency is calculated under different distributions, which show that the proposed test is very powerful when the assumption of normality breaks down. Monte Carlo simulation studies are performed to compare the power of the proposed test with some of the existing procedures. It is found that the proposed test is more powerful than commonly used tests under almost all the distributions considered in the study. A microarray data is used to illustrate the working of the proposed test. Results indicate that the proposed test is very powerful in detecting the smallest change in differential expression variance with high degree of confidence than some of its competitors.
Collapse
Affiliation(s)
- Sunil Mathur
- Department of Mathematics, University of Mississippi, MS, USA
| | | |
Collapse
|
24
|
Pang S, Havukkala I, Hu Y, Kasabov N. Classification consistency analysis for bootstrapping gene selection. Neural Comput Appl 2007. [DOI: 10.1007/s00521-007-0110-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
25
|
Chung H, Kim HJ, Jang KS, Kim M, Yang J, Kim JH, Lee YS, Kong G. Comprehensive analysis of differential gene expression profiles on diclofenac-induced acute mouse liver injury and recovery. Toxicol Lett 2006; 166:77-87. [PMID: 16859844 DOI: 10.1016/j.toxlet.2006.05.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Revised: 05/26/2006] [Accepted: 05/29/2006] [Indexed: 10/24/2022]
Abstract
Microarray analysis of RNA from diclofenac-administered mouse livers was performed to establish a global gene expression profile during injury and recovery stages at two different doses. A single dose of diclofenac at 9.5 mg/kg or 0.95 mg/kg body weight was given orally, and the liver samples were obtained after 6, 24, and 72 h. Histopathologic studies enabled the classification of the diclofenac effect into injury (6, 24 h) and recovery (72 h) stages. By using the Applied Biosystems Mouse Genome Survey Microarray, a total of 7370 out of 33,315 (22.1%) genes were found to be statistically reliable at p<0.05 by two-way ANOVA, and 602 (1.8%) probes at false discovery rate <5% by Significance Analysis of Microarray. Among the statistically reliable clones by both analytical methods, 49 genes were differentially expressed with more than a 1.625-fold difference (which equals 0.7 in log(2) scale) at one or more treatment conditions. Forty genes and two genes were identified as injury- and recovery-specific genes, respectively, showing that most of the transcriptomic changes were seen during the injury stage. Furthermore, multiple genes involved in oxidative stress, eicosanoid synthesis, apoptosis, and ATP synthesis showed variable transcript levels upon acute diclofenac administration.
Collapse
Affiliation(s)
- Heekyoung Chung
- Department of Pathology, College of Medicine, Hanyang University, Seongdong-gu, Seoul 133-791, Republic of Korea
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Koutna I, Klabusay M, Kohutova V, Krontorad P, Svoboda Z, Kozubek M, Mayer J. Evaluation of CD34+ - and Lin- -selected cells from peripheral blood stem cell grafts of patients with lymphoma during differentiation in culture ex vivo using a cDNA microarray technique. Exp Hematol 2006; 34:832-40. [PMID: 16797410 DOI: 10.1016/j.exphem.2006.04.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Revised: 03/06/2006] [Accepted: 04/04/2006] [Indexed: 11/15/2022]
Abstract
OBJECTIVE Hematopoietic stem cells (enriched in fraction of CD34+ cells) have the ability to regenerate hematopoiesis in all of its lineages, and this potential is clinically used in transplanting bone marrow or peripheral blood stem cells. Our objective was to assemble a suitable method for evaluating gene expression in enriched populations of hematopoietic stem cells. We compared biologic properties of cells cultured ex vivo obtained using two different ways of immunomagnetic separation (positive selection of CD34+ cells and negative selection of Lin- cells) by means of a cDNA microarray technique. METHODS CD34+ and Lin- cells were enriched from peripheral blood stem cell (PBSCs) grafts of patients with non-Hodgkin's lymphoma. Isolated cells were in the presence of cytokine PBSCs, Flt-3 ligand, interleukin-3, interleukin-6, and granulocyte colony-stimulating factor. At days 0, 4, 6, 8, 10, 12, and 14 cells were harvested and analyzed by cDNA microarrays. Total cell expansion, CD34+, colony-forming unit for granulocyte-macrophage and megakaryocytes expansion, vitality, and phenotype of cells were also analyzed. RESULTS cDNA microarray analysis of cultured hematopoietic cells proved equivalence of the two enrichment methods for PBSC samples and helped us characterize differentiating cells cultured ex vivo. CONCLUSION Our methodologic approach is helpful in characterizing cultured hematopoietic cells cultured ex vivo, but it is also suitable for more general purposes. Equivalence of CD34+ and Lin- selection methods from PBSC samples proved by cDNA microarray may have an implication for graft manipulation in an experimental setting of hematopoietic transplantation. Total cell expansion and colony formation and phenotype from CD34+ selected and from Lin- samples were comparable.
Collapse
Affiliation(s)
- Irena Koutna
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | | | | | | | | | | | | |
Collapse
|
27
|
Abstract
The study of gene expression profiling of cells and tissue has become a major tool for discovery in medicine. Microarray experiments allow description of genome-wide expression changes in health and disease. The results of such experiments are expected to change the methods employed in the diagnosis and prognosis of disease in obstetrics and gynecology. Moreover, an unbiased and systematic study of gene expression profiling should allow the establishment of a new taxonomy of disease for obstetric and gynecologic syndromes. Thus, a new era is emerging in which reproductive processes and disorders could be characterized using molecular tools and fingerprinting. The design, analysis, and interpretation of microarray experiments require specialized knowledge that is not part of the standard curriculum of our discipline. This article describes the types of studies that can be conducted with microarray experiments (class comparison, class prediction, class discovery). We discuss key issues pertaining to experimental design, data preprocessing, and gene selection methods. Common types of data representation are illustrated. Potential pitfalls in the interpretation of microarray experiments, as well as the strengths and limitations of this technology, are highlighted. This article is intended to assist clinicians in appraising the quality of the scientific evidence now reported in the obstetric and gynecologic literature.
Collapse
Affiliation(s)
- Adi L. Tarca
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Department of Computer Science, Wayne State University
| | - Roberto Romero
- Perinatology Research Branch, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
- Center for Molecular Medicine and Genetics, Wayne State University
| | - Sorin Draghici
- Department of Computer Science, Wayne State University
- Karmanos Cancer Institute, Detroit, MI
| |
Collapse
|
28
|
Carles A, Millon R, Cromer A, Ganguli G, Lemaire F, Young J, Wasylyk C, Muller D, Schultz I, Rabouel Y, Dembélé D, Zhao C, Marchal P, Ducray C, Bracco L, Abecassis J, Poch O, Wasylyk B. Head and neck squamous cell carcinoma transcriptome analysis by comprehensive validated differential display. Oncogene 2006; 25:1821-31. [PMID: 16261155 DOI: 10.1038/sj.onc.1209203] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Head and neck squamous cell carcinoma (HNSCC) is common worldwide and is associated with a poor rate of survival. Identification of new markers and therapeutic targets, and understanding the complex transformation process, will require a comprehensive description of genome expression, that can only be achieved by combining different methodologies. We report here the HNSCC transcriptome that was determined by exhaustive differential display (DD) analysis coupled with validation by different methods on the same patient samples. The resulting 820 nonredundant sequences were analysed by high throughput bioinformatics analysis. Human proteins were identified for 73% (596) of the DD sequences. A large proportion (>50%) of the remaining unassigned sequences match ESTs (expressed sequence tags) from human tumours. For the functionally annotated proteins, there is significant enrichment for relevant biological processes, including cell motility, protein biosynthesis, stress and immune responses, cell death, cell cycle, cell proliferation and/or maintenance and transport. Three of the novel proteins (TMEM16A, PHLDB2 and ARHGAP21) were analysed further to show that they have the potential to be developed as therapeutic targets.
Collapse
Affiliation(s)
- A Carles
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, 67404 Illkirch Cedex, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Pan KH, Lih CJ, Cohen SN. Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci U S A 2005; 102:8961-5. [PMID: 15951424 PMCID: PMC1149502 DOI: 10.1073/pnas.0502674102] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Global analysis of gene expression by using DNA microarrays is employed increasingly to search for differences in biological properties between normal and diseased tissue. In such studies, expression that deviates from defined thresholds commonly is used for creating genetic signatures that characterize disease vs. normality. Although it is axiomatic that the threshold parameters applied to microarray analysis will alter the contents of such genetic signatures, the extent to which threshold choice can affect the fundamental conclusions made from microarray-based studies has not been elucidated. We used GABRIEL (Genetic Analysis By Rules Incorporating Expert Logic), a platform of knowledge-based algorithms for the global analysis of gene expression, together with conventional statistical approaches, to examine the sensitivity of conclusions to threshold choice in recently published microarray-based studies. An analysis of the effects of threshold decisions in one of these studies [Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. (2003) Nat. Genet. 33, 49-54], which arrived at the important conclusion that the metastatic potential of primary tumors is encoded by the bulk of cells in the tumor, is the focus of this article. We discovered that support for this conclusion highly depends on the threshold used to create gene expression signatures. We also found that threshold choice dramatically affected the gene function categories represented nonrandomly in signatures. Our results suggest that the robustness of biological conclusions made by using microarray analysis should be routinely assessed by examining the validity of the conclusions by using a range of threshold parameters.
Collapse
Affiliation(s)
- Kuang-Hung Pan
- Department of Genetics and Program in Biomedical Informatics, Stanford University School of Medicine, Stanford University, Stanford, CA 94305-5120, USA
| | | | | |
Collapse
|
30
|
Tarca AL, Cooke JEK, Mackay J. A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data. Bioinformatics 2005; 21:2674-83. [PMID: 15797913 DOI: 10.1093/bioinformatics/bti397] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Microarray experiments are affected by numerous sources of non-biological variation that contribute systematic bias to the resulting data. In a dual-label (two-color) cDNA or long-oligonucleotide microarray, these systematic biases are often manifested as an imbalance of measured fluorescent intensities corresponding to Sample A versus those corresponding to Sample B. Systematic biases also affect between-slide comparisons. Making effective corrections for these systematic biases is a requisite for detecting the underlying biological variation between samples. Effective data normalization is therefore an essential step in the confident identification of biologically relevant differences in gene expression profiles. Several normalization methods for the correction of systemic bias have been described. While many of these methods have addressed intensity-dependent bias, few have addressed both intensity-dependent and spatiality-dependent bias. RESULTS We present a neural network-based normalization method for correcting the intensity- and spatiality-dependent bias in cDNA microarray datasets. In this normalization method, the dependence of the log-intensity ratio (M) on the average log-intensity (A) as well as on the spatial coordinates (X,Y) of spots is approximated with a feed-forward neural network function. Resistance to outliers is provided by assigning weights to each spot based on how distant their M values is from the median over the spots whose A values are similar, as well as by using pseudospatial coordinates instead of spot row and column indices. A comparison of the robust neural network method with other published methods demonstrates its potential in reducing both intensity-dependent bias and spatial-dependent bias, which translates to more reliable identification of truly regulated genes.
Collapse
MESH Headings
- Algorithms
- Artifacts
- Gene Expression Profiling/methods
- Gene Expression Profiling/standards
- Image Interpretation, Computer-Assisted/methods
- Image Interpretation, Computer-Assisted/standards
- In Situ Hybridization, Fluorescence/methods
- In Situ Hybridization, Fluorescence/standards
- Microscopy, Fluorescence/methods
- Microscopy, Fluorescence/standards
- Models, Genetic
- Models, Statistical
- Neural Networks, Computer
- Oligonucleotide Array Sequence Analysis/methods
- Oligonucleotide Array Sequence Analysis/standards
- Pattern Recognition, Automated/methods
Collapse
Affiliation(s)
- A L Tarca
- Research Center in Forest Biology, Department of Wood and Forest Science, Laval University, Sainte-Foy (QC), Canada G1K-7P4.
| | | | | |
Collapse
|
31
|
Shelest E, Wingender E. Construction of predictive promoter models on the example of antibacterial response of human epithelial cells. Theor Biol Med Model 2005; 2:2. [PMID: 15647113 PMCID: PMC546226 DOI: 10.1186/1742-4682-2-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2004] [Accepted: 01/12/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Binding of a bacteria to a eukaryotic cell triggers a complex network of interactions in and between both cells. P. aeruginosa is a pathogen that causes acute and chronic lung infections by interacting with the pulmonary epithelial cells. We use this example for examining the ways of triggering the response of the eukaryotic cell(s), leading us to a better understanding of the details of the inflammatory process in general. RESULTS Considering a set of genes co-expressed during the antibacterial response of human lung epithelial cells, we constructed a promoter model for the search of additional target genes potentially involved in the same cell response. The model construction is based on the consideration of pair-wise combinations of transcription factor binding sites (TFBS). It has been shown that the antibacterial response of human epithelial cells is triggered by at least two distinct pathways. We therefore supposed that there are two subsets of promoters activated by each of them. Optimally, they should be "complementary" in the sense of appearing in complementary subsets of the (+)-training set. We developed the concept of complementary pairs, i.e., two mutually exclusive pairs of TFBS, each of which should be found in one of the two complementary subsets. CONCLUSIONS We suggest a simple, but exhaustive method for searching for TFBS pairs which characterize the whole (+)-training set, as well as for complementary pairs. Applying this method, we came up with a promoter model of antibacterial response genes that consists of one TFBS pair which should be found in the whole training set and four complementary pairs. We applied this model to screening of 13,000 upstream regions of human genes and identified 430 new target genes which are potentially involved in antibacterial defense mechanisms.
Collapse
Affiliation(s)
- Ekaterina Shelest
- Dept. of Bioinformatics, UKG, University of Göttingen, Goldschmidtstr. 1, D-37077 Göttingen, Germany
| | - Edgar Wingender
- Dept. of Bioinformatics, UKG, University of Göttingen, Goldschmidtstr. 1, D-37077 Göttingen, Germany
- BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenbüttel, Germany
| |
Collapse
|
32
|
N/A. N/A. Shijie Huaren Xiaohua Zazhi 2004; 12:2742-2744. [DOI: 10.11569/wcjd.v12.i11.2742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
33
|
Delmar P, Robin S, Daudin JJ. VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data. Bioinformatics 2004; 21:502-8. [PMID: 15374871 DOI: 10.1093/bioinformatics/bti023] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identifying differentially regulated genes in experiments comparing two experimental conditions is often a key step in the microarray data analysis process. Many different approaches and methodological developments have been put forward, yet the question remains open. RESULTS Varmixt is a powerful and efficient novel methodology for this task. It is based on a flexible and realistic variance modelling strategy. It compares favourably with other popular techniques (standard t-test, SAM and Cyber-T). The relevance of the approach is demonstrated with real-world and simulated datasets. The analysis strategy was successfully applied to both a 'two-colour' cDNA microarray and an Affymetrix Genechip. Strong control of false positive and false negative rates is proven in large simulation studies. AVAILABILITY The R package is freely available at http://www.inapg.inra.fr/ens_rech/mathinfo/recherche/mathematique/outil.html CONTACT delmar@inapg.inra.fr SUPPLEMENTARY INFORMATION http://www.inapg.inra.fr/ens_rech/mathinfo/recherche/mathematique/outil.html.
Collapse
Affiliation(s)
- Paul Delmar
- Laboratoire MAS Ecole Centrale Paris, Grande Voie des vignes, 92295 Chatenay Malabry, France.
| | | | | |
Collapse
|
34
|
Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE. Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinformatics 2004; 5:110. [PMID: 15307894 PMCID: PMC514539 DOI: 10.1186/1471-2105-5-110] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2004] [Accepted: 08/12/2004] [Indexed: 11/10/2022] Open
Abstract
Background Microarray studies in cancer compare expression levels between two or more sample groups on thousands of genes. Data analysis follows a population-level approach (e.g., comparison of sample means) to identify differentially expressed genes. This leads to the discovery of 'population-level' markers, i.e., genes with the expression patterns A > B and B > A. We introduce the PPST test that identifies genes where a significantly large subset of cases exhibit expression values beyond upper and lower thresholds observed in the control samples. Results Interestingly, the test identifies A > B and B < A pattern genes that are missed by population-level approaches, such as the t-test, and many genes that exhibit both significant overexpression and significant underexpression in statistically significantly large subsets of cancer patients (ABA pattern genes). These patterns tend to show distributions that are unique to individual genes, and are aptly visualized in a 'gene expression pattern grid'. The low degree of among-gene correlations in these genes suggests unique underlying genomic pathologies and high degree of unique tumor-specific differential expression. We compare the PPST and the ABA test to the parametric and non-parametric t-test by analyzing two independently published data sets from studies of progression in astrocytoma. Conclusions The PPST test resulted findings similar to the nonparametric t-test with higher self-consistency. These tests and the gene expression pattern grid may be useful for the identification of therapeutic targets and diagnostic or prognostic markers that are present only in subsets of cancer patients, and provide a more complete portrait of differential expression in cancer.
Collapse
Affiliation(s)
- James Lyons-Weiler
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Satish Patel
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Michael J Becich
- Department of Pathology, Center for Biomedical Informatics, and Interdisciplinary Biomedical Graduate Program, University of Pittsburgh, PA 15232 USA
- Clinical Genomics Facility, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232 USA
| | - Tony E Godfrey
- Departments of Surgery and Human Genetics, University of Pittsburgh Medical School, Pittsburgh, PA 15232 USA
- Mount Sinai School of Medicine, One Gustave Levy Place, Box 1668, East Building, Room 1070C, New York, NY 10029 USA
| |
Collapse
|
35
|
Abstract
The "informatics revolution" in both bioinformatics and dental informatics will eventually change the way we practice dentistry. This convergence will play a pivotal role in creating a bridge of opportunity by integrating scientific and clinical specialties to promote the advances in treatment, risk assessment, diagnosis, therapeutics, and oral health-care outcome. Bioinformatics has been an emerging field in the biomedical research community and has been gaining momentum in dental medicine. This area has created a steady stream of large and complex genomic data, which has transformed the way a clinical or basic science researcher approaches genomic research. This application to dental medicine, termed "oral genomics", can aid in the molecular understanding of the genes and proteins, their interactions, pathways, and networks that are responsible for the development and progression of oral diseases and disorders. As the result of the Human Genome Project, new advances have prompted high-throughput technologies, such as DNA microarrays, which have become accepted tools in the biomedical research community. This manuscript reviews the two most commonly used microarray technologies, basic microarray data analysis, and the results from several ongoing oral cancer genomic studies.
Collapse
Affiliation(s)
- W P Kuo
- Harvard School of Dental Medicine, Department of Oral Medicine, Infection, and Immunity, 188 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
36
|
A primer on gene expression and microarrays for machine learning researchers. J Biomed Inform 2004; 37:293-303. [DOI: 10.1016/j.jbi.2004.07.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2004] [Indexed: 01/09/2023]
|
37
|
Meyers BC, Galbraith DW, Nelson T, Agrawal V. Methods for transcriptional profiling in plants. Be fruitful and replicate. PLANT PHYSIOLOGY 2004; 135:637-52. [PMID: 15173570 PMCID: PMC514100 DOI: 10.1104/pp.104.040840] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2004] [Revised: 03/19/2004] [Accepted: 03/19/2004] [Indexed: 05/18/2023]
Affiliation(s)
- Blake C Meyers
- Department of Plant and Soil Sciences and Delaware Biotechnology Institute, University of Delaware, Newark, Delaware 19711, USA.
| | | | | | | |
Collapse
|
38
|
Dombkowski AA, Thibodeau BJ, Starcevic SL, Novak RF. Gene-specific dye bias in microarray reference designs. FEBS Lett 2004; 560:120-4. [PMID: 14988009 DOI: 10.1016/s0014-5793(04)00083-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2003] [Revised: 12/19/2003] [Accepted: 01/07/2004] [Indexed: 11/23/2022]
Abstract
The most widely used microarray experiment design includes the use of a reference standard. Comparisons of gene expression between samples are facilitated because each sample is directly measured against the reference standard, using two fluorescent dyes. Numerous reports indicate that some genes incorporate the two commonly used dyes with different efficiencies, contributing to inaccurate data. However, it is widely assumed that these effects will not corrupt results if the reference standard is labeled with the same dye on each microarray. We demonstrate that this assumption is not reliable and that dye orientation can significantly influence measured changes in gene expression.
Collapse
Affiliation(s)
- Alan A Dombkowski
- Institute of Environmental Health Sciences, Wayne State University, 2727 Second Ave, Detroit, MI 48201, USA.
| | | | | | | |
Collapse
|
39
|
Sauvageot C, Dahia PL, Lipan O, Park JK, Chang MS, Alberta JA, Stiles CD. Distinct temporal genetic signatures of neurogenic and gliogenic cues in cortical stem cell cultures. ACTA ACUST UNITED AC 2004; 62:121-33. [PMID: 15389679 DOI: 10.1002/neu.20072] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Cortical progenitor cells from rat embryos give rise to neurons or glia following exposure to platelet derived growth factor (PDGF) or ciliary neurotrophic factor (CNTF), respectively. Both growth factors impart their developmental cues quickly through a transcription-dependent mechanism. Do the alternate developmental responses to PDGF and CNTF reflect induction of qualitatively distinct genes? Alternatively, do the same genes respond to each growth factor, but with quantitatively distinct kinetics? Using differential library screening and custom cDNA microarrays we show that a common set of genes responds to either growth factor. However, quantitative differences in the onset and duration of gene induction equate to the expression of factor-specific gene signatures. Multitissue cluster analysis also reveals tissue-specific gene signatures that may play important roles in the developing brain.
Collapse
Affiliation(s)
- Claire Sauvageot
- Department of Cancer Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | |
Collapse
|