1
|
Li R, Fristensky B, Wang G. Sequence data analysis and preprocessing for oligo probe design in microbial genomes. AIMS BIOENGINEERING 2017. [DOI: 10.3934/bioeng.2017.1.28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
2
|
A hypervariable genomic island identified in clinical and environmental Mycobacterium avium subsp. hominissuis isolates from Germany. Int J Med Microbiol 2016; 306:495-503. [PMID: 27481640 DOI: 10.1016/j.ijmm.2016.07.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 06/06/2016] [Accepted: 07/17/2016] [Indexed: 11/21/2022] Open
Abstract
Mycobacterium avium subsp. hominissuis (MAH) is an opportunistic human pathogen widespread in the environment. Genomic islands (GI)s represent a part of the accessory genome of bacteria and influence virulence, drug-resistance or fitness and trigger bacterial evolution. We previously identified a novel GI in four MAH genomes. Here, we further explored this GI in a larger collection of MAH isolates from Germany (n=41), including 20 clinical and 21 environmental isolates. Based on comparative whole genome analysis, we detected this GI in 39/41 (95.1%) isolates. Although all these GIs integrated in the same insertion hotspot, there is high variability in the genetic structure of this GI: eight different types of GI have been identified, designated A-H (sized 6.2-73.3kb). These GIs were arranged as single GI (23/41, 56.1%), combination of two different GIs (14/41, 34.1%) or combination of three different GIs (2/41, 4.9%) in the insertion hotspot. Moreover, two GI types shared more than 80% sequence identity with sequences of M. canettii, responsible for Tuberculosis. A total of 253 different genes were identified in all GIs, among which the previously documented virulence-related genes mmpL10 and mce. The diversity of the GI and the sequence similarity with other mycobacteria suggests cross-species transfer, involving also highly pathogenic species. Shuffling of potential virulence genes such as mmpL10 via this GI may create new pathogens that can cause future outbreaks.
Collapse
|
3
|
Tulpan D, Ghiggi A, Montemanni R. Computational Sequence Design Techniques for DNA Microarray Technologies. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
In systems biology and biomedical research, microarray technology is a method of choice that enables the complete quantitative and qualitative ascertainment of gene expression patterns for whole genomes. The selection of high quality oligonucleotide sequences that behave consistently across multiple experiments is a key step in the design, fabrication and experimental performance of DNA microarrays. The aim of this chapter is to outline recent algorithmic developments in microarray probe design, evaluate existing probe sequences used in commercial arrays, and suggest methodologies that have the potential to improve on existing design techniques.
Collapse
Affiliation(s)
- Dan Tulpan
- National Research Council of Canada, Canada
| | | | - Roberto Montemanni
- Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Switzerland
| |
Collapse
|
4
|
Baciu C, Thompson KJ, Mougeot JL, Brooks BR, Weller JW. The LO-BaFL method and ALS microarray expression analysis. BMC Bioinformatics 2012; 13:244. [PMID: 23006766 PMCID: PMC3526454 DOI: 10.1186/1471-2105-13-244] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 09/05/2012] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Sporadic Amyotrophic Lateral Sclerosis (sALS) is a devastating, complex disease of unknown etiology. We studied this disease with microarray technology to capture as much biological complexity as possible. The Affymetrix-focused BaFL pipeline takes into account problems with probes that arise from physical and biological properties, so we adapted it to handle the long-oligonucleotide probes on our arrays (hence LO-BaFL). The revised method was tested against a validated array experiment and then used in a meta-analysis of peripheral white blood cells from healthy control samples in two experiments. We predicted differentially expressed (DE) genes in our sALS data, combining the results obtained using the TM4 suite of tools with those from the LO-BaFL method. Those predictions were tested using qRT-PCR assays. RESULTS LO-BaFL filtering and DE testing accurately predicted previously validated DE genes in a published experiment on coronary artery disease (CAD). Filtering healthy control data from the sALS and CAD studies with LO-BaFL resulted in highly correlated expression levels across many genes. After bioinformatics analysis, twelve genes from the sALS DE gene list were selected for independent testing using qRT-PCR assays. High-quality RNA from six healthy Control and six sALS samples yielded the predicted differential expression for 7 genes: TARDBP, SKIV2L2, C12orf35, DYNLT1, ACTG1, B2M, and ILKAP. Four of the seven have been previously described in sALS studies, while ACTG1, B2M and ILKAP appear in the context of this disease for the first time. Supplementary material can be accessed at: http://webpages.uncc.edu/~cbaciu/LO-BaFL/supplementary_data.html. CONCLUSION LO-BaFL predicts DE results that are broadly similar to those of other methods. The small healthy control cohort in the sALS study is a reasonable foundation for predicting DE genes. Modifying the BaFL pipeline allowed us to remove noise and systematic errors, improving the power of this study, which had a small sample size. Each bioinformatics approach revealed DE genes not predicted by the other; subsequent PCR assays confirmed seven of twelve candidates, a relatively high success rate.
Collapse
Affiliation(s)
- Cristina Baciu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Kevin J Thompson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jean-Luc Mougeot
- ALS Biomarker Laboratory, Carolinas Neuromuscular/ALS-MDA Center, Department of Neurology, Carolinas Medical Center, Charlotte, NC, 28207, USA
- University of North Carolina School of Medicine, Charlotte Campus, Charlotte, NC, 28203, USA
| | - Benjamin R Brooks
- ALS Biomarker Laboratory, Carolinas Neuromuscular/ALS-MDA Center, Department of Neurology, Carolinas Medical Center, Charlotte, NC, 28207, USA
- University of North Carolina School of Medicine, Charlotte Campus, Charlotte, NC, 28203, USA
| | - Jennifer W Weller
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| |
Collapse
|
5
|
Tulpan D, Ghiggi A, Montemanni R. Computational Sequence Design Techniques for DNA Microarray Technologies. SYSTEMIC APPROACHES IN BIOINFORMATICS AND COMPUTATIONAL SYSTEMS BIOLOGY 2011. [DOI: 10.4018/978-1-61350-435-2.ch003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In systems biology and biomedical research, microarray technology is a method of choice that enables the complete quantitative and qualitative ascertainment of gene expression patterns for whole genomes. The selection of high quality oligonucleotide sequences that behave consistently across multiple experiments is a key step in the design, fabrication and experimental performance of DNA microarrays. The aim of this chapter is to outline recent algorithmic developments in microarray probe design, evaluate existing probe sequences used in commercial arrays, and suggest methodologies that have the potential to improve on existing design techniques.
Collapse
Affiliation(s)
- Dan Tulpan
- National Research Council of Canada, Canada
| | | | - Roberto Montemanni
- Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), Switzerland
| |
Collapse
|
6
|
Auslander M, Neumann PM, Tom M. The effect of tert-butyl hydroperoxide on hepatic transcriptome expression patterns in the striped sea bream (Lithognathus mormyrus; Teleostei). Free Radic Res 2010; 44:991-1003. [PMID: 20553222 DOI: 10.3109/10715762.2010.492831] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The study was aimed at examining the effects of tert-butyl hydroperoxide (tBHP) on hepatic transcriptome expression patterns of the teleost fish Lithognathus mormyrus. tBHP is an organic hydro-peroxide, widely used as a model pro-oxidant. It generates the reactive oxygen species (ROS) tert-butoxyl and tert-butylperoxyl. Complementary DNAs of tBHP-treated vs control fish were applied onto a previously produced cDNA microarray of approximately 1500 unique sequences. The effects of the tBHP application were demonstrated by leukocyte infiltration into the liver and by differential expression of various genes, some already known to be involved in ROS-related responses. Indicator genes of putative ROS effects were: aldehyde dehydrogenase 3A2, Heme oxygenase and the hemopexin-like protein. Putative indicators of transendothelial leukocyte migration and function were: p22phox, Rac1 and CD63-like genes. Interestingly, 7-dehydrocholesterol reductase was significantly down-regulated in response to all treatments. Several non-annotated genes revealed uniform directions of differential expression in response to all treatments.
Collapse
|
7
|
Thompson KJ, Deshmukh H, Solka JL, Weller JW. A white-box approach to microarray probe response characterization: the BaFL pipeline. BMC Bioinformatics 2009; 10:449. [PMID: 20040098 PMCID: PMC2804686 DOI: 10.1186/1471-2105-10-449] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Accepted: 12/29/2009] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with adjusting for non-ideal probes that do not report target concentration accurately. Cross-hybridizing probes (non-unique), probe composition and structure, as well as platform effects such as instrument limitations, have been shown to affect the interpretation of signal. Data cleansing pipelines seldom filter specifically for these constraints, relying instead on general statistical tests to remove the most variable probes from the samples in a study. This adjusts probes contributing to ProbeSet (gene) values in a study-specific manner. We refer to the complete set of factors as biologically applied filter levels (BaFL) and have assembled an analysis pipeline for managing them consistently. The pipeline and associated experiments reported here examine the outcome of comprehensively excluding probes affected by known factors on inter-experiment target behavior consistency. RESULTS We present here a 'white box' probe filtering and intensity transformation protocol that incorporates currently understood factors affecting probe and target interactions; the method has been tested on data from the Affymetrix human GeneChip HG-U95Av2, using two independent datasets from studies of a complex lung adenocarcinoma phenotype. The protocol incorporates probe-specific effects from SNPs, cross-hybridization and low heteroduplex affinity, as well as effects from scanner sensitivity, sample batches, and includes simple statistical tests for identifying unresolved biological factors leading to sample variability. Subsequent to filtering for these factors, the consistency and reliability of the remaining measurements is shown to be markedly improved. CONCLUSIONS The data cleansing protocol yields reproducible estimates of a given probe or ProbeSet's (gene's) relative expression that translates across datasets, allowing for credible cross-experiment comparisons. We provide supporting evidence for the validity of removing several large classes of probes, and for our approaches for removing outlying samples. The resulting expression profiles demonstrate consistency across the two independent datasets. Finally, we demonstrate that, given an appropriate sampling pool, the method enhances the t-test's statistical power to discriminate significantly different means over sample classes.
Collapse
Affiliation(s)
- Kevin J Thompson
- Computer Science Dept, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
| | | | | | | |
Collapse
|
8
|
Uva P, de Rinaldis E. CrossHybDetector: detection of cross-hybridization events in DNA microarray experiments. BMC Bioinformatics 2008; 9:485. [PMID: 19014642 PMCID: PMC2596149 DOI: 10.1186/1471-2105-9-485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 11/17/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA microarrays contain thousands of different probe sequences represented on their surface. These are designed in such a way that potential cross-hybridization reactions with non-target sequences are minimized. However, given the large number of probes, the occurrence of cross hybridization events cannot be excluded. This problem can dramatically affect the data quality and cause false positive/false negative results. RESULTS CrossHybDetector is a software package aimed at the identification of cross-hybridization events occurred during individual array hybridization, by using the probe sequences and the array intensity values. As output, the software provides the user with a list of array spots potentially 'corrupted' and their associated p-values calculated by Monte Carlo simulations. Graphical plots are also generated, which provide a visual and global overview of the quality of the microarray experiment with respect to cross-hybridization issues. CONCLUSION CrossHybDetector is implemented as a package for the statistical computing environment R and is freely available under the LGPL license within the CRAN project.
Collapse
|
9
|
Auslander M, Yudkovski Y, Chalifa-Caspi V, Herut B, Ophir R, Reinhardt R, Neumann PM, Tom M. Pollution-affected fish hepatic transcriptome and its expression patterns on exposure to cadmium. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2008; 10:250-261. [PMID: 18213484 PMCID: PMC2921062 DOI: 10.1007/s10126-007-9060-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2007] [Revised: 09/16/2007] [Accepted: 09/28/2007] [Indexed: 05/25/2023]
Abstract
Individuals of the fish Lithognathus mormyrus were exposed to a series of pollutants including: benzo[a]pyrene, pp-DDE, Aroclor 1254, perfluorooctanoic acid, tributyl-tin chloride, lindane, estradiol, 4-nonylphenol, methyl mercury chloride, and cadmium chloride. Five mixtures of the pollutants were injected. Each mixture included one to three compounds. A microarray was constructed using 4608 L. mormyrus hepatic cDNAs cloned from the pollutant-exposed fish. Most clones (4456) were sequenced and assembled into 1494 annotated unique clones. The constructed microarray was used to identify changes in hepatic gene expression profile on exposure to cadmium administered to the fish by feeding or injections. Thirty-one unique clones showed altered expression levels on exposure to cadmium. Prominently differentially expressed genes included elastase 4, carboxypeptidase B, trypsinogen, perforin, complement C31, cytochrome P450 2K5, ceruloplasmin, carboxyl ester lipase, and metallothionein. Twelve sequences have no available annotation. Most genes (23) were downregulated and hypothesized to be affected by general toxicity due to the intensive cadmium exposure regime. The concept of an operational multigene cDNA microarray, aimed at routine and fast biomonitoring of multiple environmental threats, is outlined and the cadmium exposure experiment has been used to demonstrate functional and methodological aspects of the biomonitoring tool. The components of the outlined system include: (1) spotted array, composed of both pollution-affected and constitutively expressed genes, the latter are used for normalization; (2) standard, repeatable labeling procedure of a reference transcript population; and (3) biomarker indices derived from the profile of expression ratio across the pollution-affected genes, between the field-sampled transcript populations and the reference.
Collapse
Affiliation(s)
- M. Auslander
- Israel Oceanographic and Limnological Research, Haifa, 31080 Israel
- The Technion-Israel Institute of Technology, Faculty of Civil and Environmental Engineering, Technion City, Haifa 32000 Israel
| | - Y. Yudkovski
- Israel Oceanographic and Limnological Research, Haifa, 31080 Israel
| | - V. Chalifa-Caspi
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105 Israel
| | - B. Herut
- Israel Oceanographic and Limnological Research, Haifa, 31080 Israel
| | - R. Ophir
- Weizmann Institute of Science, 71600 Rehovot, Israel
| | - R. Reinhardt
- Max Plank Institute-Molecular Genetics, 14195 Berlin-Dahlem, Germany
| | - P. M. Neumann
- The Technion-Israel Institute of Technology, Faculty of Civil and Environmental Engineering, Technion City, Haifa 32000 Israel
| | - M. Tom
- Israel Oceanographic and Limnological Research, Haifa, 31080 Israel
| |
Collapse
|
10
|
|
11
|
Koltai H, Weingarten-Baror C. Specificity of DNA microarray hybridization: characterization, effectors and approaches for data correction. Nucleic Acids Res 2008; 36:2395-405. [PMID: 18299281 PMCID: PMC2367720 DOI: 10.1093/nar/gkn087] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Microarray-hybridization specificity is one of the main effectors of microarray result quality. In the present review, we suggest a definition for specificity that spans four hybridization levels, from the single probe to the microarray platform. For increased hybridization specificity, it is important to quantify the extent of the specificity at each of these levels, and correct the data accordingly. We outline possible effects of low hybridization specificity on the obtained results and list possible effectors of hybridization specificity. In addition, we discuss several studies in which theoretical approaches, empirical means or data filtration were used to identify specificity effectors, and increase the specificity of the hybridization results. However, these various approaches may not yet provide an ultimate solution; rather, further tool development is needed to enhance microarray-hybridization specificity.
Collapse
Affiliation(s)
- Hinanit Koltai
- Department of Ornamental Horticulture, ARO Volcani Center, Bet Dagan, Israel.
| | | |
Collapse
|
12
|
Yudkovski Y, Shechter A, Chalifa-Caspi V, Auslander M, Ophir R, Dauphin-Villemant C, Waterman M, Sagi A, Tom M. Hepatopancreatic multi-transcript expression patterns in the crayfish Cherax quadricarinatus during the moult cycle. INSECT MOLECULAR BIOLOGY 2007; 16:661-674. [PMID: 18092996 DOI: 10.1111/j.1365-2583.2007.00762.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Alterations of hepatopancreatic multi-transcript expression patterns, related to induced moult cycle, were identified in male Cherax quadricarinatus through cDNA microarray hybridizations of hepatopancreatic transcript populations. Moult was induced by X-organ sinus gland extirpation or by repeated injections of 20-hydroxyecdysone. Manipulated males were sacrificed at premoult or early postmoult, and a reference population was sacrificed at intermoult. Differentially expressed genes among the four combinations of two induction methods and two moult stages were identified. Biologically interesting clusters revealing concurrently changing transcript expressions across treatments were selected, characterized by a general shift of expression throughout premoult and early postmoult vs. intermoult, or by different premoult vs. postmoult expressions. A number of genes were differentially expressed in 20-hydroxyecdysone-injected crayfish vs. X-organ sinus gland extirpated males.
Collapse
Affiliation(s)
- Y Yudkovski
- Israel Oceanographic and Limnological Research, Haifa, Israel
| | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Casneuf T, Van de Peer Y, Huber W. In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. BMC Bioinformatics 2007; 8:461. [PMID: 18039370 PMCID: PMC2213692 DOI: 10.1186/1471-2105-8-461] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 11/26/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray co-expression signatures are an important tool for studying gene function and relations between genes. In addition to genuine biological co-expression, correlated signals can result from technical deficiencies like hybridization of reporters with off-target transcripts. An approach that is able to distinguish these factors permits the detection of more biologically relevant co-expression signatures. RESULTS We demonstrate a positive relation between off-target reporter alignment strength and expression correlation in data from oligonucleotide genechips. Furthermore, we describe a method that allows the identification, from their expression data, of individual probe sets affected by off-target hybridization. CONCLUSION The effects of off-target hybridization on expression correlation coefficients can be substantial, and can be alleviated by more accurate mapping between microarray reporters and the target transcriptome. We recommend attention to the mapping for any microarray analysis of gene expression patterns.
Collapse
Affiliation(s)
- Tineke Casneuf
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium.
| | | | | |
Collapse
|
14
|
Bruland T, Anderssen E, Doseth B, Bergum H, Beisvag V, Laegreid A. Optimization of cDNA microarrays procedures using criteria that do not rely on external standards. BMC Genomics 2007; 8:377. [PMID: 17949480 PMCID: PMC2147032 DOI: 10.1186/1471-2164-8-377] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2007] [Accepted: 10/18/2007] [Indexed: 11/18/2022] Open
Abstract
Background The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. Results We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. Conclusion The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish.
Collapse
Affiliation(s)
- Torunn Bruland
- Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology (NTNU), N-7489 Trondheim, Norway.
| | | | | | | | | | | |
Collapse
|
15
|
Cohen R, Chalifa-Caspi V, Williams TD, Auslander M, George SG, Chipman JK, Tom M. Estimating the efficiency of fish cross-species cDNA microarray hybridization. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2007; 9:491-9. [PMID: 17514486 DOI: 10.1007/s10126-007-9010-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2007] [Revised: 03/06/2007] [Accepted: 03/09/2007] [Indexed: 04/12/2023]
Abstract
Using an available cross-species cDNA microarray is advantageous for examining multigene expression patterns in non-model organisms, saving the need for construction of species-specific arrays. The aim of the present study was to estimate relative efficiency of cross-species hybridizations across bony fishes, using bioinformatics tools. The methodology may serve also as a model for similar evaluations in other taxa. The theoretical evaluation was done by substituting comparative whole-transcriptome sequence similarity information into the thermodynamic hybridization equation. Complementary DNA sequence assemblages of nine fish species belonging to common families or suborders and distributed across the bony fish taxonomic branch were selected for transcriptome-wise comparisons. Actual cross-species hybridizations among fish of different taxonomic distances were used to validate and eventually to calibrate the theoretically computed relative efficiencies.
Collapse
Affiliation(s)
- Raphael Cohen
- National Institute for Biotechnology in Negev, Ben Gurion University of Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
16
|
Chen YA, Chou CC, Lu X, Slate EH, Peck K, Xu W, Voit EO, Almeida JS. A multivariate prediction model for microarray cross-hybridization. BMC Bioinformatics 2006; 7:101. [PMID: 16509965 PMCID: PMC1409802 DOI: 10.1186/1471-2105-7-101] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Accepted: 03/01/2006] [Indexed: 11/17/2022] Open
Abstract
Background Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. Results We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. Conclusion A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments.
Collapse
Affiliation(s)
- Yian A Chen
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - Cheng-Chung Chou
- Center for Genomic Medicine, National Taiwan University, Taipei, Taiwan
| | - Xinghua Lu
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - Elizabeth H Slate
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - Konan Peck
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Wenying Xu
- Key Laboratory of Molecular and Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, P. R China
| | - Eberhard O Voit
- Department of Biomedical Engineering, Georgia Tech, Atlanta, GA, USA
| | - Jonas S Almeida
- Department of Biostatistics and Applied Mathematics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
17
|
Abstract
The shotgun proteomic strategy based on digesting proteins into peptides and sequencing them using tandem mass spectrometry and automated database searching has become the method of choice for identifying proteins in most large scale studies. However, the peptide-centric nature of shotgun proteomics complicates the analysis and biological interpretation of the data especially in the case of higher eukaryote organisms. The same peptide sequence can be present in multiple different proteins or protein isoforms. Such shared peptides therefore can lead to ambiguities in determining the identities of sample proteins. In this article we illustrate the difficulties of interpreting shotgun proteomic data and discuss the need for common nomenclature and transparent informatic approaches. We also discuss related issues such as the state of protein sequence databases and their role in shotgun proteomic analysis, interpretation of relative peptide quantification data in the presence of multiple protein isoforms, the integration of proteomic and transcriptional data, and the development of a computational infrastructure for the integration of multiple diverse datasets.
Collapse
|
18
|
Abstract
One of the critical problems in the short oligo microarray technology is how to deal with cross-hybridization that produces spurious data. Little is known about the details of cross-hybridization effect at molecular level. Here, we report a free energy analysis of cross-hybridization on short oligo microarrays using data from a spike-in study. Our analysis revealed that cross-hybridization on the arrays is mostly caused by oligo fragments with a run of 10–16 nt complementary to the probes. Mismatches were estimated to be energetically much more costly in cross-hybridization than that in gene-specific hybridization, implying that the sources of cross-hybridization must be very different between a PM–MM probe pair. Consequently, it is unreliable to use MM probe signal to track cross-hybridizing signal on a corresponding PM probe. Our results also showed that the oligo fragments tend to bind to the 5′ ends of the probes, and are rarely seen at the 3′ ends. These results are useful for microarray design and data analysis.
Collapse
Affiliation(s)
| | - Roberto Carta
- Department of Statistic and Actuarial Sciences, University of Central FloridaOrlando, FL 32816–2370, USA
| | - Li Zhang
- To whom correspondence should be addressed. Tel: +1 713 563 4298; Fax: +1 713 563 4243;
| |
Collapse
|