1
|
Qiao YP, Ren CL. Correlated Hybrid DNA Structures Explored by the oxDNA Model. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2024; 40:109-117. [PMID: 38154122 DOI: 10.1021/acs.langmuir.3c02231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Thermodynamically, perfect DNA hybridization can be formed between probes and their corresponding targets due to the favorable energy. However, this is not the case dynamically. Here, we use molecular dynamics (MD) simulations based on the oxDNA model to investigate the process of DNA microarray hybridization. In general, correlated hybrid DNA structures are formed, including one probe associated with several targets as well as one target hybrid with multiple probes leading to the target-mediated hybridization. The formation of these two types of correlated structures largely depends on the surface coverage of the DNA microarray. Moreover, DNA sequence, DNA length, and spacer length have an impact on the structural formation. Our findings shed light on the dynamics of DNA hybridization, which is important for the application of DNA microarray.
Collapse
Affiliation(s)
- Ye-Peng Qiao
- National Laboratory of Solid State Microstructures and Department of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China
| | - Chun-Lai Ren
- National Laboratory of Solid State Microstructures and Department of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China
| |
Collapse
|
2
|
Abstract
Hybridization between nucleic acid strands immobilized on a solid support with partners in solution is widely practiced in bioanalytical technologies and materials science. An important fundamental aspect of understanding these reactions is the role played by immobilization in the dynamics of duplex formation and disassembly. This report reviews and analyzes literature kinetic data to identify commonly observed trends and to correlate them with probable molecular mechanisms. The analysis reveals that while under certain conditions impacts from immobilization are minimal so that surface and solution hybridization kinetics are comparable, it is more typical to observe pronounced offsets between the two scenarios. In the forward (hybridization) direction, rates at the surface commonly decrease by one to two decades relative to solution, while in the reverse direction rates of strand separation at the surface can exceed those in solution by tens of decades. By recasting the deviations in terms of activation barriers, a consensus of how immobilization impacts nucleation, zipping, and strand separation can be conceived within the classical mechanism in which duplex formation is rate limited by preassembly of a nucleus a few base pairs in length, while dehybridization requires the cumulative breakup of base pairs along the length of a duplex. Evidence is considered for how excess interactions encountered on solid supports impact these processes.
Collapse
Affiliation(s)
- Eshan Treasurer
- Department of Chemical and Biomolecular Engineering, New York University Tandon School of Engineering, Brooklyn, New York 11201, United States
| | - Rastislav Levicky
- Department of Chemical and Biomolecular Engineering, New York University Tandon School of Engineering, Brooklyn, New York 11201, United States
| |
Collapse
|
3
|
Matveeva OV, Ogurtsov AY, Nazipova NN, Shabalina SA. Sequence characteristics define trade-offs between on-target and genome-wide off-target hybridization of oligoprobes. PLoS One 2018; 13:e0199162. [PMID: 29928000 PMCID: PMC6013149 DOI: 10.1371/journal.pone.0199162] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 06/02/2018] [Indexed: 12/20/2022] Open
Abstract
Off-target oligoprobe's interaction with partially complementary nucleotide sequences represents a problem for many bio-techniques. The goal of the study was to identify oligoprobe sequence characteristics that control the ratio between on-target and off-target hybridization. To understand the complex interplay between specific and genome-wide off-target (cross-hybridization) signals, we analyzed a database derived from genomic comparison hybridization experiments performed with an Affymetrix tiling array. The database included two types of probes with signals derived from (i) a combination of specific signal and cross-hybridization and (ii) genomic cross-hybridization only. All probes from the database were grouped into bins according to their sequence characteristics, where both hybridization signals were averaged separately. For selection of specific probes, we analyzed the following sequence characteristics: vulnerability to self-folding, nucleotide composition bias, numbers of G nucleotides and GGG-blocks, and occurrence of probe's k-mers in the human genome. Increases in bin ranges for these characteristics are simultaneously accompanied by a decrease in hybridization specificity-the ratio between specific and cross-hybridization signals. However, both averaged hybridization signals exhibit growing trends along with an increase of probes' binding energy, where the hybridization specific signal increases significantly faster in comparison to the cross-hybridization. The same trend is evident for the S function, which serves as a combined evaluation of probe binding energy and occurrence of probe's k-mers in the genome. Application of S allows extracting a larger number of specific probes, as compared to using only binding energy. Thus, we showed that high values of specific and cross-hybridization signals are not mutually exclusive for probes with high values of binding energy and S. In this study, the application of a new set of sequence characteristics allows detection of probes that are highly specific to their targets for array design and other bio-techniques that require selection of specific probes.
Collapse
Affiliation(s)
- Olga V. Matveeva
- Biopolymer Design LLC, Acton, Massachusetts, United States of America
- * E-mail: (OVM); (SAS)
| | - Aleksey Y. Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Nafisa N. Nazipova
- Institute of Mathematical Problems of Biology, RAS – the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Svetlana A. Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (OVM); (SAS)
| |
Collapse
|
4
|
Alnasir J, Shanahan HP. A Novel Method to Detect Bias in Short Read NGS Data. J Integr Bioinform 2017; 14:/j/jib.2017.14.issue-3/jib-2017-0025/jib-2017-0025.xml. [PMID: 28941355 PMCID: PMC6042817 DOI: 10.1515/jib-2017-0025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 08/10/2017] [Indexed: 11/15/2022] Open
Abstract
Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.
Collapse
|
5
|
Matveeva OV, Nechipurenko YD, Riabenko E, Ragan C, Nazipova NN, Ogurtsov AY, Shabalina SA. Optimization of signal-to-noise ratio for efficient microarray probe design. Bioinformatics 2017; 32:i552-i558. [PMID: 27587674 DOI: 10.1093/bioinformatics/btw451] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Target-specific hybridization depends on oligo-probe characteristics that improve hybridization specificity and minimize genome-wide cross-hybridization. Interplay between specific hybridization and genome-wide cross-hybridization has been insufficiently studied, despite its crucial role in efficient probe design and in data analysis. RESULTS In this study, we defined hybridization specificity as a ratio between oligo target-specific hybridization and oligo genome-wide cross-hybridization. A microarray database, derived from the Genomic Comparison Hybridization (GCH) experiment and performed using the Affymetrix platform, contains two different types of probes. The first type of oligo-probes does not have a specific target on the genome and their hybridization signals are derived from genome-wide cross-hybridization alone. The second type includes oligonucleotides that have a specific target on the genomic DNA and their signals are derived from specific and cross-hybridization components combined together in a total signal. A comparative analysis of hybridization specificity of oligo-probes, as well as their nucleotide sequences and thermodynamic features was performed on the database. The comparison has revealed that hybridization specificity was negatively affected by low stability of the fully-paired oligo-target duplex, stable probe self-folding, G-rich content, including GGG motifs, low sequence complexity and nucleotide composition symmetry. CONCLUSION Filtering out the probes with defined 'negative' characteristics significantly increases specific hybridization and dramatically decreasing genome-wide cross-hybridization. Selected oligo-probes have two times higher hybridization specificity on average, compared to the probes that were filtered from the analysis by applying suggested cutoff thresholds to the described parameters. A new approach for efficient oligo-probe design is described in our study. CONTACT shabalin@ncbi.nlm.nih.gov or olga.matveeva@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Olga V Matveeva
- Biopolymer Design LLC, Acton, MA 01721, USA Engelhardt Institute of Molecular Biology, Moscow 119991, Russia
| | | | - Evgeniy Riabenko
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Chikako Ragan
- Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072 Australia
| | - Nafisa N Nazipova
- Institute of Mathematical Problems of Biology, Pushchino, Moscow Region, 142290, Russia
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
6
|
Chiang HC, Levicky R. Effects of Chain-Chain Associations on Hybridization in DNA Brushes. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2016; 32:12603-12610. [PMID: 27934512 DOI: 10.1021/acs.langmuir.6b02990] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Hybridization of solution nucleic acids to DNA brushes is widely encountered in diagnostic and materials science applications. Typically, brush chain lengths of ten or more nucleotides are used to provide the needed sequence specificity and binding affinity. At these lengths, coincidental occurrence of complementary regions is expected to lead to associations between the nominally single-stranded brush chains due to intra- or interchain base pairing. This report investigates how these associations impact the brushes' hybridization activity toward complementary "target" sequences. Brushes were prepared from 20-mer chains with four-nucleotide-long "adhesive regions" through which neighboring chains could interact. The affinity and position of the adhesive region along the chain backbone were varied. DNA brushes were exposed to complementary solution targets, and the corresponding melting transitions were measured to estimate free energies of the brush-target hybridization. These results revealed that higher affinity adhesive regions more extensively suppressed brush hybridization relative to hybridization in solution. Associations near the middle of the chains were found to be more penalizing than those at the immobilized or the free end of the chains. Provided that the brush chains were close enough to associate, changes in brush density did not exert a significant effect on hybridization thermodynamics within the investigated coverage window. Comparison of the DNA brush results with those from commercial Affymetrix single-nucleotide-polymorphism (SNP) microarrays revealed agreement in the impact of chain associations on hybridization.
Collapse
Affiliation(s)
- Hao-Chun Chiang
- Department of Chemical and Biomolecular Engineering, New York University Tandon School of Engineering , 6 Metrotech Center, Brooklyn, New York 11201, United States
| | - Rastislav Levicky
- Department of Chemical and Biomolecular Engineering, New York University Tandon School of Engineering , 6 Metrotech Center, Brooklyn, New York 11201, United States
| |
Collapse
|
7
|
Homouz D, Chen G, Kudlicki AS. Correcting positional correlations in Affymetrix® genome chips. Sci Rep 2015; 5:9078. [PMID: 25767049 PMCID: PMC4649851 DOI: 10.1038/srep09078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 02/16/2015] [Indexed: 12/03/2022] Open
Abstract
We report and model a previously undescribed systematic error causing spurious excess correlations that depend on the distance between probes on Affymetrix® microarrays. The phenomenon affects pairs of features with large chip separations, up to over 100 probes apart. The effect may have a significant impact on analysis of correlations in large collections of expression data, where the systematic experimental errors are repeated in many data sets. Examples of such studies include analysis of functions and interactions in groups of genes, as well as global properties of genomes. We find that the average correlations between probes on Affymetrix microarrays are larger for smaller chip distances, which points out to a previously undescribed positional artifact. The magnitude of the artifact depends on the design of the chip, and we find it to be especially high for the yeast S98 microarray, where spurious excess correlations reach 0.1 at a distance of 50 probes. We have designed an algorithm to correct this bias and provide new data sets with the corrected expression values. This algorithm was successfully implemented to remove the positional artifact from the S98 chip data while preserving the integrity of the data.
Collapse
Affiliation(s)
- Dirar Homouz
- Khalifa University of Science, Technology and Research, Abu Dhabi, UAE
| | | | - Andrzej S Kudlicki
- 1] Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA [2] Institute for Translational Sciences, University of Texas Medical Branch, Galveston, TX, USA [3] Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, TX, USA
| |
Collapse
|
8
|
Shanahan HP, Owen AM, Harrison AP. Bioinformatics on the cloud computing platform Azure. PLoS One 2014; 9:e102642. [PMID: 25050811 PMCID: PMC4106841 DOI: 10.1371/journal.pone.0102642] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Accepted: 06/20/2014] [Indexed: 12/27/2022] Open
Abstract
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.
Collapse
Affiliation(s)
- Hugh P. Shanahan
- Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
- * E-mail:
| | - Anne M. Owen
- Department of Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
| | - Andrew P. Harrison
- Department of Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
- Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
| |
Collapse
|
9
|
Sources of high variance between probe signals in Affymetrix short oligonucleotide microarrays. SENSORS 2013; 14:532-48. [PMID: 24385030 PMCID: PMC3926573 DOI: 10.3390/s140100532] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 12/16/2013] [Accepted: 12/24/2013] [Indexed: 01/21/2023]
Abstract
High density oligonucleotide microarrays present a big challenge for statistical data processing methods which aim to separate changes induced by experimental factors from those caused by artifacts and measurement inaccuracies. Despite huge advances in the field of microarray probe design methods, the signal variation between probes that target a single transcript is substantially larger than their between-replicate array variability, suggesting a large influence of various probe-specific effects that introduce bias to the data. In this work we present the influence of probe-related design variations on the expression intensities of individual probes, focusing on five potential sources of high probe signal variance: the GC composition of the probe, the distance between individual probe target sites, G-quadruplex formation in the probe sequence, the occurrence of sequence motifs complementary to the oligo(dT) primer, and the specificity of unrecognized alternative splicing probeset assignment. By focusing on two high quality microarray datasets based on two distinct array designs we show the extent of variance between probes that target a specific transcript providing guidelines for the future design of microarrays and data processing methods.
Collapse
|
10
|
Harrison A, Binder H, Buhot A, Burden CJ, Carlon E, Gibas C, Gamble LJ, Halperin A, Hooyberghs J, Kreil DP, Levicky R, Noble PA, Ott A, Pettitt BM, Tautz D, Pozhitkov AE. Physico-chemical foundations underpinning microarray and next-generation sequencing experiments. Nucleic Acids Res 2013; 41:2779-96. [PMID: 23307556 PMCID: PMC3597649 DOI: 10.1093/nar/gks1358] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized.
Collapse
Affiliation(s)
- Andrew Harrison
- University of Essex-Mathematical Sciences, Colchester CO4 3SQ, Essex, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Upton GJG, Harrison AP. Motif effects in Affymetrix GeneChips seriously affect probe intensities. Nucleic Acids Res 2012; 40:9705-16. [PMID: 22904084 PMCID: PMC3479185 DOI: 10.1093/nar/gks717] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
An Affymetrix GeneChip consists of an array of hundreds of thousands of probes (each a sequence of 25 bases) with the probe values being used to infer the extent to which genes are expressed in the biological material under investigation. In this article, we demonstrate that these probe values are also strongly influenced by their precise base sequence. We use data from >28 000 CEL files relating to 10 different Affymetrix GeneChip platforms and involving nearly 1000 experiments. Our results confirm known effects (those due to the T7-primer and the formation of G-quadruplexes) but reveal other effects. We show that there can be huge variations from one experiment to another, and that there may also be sizeable disparities between batches within an experiment and between CEL files within a batch.
Collapse
Affiliation(s)
- Graham J G Upton
- Department of Mathematical Sciences and School of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ, UK.
| | | |
Collapse
|
12
|
Ge D, Wang X, Williams K, Levicky R. Thermostable DNA immobilization and temperature effects on surface hybridization. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2012; 28:8446-8455. [PMID: 22578171 PMCID: PMC3368703 DOI: 10.1021/la301165a] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Monolayer films of nucleic acids on solid supports are encountered in a range of diagnostic and bioanalytical applications. These applications often rely on elevated temperatures to improve performance; moreover, studies at elevated temperatures can provide fundamental information on layer organization and functionality. To support such applications, this study compares thermostability of oligonucleotide monolayers immobilized to gold by first coating the gold with a nanometer-thick film (an "anchor layer") of a polymercaptosiloxane, to which DNA oligonucleotides are subsequently tethered through maleimide-thiol conjugation, with thermostability of monolayers formed via widely used attachment through a terminal thiol moiety on the DNA. The temperature range covered is from 25 to 90 °C. After confirming stability of immobilization and, more importantly, retention of hybridization activity even under the harshest conditions investigated, these thermostable films are used to demonstrate measurements of (1) reversible surface melting transitions and (2) temperature dependence of competitive hybridization, when fully matched and mismatched sequences compete for binding to immobilized DNA oligonucleotides. The competitive hybridization experiments reveal a pronounced impact of temperature on rates of approach to equilibrium, with kinetic freezing into nonequilibrium states close to room temperature and rapid approach to equilibrium at elevated temperatures. Modeling of competitive surface hybridization equilibria using thermodynamic parameters derived from surface melting transitions of the individual sequences is also discussed.
Collapse
|
13
|
Shanahan HP, Memon FN, Upton GJG, Harrison AP. Normalized Affymetrix expression data are biased by G-quadruplex formation. Nucleic Acids Res 2011; 40:3307-15. [PMID: 22199258 PMCID: PMC3333884 DOI: 10.1093/nar/gkr1230] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.
Collapse
Affiliation(s)
- Hugh P Shanahan
- Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | | |
Collapse
|
14
|
Hafemeister C, Krause R, Schliep A. Selecting oligonucleotide probes for whole-genome tiling arrays with a cross-hybridization potential. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1642-1652. [PMID: 21358006 DOI: 10.1109/tcbb.2011.39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization.
Collapse
Affiliation(s)
- Christoph Hafemeister
- Department of Biology, New York University, 100 Washington Square East, Rm 1009, New York, NY 10003-6688, USA.
| | | | | |
Collapse
|
15
|
Gharaibeh RZ, Fodor AA, Gibas CJ. Accurate estimates of microarray target concentration from a simple sequence-independent Langmuir model. PLoS One 2010; 5:e14464. [PMID: 21209932 PMCID: PMC3012684 DOI: 10.1371/journal.pone.0014464] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 12/01/2010] [Indexed: 11/18/2022] Open
Abstract
Background Microarray technology is a commonly used tool for assessing global gene expression. Many models for estimation of target concentration based on observed microarray signal have been proposed, but, in general, these models have been complex and platform-dependent. Principal Findings We introduce a universal Langmuir model for estimation of absolute target concentration from microarray experiments. We find that this sequence-independent model, characterized by only three free parameters, yields excellent predictions for four microarray platforms, including Affymetrix, Agilent, Illumina and a custom-printed microarray. The model also accurately predicts concentration for the MAQC data sets. This approach significantly reduces the computational complexity of quantitative target concentration estimates. Conclusions Using a simple form of the Langmuir isotherm model, with a minimum of parameters and assumptions, and without explicit modeling of individual probe properties, we were able to recover absolute transcript concentrations with high R2 on four different array platforms. The results obtained here suggest that with a “spiked-in” concentration series targeting as few as 5–10 genes, reliable estimation of target concentration can be achieved for the entire microarray.
Collapse
Affiliation(s)
- Raad Z. Gharaibeh
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Anthony A. Fodor
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Cynthia J. Gibas
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
16
|
Irving D, Gong P, Levicky R. DNA surface hybridization: comparison of theory and experiment. J Phys Chem B 2010; 114:7631-40. [PMID: 20469913 DOI: 10.1021/jp100860z] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The design and interpretation of surface hybridization assays is complicated by poorly understood aspects of the interfacial environment that cause both kinetic and thermodynamic behaviors to deviate from those in solution. The origins of these differences lie in the additional interactions experienced by hybridizing strands at the surface. In this report, an analysis of surface hybridization equilibria is provided for end-tethered, single-stranded oligonucleotide "probes" hybridizing with similarly sized, single-stranded solution "target" molecules. Theoretical models by Vainrub and Pettitt (Phys. Rev. E 2002, 66, 041905) and by Halperin, Buhot, and Zhulina (Biophys. J. 2004, 86, 718), and an "extended" model that in addition includes a solution-like salt dependence of probe-target dimerization, are compared to experiments as a function of salt concentration and probe coverage. Good agreement with experiment is observed when the DNA volume fraction at the surface remains below approximately 0.25. None of the models, however, can account for strong suppression of hybridization when the volume fraction of DNA approaches 0.3, realizable in the limit of high buffer strength and densely tethered films. Under these conditions, hybridization yields become insensitive to increases in analyte concentration even though many probes remain available to bind targets. These observations are attributed to the onset of packing constraints which, interestingly, become limiting significantly below maximum DNA coverages estimated from ideally efficient hexagonal packing. By delineating conditions under which specific hybridization behaviors are observed, the results advance fundamental knowledge in support of DNA microarray and biosensor applications.
Collapse
Affiliation(s)
- Damion Irving
- Department of Chemical & Biological Engineering, Polytechnic Institute of New York University, Brooklyn, New York 11201, USA
| | | | | |
Collapse
|
17
|
Memon FN, Upton GJG, Harrison AP. A Comparative Study of the Impact of G-Stack Probes on Various Affymetrix GeneChips of Mammalia. J Nucleic Acids 2010; 2010:489736. [PMID: 20725627 PMCID: PMC2915844 DOI: 10.4061/2010/489736] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Revised: 02/11/2010] [Accepted: 04/22/2010] [Indexed: 11/20/2022] Open
Abstract
We have previously discovered that probes containing runs of four or more contiguous guanines are not reliable for measuring gene expression in the Human HG_U133A Affymetrix GeneChip data. These probes are not correlated with other members of their probe set, but they are correlated with each other. We now extend our analysis to different 3' GeneChip designs of mouse, rat, and human. We find that, in all these chip designs, the G-stack probes (probes with a run of exactly four consecutive guanines) are correlated highly with each other, indicating that such probes are not reliable measures of gene expression in mammalian studies. Furthermore, there is no specific position of G-stack where the correlation is highest in all the chips. We also find that the latest designs of rat and mouse chips have significantly fewer G-stack probes compared to their predecessors, whereas there has not been a similar reduction in G-stack density across the changes in human chips. Moreover, we find significant changes in RMA values (after removing G-stack probes) as the number of G-stack probes increases.
Collapse
Affiliation(s)
- Farhat Naureen Memon
- Departments of Mathematical Sciences and Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
| | - Graham J. G. Upton
- Departments of Mathematical Sciences and Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
| | - Andrew P. Harrison
- Departments of Mathematical Sciences and Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
| |
Collapse
|
18
|
Ortiz-Estevez M, Bengtsson H, Rubio A. ACNE: a summarization method to estimate allele-specific copy numbers for Affymetrix SNP arrays. ACTA ACUST UNITED AC 2010; 26:1827-33. [PMID: 20529889 DOI: 10.1093/bioinformatics/btq300] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
MOTIVATION Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs. RESULTS The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs. AVAILABILITY The method is available in the open-source R package ACNE, which also includes an add on to the aroma.affymetrix framework (http://www.aroma-project.org/).
Collapse
Affiliation(s)
- Maria Ortiz-Estevez
- Group of Bioinformatics, CEIT and TECNUN, University of Navarra, San Sebastian, Spain
| | | | | |
Collapse
|
19
|
Fasold M, Stadler PF, Binder H. G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration. BMC Bioinformatics 2010; 11:207. [PMID: 20423484 PMCID: PMC2884167 DOI: 10.1186/1471-2105-11-207] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 04/27/2010] [Indexed: 02/02/2023] Open
Abstract
Background The brightness of the probe spots on expression microarrays intends to measure the abundance of specific mRNA targets. Probes with runs of at least three guanines (G) in their sequence show abnormal high intensities which reflect rather probe effects than target concentrations. This G-bias requires correction prior to downstream expression analysis. Results Longer runs of three or more consecutive G along the probe sequence and in particular triple degenerated G at its solution end ((GGG)1-effect) are associated with exceptionally large probe intensities on GeneChip expression arrays. This intensity bias is related to non-specific hybridization and affects both perfect match and mismatch probes. The (GGG)1-effect tends to increase gradually for microarrays of later GeneChip generations. It was found for DNA/RNA as well as for DNA/DNA probe/target-hybridization chemistries. Amplification of sample RNA using T7-primers is associated with strong positive amplitudes of the G-bias whereas alternative amplification protocols using random primers give rise to much smaller and partly even negative amplitudes. We applied positional dependent sensitivity models to analyze the specifics of probe intensities in the context of all possible short sequence motifs of one to four adjacent nucleotides along the 25meric probe sequence. Most of the longer motifs are adequately described using a nearest-neighbor (NN) model. In contrast, runs of degenerated guanines require explicit consideration of next nearest neighbors (GGG terms). Preprocessing methods such as vsn, RMA, dChip, MAS5 and gcRMA only insufficiently remove the G-bias from data. Conclusions Positional and motif dependent sensitivity models accounts for sequence effects of oligonucleotide probe intensities. We propose a positional dependent NN+GGG hybrid model to correct the intensity bias associated with probes containing poly-G motifs. It is implemented as a single-chip based calibration algorithm for GeneChips which can be applied in a pre-correction step prior to standard preprocessing.
Collapse
Affiliation(s)
- Mario Fasold
- Interdisciplinary Centre for Bioinformatics, University Leipzig, Germany
| | | | | |
Collapse
|
20
|
Upton GJG, Sanchez-Graillet O, Rowsell J, Arteaga-Salas JM, Graham NS, Stalteri MA, Memon FN, May ST, Harrison AP. On the causes of outliers in Affymetrix GeneChip data. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:199-212. [PMID: 19734302 DOI: 10.1093/bfgp/elp027] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We describe various types of outliers seen in Affymetrix GeneChip data. We have been able to utilise the data in the Gene Expression Omnibus to screen GeneChips across a range of scales, from single probes, to spatially adjacent fractions of arrays, to whole arrays, to whole experiments. In this review we describe a number of causes for why some reported intensities might be misleading on GeneChips.
Collapse
Affiliation(s)
- Graham J G Upton
- University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Binder H, Fasold M, Glomb T. Mismatch and G-stack modulated probe signals on SNP microarrays. PLoS One 2009; 4:e7862. [PMID: 19924253 PMCID: PMC2775684 DOI: 10.1371/journal.pone.0007862] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 10/19/2009] [Indexed: 11/24/2022] Open
Abstract
Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested.
Collapse
Affiliation(s)
- Hans Binder
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Leipzig, Germany.
| | | | | |
Collapse
|