Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Carvalho LE, Lawrence CE. Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci U S A 2008;105:3209-14. [PMID: 18305160 DOI: 10.1073/pnas.0712329105] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

For:	Carvalho LE, Lawrence CE. Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci U S A 2008;105:3209-14. [PMID: 18305160 DOI: 10.1073/pnas.0712329105] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Ma H, Wen H, Xue Z, Li G, Zhang Z. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites. PLoS Comput Biol 2022;18:e1010293. [PMID: 35819951 PMCID: PMC9275694 DOI: 10.1371/journal.pcbi.1010293] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 06/09/2022] [Indexed: 11/19/2022] Open

Abstract

RNA molecules can adopt stable secondary and tertiary structures, which are essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via coarse-grained molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28B. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs.

RNA binding proteins (RBPs) regulate every aspect of RNA biology, including splicing, translation, transportation, and degradation. High-throughput technologies such as eCLIP have identified thousands of binding sites for a given RBP throughout the genome. It has been shown by earlier studies that, in addition to nucleotide sequences, the structure and conformation of RNAs also play important role in RBP-RNA interactions. Analogous to protein-protein interactions or protein-DNA interactions, it is likely that there exist intrinsic sequence-structure motifs common to these RNAs that underlie their binding specificity to specific RBPs. It is known that RNAs form energetically favorable secondary structures, which can be represented as graphs, with nucleotides being nodes and backbone covalent bonds and base-pairing hydrogen bonds representing edges. We hypothesize that these graphs can be mined by graph theory approaches to identify sequence-structure motifs as enriched sub-graphs. In this article, we described the details of this approach, termed RNANetMotif and associated new concepts, namely EKS (Extended K-mer Subgraph) and GraphK graph algorithm. To test the utility of our approach, we conducted 3D structure modeling of selected RNA sequences through molecular dynamics (MD) folding simulation and evaluated the significance of the discovered RNA motifs by comparing their spatial exposure with other regions on the RNA. We believe that this approach has the novelty of treating the RNA sequence as a graph and RBP binding sites as enriched subgraph, which has broader applications beyond RBP-RNA interactions.

Collapse

Method to Determine the Centroid of Non-Homogeneous Polygons Based on Suspension Theory. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11040233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Yu B, Lu Y, Zhang QC, Hou L. Prediction and differential analysis of RNA secondary structure. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-020-0205-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Takahashi S, Sugimoto N. Stability prediction of canonical and non-canonical structures of nucleic acids in various molecular environments and cells. Chem Soc Rev 2020;49:8439-8468. [DOI: 10.1039/d0cs00594k] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Akiyama M, Sato K, Sakakibara Y. A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J Bioinform Comput Biol 2019;16:1840025. [PMID: 30616476 DOI: 10.1142/s0219720018400255] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Johnston I, Hancock T, Mamitsuka H, Carvalho L. Gene-proximity models for genome-wide association studies. Ann Appl Stat 2016. [DOI: 10.1214/16-aoas907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Pham LM, Carvalho L, Schaus S, Kolaczyk ED. Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach. J Am Stat Assoc 2016;111:73-92. [PMID: 27647944 DOI: 10.1080/01621459.2015.1110523] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Peng L, Carvalho L. Bayesian degree-corrected stochastic blockmodels for community detection. Electron J Stat 2016. [DOI: 10.1214/16-ejs1163] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Parks MM, Lawrence CE, Raphael BJ. Detecting non-allelic homologous recombination from high-throughput sequencing data. Genome Biol 2015;16:72. [PMID: 25886137 PMCID: PMC4425883 DOI: 10.1186/s13059-015-0633-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/16/2015] [Indexed: 12/27/2022] Open

Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J. Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 2015;16:108. [PMID: 25888064 PMCID: PMC4395974 DOI: 10.1186/s12859-015-0516-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 02/24/2015] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment.

RESULTS

In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased.

CONCLUSIONS

The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .

Collapse

Wang Z, He Q, Larget B, Newton MA. A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis. Ann Appl Stat 2015. [DOI: 10.1214/14-aoas777] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

RNA secondary structure prediction from multi-aligned sequences. Methods Mol Biol 2015;1269:17-38. [PMID: 25577370 DOI: 10.1007/978-1-4939-2291-8_2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Mori R, Hamada M, Asai K. Efficient calculation of exact probability distributions of integer features on RNA secondary structures. BMC Genomics 2014;15 Suppl 10:S6. [PMID: 25560710 PMCID: PMC4304215 DOI: 10.1186/1471-2164-15-s10-s6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Johnston I, Carvalho LE. A Bayesian hierarchical gene model on latent genotypes for genome-wide association studies. BMC Proc 2014;8:S45. [PMID: 25519327 PMCID: PMC4143727 DOI: 10.1186/1753-6561-8-s1-s45] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Ruggieri E, Lawrence CE. The Bayesian Change Point and Variable Selection Algorithm: Application to the δ18O Proxy Record of the Plio-Pleistocene. J Comput Graph Stat 2014. [DOI: 10.1080/10618600.2012.707852] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Patro R, Kingsford C. Predicting protein interactions via parsimonious network history inference. Bioinformatics 2013;29:i237-46. [PMID: 23812989 PMCID: PMC3694678 DOI: 10.1093/bioinformatics/btt224] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

Motivation: Reconstruction of the network-level evolutionary history of protein–protein interactions provides a principled way to relate interactions in several present-day networks. Here, we present a general framework for inferring such histories and demonstrate how it can be used to determine what interactions existed in the ancestral networks, which present-day interactions we might expect to exist based on evolutionary evidence and what information extant networks contain about the order of ancestral protein duplications.

Results: Our framework characterizes the space of likely parsimonious network histories. It results in a structure that can be used to find probabilities for a number of events associated with the histories. The framework is based on a directed hypergraph formulation of dynamic programming that we extend to enumerate many optimal and near-optimal solutions. The algorithm is applied to reconstructing ancestral interactions among bZIP transcription factors, imputing missing present-day interactions among the bZIPs and among proteins from five herpes viruses, and determining relative protein duplication order in the bZIP family. Our approach more accurately reconstructs ancestral interactions than existing approaches. In cross-validation tests, we find that our approach ranks the majority of the left-out present-day interactions among the top 2 and 17% of possible edges for the bZIP and herpes networks, respectively, making it a competitive approach for edge imputation. It also estimates relative bZIP protein duplication orders, using only interaction data and phylogenetic tree topology, which are significantly correlated with sequence-based estimates.

Availability: The algorithm is implemented in C++, is open source and is available at http://www.cs.cmu.edu/ckingsf/software/parana2.

Contact:robp@cs.cmu.edu or carlk@cs.cmu.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Carvalho L. Bayesian centroid estimation for motif discovery. PLoS One 2013;8:e80511. [PMID: 24324603 PMCID: PMC3855595 DOI: 10.1371/journal.pone.0080511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 10/03/2013] [Indexed: 11/29/2022] Open

Boca SM, Bravo HC, Caffo B, Leek JT, Parmigiani G. A decision-theory approach to interpretable set analysis for high-dimensional data. Biometrics 2013;69:614-23. [PMID: 23909925 DOI: 10.1111/biom.12060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 03/01/2013] [Accepted: 03/01/2013] [Indexed: 02/06/2023]

Hamada M. Fighting against uncertainty: an essential issue in bioinformatics. Brief Bioinform 2013;15:748-67. [PMID: 23803300 DOI: 10.1093/bib/bbt038] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hamada M. Direct updating of an RNA base-pairing probability matrix with marginal probability constraints. J Comput Biol 2013;19:1265-76. [PMID: 23210474 DOI: 10.1089/cmb.2012.0215] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y. DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition. ACTA ACUST UNITED AC 2012;28:3218-24. [PMID: 23060618 DOI: 10.1093/bioinformatics/bts612] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Ruggieri E, Lawrence CE. On efficient calculations for Bayesian variable selection. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.09.026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Lange SJ, Maticzka D, Möhl M, Gagnon JN, Brown CM, Backofen R. Global or local? Predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res 2012;40:5215-26. [PMID: 22373926 PMCID: PMC3384308 DOI: 10.1093/nar/gks181] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012;19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Okada Y, Saito Y, Sato K, Sakakibara Y. Improved measurements of RNA structure conservation with generalized centroid estimators. Front Genet 2012;2:54. [PMID: 22303350 PMCID: PMC3268607 DOI: 10.3389/fgene.2011.00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 08/08/2011] [Indexed: 11/13/2022] Open

A model-based analysis to infer the functional content of a gene list. Stat Appl Genet Mol Biol 2012;11:/j/sagmb.2012.11.issue-2/1544-6115.1716/1544-6115.1716.xml. [PMID: 22499692 DOI: 10.2202/1544-6115.1716] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

SATO KENGO, HAMADA MICHIAKI, MITUYAMA TOUTAI, ASAI KIYOSHI, SAKAKIBARA YASUBUMI. A NON-PARAMETRIC BAYESIAN APPROACH FOR PREDICTING RNA SECONDARY STRUCTURES. J Bioinform Comput Biol 2011. [DOI: 10.1142/s0219720010004926] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Sato K, Kato Y, Hamada M, Akutsu T, Asai K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. ACTA ACUST UNITED AC 2011;27:i85-93. [PMID: 21685106 PMCID: PMC3117384 DOI: 10.1093/bioinformatics/btr215] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Wei D, Alpert LV, Lawrence CE. RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences. ACTA ACUST UNITED AC 2011;27:2486-93. [PMID: 21788211 PMCID: PMC3167047 DOI: 10.1093/bioinformatics/btr421] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions.

RESULTS

Our analysis of three publically available datasets showed a substantial improvement in RNA structure prediction by RNAG over extant prediction methods. Additionally, our analysis of 17 RNA families showed that the RNAG sampled structures were generally compact around their ensemble centroids, and at least 11 families had at least two well-separated clusters of predicted structures. In general, the distance between a reference structure and our predicted structure was large relative to the variation among structures within an ensemble.

AVAILABILITY

The Perl implementation of the RNAG algorithm and the data necessary to reproduce the results described in Sections 3.1 and 3.2 are available at http://ccmbweb.ccv.brown.edu/rnag.html

CONTACT

charles_lawrence@brown.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Kiryu H, Terai G, Imamura O, Yoneyama H, Suzuki K, Asai K. A detailed investigation of accessibilities around target sites of siRNAs and miRNAs. ACTA ACUST UNITED AC 2011;27:1788-97. [PMID: 21531769 DOI: 10.1093/bioinformatics/btr276] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

MOTIVATION

The importance of RNA sequence analysis has been increasing since the discovery of various types of non-coding RNAs transcribed in animal cells. Conventional RNA sequence analyses have mainly focused on structured regions, which are stabilized by the stacking energies acting on adjacent base pairs. On the other hand, recent findings regarding the mechanisms of small interfering RNAs (siRNAs) and transcription regulation by microRNAs (miRNAs) indicate the importance of analyzing accessible regions where no base pairs exist. So far, relatively few studies have investigated the nature of such regions.

RESULTS

We have conducted a detailed investigation of accessibilities around the target sites of siRNAs and miRNAs. We have exhaustively calculated the correlations between the accessibilities around the target sites and the repression levels of the corresponding mRNAs. We have computed the accessibilities with an originally developed software package, called 'Raccess', which computes the accessibility of all the segments of a fixed length for a given RNA sequence when the maximal distance between base pairs is limited to a fixed size W. We show that the computed accessibilities are relatively insensitive to the choice of the maximal span W. We have found that the efficacy of siRNAs depends strongly on the accessibility of the very 3'-end of their binding sites, which might reflect a target site recognition mechanism in the RNA-induced silencing complex. We also show that the efficacy of miRNAs has a similar dependence on the accessibilities, but some miRNAs also show positive correlations between the efficacy and the accessibilities in broad regions downstream of their putative binding sites, which might imply that the downstream regions of the target sites are bound by other proteins that allow the miRNAs to implement their functions. We have also investigated the off-target effects of an siRNA as a potential RNAi therapeutic. We show that the off-target effects of the siRNA have similar correlations to the miRNA repression, indicating that they are caused by the same mechanism.

AVAILABILITY

The C++ source code of the Raccess software is available at http://www.ncrna.org/software/Raccess/ The microarray data on the measurements of the siRNA off-target effects are also available at the same site.

CONTACT

kiryu-h@k.u-tokyo.ac.jp

Collapse

Hamada M, Kiryu H, Iwasaki W, Asai K. Generalized centroid estimators in bioinformatics. PLoS One 2011;6:e16450. [PMID: 21365017 PMCID: PMC3041832 DOI: 10.1371/journal.pone.0016450] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 12/22/2010] [Indexed: 11/27/2022] Open

Clyde MA, Ghosh J, Littman ML. Bayesian Adaptive Sampling for Variable Selection and Model Averaging. J Comput Graph Stat 2011. [DOI: 10.1198/jcgs.2010.09049] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Hamada M, Sato K, Asai K. Prediction of RNA secondary structure by maximizing pseudo-expected accuracy. BMC Bioinformatics 2010;11:586. [PMID: 21118522 PMCID: PMC3003279 DOI: 10.1186/1471-2105-11-586] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 11/30/2010] [Indexed: 12/17/2022] Open

Sheridan P, Kamimura T, Shimodaira H. A scale-free structure prior for graphical models with applications in functional genomics. PLoS One 2010;5:e13580. [PMID: 21079769 PMCID: PMC2974640 DOI: 10.1371/journal.pone.0013580] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 09/28/2010] [Indexed: 11/18/2022] Open

Abstract

The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called structure prior—a probability distribution on networks, encoding a priori biological knowledge either in the form of supplemental data or high-level topological features. A key topological consideration is that a wide range of cellular networks are approximately scale-free, meaning that the fraction, , of nodes in a network with degree is roughly described by a power-law with exponent between and . The standard practice, however, is to utilize a random structure prior, which favors networks with binomially distributed degree distributions. In this paper, we introduce a scale-free structure prior for graphical models based on the formula for the probability of a network under a simple scale-free network model. Unlike the random structure prior, its scale-free counterpart requires a node labeling as a parameter. In order to use this prior for large-scale network inference, we design a novel Metropolis-Hastings sampler for graphical models that includes a node labeling as a state space variable. In a simulation study, we demonstrate that the scale-free structure prior outperforms the random structure prior at recovering scale-free networks while at the same time retains the ability to recover random networks. We then estimate a gene association network from gene expression data taken from a breast cancer tumor study, showing that scale-free structure prior recovers hubs, including the previously unknown hub SLC39A6, which is a zinc transporter that has been implicated with the spread of breast cancer to the lymph nodes. Our analysis of the breast cancer expression data underscores the value of the scale-free structure prior as an instrument to aid in the identification of candidate hub genes with the potential to direct the hypotheses of molecular biologists, and thus drive future experiments.

Collapse

Hamada M, Sato K, Asai K. Improving the accuracy of predicting secondary structure for aligned RNA sequences. Nucleic Acids Res 2010;39:393-402. [PMID: 20843778 PMCID: PMC3025558 DOI: 10.1093/nar/gkq792] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Iwasaki W, Takagi T. An intuitive, informative, and most balanced representation of phylogenetic topologies. Syst Biol 2010;59:584-93. [PMID: 20817714 PMCID: PMC2950835 DOI: 10.1093/sysbio/syq044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Stovgaard K, Andreetta C, Ferkinghoff-Borg J, Hamelryck T. Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics 2010;11:429. [PMID: 20718956 PMCID: PMC2931518 DOI: 10.1186/1471-2105-11-429] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 08/18/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS) is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference.

RESULTS

We present a method for the efficient calculation of accurate SAXS curves based on the Debye formula and a set of scattering form factors for dummy atom representations of amino acids. Such a method avoids the computationally costly iteration over all atoms. We estimated the form factors using generated data from a set of high quality protein structures. No ad hoc scaling or correction factors are applied in the calculation of the curves. Two coarse-grained representations of protein structure were investigated; two scattering bodies per amino acid led to significantly better results than a single scattering body.

CONCLUSION

We show that the obtained point estimates allow the calculation of accurate SAXS curves from coarse-grained protein models. The resulting curves are on par with the current state-of-the-art program CRYSOL, which requires full atomic detail. Our method was also comparable to CRYSOL in recognizing native structures among native-like decoys. As a proof-of-concept, we combined the coarse-grained Debye calculation with a previously described probabilistic model of protein structure, TorusDBN. This resulted in a significant improvement in the decoy recognition performance. In conclusion, the presented method shows great promise for use in statistical inference of protein structures from SAXS data.

Collapse

Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010;78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Frith MC, Hamada M, Horton P. Parameters for accurate genome alignment. BMC Bioinformatics 2010;11:80. [PMID: 20144198 PMCID: PMC2829014 DOI: 10.1186/1471-2105-11-80] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 02/09/2010] [Indexed: 11/25/2022] Open

Janssen S, Giegerich R. Faster computation of exact RNA shape probabilities. ACTA ACUST UNITED AC 2010;26:632-9. [PMID: 20080511 PMCID: PMC2828121 DOI: 10.1093/bioinformatics/btq014] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Distribution of distances between topologies and its effect on detection of phylogenetic recombination. ANN I STAT MATH 2009. [DOI: 10.1007/s10463-009-0259-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Hamada M, Sato K, Kiryu H, Mituyama T, Asai K. CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. Bioinformatics 2009;25:3236-43. [DOI: 10.1093/bioinformatics/btp580] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Hamada M, Sato K, Kiryu H, Mituyama T, Asai K. Predictions of RNA secondary structure by combining homologous sequence information. ACTA ACUST UNITED AC 2009;25:i330-8. [PMID: 19478007 PMCID: PMC2687982 DOI: 10.1093/bioinformatics/btp228] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Sato K, Hamada M, Asai K, Mituyama T. CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res 2009;37:W277-80. [PMID: 19435882 PMCID: PMC2703931 DOI: 10.1093/nar/gkp367] [Citation(s) in RCA: 198] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Tabei Y, Asai K. A local multiple alignment method for detection of non-coding RNA sequences. ACTA ACUST UNITED AC 2009;25:1498-505. [PMID: 19376823 DOI: 10.1093/bioinformatics/btp261] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

MOTIVATION

Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required.

RESULTS

We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs.

AVAILABILITY

The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships. J Virol 2009;83:6326-34. [PMID: 19369331 DOI: 10.1128/jvi.00251-09] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Newberg LA, Lawrence CE. Exact calculation of distributions on integers, with application to sequence alignment. J Comput Biol 2009;16:1-18. [PMID: 19119992 DOI: 10.1089/cmb.2008.0137] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Joshi A, De Smet R, Marchal K, Van de Peer Y, Michoel T. Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics 2009;25:490-6. [PMID: 19136553 DOI: 10.1093/bioinformatics/btn658] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Morita K, Saito Y, Sato K, Oka K, Hotta K, Sakakibara Y. Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans. Nucleic Acids Res 2009;37:999-1009. [PMID: 19129214 PMCID: PMC2647286 DOI: 10.1093/nar/gkn1054] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hamada M, Kiryu H, Sato K, Mituyama T, Asai K. Prediction of RNA secondary structure using generalized centroid estimators. ACTA ACUST UNITED AC 2008;25:465-73. [PMID: 19095700 DOI: 10.1093/bioinformatics/btn601] [Citation(s) in RCA: 157] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]