1
|
Fraga KJ, Huang YJ, Ramelot TA, Swapna GVT, Lashawn Anak Kendary A, Li E, Korf I, Montelione GT. SpecDB: A relational database for archiving biomolecular NMR spectral data. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2022; 342:107268. [PMID: 35930941 PMCID: PMC9922030 DOI: 10.1016/j.jmr.2022.107268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 06/16/2022] [Accepted: 07/06/2022] [Indexed: 05/11/2023]
Abstract
NMR is a valuable experimental tool in the structural biologist's toolkit to elucidate the structures, functions, and motions of biomolecules. The progress of machine learning, particularly in structural biology, reveals the critical importance of large, diverse, and reliable datasets in developing new methods and understanding in structural biology and science more broadly. Biomolecular NMR research groups produce large amounts of data, and there is renewed interest in organizing these data to train new, sophisticated machine learning architectures and to improve biomolecular NMR analysis pipelines. The foundational data type in NMR is the free-induction decay (FID). There are opportunities to build sophisticated machine learning methods to tackle long-standing problems in NMR data processing, resonance assignment, dynamics analysis, and structure determination using NMR FIDs. Our goal in this study is to provide a lightweight, broadly available tool for archiving FID data as it is generated at the spectrometer, and grow a new resource of FID data and associated metadata. This study presents a relational schema for storing and organizing the metadata items that describe an NMR sample and FID data, which we call Spectral Database (SpecDB). SpecDB is implemented in SQLite and includes a Python software library providing a command-line application to create, organize, query, backup, share, and maintain the database. This set of software tools and database schema allow users to store, organize, share, and learn from NMR time domain data. SpecDB is freely available under an open source license at https://github.rpi.edu/RPIBioinformatics/SpecDB.
Collapse
Affiliation(s)
- Keith J Fraga
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Yuanpeng J Huang
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - G V T Swapna
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA; Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA.
| | | | - Ethan Li
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Ian Korf
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| |
Collapse
|
2
|
Su Y, Sarell CJ, Eddy MT, Debelouchina GT, Andreas LB, Pashley CL, Radford SE, Griffin RG. Secondary structure in the core of amyloid fibrils formed from human β₂m and its truncated variant ΔN6. J Am Chem Soc 2014; 136:6313-25. [PMID: 24679070 PMCID: PMC4017606 DOI: 10.1021/ja4126092] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
![]()
Amyloid
fibrils formed from initially soluble proteins with diverse
sequences are associated with an array of human diseases. In the human
disorder, dialysis-related amyloidosis (DRA), fibrils contain two
major constituents, full-length human β2-microglobulin
(hβ2m) and a truncation variant, ΔN6 which
lacks the N-terminal six amino acids. These fibrils are assembled
from initially natively folded proteins with an all antiparallel β-stranded
structure. Here, backbone conformations of wild-type hβ2m and ΔN6 in their amyloid forms have been determined
using a combination of dilute isotopic labeling strategies and multidimensional
magic angle spinning (MAS) NMR techniques at high magnetic fields,
providing valuable structural information at the atomic-level about
the fibril architecture. The secondary structures of both fibril types,
determined by the assignment of ∼80% of the backbone resonances
of these 100- and 94-residue proteins, respectively, reveal substantial
backbone rearrangement compared with the location of β-strands
in their native immunoglobulin folds. The identification of seven
β-strands in hβ2m fibrils indicates that approximately
70 residues are in a β-strand conformation in the fibril core.
By contrast, nine β-strands comprise the fibrils formed from
ΔN6, indicating a more extensive core. The precise location
and length of β-strands in the two fibril forms also differ.
The results indicate fibrils of ΔN6 and hβ2m have an extensive core architecture involving the majority of residues
in the polypeptide sequence. The common elements of the backbone structure
of the two proteins likely facilitates their ability to copolymerize
during amyloid fibril assembly.
Collapse
Affiliation(s)
- Yongchao Su
- Department of Chemistry and Francis Bitter Magnet Laboratory, Massachusetts Institute of Technology Cambridge, Massachusetts 02139, United States
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Tejero R, Snyder D, Mao B, Aramini JM, Montelione GT. PDBStat: a universal restraint converter and restraint analysis software package for protein NMR. JOURNAL OF BIOMOLECULAR NMR 2013; 56:337-51. [PMID: 23897031 PMCID: PMC3932191 DOI: 10.1007/s10858-013-9753-7] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 06/11/2013] [Indexed: 05/20/2023]
Abstract
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
Collapse
Affiliation(s)
- Roberto Tejero
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, and Northeast Structural Genomics Consortium, 679 Hoes Lane, Piscataway, New Jersey, 08854, USA
- Departamento de Quίmica Fίsica, Universidad de Valencia, Avenida Dr. Moliner 50 46100 Burjassot, Valencia, SPAIN
| | - David Snyder
- Department of Chemistry, William Paterson University, 300 Pompton Road Wayne, New Jersey 07470, USA
| | - Binchen Mao
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, and Northeast Structural Genomics Consortium, 679 Hoes Lane, Piscataway, New Jersey, 08854, USA
| | - James M. Aramini
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, and Northeast Structural Genomics Consortium, 679 Hoes Lane, Piscataway, New Jersey, 08854, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, and Northeast Structural Genomics Consortium, 679 Hoes Lane, Piscataway, New Jersey, 08854, USA
- To whom correspondence should be addressed: Prof. Gaetano T. Montelione CABM, Rutgers University 679 Hoes Lane Piscataway, NJ 08854-5638 Phone: 732-235-5321
| |
Collapse
|
4
|
Kobayashi N, Harano Y, Tochio N, Nakatani E, Kigawa T, Yokoyama S, Mading S, Ulrich EL, Markley JL, Akutsu H, Fujiwara T. An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database. JOURNAL OF BIOMOLECULAR NMR 2012; 53:311-320. [PMID: 22689068 PMCID: PMC4308039 DOI: 10.1007/s10858-012-9641-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 05/23/2012] [Indexed: 05/30/2023]
Abstract
Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including (1)H, (13)C and (15)N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001-2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.
Collapse
Affiliation(s)
- Naohiro Kobayashi
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, 565-0871 Osaka, Japan.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Gifford LK, Carter LG, Gabanyi MJ, Berman HM, Adams PD. The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:57-62. [PMID: 22527514 PMCID: PMC3588887 DOI: 10.1007/s10969-012-9133-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 03/05/2012] [Indexed: 02/01/2023]
Abstract
The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.
Collapse
Affiliation(s)
- Lida K. Gifford
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
| | | | - Margaret J. Gabanyi
- Department of Chemistry & Chemical Biology, Rutgers – The State University of New Jersey, Piscataway, NJ 08854
| | - Helen M. Berman
- Department of Chemistry & Chemical Biology, Rutgers – The State University of New Jersey, Piscataway, NJ 08854
| | - Paul D. Adams
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
| |
Collapse
|
6
|
Farrell D, O'Meara F, Johnston M, Bradley J, Søndergaard CR, Georgi N, Webb H, Tynan-Connolly BM, Bjarnadottir U, Carstensen T, Nielsen JE. Capturing, sharing and analysing biophysical data from protein engineering and protein characterization studies. Nucleic Acids Res 2010; 38:e186. [PMID: 20724439 PMCID: PMC2978379 DOI: 10.1093/nar/gkq726] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Large amounts of data are being generated annually on the connection between the sequence, structure and function of proteins using site-directed mutagenesis, protein design and directed evolution techniques. These data provide the fundamental building blocks for our understanding of protein function, molecular biology and living organisms in general. However, much experimental data are never deposited in databases and is thus 'lost' in journal publications or in PhD theses. At the same time theoretical scientists are in need of large amounts of experimental data for benchmarking and calibrating novel predictive algorithms, and theoretical progress is therefore often hampered by the lack of suitable data to validate or disprove a theoretical assumption. We present PEAT (Protein Engineering Analysis Tool), an application that integrates data deposition, storage and analysis for researchers carrying out protein engineering projects or biophysical characterization of proteins. PEAT contains modules for DNA sequence manipulation, primer design, fitting of biophysical characterization data (enzyme kinetics, circular dichroism spectroscopy, NMR titration data, etc.), and facilitates sharing of experimental data and analyses for a typical university-based research group. PEAT is freely available to academic researchers at http://enzyme.ucd.ie/PEAT.
Collapse
Affiliation(s)
- Damien Farrell
- Centre for Synthesis and Chemical Biology, School of Biomolecular and Biomedical Science, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Tramesel D, Catherinot V, Delsuc MA. Modeling of NMR processing, toward efficient unattended processing of NMR experiments. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2007; 188:56-67. [PMID: 17616410 DOI: 10.1016/j.jmr.2007.05.023] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2006] [Revised: 04/27/2007] [Accepted: 05/09/2007] [Indexed: 05/16/2023]
Abstract
Many alternative processing techniques have recently been proposed in the literature. Most of these techniques rely on specific acquisition protocols as well as on specific data processing techniques, the need for an efficient versatile and expandable NMR processing tool would be a particularly timely addition to the modern NMR spectroscopy laboratory. The work presented here consists in a modeling of the various possible NMR data processing approaches. This modeling presents a common working frame for most of the modern acquisition/processing protocols. Two different data modeling approaches are presented, strong modeling and weak modeling, depending whether the system under study or the measurement is modeled. The emphasis is placed on the weak modeling approach. This modeling is implemented in a computer program developed in python and called NPK standing (standing for NMR Processing Kernel), organized in four logical layers (i) mathematical kernel; (ii) elementary actions; (iii) processing phases; (iv) processing strategies. This organisation, along with default values for most processing parameters allows the use of the program in an unattended manner, producing close to optimal spectra. Examples are shown for 1D and 2D processing, and liquid and solid NMR spectroscopy. NPK is available from the site: http://abcis.cbs.cnrs.fr/NPK.
Collapse
Affiliation(s)
- Dominique Tramesel
- Centre de Biochimie Structurale, 29 rue de Navacelles, CNRS UMR5048, INSERM U554, Université Montpellier 1 & 2, F34090 Montpellier, France
| | | | | |
Collapse
|
8
|
Yin C, Khan JA, Swapna GVT, Ertekin A, Krug RM, Tong L, Montelione GT. Conserved surface features form the double-stranded RNA binding site of non-structural protein 1 (NS1) from influenza A and B viruses. J Biol Chem 2007; 282:20584-92. [PMID: 17475623 DOI: 10.1074/jbc.m611619200] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Influenza A viruses cause a highly contagious respiratory disease in humans and are responsible for periodic widespread epidemics with high mortality rates. The influenza A virus NS1 protein (NS1A) plays a key role in countering host antiviral defense and in virulence. The 73-residue N-terminal domain of NS1A (NS1A-(1-73)) forms a symmetric homodimer with a unique six-helical chain fold. It binds canonical A-form double-stranded RNA (dsRNA). Mutational inactivation of this dsRNA binding activity of NS1A highly attenuates virus replication. Here, we have characterized the unique structural features of the dsRNA binding surface of NS1A-(1-73) using NMR methods and describe the 2.1-A x-ray crystal structure of the corresponding dsRNA binding domain from human influenza B virus NS1B-(15-93). These results identify conserved dsRNA binding surfaces on both NS1A-(1-73) and NS1B-(15-93) that are very different from those indicated in earlier "working models" of the complex between dsRNA and NS1A-(1-73). The combined NMR and crystallographic data reveal highly conserved surface tracks of basic and hydrophilic residues that interact with dsRNA. These tracks are structurally complementary to the polyphosphate backbone conformation of A-form dsRNA and run at an approximately 45 degrees angle relative to the axes of helices alpha2/alpha2'. At the center of this dsRNA binding epitope, and common to NS1 proteins from influenza A and B viruses, is a deep pocket that includes both hydrophilic and hydrophobic amino acids. This pocket provides a target on the surface of the NS1 protein that is potentially suitable for the development of antiviral drugs targeting both influenza A and B viruses.
Collapse
MESH Headings
- Crystallography, X-Ray
- Dimerization
- Humans
- Influenza A virus/chemistry
- Influenza A virus/metabolism
- Influenza A virus/pathogenicity
- Influenza B virus/chemistry
- Influenza B virus/metabolism
- Influenza B virus/pathogenicity
- Influenza, Human/metabolism
- Influenza, Human/mortality
- Nuclear Magnetic Resonance, Biomolecular
- Nucleic Acid Conformation
- Protein Binding
- Protein Folding
- Protein Structure, Quaternary
- Protein Structure, Secondary
- RNA, Double-Stranded/chemistry
- RNA, Double-Stranded/metabolism
- RNA, Viral/chemistry
- RNA, Viral/metabolism
- Viral Nonstructural Proteins/chemistry
- Viral Nonstructural Proteins/metabolism
Collapse
Affiliation(s)
- Cuifeng Yin
- Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ 08854, USA
| | | | | | | | | | | | | |
Collapse
|