1
|
ElasticBLAST: accelerating sequence search via cloud computing. BMC Bioinformatics 2023; 24:117. [PMID: 36967390 PMCID: PMC10040096 DOI: 10.1186/s12859-023-05245-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/21/2023] [Indexed: 03/28/2023] Open
Abstract
BACKGROUND Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. RESULTS We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. CONCLUSION We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.
Collapse
|
2
|
ElasticBLAST: Accelerating Sequence Search via Cloud Computing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522777. [PMID: 36789435 PMCID: PMC9928022 DOI: 10.1101/2023.01.04.522777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. Results We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. Conclusion We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.
Collapse
|
3
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2021; 49:D10-D17. [PMID: 33095870 PMCID: PMC7778943 DOI: 10.1093/nar/gkaa892] [Citation(s) in RCA: 410] [Impact Index Per Article: 136.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/25/2020] [Accepted: 10/08/2020] [Indexed: 11/14/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface and NCBI datasets. Additional resources that were updated in the past year include PMC, Bookshelf, Genome Data Viewer, SRA, ClinVar, dbSNP, dbVar, Pathogen Detection, BLAST, Primer-BLAST, IgBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
|
4
|
Reply to the paper: Misunderstood parameters of NCBI BLAST impacts the correctness of bioinformatics workflows. Bioinformatics 2020; 35:2699-2700. [PMID: 30590429 DOI: 10.1093/bioinformatics/bty1026] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 12/19/2018] [Indexed: 11/14/2022] Open
|
5
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2020; 48:D9-D16. [PMID: 31602479 DOI: 10.1093/nar/gkz899] [Citation(s) in RCA: 267] [Impact Index Per Article: 66.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 10/09/2019] [Indexed: 11/14/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
6
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2020; 47:D23-D28. [PMID: 30395293 PMCID: PMC6323993 DOI: 10.1093/nar/gky1069] [Citation(s) in RCA: 331] [Impact Index Per Article: 82.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 10/18/2018] [Indexed: 11/16/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
7
|
Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinformatics 2019; 20:405. [PMID: 31345161 PMCID: PMC6659269 DOI: 10.1186/s12859-019-2996-x] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 07/16/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. RESULTS Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. CONCLUSIONS We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.
Collapse
|
8
|
Abstract
The variable domain of an immunoglobulin (IG) sequence is encoded by multiple genes, including the variable (V) gene, the diversity (D) gene and the joining (J) gene. Analysis of IG sequences typically requires identification of each gene, as well as a comparison of sequence variations in the context of defined regions. General purpose tools, such as the BLAST program, have only limited use for such tasks, as the rearranged nature of an IG sequence and the variable length of each gene requires multiple rounds of BLAST searches for a single IG sequence. Additionally, manual assembly of different genes is difficult and error-prone. To address these issues and to facilitate other common tasks in analysing IG sequences, we have developed the sequence analysis tool IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/). With this tool, users can view the matches to the germline V, D and J genes, details at rearrangement junctions, the delineation of IG V domain framework regions and complementarity determining regions. IgBLAST has the capability to analyse nucleotide and protein sequences and can process sequences in batches. Furthermore, IgBLAST allows searches against the germline gene databases and other sequence databases simultaneously to minimize the chance of missing possibly the best matching germline V gene.
Collapse
|
9
|
Abstract
The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.
Collapse
|
10
|
Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 2012; 13:134. [PMID: 22708584 PMCID: PMC3412702 DOI: 10.1186/1471-2105-13-134] [Citation(s) in RCA: 3443] [Impact Index Per Article: 286.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 06/18/2012] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Choosing appropriate primers is probably the single most important factor affecting the polymerase chain reaction (PCR). Specific amplification of the intended target requires that primers do not have matches to other targets in certain orientations and within certain distances that allow undesired amplification. The process of designing specific primers typically involves two stages. First, the primers flanking regions of interest are generated either manually or using software tools; then they are searched against an appropriate nucleotide sequence database using tools such as BLAST to examine the potential targets. However, the latter is not an easy process as one needs to examine many details between primers and targets, such as the number and the positions of matched bases, the primer orientations and distance between forward and reverse primers. The complexity of such analysis usually makes this a time-consuming and very difficult task for users, especially when the primers have a large number of hits. Furthermore, although the BLAST program has been widely used for primer target detection, it is in fact not an ideal tool for this purpose as BLAST is a local alignment algorithm and does not necessarily return complete match information over the entire primer range. RESULTS We present a new software tool called Primer-BLAST to alleviate the difficulty in designing target-specific primers. This tool combines BLAST with a global alignment algorithm to ensure a full primer-target alignment and is sensitive enough to detect targets that have a significant number of mismatches to primers. Primer-BLAST allows users to design new target-specific primers in one step as well as to check the specificity of pre-existing primers. Primer-BLAST also supports placing primers based on exon/intron locations and excluding single nucleotide polymorphism (SNP) sites in primers. CONCLUSIONS We describe a robust and fully implemented general purpose primer design tool that designs target-specific PCR primers. Primer-BLAST offers flexible options to adjust the specificity threshold and other primer properties. This tool is publicly available at http://www.ncbi.nlm.nih.gov/tools/primer-blast.
Collapse
|
11
|
New finite-size correction for local alignment score distributions. BMC Res Notes 2012; 5:286. [PMID: 22691307 PMCID: PMC3483159 DOI: 10.1186/1756-0500-5-286] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 05/16/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Local alignment programs often calculate the probability that a match occurred by chance. The calculation of this probability may require a "finite-size" correction to the lengths of the sequences, as an alignment that starts near the end of either sequence may run out of sequence before achieving a significant score. FINDINGS We present an improved finite-size correction that considers the distribution of sequence lengths rather than simply the corresponding means. This approach improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match. We use a test set derived from ASTRAL to show improved ROC scores, especially for shorter sequences. CONCLUSIONS The new finite-size correction improves the calculation of probabilities for a local alignment. It is now used in the BLAST+ package and at the NCBI BLAST web site ( http://blast.ncbi.nlm.nih.gov).
Collapse
|
12
|
Abstract
Background BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch. Results We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST. Conclusions DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at http://blast.ncbi.nlm.nih.gov. Reviewers This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.
Collapse
|
13
|
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
14
|
Exclusion of Spherical Particles from the Nematic Phase of Reversibly Assembled Rod-Like Particles. ACTA ACUST UNITED AC 2011. [DOI: 10.1557/proc-248-95] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe open-ended aggregation of amphiphilic molecules in aqueous solution generates a broadly polydisperse population of elongated particles that form a variety of partially ordered phases. Herzfeld and coworkers have shown that the phase behavior of these binary systems is well described by self-consistently combining scaled-particle theory for the effects of excluded volume in fluid dimensions, a simple cell model for the effects of excluded volume in positionally ordered dimensions, a mean-field treatment of soft-interactions, and a phenomenological model of aggregate formation. We have now extended this model to ternary systems. We find that the addition of spherical particles to a solution of rod-forming particles induces a very wide isotropic-nematic coexistence region in which a relatively dilute isotropic solution with little aggregation separates from a rather concentrated nematic solution that almost completely excludes the spherical solutes. The magnitude of this effect depends on the relative diameters of the two solutes.
Collapse
|
15
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2011; 39:D38-51. [PMID: 21097890 PMCID: PMC3013733 DOI: 10.1093/nar/gkq1172] [Citation(s) in RCA: 475] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Revised: 10/29/2010] [Accepted: 11/01/2010] [Indexed: 12/03/2022] Open
Abstract
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
16
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2010; 38:D5-16. [PMID: 19910364 PMCID: PMC2808881 DOI: 10.1093/nar/gkp967] [Citation(s) in RCA: 374] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Revised: 10/06/2009] [Accepted: 10/13/2009] [Indexed: 12/23/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
17
|
Abstract
BACKGROUND Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
Collapse
|
18
|
Abstract
Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
Collapse
|
19
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2009; 37:D5-15. [PMID: 18940862 PMCID: PMC2686545 DOI: 10.1093/nar/gkn741] [Citation(s) in RCA: 654] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 10/01/2008] [Accepted: 10/02/2008] [Indexed: 11/13/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
20
|
Abstract
Motivation: The BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar. Results: We developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new ‘indexed MegaBLAST’ is faster than the ‘non-indexed’ version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases. Availability: The code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast Contact:schaffer@helix.nih.gov Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
21
|
Abstract
Basic Local Alignment Search Tool (BLAST) is a sequence similarity search program. The public interface of BLAST, http://www.ncbi.nlm.nih.gov/blast, at the NCBI website has recently been reengineered to improve usability and performance. Key new features include simplified search forms, improved navigation, a list of recent BLAST results, saved search strategies and a documentation directory. Here, we describe the BLAST web application's new features, explain design decisions and outline plans for future improvement.
Collapse
|
22
|
Abstract
Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.
Collapse
|
23
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2008; 36:D13-21. [PMID: 18045790 PMCID: PMC2238880 DOI: 10.1093/nar/gkm1000] [Citation(s) in RCA: 608] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Revised: 10/19/2007] [Accepted: 10/22/2007] [Indexed: 12/21/2022] Open
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
24
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2007; 35:D5-12. [PMID: 17170002 PMCID: PMC1781113 DOI: 10.1093/nar/gkl1031] [Citation(s) in RCA: 626] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2006] [Revised: 10/16/2006] [Accepted: 10/17/2006] [Indexed: 01/12/2023] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
25
|
Abstract
Basic local alignment search tool (BLAST) is a sequence similarity search program. The National Center for Biotechnology Information (NCBI) maintains a BLAST server with a home page at . We report here on recent enhancements to the results produced by the BLAST server at the NCBI. These include features to highlight mismatches between similar sequences, show where the query was masked for low-complexity sequence, and integrate information about the database sequences from the NCBI Entrez system into the BLAST display. Changes to how the database sequences are fetched have also improved the speed of the report generator.
Collapse
|
26
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2006; 34:D173-80. [PMID: 16381840 PMCID: PMC1347520 DOI: 10.1093/nar/gkj158] [Citation(s) in RCA: 396] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2005] [Revised: 10/03/2005] [Accepted: 10/31/2005] [Indexed: 12/31/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Collapse
|
27
|
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data retrieval systems and computational resources for the analysis of data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, Entrez Programming Utilities, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.
Collapse
|
28
|
Abstract
Basic Local Alignment Search Tool (BLAST) is one of the most heavily used sequence analysis tools available in the public domain. There is now a wide choice of BLAST algorithms that can be used to search many different sequence databases via the BLAST web pages (http://www.ncbi.nlm.nih.gov/BLAST/). All the algorithm-database combinations can be executed with default parameters or with customized settings, and the results can be viewed in a variety of ways. A new online resource, the BLAST Program Selection Guide, has been created to assist in the definition of search strategies. This article discusses optimal search strategies and highlights some BLAST features that can make your searches more powerful.
Collapse
|
29
|
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Collapse
|
30
|
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Collapse
|
31
|
Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res 2002; 30:13-6. [PMID: 11752242 PMCID: PMC99094 DOI: 10.1093/nar/30.1.13] [Citation(s) in RCA: 154] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human inverted exclamation markVMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.
Collapse
|
32
|
Development of a simplified, sensitive high-performance liquid chromatographic method using fluorescence detection to determine the concentration of UCN-01 in human plasma. JOURNAL OF CHROMATOGRAPHY. B, BIOMEDICAL SCIENCES AND APPLICATIONS 2001; 760:247-53. [PMID: 11530983 DOI: 10.1016/s0378-4347(01)00276-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
UCN-01 is a naturally derived anticancer agent isolated in the culture broth of actinomyces streptomyces. We have developed a sensitive high-performance liquid chromatographic method for the determination of UCN-01 in human plasma. UCN-01 was isolated from human plasma after intravenous administration, by using 100% ice-cold acetonitrile liquid-liquid phase extraction. Liquid chromatographic separation was achieved by isocratic elution on a phenyl analytical column. The mobile phase consisted of acetonitrile-0.5 M ammonium acetate (45:55) with 0.2% triethylamine added as a modifier. The UCN-01 peak was identified from other peaks using fluorescence excitation energy and emission energy wavelengths of 310 and 410 nm, respectively. Retention time for UCN-01 was 4.2 +/- 0.5 min. The UCN-01 peak was baseline resolved, with nearest peak at 2.6 min distance. No interfering peaks were observed at the retention time of UCN-01. Peak area amounts from extracted samples were proportional over the dynamic concentration range used: 0.2 to 30 microg/ml. Mean recoveries of UCN-01 at concentrations of 0.5 and 25 microg/ml were 89 and 90.2%, respectively. Relative standard deviations for UCN-01 calibration standards ranged from 1.89 to 2.31%, with relative errors ranging from 0.3 to 11.6%. Assay precision for UCN-01 based on quality control samples of 0.50 microg/ml was +/- 4.86% with an accuracy of +/-5.7%. For drug extracted from plasma the lowest limit of detection was 0.1 microg/ml, with the lowest limit of quantitation being 0.2 microg/ml. This method is suitable for routine analysis of UCN-01 in human plasma at concentration from 0.2 to 30 microg/ml.
Collapse
|
33
|
Development of a high-performance liquid chromatographic method to determine the concentration of karenitecin, a novel highly lipophilic camptothecin derivative, in human plasma and urine. JOURNAL OF CHROMATOGRAPHY. B, BIOMEDICAL SCIENCES AND APPLICATIONS 2001; 759:117-24. [PMID: 11499615 DOI: 10.1016/s0378-4347(01)00206-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Karenitecin is a novel, highly lipophilic camptothecin derivative with potent anticancer potential. We have developed a sensitive high-performance liquid chromatographic method for the determination of karenitecin concentration in human plasma and urine. Karenitecin was isolated from human plasma and urine using solid-phase extraction. Separation was achieved by gradient elution, using a water and acetonitrile mobile phase, on an ODS analytical column. Karenitecin was detected using fluorescence detection at excitation and emission wavelengths of 370 and 490 nm, respectively. Retention time for karenitecin was 16.2 +/- 0.5 min and 8.0 +/- 0.2 min for camptothecin, the internal standard. The karenitecin peak was baseline resolved, with the nearest peak at 3.1 min distance. Using normal volunteer plasma and urine from multiple individuals, as well as samples from the 50 patients analyzed to date, no interfering peaks were detected. Inter- and intra-day coefficients of variance were <4.4 and 7.1% for plasma and <4.9 and 11.6% for urine. Assay precision, based on an extracted karenitecin standard plasma sample of 2.5 ng/ml, was +4.46% with a mean accuracy of 92.4%. For extracted karenitecin standard urine samples of 2.5 ng/ml assay precision was +2.35% with a mean accuracy of 99.5%. The mean recovery of karenitecin, at plasma concentrations of 1.0 and 50 ng/ml, was 81.9 and 87.8% respectively. In urine, at concentrations of 1.5 and 50 ng/ml, the mean recoveries were 90.3 and 78.4% respectively. The lower limit of detection (LLD) for karenitecin was 0.5 ng/ml in plasma and 1.0 ng/ml in urine. The lower limit of quantification (LLQ) for karenitecin was 1 ng/ml and 1.5 ng/ml for plasma and urine, respectively. Stability studies indicate that when frozen at -70 degrees C, karenitecin is stable in human plasma for up to 3 months and in human urine for up to 1 month. This method is useful for the quantification of karenitecin in plasma and urine samples for clinical pharmacology studies in patients receiving this agent in clinical trials.
Collapse
|
34
|
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 2001; 29:2994-3005. [PMID: 11452024 PMCID: PMC55814 DOI: 10.1093/nar/29.14.2994] [Citation(s) in RCA: 939] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2001] [Revised: 05/30/2001] [Accepted: 05/30/2001] [Indexed: 11/13/2022] Open
Abstract
PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 +/- 0.005 to 0.895 +/- 0.003. This does not include the benefits from four modifications we included in the 'baseline' version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.
Collapse
|
35
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2001; 29:11-6. [PMID: 11125038 PMCID: PMC29800 DOI: 10.1093/nar/29.1.11] [Citation(s) in RCA: 196] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2000] [Accepted: 10/04/2000] [Indexed: 11/14/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap'99, Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri-tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov.
Collapse
|
36
|
Abstract
Arglabin [1(R),10(S)-epoxy-5(S),5(S),7(S)-guaia-3(4),11(13)-dien-6, 12-olide], a sesquiterpene gamma-lactone is isolated from Artemisia glabella, a species of wormwood endemic to the Karaganda region of Kazakstan. The compound has been modified to render it water-soluble through addition of a dimethylaminohydrochloride group to the C(13) carbohydride moiety to yield Arglabin-DMA. Arglabin-DMA is a registered antitumor substance in the Republic of Kazakstan. Previously, we have shown that this compound prevents protein farnesylation without altering geranylgeranylation. We now report that Arglabin-DMA inhibits the incorporation of [(3)H]farnesylpyrophosphate into human H-ras protein by FTase with an IC(50) of no greater than 25 microM. Kinetic studies show that the phosphorylated form of this compound competitively inhibits the binding of farnesyl diphosphate to FTase. This mechanism of action is different from other reported peptidomimetic FTIs which lower the affinity of ras protein to FTase. Our in vitro studies confirm that Arglabin-DMA inhibits post-translational modification of ras protein in cells. Arglabin-DMA inhibits anchorage-dependent proliferation of NB cells (IC50=10 microg/ml) and inhibits anchorage-independent growth of NB and KNRK cells with about the same IC(50). Soft-agar colony formation assay of H-ras and K-ras transformed cells show IC(50)s to be 2 and 5 microg/ml, respectively. In primary cultures of human tumor cells, Arglabin-DMA inhibits cell proliferation of a variety of tumor types with IC(90)s in the range of 0.85 to 5.0 microg/ml. Because of these pharmacologic properties, we propose that Arglabin-DMA is suitable for the treatment of ras related malignancies.
Collapse
|
37
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2000; 28:10-4. [PMID: 10592169 PMCID: PMC102437 DOI: 10.1093/nar/28.1.10] [Citation(s) in RCA: 297] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/1999] [Revised: 09/14/1999] [Accepted: 10/08/1999] [Indexed: 11/14/2022] Open
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap'99, Davis Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov
Collapse
|
38
|
Erratum to âBLAST 2 Sequences, a new tool for comparing protein and nucleotide sequencesâ [FEMS Microbiol. 174 (1999) 247â250]. FEMS Microbiol Lett 1999. [DOI: 10.1111/j.1574-6968.1999.tb13730.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
39
|
Abstract
'BLAST 2 Sequences', a new BLAST-based tool for aligning two protein or nucleotide sequences, is described. While the standard BLAST program is widely used to search for homologous sequences in nucleotide and protein databases, one often needs to compare only two sequences that are already known to be homologous, coming from related species or, e.g. different isolates of the same virus. In such cases searching the entire database would be unnecessarily time-consuming. 'BLAST 2 Sequences' utilizes the BLAST algorithm for pairwise DNA-DNA or protein-protein sequence comparison. A World Wide Web version of the program can be used interactively at the NCBI WWW site (http://www.ncbi.nlm.nih.gov/gorf/bl2.++ +html). The resulting alignments are presented in both graphical and text form. The variants of the program for PC (Windows), Mac and several UNIX-based platforms can be downloaded from the NCBI FTP site (ftp://ncbi.nlm.nih.gov).
Collapse
|
40
|
|
41
|
Pharmaceutical properties of related calanolide compounds with activity against human immunodeficiency virus. J Pharm Sci 1998; 87:1077-80. [PMID: 9724557 DOI: 10.1021/js980122d] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The present studies were undertaken to compare the relative pharmacokinetic parameters and bioavailability of two chemically related natural products which are nonnucleoside inhibitors of reverse transcriptase. Both (+)-calanolide A (Cal A; NSC 675451) and (+)-dihydrocalanolide A (DHCal A; NSC 678323) are currently under development for the treatment of HIV infections. HPLC-based analytical assays were developed for both compounds using modifications of a previously published procedure. The assays were used to compare the intravenous pharmacokinetics of the dihydro analogue relative to the parent compound, Cal A, and to determine the relative oral bioavailability of each drug in CD2F1 mice. Although the pharmacokinetic parameters of each drug were similar (Cal A, 25 mg/kg: AUC: 9.4 [microg/mL]. hr, t1/2beta: 0.25 h,, t1/2gamma: 1.8 h, clearance: 2.7 L/h/kg versus DHCal A, 25 mg/kg: AUC: 6.9 [microg/mL].hr, t1/2beta: 0.22 h,, t1/2gamma: 2.3 h, clearance: 3.6 L/h/kg), the oral bioavailability of DHCal A (F = 46. 8%) was markedly better than that obtained for Cal A (F = 13.2%). The relative ability of Cal A and DHCal A to change to their inactive epimer forms, (+)-calanolide B and (+)-dihydrocalanolide B, respectively, was also determined. While conversion of active to inactive forms of the drugs was noted to occur in vitro especially under acidic conditions, no epimer forms of either compound were noted in plasma of mice after administration of either CalA or DHCal A. Considered together with preliminary toxicology findings, the pharmacokinetic data obtained in the present series of experiments suggest that selection of the dihydro derivative of (+)-calanolide A may be a reasonable choice for further preclinical development and possible Phase I clinical evaluation.
Collapse
|
42
|
Abstract
Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.
Collapse
|
43
|
Abstract
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Collapse
|
44
|
Abstract
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Collapse
|
45
|
A phase II and pharmacokinetic study of enloplatin in patients with platinum refractory advanced ovarian carcinoma. Anticancer Drugs 1997; 8:649-56. [PMID: 9311439 DOI: 10.1097/00001813-199708000-00001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This was a study of enloplatin in 18 evaluable patients with platinum refractory ovarian cancer. They received an i.v. infusion of enloplatin over 1.5 h without prehydration every 21 days. One patient had a partial response (6%; 95% CI 0-26%) lasting 2.8 months. The median survival was 9.4 months (95%; CI 5.1-19.7%). Neutropenia was the dose-limiting toxicity. Nephrotoxicity was manageable. Enloplatin is the major form of the free drug in plasma. However, 13.5 h after initiation of treatment, 85% of the drug in plasma is protein bound. Elimination of the drug is mainly renal. Enloplatin pharmacokinetics is similar to that of carboplatin. Thus, the plasma pharmacokinetics of enloplatin is dictated by the cyclobutanedicarboxylato (CBDCA) ligand and not the novel amino ligand.
Collapse
|
46
|
PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res 1997; 7:649-56. [PMID: 9199938 PMCID: PMC310664 DOI: 10.1101/gr.7.6.649] [Citation(s) in RCA: 209] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
As the rate of DNA sequencing increases, analysis by sequence similarity search will need to become much more efficient in terms of sensitivity, specificity, automation potential, and consistency in annotation. PowerBLAST was developed, in part, to address these problems. PowerBLAST includes a number of options for masking repetitive elements and low complexity subsequences. It also has the capacity to restrict the search to any level of NCBI's taxonomy index, thus supporting "comparative genomics" applications. Postprocessing of the BLAST output using the SIM series of algorithms produces optimal, gapped alignments, and multiple alignments when a region of the query sequence matches multiple database sequences. PowerBLAST is capable of processing sequences of any length because it divides long query sequences into overlapping fragments and then merges the results after searching. The results may be viewed graphically, as a textual representation, or as an HTML page with links to GenBank and Entrez. For matching database sequences, annotated features are superimposed on the aligned query sequence in the output, thus greatly increasing the ease of interpretation. Such features may be used for automated annotation of new sequence because PowerBLAST output in ASN.1 form may be "dragged and dropped" into NCBI's Sequin program for sequence annotation and submission. PowerBLAST is capable of analyzing and annotating a 100-kb query in 60 min on NCBI's BLAST server.
Collapse
|
47
|
Abstract
The sequence databases continue to grow at an extraordinary rate. Contributions come from both small laboratories and large-scale projects, such as the Merck EST project. This growth has placed new demands on computational sequence comparison tools such as BLAST. Even now it is no longer practical to evaluate some BLAST reports manually; it is necessary to filter the output by, for example, organism, source, or degree of annotation. The new network BLAST service makes such tools possible. It is also possible to present BLAST output in different formats, such as BLANCE. Perhaps most important of all, it becomes simple to call BLAST from another application, making it one step within an integrated system. This makes the automated preparation of sequence evaluations that include BLAST runs possible. In the near future we expect to see a number of applications that use the network BLAST interface to help molecular biologists search against a database that is growing not only in size but in biological richness.
Collapse
|
48
|
Crowding-induced organization of cytoskeletal elements: II. Dissolution of spontaneously formed filament bundles by capping proteins. J Biophys Biochem Cytol 1994; 126:169-74. [PMID: 8027175 PMCID: PMC2120095 DOI: 10.1083/jcb.126.1.169] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Through calculations of molecular packing constraints in crowded solutions, we have previously shown that dispersions of filament forming proteins and soluble proteins can be unstable at physiological concentrations, such that tight bundles of filaments are formed spontaneously, in the absence of any accessory binding proteins. Here we consider the modulation of this phenomenon by capping proteins. The theory predicts that, by shortening the average filament length, capping alleviates the packing problem. As a result, the dispersed isotropic solution is stable over an expanded range of compositions.
Collapse
|
49
|
Crowding-induced organization of cytoskeletal elements: I. Spontaneous demixing of cytosolic proteins and model filaments to form filament bundles. Biophys J 1993; 65:1147-54. [PMID: 8241394 PMCID: PMC1225832 DOI: 10.1016/s0006-3495(93)81144-5] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The theory for the effects of crowding on the behavior of reversibly self-assembling solutes is extended to mixtures containing nonassembling solutes. The theory predicts that excluded volume will cause dramatic demixing into domains of long, tightly packed, highly aligned fibers coexisting with an isotropic solution of unaggregated species. It suggests that the bundling of fibers in cells is entropically driven and that accessory binding proteins in the cytoplasm serve to modulate the process rather than create it.
Collapse
|
50
|
Theoretical studies of DNA during orthogonal field alternating gel electrophoresis. J Chem Phys 1991. [DOI: 10.1063/1.459963] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|