1
|
Mapping Conformational Changes in the Saliva Proteome Potentially Associated with Oral Cancer Aggressiveness. J Proteome Res 2024. [PMID: 38785273 DOI: 10.1021/acs.jproteome.4c00093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Diverse proteomics-based strategies have been applied to saliva to quantitatively identify diagnostic and prognostic targets for oral cancer. Considering that these targets may be regulated by events that do not imply variation in protein abundance levels, we hypothesized that changes in protein conformation can be associated with diagnosis and prognosis, revealing biological processes and novel targets of clinical relevance. For this, we employed limited proteolysis-mass spectrometry in saliva samples to explore structural alterations, comparing the proteome of healthy control and oral squamous cell carcinoma (OSCC) patients with and without lymph node metastasis. Thirty-six proteins with potential structural rearrangements were associated with clinical patient features including transketolase and its interacting partners. Moreover, N-glycosylated peptides contribute to structural rearrangements of potential diagnostic and prognostic markers. Altogether, this approach utilizes saliva proteins to search for targets for diagnosing and prognosing oral cancer and can guide the discovery of potential regulated sites beyond protein-level abundance.
Collapse
|
2
|
NP 3 MS Workflow: An Open-Source Software System to Empower Natural Product-Based Drug Discovery Using Untargeted Metabolomics. Anal Chem 2024; 96:7460-7469. [PMID: 38702053 PMCID: PMC11099897 DOI: 10.1021/acs.analchem.3c05829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/02/2024] [Accepted: 04/05/2024] [Indexed: 05/06/2024]
Abstract
Natural products (or specialized metabolites) are historically the main source of new drugs. However, the current drug discovery pipelines require miniaturization and speeds that are incompatible with traditional natural product research methods, especially in the early stages of the research. This article introduces the NP3 MS Workflow, a robust open-source software system for liquid chromatography-tandem mass spectrometry (LC-MS/MS) untargeted metabolomic data processing and analysis, designed to rank bioactive natural products directly from complex mixtures of compounds, such as bioactive biota samples. NP3 MS Workflow allows minimal user intervention as well as customization of each step of LC-MS/MS data processing, with diagnostic statistics to allow interpretation and optimization of LC-MS/MS data processing by the user. NP3 MS Workflow adds improved computing of the MS2 spectra in an LC-MS/MS data set and provides tools for automatic [M + H]+ ion deconvolution using fragmentation rules; chemical structural annotation against MS2 databases; and relative quantification of the precursor ions for bioactivity correlation scoring. The software will be presented with case studies and comparisons with equivalent tools currently available. NP3 MS Workflow shows a robust and useful approach to select bioactive natural products from complex mixtures, improving the set of tools available for untargeted metabolomics. It can be easily integrated into natural product-based drug-discovery pipelines and to other fields of research at the interface of chemistry and biology.
Collapse
|
3
|
Abstract
The selection of a suitable proteotypic peptide remains a challenge for designing a targeted quantitative proteomics assay. Although the criteria are well-established in the literature, the selection of these peptides is often performed in a subjective and time-consuming manner. Here, we have developed a practical and semiautomated workflow implemented in an open-source program named Typic. Typic is designed to run in a command line and a graphical interface to help selecting a list of proteotypic peptides for targeted quantitation. The tool combines the input data and downloads additional data from public repositories to produce a file per protein as output. Each output file includes relevant information to the selection of proteotypic peptides organized in a table, a colored ranking of peptides according to their potential value as targets for quantitation and auxiliary plots to assist users in the task of proteotypic peptides selection. Taken together, Typic leads to a practical and straightforward data extraction from multiple data sets, allowing the identification of most suitable proteotypic peptides based on established criteria, in an unbiased and standardized manner, ultimately leading to a more robust targeted proteomics assay.
Collapse
|
4
|
Meta-omics analysis indicates the saliva microbiome and its proteins associated with the prognosis of oral cancer patients. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2021; 1869:140659. [PMID: 33839314 DOI: 10.1016/j.bbapap.2021.140659] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 12/27/2022]
Abstract
Saliva is a biofluid that maintains the health of oral tissues and the homeostasis of oral microbiota. Studies have demonstrated that Oral squamous cell carcinoma (OSCC) patients have different salivary microbiota than healthy individuals. However, the relationship between these microbial differences and clinicopathological outcomes is still far from conclusive. Herein, we investigate the capability of using metagenomic and metaproteomic saliva profiles to distinguish between Control (C), OSCC without active lesion (L0), and OSCC with active lesion (L1) patients. The results show that there are significantly distinct taxonomies and functional changes in L1 patients compared to C and L0 patients, suggesting compositional modulation of the oral microbiome, as the relative abundances of Centipeda, Veillonella, and Gemella suggested by metagenomics are correlated with tumor size, clinical stage, and active lesion. Metagenomics results also demonstrated that poor overall patient survival is associated with a higher relative abundance of Stenophotromonas, Staphylococcus, Centipeda, Selenomonas, Alloscordovia, and Acitenobacter. Finally, compositional and functional differences in the saliva content by metaproteomics analysis can distinguish healthy individuals from OSCC patients. In summary, our study suggests that oral microbiota and their protein abundance have potential diagnosis and prognosis value for oral cancer patients. Further studies are necessary to understand the role of uniquely detected metaproteins in the microbiota of healthy and OSCC patients as well as the crosstalk between saliva host proteins and the oral microbiome present in OSCC.
Collapse
|
5
|
Chemical Elicitors Induce Rare Bioactive Secondary Metabolites in Deep-Sea Bacteria under Laboratory Conditions. Metabolites 2021; 11:metabo11020107. [PMID: 33673148 PMCID: PMC7918856 DOI: 10.3390/metabo11020107] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/29/2021] [Accepted: 02/03/2021] [Indexed: 02/06/2023] Open
Abstract
Bacterial genome sequencing has revealed a vast number of novel biosynthetic gene clusters (BGC) with potential to produce bioactive natural products. However, the biosynthesis of secondary metabolites by bacteria is often silenced under laboratory conditions, limiting the controlled expression of natural products. Here we describe an integrated methodology for the construction and screening of an elicited and pre-fractionated library of marine bacteria. In this pilot study, chemical elicitors were evaluated to mimic the natural environment and to induce the expression of cryptic BGCs in deep-sea bacteria. By integrating high-resolution untargeted metabolomics with cheminformatics analyses, it was possible to visualize, mine, identify and map the chemical and biological space of the elicited bacterial metabolites. The results show that elicited bacterial metabolites correspond to ~45% of the compounds produced under laboratory conditions. In addition, the elicited chemical space is novel (~70% of the elicited compounds) or concentrated in the chemical space of drugs. Fractionation of the crude extracts further evidenced minor compounds (~90% of the collection) and the detection of biological activity. This pilot work pinpoints strategies for constructing and evaluating chemically diverse bacterial natural product libraries towards the identification of novel bacterial metabolites in natural product-based drug discovery pipelines.
Collapse
|
6
|
gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections. Algorithms Mol Biol 2020; 15:18. [PMID: 32973918 PMCID: PMC7507297 DOI: 10.1186/s13015-020-00177-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 09/08/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The construction of a suffix array for a collection of strings is a fundamental task in Bioinformatics and in many other applications that process strings. Related data structures, as the Longest Common Prefix array, the Burrows-Wheeler transform, and the document array, are often needed to accompany the suffix array to efficiently solve a wide variety of problems. While several algorithms have been proposed to construct the suffix array for a single string, less emphasis has been put on algorithms to construct suffix arrays for string collections. RESULT In this paper we introduce gsufsort, an open source software for constructing the suffix array and related data indexing structures for a string collection with N symbols in O(N) time. Our tool is written in ANSI/C and is based on the algorithm gSACA-K (Louza et al. in Theor Comput Sci 678:22-39, 2017), the fastest algorithm to construct suffix arrays for string collections. The tool supports large fasta, fastq and text files with multiple strings as input. Experiments have shown very good performance on different types of strings. CONCLUSIONS gsufsort is a fast, portable, and lightweight tool for constructing the suffix array and additional data structures for string collections.
Collapse
|
7
|
Detection and identification of Xanthomonas pathotypes associated with citrus diseases using comparative genomics and multiplex PCR. PeerJ 2019; 7:e7676. [PMID: 31592342 PMCID: PMC6777491 DOI: 10.7717/peerj.7676] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 08/15/2019] [Indexed: 12/25/2022] Open
Abstract
Background In Citrus cultures, three species of Xanthomonas are known to cause distinct diseases. X. citri subsp. citri patothype A, X. fuscans subsp. aurantifolii pathotypes B and C, and X. alfalfae subsp. citrumelonis, are the causative agents of cancrosis A, B, C, and citrus bacterial spots, respectively. Although these species exhibit different levels of virulence and aggressiveness, only limited alternatives are currently available for proper and early detection of these diseases in the fields. The present study aimed to develop a new molecular diagnostic method based on genomic sequences derived from the four species of Xanthomonas. Results Using comparative genomics approaches, primers were synthesized for the identification of the four causative agents of citrus diseases. These primers were validated for their specificity to their target DNA by both conventional and multiplex PCR. Upon evaluation, their sensitivity was found to be 0.02 ng/µl in vitro and 1.5 × 104 CFU ml−1 in infected leaves. Additionally, none of the primers were able to generate amplicons in 19 other genomes of Xanthomonas not associated with Citrus and one species of Xylella, the causal agent of citrus variegated chlorosis (CVC). This denotes strong specificity of the primers for the different species of Xanthomonas investigated in this study. Conclusions We demonstrated that these markers can be used as potential candidates for performing in vivo molecular diagnosis exclusively for citrus-associated Xanthomonas. The bioinformatics pipeline developed in this study to design specific genomic regions is capable of generating specific primers. It is freely available and can be utilized for any other model organism.
Collapse
|
8
|
External memory BWT and LCP computation for sequence collections with applications. Algorithms Mol Biol 2019; 14:6. [PMID: 30899322 PMCID: PMC6408864 DOI: 10.1186/s13015-019-0140-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 02/23/2019] [Indexed: 11/10/2022] Open
Abstract
Background Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows–Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM. Results We propose a space-efficient algorithm to compute the BWT and LCP array for a collection of sequences in the external or semi-external memory setting. Our algorithm splits the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external or semi-external memory and in the process it also computes the LCP values. Our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP array, provide simple, scan-based, external memory algorithms for three well known problems in bioinformatics: the computation of maximal repeats, the all pairs suffix–prefix overlaps, and the construction of succinct de Bruijn graphs. Conclusions We prove that our algorithm performs \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathcal {O}}(n\, \mathsf {maxlcp})$$\end{document}O(nmaxlcp) sequential I/Os, where n is the total length of the collection and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {maxlcp}$$\end{document}maxlcp is the maximum LCP value. The experimental results show that our algorithm is only slightly slower than the state of the art for short sequences but it is up to 40 times faster for longer sequences or when the available RAM is at least equal to the size of the input.
Collapse
|
9
|
Abstract
BACKGROUND In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. RESULTS We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. CONCLUSION Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used.
Collapse
|
10
|
Generalized enhanced suffix array construction in external memory. Algorithms Mol Biol 2017; 12:26. [PMID: 29234460 PMCID: PMC5719966 DOI: 10.1186/s13015-017-0117-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 11/22/2017] [Indexed: 11/24/2022] Open
Abstract
Background Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. Results In this article we present and analyze \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {eGSA}$$\end{document}eGSA [introduced in CPM (External memory generalized suffix and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {LCP}$$\end{document}LCP arrays construction. In: Proceedings of CPM. pp 201–10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {eGSA}$$\end{document}eGSA showed a competitive performance when compared to \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {eSAIS}$$\end{document}eSAIS and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathsf {SAscan}$$\end{document}SAscan, which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm. Conclusions The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.
Collapse
|
11
|
CellNetVis: a web tool for visualization of biological networks using force-directed layout constrained by cellular components. BMC Bioinformatics 2017; 18:395. [PMID: 28929969 PMCID: PMC5606216 DOI: 10.1186/s12859-017-1787-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The advent of "omics" science has brought new perspectives in contemporary biology through the high-throughput analyses of molecular interactions, providing new clues in protein/gene function and in the organization of biological pathways. Biomolecular interaction networks, or graphs, are simple abstract representations where the components of a cell (e.g. proteins, metabolites etc.) are represented by nodes and their interactions are represented by edges. An appropriate visualization of data is crucial for understanding such networks, since pathways are related to functions that occur in specific regions of the cell. The force-directed layout is an important and widely used technique to draw networks according to their topologies. Placing the networks into cellular compartments helps to quickly identify where network elements are located and, more specifically, concentrated. Currently, only a few tools provide the capability of visually organizing networks by cellular compartments. Most of them cannot handle large and dense networks. Even for small networks with hundreds of nodes the available tools are not able to reposition the network while the user is interacting, limiting the visual exploration capability. RESULTS Here we propose CellNetVis, a web tool to easily display biological networks in a cell diagram employing a constrained force-directed layout algorithm. The tool is freely available and open-source. It was originally designed for networks generated by the Integrated Interactome System and can be used with networks from others databases, like InnateDB. CONCLUSIONS CellNetVis has demonstrated to be applicable for dynamic investigation of complex networks over a consistent representation of a cell on the Web, with capabilities not matched elsewhere.
Collapse
|
12
|
|
13
|
Integrative analysis to select cancer candidate biomarkers to targeted validation. Oncotarget 2016; 6:43635-52. [PMID: 26540631 PMCID: PMC4791256 DOI: 10.18632/oncotarget.6018] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 10/17/2015] [Indexed: 01/15/2023] Open
Abstract
Targeted proteomics has flourished as the method of choice for prospecting for and validating potential candidate biomarkers in many diseases. However, challenges still remain due to the lack of standardized routines that can prioritize a limited number of proteins to be further validated in human samples. To help researchers identify candidate biomarkers that best characterize their samples under study, a well-designed integrative analysis pipeline, comprising MS-based discovery, feature selection methods, clustering techniques, bioinformatic analyses and targeted approaches was performed using discovery-based proteomic data from the secretomes of three classes of human cell lines (carcinoma, melanoma and non-cancerous). Three feature selection algorithms, namely, Beta-binomial, Nearest Shrunken Centroids (NSC), and Support Vector Machine-Recursive Features Elimination (SVM-RFE), indicated a panel of 137 candidate biomarkers for carcinoma and 271 for melanoma, which were differentially abundant between the tumor classes. We further tested the strength of the pipeline in selecting candidate biomarkers by immunoblotting, human tissue microarrays, label-free targeted MS and functional experiments. In conclusion, the proposed integrative analysis was able to pre-qualify and prioritize candidate biomarkers from discovery-based proteomics to targeted MS.
Collapse
|
14
|
InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 2015; 16:169. [PMID: 25994840 DOI: 10.1186/s12859-015-0611-613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 05/06/2015] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. RESULTS We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets' elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains. CONCLUSIONS InteractiVenn allows set unions in Venn diagrams to be explored thoroughly, by consequence extending the ability to analyze combinations of sets with additional observations, yielded by novel interactions between joined sets. InteractiVenn is freely available online at: www.interactivenn.net .
Collapse
|
15
|
InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics 2015; 16:169. [PMID: 25994840 PMCID: PMC4455604 DOI: 10.1186/s12859-015-0611-3] [Citation(s) in RCA: 1243] [Impact Index Per Article: 138.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 05/06/2015] [Indexed: 01/12/2023] Open
Abstract
Background Set comparisons permeate a large number of data analysis workflows, in particular workflows in biological sciences. Venn diagrams are frequently employed for such analysis but current tools are limited. Results We have developed InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets’ elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains. Conclusions InteractiVenn allows set unions in Venn diagrams to be explored thoroughly, by consequence extending the ability to analyze combinations of sets with additional observations, yielded by novel interactions between joined sets. InteractiVenn is freely available online at: www.interactivenn.net.
Collapse
|
16
|
Abstract
The live phylogeny problem generalizes the phylogeny problem while admitting the existence of living ancestors among the taxonomic objects. This problem suits the case of fast-evolving species, like virus, and the construction of phylogenies for nonbiological objects like documents, images, and database records. In this article, we formalize the live phylogeny problem for distances and character states and introduce polynomial-time algorithms for particular versions of the problems. We believe that more general versions of the problems are NP-hard and that many heuristic and approximation approaches may be developed as solution strategies.
Collapse
|
17
|
Improved similarity trees and their application to visual data classification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2011; 17:2459-2468. [PMID: 22034367 DOI: 10.1109/tvcg.2011.212] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
An alternative form to multidimensional projections for the visual analysis of data represented in multidimensional spaces is the deployment of similarity trees, such as Neighbor Joining trees. They organize data objects on the visual plane emphasizing their levels of similarity with high capability of detecting and separating groups and subgroups of objects. Besides this similarity-based hierarchical data organization, some of their advantages include the ability to decrease point clutter; high precision; and a consistent view of the data set during focusing, offering a very intuitive way to view the general structure of the data set as well as to drill down to groups and subgroups of interest. Disadvantages of similarity trees based on neighbor joining strategies include their computational cost and the presence of virtual nodes that utilize too much of the visual space. This paper presents a highly improved version of the similarity tree technique. The improvements in the technique are given by two procedures. The first is a strategy that replaces virtual nodes by promoting real leaf nodes to their place, saving large portions of space in the display and maintaining the expressiveness and precision of the technique. The second improvement is an implementation that significantly accelerates the algorithm, impacting its use for larger data sets. We also illustrate the applicability of the technique in visual data mining, showing its advantages to support visual classification of data sets, with special attention to the case of image classification. We demonstrate the capabilities of the tree for analysis and iterative manipulation and employ those capabilities to support evolving to a satisfactory data organization and classification.
Collapse
|
18
|
Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane. Genome Res 2003; 13:2725-35. [PMID: 14613979 PMCID: PMC403815 DOI: 10.1101/gr.1532103] [Citation(s) in RCA: 216] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
Collapse
MESH Headings
- Computational Biology/methods
- Computational Biology/statistics & numerical data
- DNA, Complementary/analysis
- DNA, Complementary/classification
- DNA, Complementary/physiology
- DNA, Plant/analysis
- DNA, Plant/classification
- DNA, Plant/physiology
- Expressed Sequence Tags
- Gene Expression Regulation, Plant
- Gene Library
- Molecular Sequence Data
- Organ Specificity/genetics
- Peptides/classification
- Peptides/genetics
- Peptides/physiology
- Plant Proteins/classification
- Plant Proteins/genetics
- Plant Proteins/physiology
- Polymorphism, Genetic/genetics
- Protein Structure, Tertiary/genetics
- Saccharum/genetics
- Saccharum/growth & development
- Saccharum/physiology
- Sequence Analysis, DNA/methods
- Signal Transduction/genetics
Collapse
|
19
|
Abstract
The Sugarcane EST project (SUCEST) produced 291,904 expressed sequence tags (ESTs) in a consortium that involved 74 sequencing and data mining laboratories. We created a web site for this project that served as a ‘meeting point’ for receiving, processing, analyzing, and providing services to help explore the sequence data. In this paper we describe the information pathway that we implemented to support this project and a brief explanation of the clustering procedure, which resulted in 43,141 clusters.
Collapse
|
20
|
Abstract
The original clustering procedure adopted in the Sugarcane Expressed Sequence Tag project (SUCEST) had many problems, for instance too many clusters, the presence of ribosomal sequences, etc. We therefore redesigned the clustering procedure entirely, including a much more careful initial trimming of the reads. In this paper the new trimming and clustering strategies are described in detail and we give the new official figures for the project, 237,954 expressed sequence tags and 43,141 clusters.
Collapse
|