1
|
Jiang B, Quinn-Bohmann N, Diener C, Nathan VB, Han-Hallett Y, Reddivari L, Gibbons SM, Baloni P. Understanding disease-associated metabolic changes in human colon epithelial cells using i ColonEpithelium metabolic reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.22.619644. [PMID: 39484551 PMCID: PMC11526933 DOI: 10.1101/2024.10.22.619644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The colon epithelium plays a key role in the host-microbiome interactions, allowing uptake of various nutrients and driving important metabolic processes. To unravel detailed metabolic activities in the human colon epithelium, our present study focuses on the generation of the first cell-type specific genome-scale metabolic model (GEM) of human colonic epithelial cells, named iColonEpithelium. GEMs are powerful tools for exploring reactions and metabolites at systems level and predicting the flux distributions at steady state. Our cell-type-specific iColonEpithelium metabolic reconstruction captures genes specifically expressed in the human colonic epithelial cells. The iColonEpithelium is also capable of performing metabolic tasks specific to the cell type. A unique transport reaction compartment has been included to allow simulation of metabolic interactions with the gut microbiome. We used iColonEpithelium to identify metabolic signatures associated with inflammatory bowel disease. We integrated single-cell RNA sequencing data from Crohn's Diseases (CD) and ulcerative colitis (UC) samples with the iColonEpithelium metabolic network to predict metabolic signatures of colonocytes between CD and UC compared to healthy samples. We identified reactions in nucleotide interconversion, fatty acid synthesis and tryptophan metabolism were differentially regulated in CD and UC conditions, which were in accordance with experimental results. The iColonEpithelium metabolic network can be used to identify mechanisms at the cellular level, and our network has the potential to be integrated with gut microbiome models to explore the metabolic interactions between host and gut microbiota under various conditions.
Collapse
|
2
|
Maruyama D, Liao WI, Tian X, Bredon M, Knapp J, Tat C, Doan TNM, Chassaing B, Bhargava A, Sokol H, Prakash A. Regulation of Lung Immune Tone by the Gut-Lung Axis via Dietary Fiber, Gut Microbiota, and Short-Chain Fatty Acids. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.24.552964. [PMID: 37662303 PMCID: PMC10473695 DOI: 10.1101/2023.08.24.552964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Lung immune tone, i.e. the immune state of the lung, can vary between individuals and over a single individual's lifetime, and its basis and regulation in the context of inflammatory responses to injury is poorly understood. The gut microbiome, through the gut-lung axis, can influence lung injury outcomes but how the diet and microbiota affect lung immune tone is also unclear. We hypothesized that lung immune tone would be influenced by the presence of fiber-fermenting short-chain fatty acid (SCFA)-producing gut bacteria. To test this hypothesis, we conducted a fiber diet intervention study followed by lung injury in mice and profiled gut microbiota using 16S sequencing, metabolomics, and lung immune tone. We also studied germ-free mice to evaluate lung immune tone in the absence of microbiota and performed in vitro mechanistic studies on immune tone and metabolic programming of alveolar macrophages exposed to the SCFA propionate (C3). Mice on high-fiber diet were protected from sterile lung injury compared to mice on a fiber-free diet. This protection strongly correlated with lower lung immune tone, elevated propionate levels and enrichment of specific fecal microbiota taxa; conversely, lower levels of SCFAs and an increase in other fatty acid metabolites and bacterial taxa correlated with increased lung immune tone and increased lung injury in the fiber-free group. In vitro , C3 reduced lung alveolar macrophage immune tone (through suppression of IL-1β and IL-18) and metabolically reprogrammed them (switching from glycolysis to oxidative phosphorylation after LPS challenge). Overall, our findings reveal that the gut-lung axis, through dietary fiber intake and enrichment of SCFA-producing gut bacteria, can regulate innate lung immune tone via IL-1β and IL-18 pathways. These results provide a rationale for the therapeutic development of dietary interventions to preserve or enhance specific aspects of host lung immunity.
Collapse
|
3
|
WGS Data Collections: How Do Genomic Databases Transform Medicine? Int J Mol Sci 2023; 24:ijms24033031. [PMID: 36769353 PMCID: PMC9917848 DOI: 10.3390/ijms24033031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/09/2023] Open
Abstract
As a scientific community we assumed that exome sequencing will elucidate the basis of most heritable diseases. However, it turned out it was not the case; therefore, attention has been increasingly focused on the non-coding sequences that encompass 98% of the genome and may play an important regulatory function. The first WGS-based datasets have already been released including underrepresented populations. Although many databases contain pooled data from several cohorts, recently the importance of local databases has been highlighted. Genomic databases are not only collecting data but may also contribute to better diagnostics and therapies. They may find applications in population studies, rare diseases, oncology, pharmacogenetics, and infectious and inflammatory diseases. Further data may be analysed with Al technologies and in the context of other omics data. To exemplify their utility, we put a highlight on the Polish genome database and its practical application.
Collapse
|
4
|
Sharma T, Kundu N, Kaur S, Chakraborty A, Mahto AK, Dewangan RP, Shankaraswamy J, Saxena S. Recognition and unfolding of human telomeric G-quadruplex by short peptide binding identified from the HRDC domain of BLM helicase. RSC Adv 2022; 12:21760-21769. [PMID: 36043100 PMCID: PMC9358547 DOI: 10.1039/d2ra03646k] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 07/17/2022] [Indexed: 11/24/2022] Open
Abstract
Research in recent decades has revealed that the guanine (G)-quadruplex secondary structure in DNA modulates a variety of cellular events that are mostly related to serious diseases. Systems capable of regulating DNA G-quadruplex structures would therefore be useful for the modulation of various cellular events to produce biological effects. A high specificity for recognition of telomeric G-quadruplex has been observed for BLM helicase. We identified peptides from the HRDC domain of BLM using a molecular docking approach with various available solutions and crystal structures of human telomeres and recently created a peptide library. Herein, we tested one peptide (BLM HRDC peptide) from the library and examined its interaction with human telomeric variant-1 (HTPu-var-1) to understand the basis of G4-protein interactions. Our circular dichroism (CD) data showed that HTPu-var-1 folded into an anti-parallel G-quadruplex, and the CD intensity significantly decreased upon increasing the peptide concentration. There was a significant decrease in hypochromicity due to the formation of G-quadruplex-peptide complex at 295 nm, which indicated the unfolding of structure due to the decrease in stacking interactions. The fluorescence data showed quenching upon titrating the peptide with HTPu-var-1-G4. Electrophoretic mobility shift assay confirmed the unfolding of the G4 structure. Cell viability was significantly reduced in the presence of the BLM peptide, with IC50 values of 10.71 μM and 11.83 μM after 72 and 96 hours, respectively. These results confirmed that the selected peptide has the ability to bind to human telomeric G-quadruplex and unfold it. This is the first report in which a peptide was identified from the HRDC domain of the BLM G4-binding protein for the exploration of the G4-binding motif, which suggests a novel strategy to target G4 using natural key peptide segments. Schematic representation of (HTPu–var-1-G4) located at the 3′ end, formation of G-quadruplex, model of the G-quadruplex structure, base stacking between G-quadruplex planes, G-quadruplex structure-peptide complex and twisting of G-quadruplex planes upon peptide binding.![]()
Collapse
Affiliation(s)
- Taniya Sharma
- Structural Biology Lab, Amity Institute of Biotechnology, Amity University Uttar Pradesh Sector-125, Expressway Highway Noida 201313 India +0120-4735600
| | - Nikita Kundu
- Structural Biology Lab, Amity Institute of Biotechnology, Amity University Uttar Pradesh Sector-125, Expressway Highway Noida 201313 India +0120-4735600
| | - Sarvpreet Kaur
- Structural Biology Lab, Amity Institute of Biotechnology, Amity University Uttar Pradesh Sector-125, Expressway Highway Noida 201313 India +0120-4735600
| | - Amlan Chakraborty
- Division of Immunology, Immunity to Infection and Respiratory Medicine (DIIRM), School of Biological Sciences, University of Manchester Manchester England
| | - Aman Kumar Mahto
- Department of Pharmaceutical Chemistry, School of Pharmaceutical Education and Research, Jamia Hamdard New Delhi India
| | - Rikeshwer Prasad Dewangan
- Department of Pharmaceutical Chemistry, School of Pharmaceutical Education and Research, Jamia Hamdard New Delhi India
| | - Jadala Shankaraswamy
- Department of Fruit Science, College of Horticulture, Sri Konda Laxman Telangana State Horticultural University Mojerla 509382 Telangana India
| | - Sarika Saxena
- Structural Biology Lab, Amity Institute of Biotechnology, Amity University Uttar Pradesh Sector-125, Expressway Highway Noida 201313 India +0120-4735600
| |
Collapse
|
5
|
Grübner M, Dunkel A, Steiner F, Hofmann T. Systematic Evaluation of Liquid Chromatography (LC) Column Combinations for Application in Two-Dimensional LC Metabolomic Studies. Anal Chem 2021; 93:12565-12573. [PMID: 34491041 DOI: 10.1021/acs.analchem.1c01857] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In comparison to proteomics, the application of two-dimensional liquid chromatography (2D LC) in the field of metabolomics is still premature. One reason might be the elevated chemical complexity and the associated challenge of selecting proper separation conditions in each dimension. As orthogonality of dimensions is a major issue, the present study aimed for the identification of successful stationary phase combinations. To determine the degree of orthogonality, first, six different metrics, namely, Pearson's correlation coefficient (1 - |R|), the nearest-neighbor distances (H̅NND), the "asterisk equations" (AO), and surface coverage by bins (SCG), convex hulls (SCCH), and α-convex hulls (SCαH), were critically assessed by 15 artificial 2D data sets, and a systematic parameter optimization of α-convex hulls was conducted. SGG, SCαH with α = 0.1, and H̅NND generated valid results with sensitivity toward space utilization and data distribution and, therefore, were applied to pairs of experimental retention time sets obtained for >350 metabolites, selected to represent the chemical space of human urine. Normalized retention data were obtained for 23 chromatographic setups, comprising reversed-phase (RP), hydrophilic interaction liquid chromatography (HILIC), and mixed-mode separation systems with an ion exchange (IEX) contribution. As expected, no single LC setting provided separation of all considered analytes, but while conventional RP×HILIC combinations appeared rather complementary than orthogonal, the incorporation of IEX properties into the RP dimension substantially increased the 2D potential. Eventually, one of the most promising column combinations was implemented for an offline 2D LC time-of-flight mass spectrometry analysis of a lyophilized urine sample. Targeted screening resulted in a total of 164 detected metabolites and confirmed the outstanding coverage of the 2D retention space.
Collapse
Affiliation(s)
- Maria Grübner
- Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Straße 34, Freising 85354, Germany.,Thermo Fisher Scientific, Dornierstraße 4, Germering 82110, Germany
| | - Andreas Dunkel
- Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Straße 34, Freising 85354, Germany.,Leibniz-Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Straße 34, Freising 85354, Germany
| | - Frank Steiner
- Thermo Fisher Scientific, Dornierstraße 4, Germering 82110, Germany
| | - Thomas Hofmann
- Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Straße 34, Freising 85354, Germany
| |
Collapse
|
6
|
Quan L, Dong R, Yang W, Chen L, Lang J, Liu J, Song Y, Ma S, Yang J, Wang W, Meng B, Tian G. Simultaneous detection and comprehensive analysis of HPV and microbiome status of a cervical liquid-based cytology sample using Nanopore MinION sequencing. Sci Rep 2019; 9:19337. [PMID: 31852945 PMCID: PMC6920169 DOI: 10.1038/s41598-019-55843-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 11/21/2019] [Indexed: 02/06/2023] Open
Abstract
Human papillomavirus (HPV) is a major pathogen that causes cervical cancer and many other related diseases. HPV infection related cervical microbiome could be an induce factor of cervical cancer. However, it is uncommon to find a single test on the market that can simultaneously provide information on both HPV and the microbiome. Herein, a novel method was developed in this study to simultaneously detect HPV infection and microbiota composition promptly and accurately. It provides a new and simple way to detect vaginal pathogen situation and also provide valuable information for clinical diagnose. This approach combined multiplex PCR, which targeted both HPV16 E6E7 and full-length 16S rRNA, and Nanopore sequencing to generate enough information to understand the vagina condition of patients. One HPV positive liquid-based cytology (LBC) sample was sequenced and analyzed. After comparing with Illumina sequencing, the results from Nanopore showed a similar microbiome composition. An instant sequencing evaluation showed that 15 min sequencing is enough to identify the top 10 most abundant bacteria. Moreover, two HPV integration sites were identified and verified by Sanger sequencing. This approach has many potential applications in pathogen detection and can potentially aid in providing a more rapid clinical diagnosis.
Collapse
Affiliation(s)
- Lili Quan
- Department of Gynaecology and Obstetrics, Sanmenxia Central Hospital of Henan University of Science and Technology, Sanmenxia, 472000, Henan, China
| | - Ruyi Dong
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China
| | | | - Lanyou Chen
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China
| | - Jidong Lang
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China
| | - Jia Liu
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China
| | - Yu Song
- Department of Gynaecology and Obstetrics, Sanmenxia Central Hospital of Henan University of Science and Technology, Sanmenxia, 472000, Henan, China
| | - Shuiqing Ma
- Department of Gynaecology and Obstetrics, Peking Union Medical College Hospital, Beijing, 100730, China
| | | | - Weiwei Wang
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China
| | - Bo Meng
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China.
| | - Geng Tian
- Geneis (Beijing) Co.Ltd, Beijing, 100102, China.
| |
Collapse
|
7
|
Ruan R, Jiang Z, Wu Y, Xu M, Ni J. High-throughput sequence analysis reveals variation in the relative abundance of components of the bacterial and fungal microbiota in the rhizosphere of Ginkgo biloba. PeerJ 2019; 7:e8051. [PMID: 31741799 PMCID: PMC6859886 DOI: 10.7717/peerj.8051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/17/2019] [Indexed: 11/20/2022] Open
Abstract
Background The narrow region of soil, in contact with and directly influenced by plant roots, is called the rhizosphere. Microbes living in the rhizosphere are considered to be important factors for the normal growth and development of plants. In this research, the structural and functional diversities of microbiota between the Ginkgo biloba root rhizosphere and the corresponding bulk soil were investigated. Methods Three independent replicate sites were selected, and triplicate soil samples were collected from the rhizosphere and the bulk soil at each sampling site. The communities of bacteria and fungi were investigated using high-throughput sequencing of the 16S rRNA gene and the internal transcribed spacer (ITS) of the rRNA gene, respectively. Results A number of bacterial genera showed significantly different abundance in the rhizosphere compared to the bulk soil, including Bradyrhizobium, Rhizobium, Sphingomonas, Streptomyces and Nitrospira. Functional enrichment analysis of bacterial microbiota revealed consistently increased abundance of ATP-binding cassette (ABC) transporters and decreased abundance of two-component systems in the rhizosphere community, compared to the bulk soil community. In contrast, the situation was more complex and inconsistent for fungi, indicating the independency of the rhizosphere fungal community on the local microenvironment.
Collapse
Affiliation(s)
- Rujue Ruan
- Hangzhou Normal University, Key Laboratory of Hangzhou City for Quality and Safety of Agricultural Products, College of Life and Environmental Sciences, Hangzhou, China.,Hangzhou Normal University, Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, China
| | - Zhifang Jiang
- Hangzhou Normal University, Key Laboratory of Hangzhou City for Quality and Safety of Agricultural Products, College of Life and Environmental Sciences, Hangzhou, China.,Hangzhou Normal University, Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, China
| | - Yuhuan Wu
- Hangzhou Normal University, Key Laboratory of Hangzhou City for Quality and Safety of Agricultural Products, College of Life and Environmental Sciences, Hangzhou, China.,Hangzhou Normal University, Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, China
| | - Maojun Xu
- Hangzhou Normal University, Key Laboratory of Hangzhou City for Quality and Safety of Agricultural Products, College of Life and Environmental Sciences, Hangzhou, China.,Hangzhou Normal University, Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, China
| | - Jun Ni
- Hangzhou Normal University, Key Laboratory of Hangzhou City for Quality and Safety of Agricultural Products, College of Life and Environmental Sciences, Hangzhou, China.,Hangzhou Normal University, Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, China
| |
Collapse
|
8
|
Xaxiri NA, Nikouli E, Berillis P, Kormas KA. Bacterial biofilm development during experimental degradation of Melicertus kerathurus exoskeleton in seawater. AIMS Microbiol 2019; 4:397-412. [PMID: 31294223 PMCID: PMC6604942 DOI: 10.3934/microbiol.2018.3.397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Accepted: 05/29/2018] [Indexed: 11/28/2022] Open
Abstract
Chitinolytic bacteria are widespread in marine and terrestrial environment, and this is rather a reflection of their principle growth substrate's ubiquity, chitin, in our planet. In this paper, we investigated the development of naturally occurring bacterial biofilms on the exoskeleton of the shrimp Melicertus kerathurus during its degradation in sea water. During a 12-day experiment with exoskeleton fragments in batch cultures containing only sea water as the growth medium at 18 °C in darkness, we analysed the formation and succession of biofilms by scanning electron microscopy and 16S rRNA gene diversity by next generation sequencing. Bacteria belonging to the γ- and α-Proteobacteria and Bacteroidetes showed marked (less or more than 10%) changes in their relative abundance from the beginning of the experiment. These bacterial taxa related to known chitinolytic bacteria were the Pseudolateromonas porphyrae, Halomonasaquamarina, Reinekea aestuarii, Colwellia asteriadis and Vibrio crassostreae. These bacteria could be considered as appropriate candidates for the degradation of chitinous crustacean waste from the seafood industry as they dominated in the biofilms developed on the shrimp's exoskeleton in natural sea water with no added substrates and the degradation of the shrimp exoskeleton was also evidenced.
Collapse
Affiliation(s)
- Nikolina-Alexandra Xaxiri
- Department of Ichthyology & Aquatic Environment, School of Agricultural Sciences, University of Thessaly, 38446 Volos, Greece
| | - Eleni Nikouli
- Department of Ichthyology & Aquatic Environment, School of Agricultural Sciences, University of Thessaly, 38446 Volos, Greece
| | - Panagiotis Berillis
- Department of Ichthyology & Aquatic Environment, School of Agricultural Sciences, University of Thessaly, 38446 Volos, Greece
| | - Konstantinos Ar Kormas
- Department of Ichthyology & Aquatic Environment, School of Agricultural Sciences, University of Thessaly, 38446 Volos, Greece
| |
Collapse
|
9
|
Galbraith H, Iwanowicz D, Spooner D, Iwanowicz L, Keller D, Zelanko P, Adams C. Exposure to synthetic hydraulic fracturing waste influences the mucosal bacterial community structure of the brook trout ( Salvelinus fontinalis) epidermis. AIMS Microbiol 2018; 4:413-427. [PMID: 31294224 PMCID: PMC6604949 DOI: 10.3934/microbiol.2018.3.413] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 05/16/2018] [Indexed: 12/14/2022] Open
Abstract
Production of natural gas using unconventional technologies has risen as demand for alternative fuels has increased. Impacts on the environment from waste generated from these processes are largely unexplored. In particular, the outcomes of organismal exposure to hydraulic fracturing waste have not been rigorously evaluated. We evaluated the effects of exposure to surrogate hydraulic fracturing waste (HF waste) on mucosal bacterial community structure of the brook trout (Salvelinus fontinalis) epidermis. Brook trout are fish native to streams at risk to HF waste exposure. Here, fish were exposed to four treatments (control, 0.00%; low, 0.01%; medium, 0.10%; and high, 1.0% concentrations) of surrogate HF waste synthesized to mimic concentrations documented in the field. Epidermal mucus samples were collected and assessed 15 days post-exposure to determine if the associated bacterial community varied among treatments. We observed differences in epidermal mucosal bacterial community composition at multiple taxonomic scales among treatments. These community changes reflected compositional differences in taxa dominance and community similarity rather than losses or gains in taxonomic richness. The dominant bacterial genus that explained the greatest variation in community structure between exposed and unexposed fish was Flavobacterium. Two genera associated with salmonid diseases, Flavobacterium and Pseudomonas, were statistically more abundant in high treatments than controls. These results suggest that exposure to low levels of HF waste influences bacterial colonization and may lead to a disruption that favors bacterial populations associated with fish disease.
Collapse
Affiliation(s)
- Heather Galbraith
- U.S. Geological Survey, Leetown Science Center, Northern Appalachian Research Laboratory, 176 Straight Run Road, Wellsboro, PA, USA
| | - Deborah Iwanowicz
- U.S. Geological Survey, Leetown Science Center, National Fish Health Research Laboratory, 11649 Leetown Road, Kearneysville, WV, USA
| | - Daniel Spooner
- U.S. Geological Survey, Leetown Science Center, Northern Appalachian Research Laboratory, 176 Straight Run Road, Wellsboro, PA, USA.,George Mason University, Department of Environmental Science and Policy, 4400 University Drive, Fairfax, VA, USA
| | - Luke Iwanowicz
- U.S. Geological Survey, Leetown Science Center, National Fish Health Research Laboratory, 11649 Leetown Road, Kearneysville, WV, USA
| | - David Keller
- The Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Pkwy, Philadelphia, PA, USA
| | - Paula Zelanko
- George Mason University, Department of Environmental Science and Policy, 4400 University Drive, Fairfax, VA, USA
| | - Cynthia Adams
- U.S. Geological Survey, Leetown Science Center, National Fish Health Research Laboratory, 11649 Leetown Road, Kearneysville, WV, USA
| |
Collapse
|
10
|
Bell MJ, Lord P. On patterns and re-use in bioinformatics databases. Bioinformatics 2018; 33:2731-2736. [PMID: 28525546 PMCID: PMC5860070 DOI: 10.1093/bioinformatics/btx310] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Accepted: 05/16/2017] [Indexed: 11/20/2022] Open
Abstract
Motivation As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. Results We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Availability and implementation Analytical software is available on request.
Collapse
Affiliation(s)
- Michael J Bell
- School of Computing Science, Newcastle University, Newcastle-upon-Tyne, UK
| | - Phillip Lord
- School of Computing Science, Newcastle University, Newcastle-upon-Tyne, UK
| |
Collapse
|
11
|
Imker HJ. 25 Years of Molecular Biology Databases: A Study of Proliferation, Impact, and Maintenance. Front Res Metr Anal 2018. [DOI: 10.3389/frma.2018.00018] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
12
|
Automated extraction of potential migraine biomarkers using a semantic graph. J Biomed Inform 2017; 71:178-189. [PMID: 28579531 DOI: 10.1016/j.jbi.2017.05.018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Revised: 04/03/2017] [Accepted: 05/23/2017] [Indexed: 01/20/2023]
Abstract
PROBLEM Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.
Collapse
|
13
|
Brown JAL. Evaluating the effectiveness of a practical inquiry-based learning bioinformatics module on undergraduate student engagement and applied skills. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2016; 44:304-13. [PMID: 27161812 DOI: 10.1002/bmb.20954] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 11/20/2015] [Accepted: 12/08/2015] [Indexed: 05/27/2023]
Abstract
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016.
Collapse
Affiliation(s)
- James A L Brown
- Department of Biochemistry, School of Natural Sciences, National University of Ireland Galway, Ireland and Discipline of Surgery, School of Medicine, Lambe Institute for Translational Research, National University of Ireland Galway, Ireland
| |
Collapse
|
14
|
Yohe SL, Carter AB, Pfeifer JD, Crawford JM, Cushman-Vokoun A, Caughron S, Leonard DGB. Standards for Clinical Grade Genomic Databases. Arch Pathol Lab Med 2016; 139:1400-12. [PMID: 26516938 DOI: 10.5858/arpa.2014-0568-cp] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. OBJECTIVES To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. DESIGN Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. RESULTS Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. CONCLUSION These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Debra G B Leonard
- From the Department of Laboratory Medicine and Pathology, University of Minnesota Medical Center, Minneapolis (Dr Yohe); the Department of Pathology and Laboratory Medicine and the Department of Biomedical Informatics, Emory University, Atlanta, Georgia (Dr Carter); the Department of Pathology, Washington University School of Medicine, St. Louis, Missouri (Dr Pfeifer); the Department of Pathology and Laboratory Medicine, Hofstra North Shore-Long Island Jewish School of Medicine, Hempstead, New York (Dr Crawford); the Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha (Dr Cushman-Vokoun); the MAWD Pathology Group, North Kansas City, Missouri (Dr Caughron); and the Department of Pathology and Laboratory Medicine, University of Vermont College of Medicine, Burlington (Dr Leonard)
| |
Collapse
|
15
|
Kerksick CM, Tsatsakis AM, Hayes AW, Kafantaris I, Kouretas D. How can bioinformatics and toxicogenomics assist the next generation of research on physical exercise and athletic performance. J Strength Cond Res 2015; 29:270-8. [PMID: 25353080 DOI: 10.1519/jsc.0000000000000730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The past 2-3 decades have seen an explosion in analytical areas related to "omic" technologies. These advancements have reached a point where their application can be and are being used as a part of exercise physiology and sport performance research. Such advancements have drastically enabled researchers to analyze extremely large groups of data that can provide amounts of information never before made available. Although these "omic" technologies offer exciting possibilities, the analytical costs and time required to complete the statistical approaches are substantial. The areas of exercise physiology and sport performance continue to witness an exponential growth of published studies using any combination of these techniques. Because more investigators within these traditionally applied science disciplines use these approaches, the need for efficient, thoughtful, and accurate extraction of information from electronic databases is paramount. As before, these disciplines can learn much from other disciplines who have already developed software and technologies to rapidly enhance the quality of results received when searching for key information. In addition, further development and interest in areas such as toxicogenomics could aid in the development and identification of more accurate testing programs for illicit drugs, performance enhancing drugs abused in sport, and better therapeutic outcomes from prescribed drug use. This review is intended to offer a discussion related to how bioinformatics approaches may assist the new generation of "omic" research in areas related to exercise physiology and toxicogenomics. Consequently, more focus will be placed on popular tools that are already available for analyzing such complex data and highlighting additional strategies and considerations that can further aid in developing new tools and data management approaches to assist future research in this field. It is our contention that introducing more scientists to how this type of work can complement existing experimental approaches within exercise physiology and sport performance will foster additional discussion and stimulate new research in these areas.
Collapse
Affiliation(s)
- Chad M Kerksick
- 1Department of Exercise Science, School of Sport, Recreation and Exercise Sciences, Lindenwood University, St. Charles, Missouri; 2Department of Forensic Sciences and Toxicology, Laboratory of Toxicology, Medical School, University of Crete, Heraklion, Greece; 3Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts; 4Spherix Consulting, Inc., Bethesda, Maryland; and 5Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece
| | | | | | | | | |
Collapse
|
16
|
Preissner S, Kostka E, Mokross M, Kersten NV, Blunck U, Preissner R. DBEndo: a web-based endodontic case management tool. BMC Res Notes 2015; 8:685. [PMID: 26577058 PMCID: PMC4650323 DOI: 10.1186/s13104-015-1680-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 11/06/2015] [Indexed: 11/30/2022] Open
Abstract
Background The success of endodontic treatment depends—among many other factors—on good documentation. Paper-based records are often difficult to read or incomplete and commercially available tools focus on billing. An electronic record captures the state of treatment at all times. Databases are a common tool in everyday life. Results Here, we present a database created for the Charité—Universitätsmedizin Berlin, Germany. Through consistent digital documentation, data analytics of patients, root canal anatomies, instrumentation techniques, efficacy of chemical disinfection, root filling techniques, and corresponding recall success rates, which needed extensive research before, are now easy to perform. Tables and even graphics and data analystics are only one click away and can be exported to other programs. Conclusions DBEndo is a database to store and visualise internally, as well as to share endodontic cases online. For academic use we provide the database including all forms and some anonymous data for free at: http://dbendo.charite.de. Through easy import and export of the data, the system is open and flexible.
Collapse
Affiliation(s)
- Saskia Preissner
- Department of Operative and Preventive Dentistry, Charité-Universitätsmedizin Berlin, Assmannshauser Straße 4-6, 14197, Berlin, Germany.
| | - Eckehard Kostka
- Department of Operative and Preventive Dentistry, Charité-Universitätsmedizin Berlin, Assmannshauser Straße 4-6, 14197, Berlin, Germany.
| | - Mareike Mokross
- Institute for Physiology and DKTK, Charité-Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany.
| | - Nina V Kersten
- Institute for Physiology and DKTK, Charité-Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany.
| | - Uwe Blunck
- Department of Operative and Preventive Dentistry, Charité-Universitätsmedizin Berlin, Assmannshauser Straße 4-6, 14197, Berlin, Germany.
| | - Robert Preissner
- Institute for Physiology and DKTK, Charité-Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany.
| |
Collapse
|
17
|
Karbhal R, Sawant S, Kulkarni-Kale U. BioDB extractor: customized data extraction system for commonly used bioinformatics databases. BioData Min 2015; 8:31. [PMID: 26516349 PMCID: PMC4624652 DOI: 10.1186/s13040-015-0067-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 10/20/2015] [Indexed: 12/15/2022] Open
Abstract
Background Diverse types of biological data, primary as well as derived, are available in various formats and are stored in heterogeneous resources. Database-specific as well as integrated search engines are available for carrying out efficient searches of databases. These search engines however, do not support extraction of subsets of data with the same level of granularity that exists in typical database entries. In order to extract fine grained subsets of data, users are required to download complete or partial database entries and write scripts for parsing and extraction. Results BioDBExtractor (BDE) has been developed to provide 26 customized data extraction utilities for some of the commonly used databases such as ENA (EMBL-Bank), UniprotKB, PDB, and KEGG. BDE eliminates the need for downloading entries and writing scripts. BDE has a simple web interface that enables input of query in the form of accession numbers/ID codes, choice of utilities and selection of fields/subfields of data by the users. Conclusions BDE thus provides a common data extraction platform for multiple databases and is useful to both, novice and expert users. BDE, however, is not a substitute to basic keyword-based database searches. Desired subsets of data, compiled using BDE can be subsequently used for downstream processing, analyses and knowledge discovery. Availability BDE can be accessed from http://bioinfo.net.in/BioDB/Home.html.
Collapse
Affiliation(s)
- Rajiv Karbhal
- Bioinformatics Centre, Savitribai Phule Pune University, Ganeshkhind, Pune, 411007 Maharashtra India
| | - Sangeeta Sawant
- Bioinformatics Centre, Savitribai Phule Pune University, Ganeshkhind, Pune, 411007 Maharashtra India
| | - Urmila Kulkarni-Kale
- Bioinformatics Centre, Savitribai Phule Pune University, Ganeshkhind, Pune, 411007 Maharashtra India
| |
Collapse
|
18
|
Jelokhani-Niaraki S, Tahmoorespur M, Minuchehr Z, Nassiri MR. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes. Genomics Inform 2015; 13:7-14. [PMID: 25873847 PMCID: PMC4394238 DOI: 10.5808/gi.2015.13.1.7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2014] [Revised: 01/16/2015] [Accepted: 01/28/2015] [Indexed: 11/20/2022] Open
Abstract
During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.
Collapse
Affiliation(s)
- Saber Jelokhani-Niaraki
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad 91775-1163, Iran
| | - Mojtaba Tahmoorespur
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad 91775-1163, Iran
| | - Zarrin Minuchehr
- National Institute of Genetic Engineering and Biotechnology, Tehran 14965-161, Iran
| | - Mohammad Reza Nassiri
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad 91775-1163, Iran
| |
Collapse
|
19
|
Zamah AM, Hassis ME, Albertolle ME, Williams KE. Proteomic analysis of human follicular fluid from fertile women. Clin Proteomics 2015; 12:5. [PMID: 25838815 PMCID: PMC4357057 DOI: 10.1186/s12014-015-9077-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 02/09/2015] [Indexed: 01/08/2023] Open
Abstract
Background Follicular fluid is a unique biological fluid in which the critical events of oocyte and follicular maturation and somatic cell-germ cell communication occur. Because of the intimate proximity of follicular fluid to the maturing oocyte, this fluid provides a unique window into the processes occurring during follicular maturation. A thorough identification of the specific components within follicular fluid may provide a better understanding of intrafollicular signaling, as well as reveal potential biomarkers of oocyte health for women undergoing assisted reproductive treatment. In this study, we used high and low pH HPLC peptide separations followed by mass spectrometry to perform a comprehensive proteomic analysis of human follicular fluid from healthy ovum donors. Next, using samples from a second set of patients, an isobaric mass tagging strategy for quantitative analysis was used to identify proteins with altered abundances after hCG treatment. Results A total of 742 follicular fluid proteins were identified in healthy ovum donors, including 413 that have not been previously reported. The proteins belong to diverse functional groups including insulin growth factor and insulin growth factor binding protein families, growth factor and related proteins, receptor signaling, defense/immunity, anti-apoptotic proteins, matrix metalloprotease related proteins, and complement activity. In a quantitative analysis, follicular fluid samples from age-matched women undergoing in vitro fertilization oocyte retrieval were compared and 17 follicular fluid proteins were found at significantly altered levels (p < 0.05) between pre-hCG and post-hCG samples. These proteins belong to a variety of functional processes, including protease inhibition, inflammation, and cell adhesion. Conclusions This database of FF proteins significantly extends the known protein components present during the peri-ovulatory period and provides a useful basis for future studies comparing follicular fluid proteomes in various fertility, disease, and environmental exposure conditions. We identified 17 differentially expressed proteins after hCG treatment and together these data showed the feasibility for defining biomarkers that illuminate how the ovarian follicle microenvironment is altered in various infertility-related conditions. Electronic supplementary material The online version of this article (doi:10.1186/s12014-015-9077-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alberuni M Zamah
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, University of Illinois at Chicago College of Medicine, Chicago, IL 60612 USA
| | - Maria E Hassis
- Sandler-Moore Mass Spectrometry Core Facility, University of California at San Francisco, San Francisco, CA 94143 USA
| | - Matthew E Albertolle
- Sandler-Moore Mass Spectrometry Core Facility, University of California at San Francisco, San Francisco, CA 94143 USA
| | - Katherine E Williams
- Sandler-Moore Mass Spectrometry Core Facility, University of California at San Francisco, San Francisco, CA 94143 USA ; Center for Reproductive Sciences and the Department of Obstetrics and Gynecology, University of California at San Francisco, San Francisco, CA 94143 USA
| |
Collapse
|
20
|
Yu Q, Ding Y, Song M, Song S, Liu J, Zhang B. Tracing database usage: Detecting main paths in database link networks. J Informetr 2015. [DOI: 10.1016/j.joi.2014.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
21
|
Olsen LR, Campos B, Barnkob MS, Winther O, Brusic V, Andersen MH. Bioinformatics for cancer immunotherapy target discovery. Cancer Immunol Immunother 2014; 63:1235-49. [PMID: 25344903 PMCID: PMC11029190 DOI: 10.1007/s00262-014-1627-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 10/08/2014] [Indexed: 12/13/2022]
Abstract
The mechanisms of immune response to cancer have been studied extensively and great effort has been invested into harnessing the therapeutic potential of the immune system. Immunotherapies have seen significant advances in the past 20 years, but the full potential of protective and therapeutic cancer immunotherapies has yet to be fulfilled. The insufficient efficacy of existing treatments can be attributed to a number of biological and technical issues. In this review, we detail the current limitations of immunotherapy target selection and design, and review computational methods to streamline therapy target discovery in a bioinformatics analysis pipeline. We describe specialized bioinformatics tools and databases for three main bottlenecks in immunotherapy target discovery: the cataloging of potentially antigenic proteins, the identification of potential HLA binders, and the selection epitopes and co-targets for single-epitope and multi-epitope strategies. We provide examples of application to the well-known tumor antigen HER2 and suggest bioinformatics methods to ameliorate therapy resistance and ensure efficient and lasting control of tumors.
Collapse
Affiliation(s)
- Lars Rønn Olsen
- Department of Biology, Bioinformatics Centre, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark,
| | | | | | | | | | | |
Collapse
|
22
|
Möller S, Afgan E, Banck M, Bonnal RJP, Booth T, Chilton J, Cock PJA, Gumbel M, Harris N, Holland R, Kalaš M, Kaján L, Kibukawa E, Powel DR, Prins P, Quinn J, Sallou O, Strozzi F, Seemann T, Sloggett C, Soiland-Reyes S, Spooner W, Steinbiss S, Tille A, Travis AJ, Guimera R, Katayama T, Chapman BA. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 2014; 15 Suppl 14:S7. [PMID: 25472764 PMCID: PMC4255748 DOI: 10.1186/1471-2105-15-s14-s7] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.
Collapse
|
23
|
Abstract
Purpose
– The purpose of this paper is to aim at modelling the trails, which are search patterns with several search systems across the heterogeneous information environment. In addition, the author seeks to examine what kinds of trails occur in routine, semi-complex and complex tasks, and what barrier types occur during the trail-blazing.
Design/methodology/approach
– The author used qualitative task-based approach with shadowing of six molecular medicine researchers during six months, and collected their web interaction logs. Data triangulation made this kind of detailed search system integration analysis possible.
Findings
– Five trail patterns emerged: branches, chains, lists, singles and berrypicking trails. The berrypicking was typical to complex work tasks, whereas the branches were common in routine work tasks. Singles and lists were employed typically in semi-complex tasks. In all kinds of trails, the barriers occurred often during the interaction with a single system, but there was a considerable number of barriers with the malfunctioning system integration, and lacking integration features. The findings propose that the trails could be used to reduce the amount of laborious manual system integration, and that there is a need for support to explorative search process in berrypicking trails.
Originality/value
– Research of information behaviour yielding to different types of search patters with several search systems during real-world work task performance in molecular medicine have not been published previously. The author presents a task-based approach how to model search behaviour patterns. The author discusses the issue of system integration, which is a great challenge in biomedical domain, from the viewpoints of information studies and search behaviour.
Collapse
|
24
|
D'Angelo G, Rampone S. Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications. BMC Bioinformatics 2014; 15 Suppl 5:S2. [PMID: 25077818 PMCID: PMC4095002 DOI: 10.1186/1471-2105-15-s5-s2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. Conclusions In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one.
Collapse
|
25
|
A content and structural assessment of oxidative motifs across a diverse set of life forms. Comput Biol Med 2014; 53:179-89. [PMID: 25151511 DOI: 10.1016/j.compbiomed.2014.07.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 06/14/2014] [Accepted: 07/16/2014] [Indexed: 11/24/2022]
Abstract
Exposure to weightlessness (microgravity) or other protein stresses are detrimental to animal and human protein tissue health. Protein damage has been associated with stress and is linked to aging and the onset of diseases such as Alzheimer׳s, Parkinson׳s, sepsis, and others. Protein stresses may cause alterations to physical protein structure, altering its functional identity. Alterations from stresses such as microgravity may be responsible for forms of muscle atrophy (as noted in returning astronauts), however, protein stresses come from other sources as well. Oxidative carbonylation is a protein stress which is a driving force behind protein decay and is attracted to protein segments enriched in R, K, P, T, E and S residues. Since mitochondria apply oxidative processes to produce ATP, their proteins may be placed in the same danger as those that are exposed to stresses. However, they do not appear to be impacted in the same way. Across 14 diverse organisms, we evaluate the coverage of motifs which are high in the amino acids thought to be affected by protein stresses such as oxidation. For this study, we study RKPT and PEST motifs which are both responsible for attracting forms of oxidation across mitochondrial and non-mitochondrial proteins. We show that mitochondrial proteins have fewer of these oxidative sites compared to non-mitochondrial proteins. Additionally, we analyze the oxidative regions to determine that their motifs preferentially tend to make up the connection points between the four kinds of structures of folded proteins (helices, turns, sheets, and coils).
Collapse
|
26
|
Arend D, Lange M, Chen J, Colmsee C, Flemming S, Hecht D, Scholz U. e!DAL--a framework to store, share and publish research data. BMC Bioinformatics 2014; 15:214. [PMID: 24958009 PMCID: PMC4080583 DOI: 10.1186/1471-2105-15-214] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 06/12/2014] [Indexed: 11/10/2022] Open
Abstract
Background The life-science community faces a major challenge in handling “big data”, highlighting the need for high quality infrastructures capable of sharing and publishing research data. Data preservation, analysis, and publication are the three pillars in the “big data life cycle”. The infrastructures currently available for managing and publishing data are often designed to meet domain-specific or project-specific requirements, resulting in the repeated development of proprietary solutions and lower quality data publication and preservation overall. Results e!DAL is a lightweight software framework for publishing and sharing research data. Its main features are version tracking, metadata management, information retrieval, registration of persistent identifiers (DOI), an embedded HTTP(S) server for public data access, access as a network file system, and a scalable storage backend. e!DAL is available as an API for local non-shared storage and as a remote API featuring distributed applications. It can be deployed “out-of-the-box” as an on-site repository. Conclusions e!DAL was developed based on experiences coming from decades of research data management at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK). Initially developed as a data publication and documentation infrastructure for the IPK’s role as a data center in the DataCite consortium, e!DAL has grown towards being a general data archiving and publication infrastructure. The e!DAL software has been deployed into the Maven Central Repository. Documentation and Software are also available at: http://edal.ipk-gatersleben.de.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstr, 3, 06466 Stadt Seeland, Germany.
| | | | | | | | | | | | | |
Collapse
|
27
|
Li M, Chen YB, Clintworth WA. Expanding roles in a library-based bioinformatics service program: a case study. J Med Libr Assoc 2014; 101:303-9. [PMID: 24163602 DOI: 10.3163/1536-5050.101.4.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
QUESTION How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? SETTING A program at a health sciences library serving a large academic medical center with a strong research focus is described. METHODS The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. RESULTS Personnel and software have increased along with the number of registered software users and use of the provided services. CONCLUSION With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs.
Collapse
|
28
|
Bölling C, Weidlich M, Holzhütter HG. SEE: structured representation of scientific evidence in the biomedical domain using Semantic Web techniques. J Biomed Semantics 2014; 5:S1. [PMID: 25093070 PMCID: PMC4108886 DOI: 10.1186/2041-1480-5-s1-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accounts of evidence are vital to evaluate and reproduce scientific findings and integrate data on an informed basis. Currently, such accounts are often inadequate, unstandardized and inaccessible for computational knowledge engineering even though computational technologies, among them those of the semantic web, are ever more employed to represent, disseminate and integrate biomedical data and knowledge. RESULTS We present SEE (Semantic EvidencE), an RDF/OWL based approach for detailed representation of evidence in terms of the argumentative structure of the supporting background for claims even in complex settings. We derive design principles and identify minimal components for the representation of evidence. We specify the Reasoning and Discourse Ontology (RDO), an OWL representation of the model of scientific claims, their subjects, their provenance and their argumentative relations underlying the SEE approach. We demonstrate the application of SEE and illustrate its design patterns in a case study by providing an expressive account of the evidence for certain claims regarding the isolation of the enzyme glutamine synthetase. CONCLUSIONS SEE is suited to provide coherent and computationally accessible representations of evidence-related information such as the materials, methods, assumptions, reasoning and information sources used to establish a scientific finding by adopting a consistently claim-based perspective on scientific results and their evidence. SEE allows for extensible evidence representations, in which the level of detail can be adjusted and which can be extended as needed. It supports representation of arbitrary many consecutive layers of interpretation and attribution and different evaluations of the same data. SEE and its underlying model could be a valuable component in a variety of use cases that require careful representation or examination of evidence for data presented on the semantic web or in other formats.
Collapse
Affiliation(s)
- Christian Bölling
- Institute of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Michael Weidlich
- Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | | |
Collapse
|
29
|
Adusumalli S, Mohd Omar MF, Soong R, Benoukraf T. Methodological aspects of whole-genome bisulfite sequencing analysis. Brief Bioinform 2014; 16:369-79. [DOI: 10.1093/bib/bbu016] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 04/17/2014] [Indexed: 12/17/2022] Open
|
30
|
Wittig U, Kania R, Bittkowski M, Wetsch E, Shi L, Jong L, Golebiewski M, Rey M, Weidemann A, Rojas I, Müller W. Data extraction for the reaction kinetics database SABIO-RK. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/j.pisc.2014.02.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
31
|
Jung JY, DeLuca TF, Nelson TH, Wall DP. A literature search tool for intelligent extraction of disease-associated genes. J Am Med Inform Assoc 2014; 21:399-405. [PMID: 23999671 PMCID: PMC3994846 DOI: 10.1136/amiajnl-2012-001563] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 07/15/2013] [Accepted: 08/08/2013] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. METHODS We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. RESULTS We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. CONCLUSIONS We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
Collapse
Affiliation(s)
- Jae-Yoon Jung
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Todd F DeLuca
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tristan H Nelson
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Dennis P Wall
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| |
Collapse
|
32
|
Vitting-Seerup K, Porse BT, Sandelin A, Waage J. spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinformatics 2014; 15:81. [PMID: 24655717 PMCID: PMC3998036 DOI: 10.1186/1471-2105-15-81] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 03/17/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND RNA-seq data is currently underutilized, in part because it is difficult to predict the functional impact of alternate transcription events. Recent software improvements in full-length transcript deconvolution prompted us to develop spliceR, an R package for classification of alternative splicing and prediction of coding potential. RESULTS spliceR uses the full-length transcript output from RNA-seq assemblers to detect single or multiple exon skipping, alternative donor and acceptor sites, intron retention, alternative first or last exon usage, and mutually exclusive exon events. For each of these events spliceR also annotates the genomic coordinates of the differentially spliced elements, facilitating downstream sequence analysis. For each transcript isoform fraction values are calculated to identify transcript switching between conditions. Lastly, spliceR predicts the coding potential, as well as the potential nonsense mediated decay (NMD) sensitivity of each transcript. CONCLUSIONS spliceR is an easy-to-use tool that extends the usability of RNA-seq and assembly technologies by allowing greater depth of annotation of RNA-seq data. spliceR is implemented as an R package and is freely available from the Bioconductor repository ( http://www.bioconductor.org/packages/2.13/bioc/html/spliceR.html).
Collapse
Affiliation(s)
| | | | - Albin Sandelin
- Department of Biology, The Bioinformatics Centre, University of Copenhagen, Ole Maaloes Vej 5, Copenhagen, DK2200, Denmark.
| | | |
Collapse
|
33
|
Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform 2014; 16:280-90. [PMID: 24608524 PMCID: PMC4364065 DOI: 10.1093/bib/bbu006] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The rise of personalized medicine and the availability of high-throughput molecular analyses in the context of clinical care have increased the need for adequate tools for translational researchers to manage and explore these data. We reviewed the biomedical literature for translational platforms allowing the management and exploration of clinical and omics data, and identified seven platforms: BRISK, caTRIP, cBio Cancer Portal, G-DOC, iCOD, iDASH and tranSMART. We analyzed these platforms along seven major axes. (1) The community axis regrouped information regarding initiators and funders of the project, as well as availability status and references. (2) We regrouped under the information content axis the nature of the clinical and omics data handled by each system. (3) The privacy management environment axis encompassed functionalities allowing control over data privacy. (4) In the analysis support axis, we detailed the analytical and statistical tools provided by the platforms. We also explored (5) interoperability support and (6) system requirements. The final axis (7) platform support listed the availability of documentation and installation procedures. A large heterogeneity was observed in regard to the capability to manage phenotype information in addition to omics data, their security and interoperability features. The analytical and visualization features strongly depend on the considered platform. Similarly, the availability of the systems is variable. This review aims at providing the reader with the background to choose the platform best suited to their needs. To conclude, we discuss the desiderata for optimal translational research platforms, in terms of privacy, interoperability and technical features.
Collapse
|
34
|
Katayama T, Wilkinson MD, Aoki-Kinoshita KF, Kawashima S, Yamamoto Y, Yamaguchi A, Okamoto S, Kawano S, Kim JD, Wang Y, Wu H, Kano Y, Ono H, Bono H, Kocbek S, Aerts J, Akune Y, Antezana E, Arakawa K, Aranda B, Baran J, Bolleman J, Bonnal RJ, Buttigieg PL, Campbell MP, Chen YA, Chiba H, Cock PJ, Cohen KB, Constantin A, Duck G, Dumontier M, Fujisawa T, Fujiwara T, Goto N, Hoehndorf R, Igarashi Y, Itaya H, Ito M, Iwasaki W, Kalaš M, Katoda T, Kim T, Kokubu A, Komiyama Y, Kotera M, Laibe C, Lapp H, Lütteke T, Marshall MS, Mori T, Mori H, Morita M, Murakami K, Nakao M, Narimatsu H, Nishide H, Nishimura Y, Nystrom-Persson J, Ogishima S, Okamura Y, Okuda S, Oshita K, Packer NH, Prins P, Ranzinger R, Rocca-Serra P, Sansone S, Sawaki H, Shin SH, Splendiani A, Strozzi F, Tadaka S, Toukach P, Uchiyama I, Umezaki M, Vos R, Whetzel PL, Yamada I, Yamasaki C, Yamashita R, York WS, Zmasek CM, Kawamoto S, Takagi T. BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J Biomed Semantics 2014; 5:5. [PMID: 24495517 PMCID: PMC3978116 DOI: 10.1186/2041-1480-5-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/26/2013] [Indexed: 01/24/2023] Open
Abstract
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Masseroli M, Mons B, Bongcam-Rudloff E, Ceri S, Kel A, Rechenmann F, Lisacek F, Romano P. Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information. BMC Bioinformatics 2014; 15 Suppl 1:S2. [PMID: 24564249 PMCID: PMC4015876 DOI: 10.1186/1471-2105-15-s1-s2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
Collapse
Affiliation(s)
- Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Barend Mons
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
- Netherlands Bioinformatics Center, Nijmegen, 6500 HB, The Netherlands
| | - Erik Bongcam-Rudloff
- Department of Animal Breeding and Genetics, SLU-Global Bioinformatics Centre, Swedish University of Agricultural Sciences, Uppsala, 75124, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, 75108, Sweden
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Alexander Kel
- GeneXplain GmbH, Wolfenbüttel, 38302, Germany
- Institute of Chemical Biology and Fundamental Medicine SBRAS, Novosibirsk, 630090, Russia
| | | | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva 4, Switzerland
- Section of Biology, University of Geneva, 1211 Geneva 4, Switzerland
| | - Paolo Romano
- Biopolymers and Proteomics, IRCCS AOU San Martino IST, Genoa, 16132, Italy
| |
Collapse
|
36
|
Casado-Vela J, Fuentes M, Franco-Zorrilla JM. Screening of Protein–Protein and Protein–DNA Interactions Using Microarrays. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 95:231-81. [DOI: 10.1016/b978-0-12-800453-1.00008-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
37
|
Mudunuri US, Khouja M, Repetski S, Venkataraman G, Che A, Luke BT, Girard FP, Stephens RM. Knowledge and theme discovery across very large biological data sets using distributed queries: a prototype combining unstructured and structured data. PLoS One 2013; 8:e80503. [PMID: 24312478 PMCID: PMC3846626 DOI: 10.1371/journal.pone.0080503] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 10/03/2013] [Indexed: 11/21/2022] Open
Abstract
As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.
Collapse
Affiliation(s)
- Uma S. Mudunuri
- Advanced Biomedical Computing Center, Information Systems Program, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Mohamad Khouja
- Oracle Corporation, Reston, Virginia, United States of America
| | | | | | - Anney Che
- Advanced Biomedical Computing Center, Information Systems Program, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Brian T. Luke
- Advanced Biomedical Computing Center, Information Systems Program, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | | | - Robert M. Stephens
- Advanced Biomedical Computing Center, Information Systems Program, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| |
Collapse
|
38
|
Mori H, Maruyama F, Kato H, Toyoda A, Dozono A, Ohtsubo Y, Nagata Y, Fujiyama A, Tsuda M, Kurokawa K. Design and experimental application of a novel non-degenerate universal primer set that amplifies prokaryotic 16S rRNA genes with a low possibility to amplify eukaryotic rRNA genes. DNA Res 2013; 21:217-27. [PMID: 24277737 PMCID: PMC3989492 DOI: 10.1093/dnares/dst052] [Citation(s) in RCA: 284] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities.
Collapse
Affiliation(s)
- Hiroshi Mori
- 1Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 B-36, Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Isci S, Dogan H, Ozturk C, Otu HH. Bayesian network prior: network analysis of biological data using external knowledge. ACTA ACUST UNITED AC 2013; 30:860-7. [PMID: 24215027 PMCID: PMC3957076 DOI: 10.1093/bioinformatics/btt643] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Motivation: Reverse engineering GI networks from experimental data is a challenging task due to the complex nature of the networks and the noise inherent in the data. One way to overcome these hurdles would be incorporating the vast amounts of external biological knowledge when building interaction networks. We propose a framework where GI networks are learned from experimental data using Bayesian networks (BNs) and the incorporation of external knowledge is also done via a BN that we call Bayesian Network Prior (BNP). BNP depicts the relation between various evidence types that contribute to the event ‘gene interaction’ and is used to calculate the probability of a candidate graph (G) in the structure learning process. Results: Our simulation results on synthetic, simulated and real biological data show that the proposed approach can identify the underlying interaction network with high accuracy even when the prior information is distorted and outperforms existing methods. Availability: Accompanying BNP software package is freely available for academic use at http://bioe.bilgi.edu.tr/BNP. Contact:hasan.otu@bilgi.edu.tr Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Senol Isci
- Bogazici University, Institute of Biomedical Engineering, Kandilli Campus, 34684, Cengelkoy - Istanbul, TUBITAK-BILGEM, Informatics and Information Security Research Center, 41470, Gebze-Kocaeli and Istanbul Bilgi University, Department of Genetics and Bioengineering, 34060, Eyup - Istanbul, Turkey
| | | | | | | |
Collapse
|
40
|
Reed RB, Chattopadhyay A, Iwema CL. Using Google blogs and discussions to recommend biomedical resources: a case study. Med Ref Serv Q 2013; 32:396-411. [PMID: 24180648 DOI: 10.1080/02763869.2013.837726] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
This case study investigated whether data gathered from discussions within the social media provide a reliable basis for a biomedical resources recommendation system. Using a search query to mine text from Google Blogs and Discussions, a ranking of biomedical resources was determined based on those most frequently mentioned. To establish quality, these results were compared with rankings by subject experts. An overall agreement between the frequency of social media discussions and subject expert recommendations was observed when identifying key bioinformatics and consumer health resources. Testing the method in more than one biomedical area implies this procedure could be employed across different subjects.
Collapse
Affiliation(s)
- Robyn B Reed
- a George T. Harrell Health Sciences Library , Penn State College of Medicine, Penn State Hershey , Hershey , Pennsylvania , USA
| | | | | |
Collapse
|
41
|
Pampel H, Vierkant P, Scholze F, Bertelmann R, Kindling M, Klump J, Goebelbecker HJ, Gundlach J, Schirmbacher P, Dierolf U. Making research data repositories visible: the re3data.org Registry. PLoS One 2013; 8:e78080. [PMID: 24223762 PMCID: PMC3817176 DOI: 10.1371/journal.pone.0078080] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/06/2013] [Indexed: 11/26/2022] Open
Abstract
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarized under the term Research Data Repositories (RDR). The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape. In July 2013 re3data.org lists 400 research data repositories and counting. 288 of these are described in detail using the re3data.org vocabulary. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data. This article describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further the article outlines the features of re3data.org, and shows how this registry helps to identify appropriate repositories for storage and search of research data.
Collapse
Affiliation(s)
- Heinz Pampel
- Deutsches GeoForschungsZentrum GFZ, Library and Information Services (LIS), Potsdam, Germany
- * E-mail:
| | - Paul Vierkant
- Humboldt-Universität zu Berlin, Berlin School of Library and Information Science, Berlin, Germany
| | - Frank Scholze
- Karlsruhe Institute of Technology (KIT), KIT Library, Karlsruhe, Germany
| | - Roland Bertelmann
- Deutsches GeoForschungsZentrum GFZ, Library and Information Services (LIS), Potsdam, Germany
| | - Maxi Kindling
- Humboldt-Universität zu Berlin, Berlin School of Library and Information Science, Berlin, Germany
| | - Jens Klump
- Deutsches GeoForschungsZentrum GFZ, Library and Information Services (LIS), Potsdam, Germany
| | | | - Jens Gundlach
- Karlsruhe Institute of Technology (KIT), KIT Library, Karlsruhe, Germany
| | - Peter Schirmbacher
- Humboldt-Universität zu Berlin, Berlin School of Library and Information Science, Berlin, Germany
| | - Uwe Dierolf
- Karlsruhe Institute of Technology (KIT), KIT Library, Karlsruhe, Germany
| |
Collapse
|
42
|
Olsen L, Johan Kudahl U, Winther O, Brusic V. Literature classification for semi-automated updating of biological knowledgebases. BMC Genomics 2013; 14 Suppl 5:S14. [PMID: 24564403 PMCID: PMC3852072 DOI: 10.1186/1471-2164-14-s5-s14] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases.
Collapse
|
43
|
Bell MJ, Collison M, Lord P. Can inferred provenance and its visualisation be used to detect erroneous annotation? A case study using UniProtKB. PLoS One 2013; 8:e75541. [PMID: 24143170 PMCID: PMC3797126 DOI: 10.1371/journal.pone.0075541] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Accepted: 08/17/2013] [Indexed: 12/31/2022] Open
Abstract
A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over sentences. Over sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately are erroneous, whilst appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website at http://homepages.cs.ncl.ac.uk/m.j.bell1/sentence_analysis/.
Collapse
Affiliation(s)
- Michael J. Bell
- School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Matthew Collison
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Phillip Lord
- School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
- * E-mail:
| |
Collapse
|
44
|
Maiden MCJ, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA, Jolley KA, McCarthy ND. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013; 11:728-36. [PMID: 23979428 PMCID: PMC3980634 DOI: 10.1038/nrmicro3093] [Citation(s) in RCA: 491] [Impact Index Per Article: 40.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria 'from domain to strain'. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.
Collapse
Affiliation(s)
- Martin C J Maiden
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
| | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.
Collapse
Affiliation(s)
- Leroy Hood
- Institute for Systems Biology, 401 Terry Ave N., Seattle, WA 98109, USA
| | - Lee Rowen
- Institute for Systems Biology, 401 Terry Ave N., Seattle, WA 98109, USA
| |
Collapse
|
46
|
Cox MJ, Cookson WOCM, Moffatt MF. Sequencing the human microbiome in health and disease. Hum Mol Genet 2013; 22:R88-94. [DOI: 10.1093/hmg/ddt398] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
47
|
Kouskoumvekaki I, Shublaq N, Brunak S. Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics. Brief Bioinform 2013; 15:942-52. [PMID: 23908249 DOI: 10.1093/bib/bbt055] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
As both the amount of generated biological data and the processing compute power increase, computational experimentation is no longer the exclusivity of bioinformaticians, but it is moving across all biomedical domains. For bioinformatics to realize its translational potential, domain experts need access to user-friendly solutions to navigate, integrate and extract information out of biological databases, as well as to combine tools and data resources in bioinformatics workflows. In this review, we present services that assist biomedical scientists in incorporating bioinformatics tools into their research. We review recent applications of Cytoscape, BioGPS and DAVID for data visualization, integration and functional enrichment. Moreover, we illustrate the use of Taverna, Kepler, GenePattern, and Galaxy as open-access workbenches for bioinformatics workflows. Finally, we mention services that facilitate the integration of biomedical ontologies and bioinformatics tools in computational workflows.
Collapse
|
48
|
Tang JY, Lee JC, Chang YT, Hou MF, Huang HW, Liaw CC, Chang HW. Long noncoding RNAs-related diseases, cancers, and drugs. ScientificWorldJournal 2013; 2013:943539. [PMID: 23843741 PMCID: PMC3690748 DOI: 10.1155/2013/943539] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 05/20/2013] [Indexed: 12/20/2022] Open
Abstract
Long noncoding RNA (lncRNA) function is described in terms of related gene expressions, diseases, and cancers as well as their polymorphisms. Potential modulators of lncRNA function, including clinical drugs, natural products, and derivatives, are discussed, and bioinformatic resources are summarized. The improving knowledge of the lncRNA regulatory network has implications not only in gene expression, diseases, and cancers, but also in the development of lncRNA-based pharmacology.
Collapse
Affiliation(s)
- Jen-Yang Tang
- Department of Radiation Oncology, Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Radiation Oncology, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
- Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Jin-Ching Lee
- Department of Biotechnology, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yung-Ting Chang
- Doctor Degree Program in Marine Biotechnology, National Sun Yat-sen University/Academia Sinica, Kaohsiung, Taiwan
| | - Ming-Feng Hou
- Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung, Taiwan
| | - Hurng-Wern Huang
- Institute of Biomedical Science, National Sun Yat-Sen University, Kaohsiung, Taiwan
| | - Chih-Chuang Liaw
- Doctor Degree Program in Marine Biotechnology, National Sun Yat-sen University/Academia Sinica, Kaohsiung, Taiwan
- Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Hsueh-Wei Chang
- Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Graduate Institute of Natural Products, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
49
|
Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration. Proteomes 2013; 1:3-24. [PMID: 28250396 PMCID: PMC5314489 DOI: 10.3390/proteomes1010003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 05/16/2013] [Accepted: 05/21/2013] [Indexed: 12/31/2022] Open
Abstract
Understanding protein interaction networks and their dynamic changes is a major challenge in modern biology. Currently, several experimental and in silico approaches allow the screening of protein interactors in a large-scale manner. Therefore, the bulk of information on protein interactions deposited in databases and peer-reviewed published literature is constantly growing. Multiple databases interfaced from user-friendly web tools recently emerged to facilitate the task of protein interaction data retrieval and data integration. Nevertheless, as we evidence in this report, despite the current efforts towards data integration, the quality of the information on protein interactions retrieved by in silico approaches is frequently incomplete and may even list false interactions. Here we point to some obstacles precluding confident data integration, with special emphasis on protein interactions, which include gene acronym redundancies and protein synonyms. Three human proteins (choline kinase, PPIase and uromodulin) and three different web-based data search engines focused on protein interaction data retrieval (PSICQUIC, DASMI and BIPS) were used to explain the potential occurrence of undesired errors that should be considered by researchers in the field. We demonstrate that, despite the recent initiatives towards data standardization, manual curation of protein interaction networks based on literature searches are still required to remove potential false positives. A three-step workflow consisting of: (i) data retrieval from multiple databases, (ii) peer-reviewed literature searches, and (iii) data curation and integration, is proposed as the best strategy to gather updated information on protein interactions. Finally, this strategy was applied to compile bona fide information on human DREAM protein interactome, which constitutes liable training datasets that can be used to improve computational predictions.
Collapse
|
50
|
Jamieson DG, Roberts PM, Robertson DL, Sidders B, Nenadic G. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat033. [PMID: 23707966 PMCID: PMC3662864 DOI: 10.1093/database/bat033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••
Collapse
Affiliation(s)
- Daniel G Jamieson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, UK, M13 9PL
| | | | | | | | | |
Collapse
|