Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hunter L, Cohen KB. Biomedical language processing: what's beyond PubMed? Mol Cell 2006;21:589-94. [PMID: 16507357 PMCID: PMC1702322 DOI: 10.1016/j.molcel.2006.02.012] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

For:	Hunter L, Cohen KB. Biomedical language processing: what's beyond PubMed? Mol Cell 2006;21:589-94. [PMID: 16507357 PMCID: PMC1702322 DOI: 10.1016/j.molcel.2006.02.012] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Number

Cited by Other Article(s)

101

Wang X, Tsujii J, Ananiadou S. Disambiguating the species of biomedical named entities using natural language parsers. Bioinformatics 2010;26:661-7. [PMID: 20053840 PMCID: PMC2828111 DOI: 10.1093/bioinformatics/btq002] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

102

MeSH Represented MEDLINE Query Results. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-13059-5_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

103

Islamaj Dogan R, Murray GC, Névéol A, Lu Z. Understanding PubMed user search behavior through log analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2009;2009:bap018. [PMID: 20157491 PMCID: PMC2797455 DOI: 10.1093/database/bap018] [Citation(s) in RCA: 131] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2009] [Revised: 10/05/2009] [Accepted: 10/06/2009] [Indexed: 11/20/2022]

104

Duchrow T, Shtatland T, Guettler D, Pivovarov M, Kramer S, Weissleder R. Enhancing navigation in biomedical databases by community voting and database-driven text classification. BMC Bioinformatics 2009;10:317. [PMID: 19799796 PMCID: PMC2768718 DOI: 10.1186/1471-2105-10-317] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 10/03/2009] [Indexed: 11/29/2022] Open

Abstract

Background

The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.

Results

Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly.

Conclusion

Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.

The system can be accessed at .

Collapse

105

Chung GYC. Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions. J Biomed Inform 2009;42:790-800. [DOI: 10.1016/j.jbi.2008.12.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 11/10/2008] [Accepted: 12/19/2008] [Indexed: 11/25/2022]

106

Garten Y, Tatonetti NP, Altman RB. Improving the prediction of pharmacogenes using text-derived drug-gene relationships. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2009:305-14. [PMID: 19908383 DOI: 10.1142/9789814295291_0033] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

107

Cano C, Monaghan T, Blanco A, Wall DP, Peshkin L. Collaborative text-annotation resource for disease-centered relation extraction from biomedical text. J Biomed Inform 2009;42:967-77. [PMID: 19232400 PMCID: PMC2757509 DOI: 10.1016/j.jbi.2009.02.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Revised: 12/04/2008] [Accepted: 02/04/2009] [Indexed: 11/30/2022]

108

Korhonen A, Silins I, Sun L, Stenius U. The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature. BMC Bioinformatics 2009;10:303. [PMID: 19772619 PMCID: PMC2759963 DOI: 10.1186/1471-2105-10-303] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2009] [Accepted: 09/22/2009] [Indexed: 12/03/2022] Open

109

Aljaber B, Stokes N, Bailey J, Pei J. Document clustering of scientific texts using citation contexts. INFORM RETRIEVAL J 2009. [DOI: 10.1007/s10791-009-9108-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

110

Smalheiser NR, Torvik VI, Zhou W. Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2009;94:190-7. [PMID: 19185946 PMCID: PMC2693227 DOI: 10.1016/j.cmpb.2008.12.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/07/2008] [Revised: 10/17/2008] [Accepted: 12/12/2008] [Indexed: 05/27/2023]

111

Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009;5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open

112

Lin J. Is searching full text more effective than searching abstracts? BMC Bioinformatics 2009;10:46. [PMID: 19192280 PMCID: PMC2695361 DOI: 10.1186/1471-2105-10-46] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2008] [Accepted: 02/03/2009] [Indexed: 11/10/2022] Open

113

Tüchler T, Velez G, Graf A, Kreil DP. BibGlimpse: the case for a light-weight reprint manager in distributed literature research. BMC Bioinformatics 2008;9:406. [PMID: 18828894 PMCID: PMC2571996 DOI: 10.1186/1471-2105-9-406] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2008] [Accepted: 10/01/2008] [Indexed: 12/02/2022] Open

114

Hull D, Pettifer SR, Kell DB. Defrosting the digital library: bibliographic tools for the next generation web. PLoS Comput Biol 2008;4:e1000204. [PMID: 18974831 PMCID: PMC2568856 DOI: 10.1371/journal.pcbi.1000204] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

115

Müller HM, Rangarajan A, Teal TK, Sternberg PW. Textpresso for neuroscience: searching the full text of thousands of neuroscience research papers. Neuroinformatics 2008;6:195-204. [PMID: 18949581 PMCID: PMC2666735 DOI: 10.1007/s12021-008-9031-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Accepted: 09/22/2008] [Indexed: 10/21/2022]

116

Krallinger M, Valencia A, Hirschman L. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 2008;9 Suppl 2:S8. [PMID: 18834499 PMCID: PMC2559992 DOI: 10.1186/gb-2008-9-s2-s8] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

117

Agarwal P, Searls DB. Literature mining in support of drug discovery. Brief Bioinform 2008;9:479-92. [DOI: 10.1093/bib/bbn035] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

118

Gabow AP, Leach SM, Baumgartner WA, Hunter LE, Goldberg DS. Improving protein function prediction methods with integrated literature data. BMC Bioinformatics 2008;9:198. [PMID: 18412966 PMCID: PMC2375131 DOI: 10.1186/1471-2105-9-198] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Accepted: 04/15/2008] [Indexed: 11/22/2022] Open

Abstract

Background

Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.

Results

We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.

Conclusion

Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.

Collapse

119

Caporaso JG, Baumgartner WA, Randolph DA, Cohen KB, Hunter L. Rapid pattern development for concept recognition systems: application to point mutations. J Bioinform Comput Biol 2008;5:1233-59. [PMID: 18172927 DOI: 10.1142/s0219720007003144] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Revised: 07/30/2007] [Accepted: 08/23/2007] [Indexed: 11/18/2022]

120

Hunter L, Lu Z, Firby J, Baumgartner WA, Johnson HL, Ogren PV, Cohen KB. OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC Bioinformatics 2008;9:78. [PMID: 18237434 PMCID: PMC2275248 DOI: 10.1186/1471-2105-9-78] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Accepted: 01/31/2008] [Indexed: 12/03/2022] Open

Abstract

Background

Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering.

Results

OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances.

Conclusion

OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at

Collapse

121

Cohen KB, Hunter L. Getting started in text mining. PLoS Comput Biol 2008;4:e20. [PMID: 18225946 PMCID: PMC2217579 DOI: 10.1371/journal.pcbi.0040020] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

122

Caporaso JG, Deshpande N, Fink JL, Bourne PE, Cohen KB, Hunter L. Intrinsic evaluation of text mining tools may not predict performance on realistic tasks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2008:640-51. [PMID: 18229722 PMCID: PMC2517250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

123

Zhou D, He Y. Extracting interactions between proteins from the literature. J Biomed Inform 2007;41:393-407. [PMID: 18207462 DOI: 10.1016/j.jbi.2007.11.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2007] [Revised: 11/21/2007] [Accepted: 11/28/2007] [Indexed: 11/29/2022]

124

Torii M, Hu ZZ, Song M, Wu CH, Liu H. A comparison study on algorithms of detecting long forms for short forms in biomedical text. BMC Bioinformatics 2007;8 Suppl 9:S5. [PMID: 18047706 PMCID: PMC2217663 DOI: 10.1186/1471-2105-8-s9-s5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

125

Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform 2007;8:358-75. [PMID: 17977867 PMCID: PMC2516302 DOI: 10.1093/bib/bbm045] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

126

Torvik VI, Smalheiser NR. A quantitative model for linking two disparate sets of articles in MEDLINE. ACTA ACUST UNITED AC 2007;23:1658-65. [PMID: 17463015 DOI: 10.1093/bioinformatics/btm161] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

BACKGROUND

Identifying information that implicitly links two disparate sets of articles is a fundamental and intuitive data mining strategy that can help investigators address real scientific questions. The Arrowsmith two-node search finds title words and phrases (so-called B-terms) that are shared across two sets of articles within MEDLINE and displays them in a manner that facilitates human assessment. A serious stumbling-block has been the lack of a quantitative model for predicting which of the hundreds if not thousands of B-terms computed for a given search are most likely to be relevant to the investigator.

METHODOLOGY/PRINCIPAL FINDINGS

Using a public two-node search interface, field testers devised a set of two-node searches under real life conditions and a certain number of B-terms were marked relevant. These were employed as 'gold standards;' each B-term was characterized according to eight complementary features that were strongly correlated with relevance. A logistic regression model was developed that permits one to estimate the probability of relevance for each B-term, to rank B-terms according to their likely relevance, and to estimate the overall number of relevant B-terms inherent in a given two-node search.

CONCLUSIONS/SIGNIFICANCE

The model greatly simplifies and streamlines the process of carrying out a two-node search, and may be applicable to a number of other literature-based discovery applications, including the so-called one-node search and related gene-centric strategies that incorporate implicit links to predict how genes may be related to each other and to human diseases. This should encourage much wider exploration of text mining for implicit information among the general scientific community.

AVAILABILITY

Two-node searches can be carried out freely at http://arrowsmith.psych.uic.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

127

Schulz S, Hahn U. Towards the ontological foundations of symbolic biological theories. Artif Intell Med 2007;39:237-50. [PMID: 17321118 DOI: 10.1016/j.artmed.2006.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Revised: 12/05/2006] [Accepted: 12/06/2006] [Indexed: 10/23/2022]

128

Ananiadou S, Fluck J. Proceedings of the Second International Symposium for Semantic Mining in Biomedicine. BMC Bioinformatics 2006. [PMCID: PMC1764445 DOI: 10.1186/1471-2105-7-s3-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

129

Ananiadou S, Kell DB, Tsujii JI. Text mining and its potential applications in systems biology. Trends Biotechnol 2006;24:571-9. [PMID: 17045684 DOI: 10.1016/j.tibtech.2006.10.002] [Citation(s) in RCA: 164] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Revised: 08/21/2006] [Accepted: 10/02/2006] [Indexed: 11/15/2022]