Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shatkay H, Brady S, Wong A. Text as data: using text-based features for proteins representation and for computational prediction of their characteristics. Methods 2015;74:54-64. [PMID: 25448299 DOI: 10.1016/j.ymeth.2014.10.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 09/21/2014] [Accepted: 10/21/2014] [Indexed: 11/21/2022] Open

For:	Shatkay H, Brady S, Wong A. Text as data: using text-based features for proteins representation and for computational prediction of their characteristics. Methods 2015;74:54-64. [PMID: 25448299 DOI: 10.1016/j.ymeth.2014.10.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 09/21/2014] [Accepted: 10/21/2014] [Indexed: 11/21/2022] Open

Number

Cited by Other Article(s)

Du Z, He Y, Li J, Uversky VN. DeepAdd: Protein function prediction from k-mer embedding and additional features. Comput Biol Chem 2020;89:107379. [PMID: 33011616 DOI: 10.1016/j.compbiolchem.2020.107379] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 10/23/2022]

DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020;2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]

Abstract

Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.

Collapse

Essack M, Salhi A, Stanimirovic J, Tifratene F, Bin Raies A, Hungler A, Uludag M, Van Neste C, Trpkovic A, Bajic VP, Bajic VB, Isenovic ER. Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019;2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]

Taha K, Iraqi Y, Al Aamri A. Predicting protein functions by applying predicate logic to biomedical literature. BMC Bioinformatics 2019;20:71. [PMID: 30736739 PMCID: PMC6368809 DOI: 10.1186/s12859-019-2594-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 01/03/2019] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts.

RESULTS

To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance.

CONCLUSIONS

The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF's combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/.

Collapse

Fodeh SJ, Tiwari A. Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification. J Biomed Inform 2018;86:160-166. [PMID: 30130573 DOI: 10.1016/j.jbi.2018.08.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 08/13/2018] [Accepted: 08/17/2018] [Indexed: 11/25/2022]

You R, Huang X, Zhu S. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation. Methods 2018;145:82-90. [PMID: 29883746 DOI: 10.1016/j.ymeth.2018.05.026] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 04/30/2018] [Accepted: 05/31/2018] [Indexed: 11/16/2022] Open

Zhou J, Fu BQ. The research on gene-disease association based on text-mining of PubMed. BMC Bioinformatics 2018;19:37. [PMID: 29415654 PMCID: PMC5804013 DOI: 10.1186/s12859-018-2048-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 01/29/2018] [Indexed: 11/23/2022] Open

Taha K. Inferring the Functions of Proteins from the Interrelationships between Functional Categories. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:157-167. [PMID: 27723600 DOI: 10.1109/tcbb.2016.2615608] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015;11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open

Abstract

The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.

Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.

Collapse

Zheng W, Blake C. Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles. J Biomed Inform 2015. [PMID: 26220461 DOI: 10.1016/j.jbi.2015.07.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Funk CS, Kahanda I, Ben-Hur A, Verspoor KM. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct. J Biomed Semantics 2015;6:9. [PMID: 26005564 PMCID: PMC4441003 DOI: 10.1186/s13326-015-0006-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/27/2015] [Indexed: 11/10/2022] Open

Andrade-Navarro M, Perez-Iratxeta C. Text mining of biomedical literature: doing well, but we could be doing better. Methods 2015;74:1-2. [PMID: 25703199 DOI: 10.1016/j.ymeth.2015.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open