1
|
Catanuto G, Rocco N, Balafa K, Masannat Y, Karakatsanis A, Maglia A, Barry P, Pappalardo F, Nava MB, Caruso F. Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data. Breast Care (Basel) 2023; 18:209-212. [PMID: 37928810 PMCID: PMC10624050 DOI: 10.1159/000530448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 03/25/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate "real-world evidence."
Collapse
Affiliation(s)
- Giuseppe Catanuto
- G.RE.T.A. Group for Reconstructive and Therapeutic Advancements, Milan, Naples, Catania, Italy
- Breast Surgery Unit, Humanitas Center, Catania, Italy
| | - Nicola Rocco
- G.RE.T.A. Group for Reconstructive and Therapeutic Advancements, Milan, Naples, Catania, Italy
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | | | - Yazan Masannat
- The Breast Unit, Aberdeen Royal Infirmary, NHS Grampian, Aberdeen, UK
| | - Andreas Karakatsanis
- Department of Surgical Sciences, Faculty of Medicine, Uppsala University, Uppsala, Sweden
- Section for Breast Surgery, Department of Surgery, Uppsala University Hospital (Akademiska), Uppsala, Sweden
| | - Anna Maglia
- G.RE.T.A. Group for Reconstructive and Therapeutic Advancements, Milan, Naples, Catania, Italy
| | - Peter Barry
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
| | | | - Maurizio Bruno Nava
- G.RE.T.A. Group for Reconstructive and Therapeutic Advancements, Milan, Naples, Catania, Italy
| | | |
Collapse
|
2
|
Yatskou MM, Apanasovich VV. Data analysis in complex biomolecular systems. Informatika (Minsk) 2021. [DOI: 10.37661/1816-0301-2021-18-1-105-122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The biomolecular technology progress is directly related to the development of effective methods and algorithms for processing a large amount of information obtained by modern high-throughput experimental equipment. The priority task is the development of promising computational tools for the analysis and interpretation of biophysical information using the methods of big data and computer models. An integrated approach to processing large datasets, which is based on the methods of data analysis and simulation modelling, is proposed. This approach allows to determine the parameters of biophysical and optical processes occurring in complex biomolecular systems. The idea of an integrated approach is to use simulation modelling of biophysical processes occurring in the object of study, comparing simulated and most relevant experimental data selected by dimension reduction methods, determining the characteristics of the investigated processes using data analysis algorithms. The application of the developed approach to the study of bimolecular systems in fluorescence spectroscopy experiments is considered. The effectiveness of the algorithms of the approach was verified by analyzing of simulated and experimental data representing the systems of molecules and proteins. The use of complex analysis increases the efficiency of the study of biophysical systems during the analysis of big data.
Collapse
|