1
|
Yadav S, Vora DS, Sundar D, Dhanjal JK. TCR-ESM: Employing protein language embeddings to predict TCR-peptide-MHC binding. Comput Struct Biotechnol J 2024; 23:165-173. [PMID: 38146434 PMCID: PMC10749252 DOI: 10.1016/j.csbj.2023.11.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/27/2023] Open
Abstract
Cognate target identification for T-cell receptors (TCRs) is a significant barrier in T-cell therapy development, which may be overcome by accurately predicting TCR interaction with peptide-bound major histocompatibility complex (pMHC). In this study, we have employed peptide embeddings learned from a large protein language model- Evolutionary Scale Modeling (ESM), to predict TCR-pMHC binding. The TCR-ESM model presented outperforms existing predictors. The complementarity-determining region 3 (CDR3) of the hypervariable TCR is located at the center of the paratope and plays a crucial role in peptide recognition. TCR-ESM trained on paired TCR data with both CDR3α and CDR3β chain information performs significantly better than those trained on data with only CDR3β, suggesting that both TCR chains contribute to specificity, the relative importance however depends on the specific peptide-MHC targeted. The study illuminates the importance of MHC information in TCR-peptide binding which remained inconclusive so far and was thought dependent on the dataset characteristics. TCR-ESM outperforms existing approaches on external datasets, suggesting generalizability. Overall, the potential of deep learning for predicting TCR-pMHC interactions and improving the understanding of factors driving TCR specificity are highlighted. The prediction model is available at http://tcresm.dhanjal-lab.iiitd.edu.in/ as an online tool.
Collapse
Affiliation(s)
- Shashank Yadav
- Department of Biomedical Engineering, University of Arizona, Tucson 85721, AZ, USA
| | - Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Jaspreet Kaur Dhanjal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| |
Collapse
|
2
|
Kotlyarov R, Papachristos K, Wood GPF, Goodman JM. Leveraging Language Model Multitasking To Predict C-H Borylation Selectivity. J Chem Inf Model 2024. [PMID: 38708520 DOI: 10.1021/acs.jcim.4c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
C-H borylation is a high-value transformation in the synthesis of lead candidates for the pharmaceutical industry because a wide array of downstream coupling reactions is available. However, predicting its regioselectivity, especially in drug-like molecules that may contain multiple heterocycles, is not a trivial task. Using a data set of borylation reactions from Reaxys, we explored how a language model originally trained on USPTO_500_MT, a broad-scope set of patent data, can be used to predict the C-H borylation reaction product in different modes: product generation and site reactivity classification. Our fine-tuned T5Chem multitask language model can generate the correct product in 79% of cases. It can also classify the reactive aromatic C-H bonds with 95% accuracy and 88% positive predictive value, exceeding purpose-developed graph-based neural networks.
Collapse
Affiliation(s)
- Ruslan Kotlyarov
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | | | - Geoffrey P F Wood
- Exscientia Plc, The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Jonathan M Goodman
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
3
|
Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024; 173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Alkaliphilic proteins have great potential as biocatalysts in biotechnology, especially for enzyme engineering. Extensive research has focused on exploring the enzymatic potential of alkaliphiles and characterizing alkaliphilic proteins. However, the current method employed for identifying these proteins that requires web lab experiment is time-consuming, labor-intensive, and expensive. Therefore, the development of a computational method for alkaliphilic protein identification would be invaluable for protein engineering and design. In this study, we present a novel approach that uses embeddings from a protein language model called ESM-2(3B) in a deep learning framework to classify alkaliphilic and non-alkaliphilic proteins. To our knowledge, this is the first attempt to employ embeddings from a pre-trained protein language model to classify alkaliphilic protein. A reliable dataset comprising 1,002 alkaliphilic and 1,866 non-alkaliphilic proteins was constructed for training and testing the proposed model. The proposed model, dubbed ALPACA, achieves performance scores of 0.88, 0.84, and 0.75 for accuracy, f1-score, and Matthew correlation coefficient respectively on independent dataset. ALPACA is likely to serve as a valuable resource for exploring protein alkalinity and its role in protein design and engineering.
Collapse
Affiliation(s)
- Meredita Susanty
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Pertamina, School of Computer Science, Jl Teuku Nyak Arief Jakarta Selatan DKI Jakarta, Indonesia
| | - Muhammad Khaerul Naim Mursalim
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Universal, Kompleks Maha Vihara Duta Maitreya Bukit Beruntung, Sei Panas Batam, 29456, Kepulauan Riau, Indonesia
| | - Rukman Hertadi
- Institut Teknologi Bandung Faculty of Math and Natural Sciences, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia
| | - Ayu Purwarianti
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Center for Artificial Intelligence (U-CoE AI-VLB), Institut Teknologi Bandung, Bandung, Indonesia
| | - Tati LE Rajab
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia.
| |
Collapse
|
4
|
Blanco LE, Wilcox JH, Hughes MS, Lal RA. Development of a Real-time Force-based Algorithm for Infusion Failure Detection. J Diabetes Sci Technol 2024:19322968241247530. [PMID: 38654491 DOI: 10.1177/19322968241247530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
BACKGROUND Continuous subcutaneous insulin infusion (CSII) is a common treatment option for people with diabetes (PWD), but insulin infusion failures pose a significant challenge, leading to hyperglycemia, diabetes burnout, and increased hospitalizations. Current CSII pumps' occlusion alarm systems are limited in detecting infusion failures; therefore, a more effective detection method is needed. METHODS We conducted five preclinical animal studies to collect data on infusion failures, utilizing both insulin and non-insulin boluses. Data were captured using in-line pressure and flow rate sensors, with additional force data from CSII pumps' onboard sensors in one study. A novel classifier model was developed using this dataset, aimed at detecting different types of infusion failures through direct utilization of force sensor data. Performance was compared against various occlusion alarm thresholds from commercially available CSII pumps. RESULTS The testing dataset included 251 boluses. The Bagging classifier model showed the highest performance metrics among the models tested, exhibiting high accuracy (96%), sensitivity (94%), and specificity (98%), with lower false-positive and false-negative rate compared with traditional occlusion alarm pressure thresholds. CONCLUSIONS Our study developed a novel non-threshold classifier that outperforms current occlusion alarm systems in CSII pumps in detecting infusion failures. This advancement has the potential to reduce the risk of hyperglycemia and hospitalizations due to undetected infusion failures, offering a more reliable and effective CSII therapy for PWD. Further studies involving human participants are recommended to validate these findings and assess the classifier's performance in a real-world setting.
Collapse
|
5
|
Hadad E, Rokach L, Veksler-Lublinsky I. Empowering prediction of miRNA-mRNA interactions in species with limited training data through transfer learning. Heliyon 2024; 10:e28000. [PMID: 38560149 PMCID: PMC10981012 DOI: 10.1016/j.heliyon.2024.e28000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
MicroRNAs (miRNAs) play a crucial role in mRNA regulation. Identifying functionally important mRNA targets of a specific miRNA is essential for uncovering its biological function and assisting miRNA-based drug development. Datasets of high-throughput direct bona fide miRNA-target interactions (MTIs) exist only for a few model organisms, prompting the need for computational prediction. However, the scarcity of data poses a challenge in training accurate machine learning models for MTI prediction. In this study, we explored the potential of transfer learning technique (with ANN and XGB) to address the limited data challenge by leveraging the similarities in interaction rules between species. Furthermore, we introduced a novel approach called TransferSHAP for estimating the feature importance of transfer learning in tabular dataset tasks. We demonstrated that transfer learning improves MTI prediction accuracy for species with limited datasets and identified the specific interaction features the models employed to transfer information across different species.
Collapse
Affiliation(s)
- Eyal Hadad
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| | - Lior Rokach
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| | - Isana Veksler-Lublinsky
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| |
Collapse
|
6
|
Mollura M, Chicco D, Paglialonga A, Barbieri R. Identifying prognostic factors for survival in intensive care unit patients with SIRS or sepsis by machine learning analysis on electronic health records. PLOS Digit Health 2024; 3:e0000459. [PMID: 38489347 PMCID: PMC10942078 DOI: 10.1371/journal.pdig.0000459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 02/05/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Systemic inflammatory response syndrome (SIRS) and sepsis are the most common causes of in-hospital death. However, the characteristics associated with the improvement in the patient conditions during the ICU stay were not fully elucidated for each population as well as the possible differences between the two. GOAL The aim of this study is to highlight the differences between the prognostic clinical features for the survival of patients diagnosed with SIRS and those of patients diagnosed with sepsis by using a multi-variable predictive modeling approach with a reduced set of easily available measurements collected at the admission to the intensive care unit (ICU). METHODS Data were collected from 1,257 patients (816 non-sepsis SIRS and 441 sepsis) admitted to the ICU. We compared the performance of five machine learning models in predicting patient survival. Matthews correlation coefficient (MCC) was used to evaluate model performances and feature importance, and by applying Monte Carlo stratified Cross-Validation. RESULTS Extreme Gradient Boosting (MCC = 0.489) and Logistic Regression (MCC = 0.533) achieved the highest results for SIRS and sepsis cohorts, respectively. In order of importance, APACHE II, mean platelet volume (MPV), eosinophil counts (EoC), and C-reactive protein (CRP) showed higher importance for predicting sepsis patient survival, whereas, SOFA, APACHE II, platelet counts (PLTC), and CRP obtained higher importance in the SIRS cohort. CONCLUSION By using complete blood count parameters as predictors of ICU patient survival, machine learning models can accurately predict the survival of SIRS and sepsis ICU patients. Interestingly, feature importance highlights the role of CRP and APACHE II in both SIRS and sepsis populations. In addition, MPV and EoC are shown to be important features for the sepsis population only, whereas SOFA and PLTC have higher importance for SIRS patients.
Collapse
Affiliation(s)
- Maximiliano Mollura
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| | - Alessia Paglialonga
- CNR-Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni (CNR-IEIIT), Milan, Italy
| | - Riccardo Barbieri
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| |
Collapse
|
7
|
Goi A, Costa A, De Marchi M. The ability of a handheld near-infrared spectrometer for a rapid quality assessment of bovine colostrum including the Ig G concentration. J Dairy Sci 2024:S0022-0302(24)00493-4. [PMID: 38395397 DOI: 10.3168/jds.2023-24005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/24/2024] [Indexed: 02/25/2024]
Abstract
Portable infrared-based instruments have made significant contributions in different research fields. Within the dairy supply chain, for example, most of portable devices are based on near-infrared spectroscopy (NIRS) and are nowadays an important support for farmers and operators of the dairy sector, allowing fast decision-making, particularly for feed and milk quality evaluation and animal health and welfare monitoring. The affordability, portability, and ease of use of these innovative devices have been pivotal factors for the implementation in dairy farms. In fact, pocket-sized devices enable non-expert users to perform quick, low cost and non-destructive analysis on various samples without complex preparation. As bovine colostrum (BC) quality is mostly given by the Ig G (IgG) level, evaluating the ability of portable NIRS tools to measure antibodies concentration is advisable. In this study we used the wireless device SCiO manufactured by Consumer Physics Inc. (Tel Aviv, Israel) to collect BC spectra and then attempt to predict IgG concentration and gross and fine composition in individual samples collected as soon as possible after calving (<6 h) in primiparous and pluriparous Holstein cows farmed in 9 Italian farms. Chemometric analyses revealed that SCiO has promising predictive performance for colostral IgG concentration, total Ig concentration, fat, and AA (R2CV ≥ 0.75). Excellent accuracy was observed for dry matter, protein, and S prediction in cross-validation and good prediction ability in external validation (R2CV ≥ 0.93; R2V ≥ 0.82). Nonetheless, SCiO's ability to discriminate between good- and low-quality samples was satisfactory. The affordable cost, the accurate predictions, and the user-friendly design coupled with the increased interest in colostrum quality within the dairy sector may boost the collection of extensive BC data for management and genetic purposes in the near future.
Collapse
Affiliation(s)
- Arianna Goi
- Department of Agronomy, Food, Natural resources, Animals and Environment, University of Padova, Viale dell'Università 16, 35020 Legnaro (PD), Italy
| | - Angela Costa
- Department of Veterinary Medical Sciences, University of Bologna, via Tolara di Sopra 50, 40064 Ozzano dell'Emilia (BO), Italy.
| | - Massimo De Marchi
- Department of Agronomy, Food, Natural resources, Animals and Environment, University of Padova, Viale dell'Università 16, 35020 Legnaro (PD), Italy
| |
Collapse
|
8
|
Patel SY, Baum A, Basu S. Prediction of non emergent acute care utilization and cost among patients receiving Medicaid. Sci Rep 2024; 14:824. [PMID: 38263373 PMCID: PMC10805799 DOI: 10.1038/s41598-023-51114-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 12/30/2023] [Indexed: 01/25/2024] Open
Abstract
Patients receiving Medicaid often experience social risk factors for poor health and limited access to primary care, leading to high utilization of emergency departments and hospitals (acute care) for non-emergent conditions. As programs proactively outreach Medicaid patients to offer primary care, they rely on risk models historically limited by poor-quality data. Following initiatives to improve data quality and collect data on social risk, we tested alternative widely-debated strategies to improve Medicaid risk models. Among a sample of 10 million patients receiving Medicaid from 26 states and Washington DC, the best-performing model tripled the probability of prospectively identifying at-risk patients versus a standard model (sensitivity 11.3% [95% CI 10.5, 12.1%] vs 3.4% [95% CI 3.0, 4.0%]), without increasing "false positives" that reduce efficiency of outreach (specificity 99.8% [95% CI 99.6, 99.9%] vs 99.5% [95% CI 99.4, 99.7%]), and with a ~ tenfold improved coefficient of determination when predicting costs (R2: 0.195-0.412 among population subgroups vs 0.022-0.050). Our best-performing model also reversed the lower sensitivity of risk prediction for Black versus White patients, a bias present in the standard cost-based model. Our results demonstrate a modeling approach to substantially improve risk prediction performance and equity for patients receiving Medicaid.
Collapse
Affiliation(s)
- Sadiq Y Patel
- Clinical Product Development, Waymark, San Francisco, CA, USA.
- School of Social Policy and Practice, University of Pennsylvania, 3701 Locust Walk, Philadelphia, PA, 19104, USA.
| | - Aaron Baum
- Clinical Product Development, Waymark, San Francisco, CA, USA
- Icahn School of Medicine at Mt Sinai, New York, NY, USA
| | - Sanjay Basu
- Clinical Product Development, Waymark, San Francisco, CA, USA
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- Center for Vulnerable Populations, San Francisco General Hospital/University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
9
|
Luo Z, Wang R, Sun Y, Liu J, Chen Z, Zhang YJ. Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction. Brief Bioinform 2024; 25:bbad534. [PMID: 38279650 PMCID: PMC10818170 DOI: 10.1093/bib/bbad534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/19/2023] [Accepted: 12/15/2024] [Indexed: 01/28/2024] Open
Abstract
As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural design of models or differing strategies for fine-tuning, largely overlooking investigations into the nature of the features derived from LLMs. In this research, we propose different ESM2 representation extraction strategies, considering both the character type and position within the ESM2 input sequence. Using model dimensionality reduction, predictive analysis and interpretability techniques, we have illuminated potential associations between diverse feature types and specific subcellular localizations. Particularly, the prediction of Mitochondrion and Golgi apparatus prefer segments feature closer to the N-terminal, and phosphorylation site-based features could mirror phosphorylation properties. We also evaluate the prediction performance and interpretability robustness of Random Forest and Deep Neural Networks with varied feature inputs. This work offers novel insights into maximizing LLMs' utility, understanding their mechanisms, and extracting biological domain knowledge. Furthermore, we have made the code, feature extraction API, and all relevant materials available at https://github.com/yujuan-zhang/feature-representation-for-LLMs.
Collapse
Affiliation(s)
- Zeyu Luo
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Rui Wang
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Yawen Sun
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Junhao Liu
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Zongqing Chen
- School of Mathematical Sciences, Chongqing Normal University, Chongqing 400047, China
| | - Yu-Juan Zhang
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| |
Collapse
|
10
|
Carrara I, Papadopoulo T. Pseudo-online framework for BCI evaluation: a MOABB perspective using various MI and SSVEP datasets. J Neural Eng 2024; 21:016003. [PMID: 38113535 DOI: 10.1088/1741-2552/ad171a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/19/2023] [Indexed: 12/21/2023]
Abstract
Objective. BCI (Brain-Computer Interfaces) operate in three modes:online,offline, andpseudo-online. Inonlinemode, real-time EEG data is constantly analyzed. Inofflinemode, the signal is acquired and processed afterwards. Thepseudo-onlinemode processes collected data as if they were received in real-time. The main difference is that theofflinemode often analyzes the whole data, while theonlineandpseudo-onlinemodes only analyze data in short time windows.Offlineprocessing tends to be more accurate, whileonlineanalysis is better for therapeutic applications.Pseudo-onlineimplementation approximatesonlineprocessing without real-time constraints. Many BCI studies beingofflineintroduce biases compared to real-life scenarios, impacting classification algorithm performance.Approach. The objective of this research paper is therefore to extend the current MOABB framework, operating inofflinemode, so as to allow a comparison of different algorithms in apseudo-onlinesetting with the use of a technology based on overlapping sliding windows. To do this will require the introduction of a idle state event in the dataset that takes into account all different possibilities that are not task thinking. To validate the performance of the algorithms we will use the normalized Matthews correlation coefficient and the information transfer rate.Main results. We analyzed the state-of-the-art algorithms of the last 15 years over several motor imagery and steady state visually evoked potential multi-subjects datasets, showing the differences between the two approaches from a statistical point of view.Significance. The ability to analyze the performance of different algorithms inofflineandpseudo-onlinemodes will allow the BCI community to obtain more accurate and comprehensive reports regarding the performance of classification algorithms.
Collapse
Affiliation(s)
- Igor Carrara
- Université Côte d'Azur (UCA), INRIA, Cronos Team, Nice, France
| | | |
Collapse
|
11
|
Dipaola F, Gatti M, Menè R, Shiffer D, Giaj Levra A, Solbiati M, Villa P, Costantino G, Furlan R. A Hybrid Model for 30-Day Syncope Prognosis Prediction in the Emergency Department. J Pers Med 2023; 14:4. [PMID: 38276219 PMCID: PMC10817569 DOI: 10.3390/jpm14010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/06/2023] [Accepted: 12/11/2023] [Indexed: 01/27/2024] Open
Abstract
Syncope is a challenging problem in the emergency department (ED) as the available risk prediction tools have suboptimal predictive performances. Predictive models based on machine learning (ML) are promising tools whose application in the context of syncope remains underexplored. The aim of the present study was to develop and compare the performance of ML-based models in predicting the risk of clinically significant outcomes in patients presenting to the ED for syncope. We enrolled 266 consecutive patients (age 73, IQR 58-83; 52% males) admitted for syncope at three tertiary centers. We collected demographic and clinical information as well as the occurrence of clinically significant outcomes at a 30-day telephone follow-up. We implemented an XGBoost model based on the best-performing candidate predictors. Subsequently, we integrated the XGboost predictors with knowledge-based rules. The obtained hybrid model outperformed the XGboost model (AUC = 0.81 vs. 0.73, p < 0.001) with acceptable calibration. In conclusion, we developed an ML-based model characterized by a commendable capability to predict adverse events within 30 days post-syncope evaluation in the ED. This model relies solely on clinical data routinely collected during a patient's initial syncope evaluation, thus obviating the need for laboratory tests or syncope experienced clinical judgment.
Collapse
Affiliation(s)
- Franca Dipaola
- Internal Medicine, Syncope Unit, IRCCS Humanitas Research Hospital, 20089 Milan, Italy;
| | | | - Roberto Menè
- Department of Medicine and Surgery, University of Milano-Bicocca, 20100 Milan, Italy;
| | - Dana Shiffer
- Emergency Department, IRCCS Humanitas Research Hospital, 20089 Milan, Italy;
- Department of Biomedical Sciences, Humanitas University, 20072 Milan, Italy;
| | | | - Monica Solbiati
- Emergency Department, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Università Degli Studi Di Milano, 20100 Milan, Italy; (M.S.); (G.C.)
| | - Paolo Villa
- Emergency Medicine Unit, Luigi Sacco Hospital, ASST Fatebenefratelli Sacco, 20100 Milan, Italy;
| | - Giorgio Costantino
- Emergency Department, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Università Degli Studi Di Milano, 20100 Milan, Italy; (M.S.); (G.C.)
| | - Raffaello Furlan
- Internal Medicine, Syncope Unit, IRCCS Humanitas Research Hospital, 20089 Milan, Italy;
- Department of Biomedical Sciences, Humanitas University, 20072 Milan, Italy;
| |
Collapse
|
12
|
Sandhu H, Garg P. Machine Learning Enables Accurate Prediction of Quinone Formation during Drug Metabolism. Chem Res Toxicol 2023; 36:1876-1890. [PMID: 37885227 DOI: 10.1021/acs.chemrestox.3c00162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Metabolism helps in the elimination of drugs from the human body by making them more hydrophilic. Sometimes, drugs can be bioactivated to highly reactive metabolites or intermediates during metabolism. These reactive metabolites are often responsible for the toxicities associated with the drugs. Identification of reactive metabolites of drug candidates can be very helpful in the initial stages of drug discovery. Quinones are soft electrophiles that are generated as reactive intermediates during metabolism. Quinones make up more than 40% of the reactive metabolites. In this work, a reliable data set of 510 molecules was used to develop machine learning and deep learning-based predictive models to predict the formation of quinone-type metabolites. For representing molecules, two-dimensional (2D) descriptors, PubChem fingerprints, electro-topological state (E-state) fingerprints, and metabolic reactivity-based descriptors were used. Developed models were compared to the existing Xenosite web server using the untouched test set of 102 molecules. The best model achieved an accuracy of 86.27%, while the Xenosite server could achieve an accuracy of only 52.94% on the test set. Descriptor analysis revealed that the presence of greater numbers of polar moieties in a molecule can prevent the formation of quinone-type metabolites. In addition, the presence of a nitrogen atom in an aromatic ring and the presence of metabolophores V51, V52, and V53 (SMARTCyp descriptors) decrease the probability of quinone formation. Finally, a tool based on the best machine learning models was developed, which is accessible at http://14.139.57.41/quinonepred/.
Collapse
Affiliation(s)
- Hardeep Sandhu
- Department of pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar 160062, Punjab, India
| | - Prabha Garg
- Department of pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar 160062, Punjab, India
| |
Collapse
|
13
|
Seok J, Jeong ST, Yoon SY, Lee JY, Kim S, Cho H, Kang WS. Novel nomogram for predicting paradoxical chest wall movement in patients with flail segment of traumatic rib fracture: a retrospective cohort study. Sci Rep 2023; 13:20251. [PMID: 37985825 PMCID: PMC10662329 DOI: 10.1038/s41598-023-47700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/17/2023] [Indexed: 11/22/2023] Open
Abstract
Flail chest is a severe injury to the chest wall and is related to adverse outcomes. A flail chest is classified as the physiologic, paradoxical motion of a chest wall or flail segment of rib fracture (RFX). We hypothesized that patients with paradoxical chest wall movement would present different clinical features from patients with a flail segment. This retrospective observational study included patients with blunt chest trauma who visited our level 1 trauma center between January 2019 and October 2022 and were diagnosed with one or more flail segments by computed tomography. The primary outcome of our study was a clinically diagnosed visible, paradoxical chest wall motion. We used the least absolute shrinkage and selection operator (LASSO) logistic regression model to minimize overfitting. After a feature selection using the LASSO regression model, we constructed a multivariable logistic regression (MLR) model and nomogram. A total of five risk factors were selected in the LASSO model and applied to the multivariable logistic regression model. Of these, four risk factors were statistically significant: the total number of RFX (adjusted OR [aOR], 1.28; 95% confidence interval [CI], 1.09-1.49; p = 0.002), number of segmental RFX including Grade III fractures (aOR, 1.78; 95% CI, 1.14-2.79; p = 0.012), laterally located primary fracture lines (aOR, 4.00; 95% CI, 1.69-9.43; p = 0.002), and anterior-lateral flail segments (aOR, 4.20; 95% CI, 1.60-10.99; p = 0.004). We constructed a nomogram to predict the personalized probability of the flail motion. A novel nomogram was developed in patients with flail segments of traumatic RFX to predict paradoxical chest wall motion. The number of RFX, Grade III segmental RFX, and the location of the RFX were significant risk factors.
Collapse
Affiliation(s)
- Junepill Seok
- Department of Thoracic and Cardiovascular Surgery, Chungbuk National University Hospital, Cheongju, 28644, South Korea
| | - Soon Tak Jeong
- Department of Physical Medicine and Rehabilitation, Ansanhyo Hospital, Ansan City, Republic of Korea
| | - Su Young Yoon
- Department of Thoracic and Cardiovascular Surgery, Chungbuk National University Hospital, Cheongju, 28644, South Korea
| | - Jin Young Lee
- Department of Trauma Surgery, Chungbuk National University Hospital, Cheongju, 28644, South Korea
| | - Seheon Kim
- Department of Trauma Surgery, Chungbuk National University Hospital, Cheongju, 28644, South Korea
| | - Hyunmin Cho
- Department of Trauma Surgery, Jeju Regional Trauma Center, Cheju Halla General Hospital, 65, Doryeong-ro, Jeju-si, Jeju-do, Republic of Korea
| | - Wu Seong Kang
- Department of Trauma Surgery, Jeju Regional Trauma Center, Cheju Halla General Hospital, 65, Doryeong-ro, Jeju-si, Jeju-do, Republic of Korea.
| |
Collapse
|
14
|
Chicco D, Haupt R, Garaventa A, Uva P, Luksch R, Cangelosi D. Computational intelligence analysis of high-risk neuroblastoma patient health records reveals time to maximum response as one of the most relevant factors for outcome prediction. Eur J Cancer 2023; 193:113291. [PMID: 37708628 DOI: 10.1016/j.ejca.2023.113291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 07/24/2023] [Accepted: 08/09/2023] [Indexed: 09/16/2023]
Abstract
OBJECTIVE Seek new candidate prognostic markers for neuroblastoma outcome, relapse or progression. MATERIALS AND METHODS In this multicentre and retrospective study, Random Forests coupled with recursive feature elimination techniques were applied to electronic records (55 clinical features) of 3034 neuroblastoma patients. To assess model performance and feature importance, dataset was split into a training set (80%) and a test set (20%). RESULTS In the test set, the mean Matthews correlation coefficient for the Random Forests models was greater than 0.46. Feature importance analysis revealed that, together with maximum response to first-line treatment (D_MAX_RESP), time to maximum response to first-line treatment (TIME_MAX_RESP.days) is a relevant predictor of both patients' outcome and relapse\progression. We showed the prognostic value of the max response to first-line treatment in clinically relevant subsets of high-, intermediate-, and low-risk patients for both overall and relapse-free survival (Log-rank p-value<0.0001). In high-risk patients older than 18 months and stage 4 tumour achieving a complete response or very good partial response, patients who exhibited a D_MAX_RESP greater than 9 months showed a better prognosis with respect to patients achieving D_MAX_RESP earlier than 9 months (overall survival): hazard ratio 3.3 95% confidence interval 1.8-5.9, Log-rank p-value p < 0.0001; relapse-free survival: 3.2 95%CI 1.8-5.6, Log-rank p-value p < 0.0001). CONCLUSION Our findings evidence the emerging role of the TIME_MAX_RESP.days in addition to the D_MAX_RESP as relevant predictors of outcome and relapse\progression in neuroblastoma with potential clinical impact on the management and treatment of patients.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada; Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| | - Riccardo Haupt
- DOPO Clinic, Department of Hematology/Oncology, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | | | - Paolo Uva
- Unità di Bioinformatica Clinica, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | - Roberto Luksch
- S.C. Pediatria oncologica, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Davide Cangelosi
- Unità di Bioinformatica Clinica, IRCCS Istituto Giannina Gaslini, Genoa, Italy.
| |
Collapse
|
15
|
Ahluwalia M, Abdalla M, Sanayei J, Seyyed-Kalantari L, Hussain M, Ali A, Fine B. The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups. Radiol Artif Intell 2023; 5:e220270. [PMID: 37795140 PMCID: PMC10546359 DOI: 10.1148/ryai.220270] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 10/06/2023]
Abstract
Purpose To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis. Materials and Methods In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups. Results Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve. Conclusion Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.Keywords: Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms Supplemental material is available for this article. © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.
Collapse
Affiliation(s)
- Monish Ahluwalia
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - Mohamed Abdalla
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - James Sanayei
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - Laleh Seyyed-Kalantari
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - Mohannad Hussain
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - Amna Ali
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| | - Benjamin Fine
- From the Kingston Health Sciences Centre, Queen’s University,
Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia,
J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia),
Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical
Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector
Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.);
Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of
Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West,
Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8;
Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan,
Canada (J.S.); Department of Electrical Engineering and Computer Science, York
University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo,
Ontario, Canada (M.H.)
| |
Collapse
|
16
|
Chicco D, Jurman G. A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes-Mallows index. J Biomed Inform 2023; 144:104426. [PMID: 37352899 DOI: 10.1016/j.jbi.2023.104426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/09/2023] [Accepted: 06/15/2023] [Indexed: 06/25/2023]
Abstract
Even if assessing binary classifications is a common task in scientific research, no consensus on a single statistic summarizing the confusion matrix has been reached so far. In recent studies, we demonstrated the advantages of the Matthews correlation coefficient (MCC) over other popular rates such as cross-entropy error, F1 score, accuracy, balanced accuracy, bookmaker informedness, diagnostic odds ratio, Brier score, and Cohen's kappa. In this study, we compared the MCC to other two statistics: prevalence threshold (PT), frequently used in obstetrics and gynecology, and Fowlkes-Mallows index, a metric employed in fuzzy logic and drug discovery. Through the investigation of the mutual relations among three metrics and the study of some relevant use cases, we show that, when positive data elements and negative data elements have the same importance, the Matthews correlation coefficient can be more informative than its two competitors, even this time.
Collapse
|
17
|
Gallagher A, Kar S, Sepúlveda MS. Computational Modeling of Human Serum Albumin Binding of Per- and Polyfluoroalkyl Substances Employing QSAR, Read-Across, and Docking. Molecules 2023; 28:5375. [PMID: 37513249 PMCID: PMC10383382 DOI: 10.3390/molecules28145375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/06/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
Per- and polyfluoroalkyl substances (PFAS) are synthetic chemicals in widespread use that have been shown to be toxic to wildlife and humans. Human serum albumin (HSA) is a known transport protein that binds PFAS at various sites, leading to bioaccumulation and long-term toxicity. In silico tools like quantitative structure-activity relationship (QSAR), read-across, and quantitative read-across structure-property relationship (q-RASPR) are proven techniques for modeling chemical toxicity based on experimental data which can be used to predict the toxicity of untested and new chemicals, while at the same time, help to identify the major features responsible for toxicity. Classification-based and regression-based QSAR models are employed in the present study to predict the binding affinities of 24 PFAS to HSA. Regression-based QSAR models revealed that the packing density index (PDI) and quantitative estimation of drug-likeness (QED) descriptors were both positively correlated with higher binding affinity, while the classification-based QSAR model showed the average connectivity index of order 4 (X4A) descriptor was inversely correlated with binding affinity. Whereas molecular docking studies suggested that PFAS with the highest binding affinity to HSA create hydrogen bonds with Arg348 and salt bridges with Arg348 and Arg485, PFAS with lower binding affinity either showed no interactions with either amino acid or only interactions with Arg348. Among the studied PFAS, perfluoroalkyl acids (PFAA) with large carbon chain length (>C10) have one of the lowest binding affinities, compared to PFAA with carbon chain length ranging from 7 to 9, which showed the highest affinity to HSA. Generalized Read-Across (GenRA) was used to predict toxicity outcomes for the top five highest binding affinity PFAS based on 10 structural analogs for each and found that all are predicted as being chronic to sub-chronically toxic to HSA. The developed in silico models presented in this work can provide a framework for designing PFAS alternatives, screening compounds currently in use, and for the study of PFAS mixture toxicity, which is an area of intense research.
Collapse
Affiliation(s)
- Andrea Gallagher
- Chemometrics and Molecular Modeling Laboratory, Department of Chemistry, Kean University, 1000 Morris Avenue, Union, NJ 07083, USA
| | - Supratik Kar
- Chemometrics and Molecular Modeling Laboratory, Department of Chemistry, Kean University, 1000 Morris Avenue, Union, NJ 07083, USA
| | - Maria S Sepúlveda
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN 47907, USA
- Faculty of Life Sciences, Universidad Andres Bello, Santiago 8370146, Chile
| |
Collapse
|
18
|
Qiu B, Shen Z, Yang D, Wang Q. Applying machine learning techniques to predict the risk of lung metastases from rectal cancer: a real-world retrospective study. Front Oncol 2023; 13:1183072. [PMID: 37293595 PMCID: PMC10247137 DOI: 10.3389/fonc.2023.1183072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/11/2023] [Indexed: 06/10/2023] Open
Abstract
Background Metastasis in the lungs is common in patients with rectal cancer, and it can have severe consequences on their survival and quality of life. Therefore, it is essential to identify patients who may be at risk of developing lung metastasis from rectal cancer. Methods In this study, we utilized eight machine-learning methods to create a model for predicting the risk of lung metastasis in patients with rectal cancer. Our cohort consisted of 27,180 rectal cancer patients selected from the Surveillance, Epidemiology and End Results (SEER) database between 2010 and 2017 for model development. Additionally, we validated our models using 1118 rectal cancer patients from a Chinese hospital to evaluate model performance and generalizability. We assessed our models' performance using various metrics, including the area under the curve (AUC), the area under the precision-recall curve (AUPR), the Matthews Correlation Coefficient (MCC), decision curve analysis (DCA), and calibration curves. Finally, we applied the best model to develop a web-based calculator for predicting the risk of lung metastasis in patients with rectal cancer. Result Our study employed tenfold cross-validation to assess the performance of eight machine-learning models for predicting the risk of lung metastasis in patients with rectal cancer. The AUC values ranged from 0.73 to 0.96 in the training set, with the extreme gradient boosting (XGB) model achieving the highest AUC value of 0.96. Moreover, the XGB model obtained the best AUPR and MCC in the training set, reaching 0.98 and 0.88, respectively. We found that the XGB model demonstrated the best predictive power, achieving an AUC of 0.87, an AUPR of 0.60, an accuracy of 0.92, and a sensitivity of 0.93 in the internal test set. Furthermore, the XGB model was evaluated in the external test set and achieved an AUC of 0.91, an AUPR of 0.63, an accuracy of 0.93, a sensitivity of 0.92, and a specificity of 0.93. The XGB model obtained the highest MCC in the internal test set and external validation set, with 0.61 and 0.68, respectively. Based on the DCA and calibration curve analysis, the XGB model had better clinical decision-making ability and predictive power than the other seven models. Lastly, we developed an online web calculator using the XGB model to assist doctors in making informed decisions and to facilitate the model's wider adoption (https://share.streamlit.io/woshiwz/rectal_cancer/main/lung.py). Conclusion In this study, we developed an XGB model based on clinicopathological information to predict the risk of lung metastasis in patients with rectal cancer, which may help physicians make clinical decisions.
Collapse
Affiliation(s)
- Binxu Qiu
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Zixiong Shen
- Department of Thoracic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Dongliang Yang
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Quan Wang
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| |
Collapse
|
19
|
Li D, Pehrson LM, Bonnevie R, Fraccaro M, Thrane J, Tøttrup L, Lauridsen CA, Butt Balaganeshan S, Jankovic J, Andersen TT, Mayar A, Hansen KL, Carlsen JF, Darkner S, Nielsen MB. Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System. Diagnostics (Basel) 2023; 13:diagnostics13061070. [PMID: 36980376 PMCID: PMC10047142 DOI: 10.3390/diagnostics13061070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/06/2023] [Accepted: 03/08/2023] [Indexed: 03/18/2023] Open
Abstract
A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.
Collapse
Affiliation(s)
- Dana Li
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
- Correspondence:
| | - Lea Marie Pehrson
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | | | | | | | | | - Carsten Ammitzbøl Lauridsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Radiography Education, University College Copenhagen, 2200 Copenhagen, Denmark
| | - Sedrah Butt Balaganeshan
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jelena Jankovic
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Tobias Thostrup Andersen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Alyas Mayar
- Department of Health Sciences, Panum Institute, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Kristoffer Lindskov Hansen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jonathan Frederik Carlsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Sune Darkner
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Michael Bachmann Nielsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
20
|
Liang Z. Novel method combining multiscale attention entropy of overnight blood oxygen level and machine learning for easy sleep apnea screening. Digit Health 2023; 9:20552076231211550. [PMID: 37936958 PMCID: PMC10627021 DOI: 10.1177/20552076231211550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
Objective Sleep apnea is a common sleep disorder affecting a significant portion of the population, but many apnea patients remain undiagnosed because existing clinical tests are invasive and expensive. This study aimed to develop a method for easy sleep apnea screening. Methods Three supervised machine learning algorithms, including logistic regression, support vector machine, and light gradient boosting machine, were applied to develop apnea screening models at two apnea-hypopnea index cutoff thresholds: ≥ 5 and ≥ 30 events/hours. The SpO2 recordings of the Sleep Heart Health Study database (N = 5786) were used for model training, validation, and test. Multiscale entropy analysis was performed to derive a set of multiscale attention entropy features from the SpO2 recordings. Demographic features including age, sex, body mass index, and blood pressure were also used. The dependency among the multiscale attention entropy features were handled with the independent component analysis. Results For cutoff ≥ 5/hours, logistic regression model achieved the highest Matthew's correlation coefficient (0.402) and area under the curve (0.747), and reasonably good sensitivity (75.38%), specificity (74.02%), and positive predictive value (92.94%). For cutoff ≥ 30/hours, support vector machine model achieved the highest Matthew's correlation coefficient (0.545) and area under the curve (0.823), and good sensitivity (82.00%), specificity (82.69%), and negative predictive value (95.53%). Conclusions Our models achieved better performance than existing methods and have the potential to be integrated with home-use pulse oximeters.
Collapse
Affiliation(s)
- Zilu Liang
- Kyoto University of Advanced Science (KUAS), Japan
| |
Collapse
|