Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

73
(from Reference Citation Analysis)

Article PDFs (12)

Cited by > 0 (27)

Searched Name

language models

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024;45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]

Abstract

Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.

Collapse

Mishra V, Sarraju A, Kalwani NM, Dexter JP. Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study. J Med Internet Res 2024;26:e55388. [PMID: 38648104 DOI: 10.2196/55388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 04/25/2024] Open

Buehler MJ. Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design. ACS Eng Au 2024;4:241-277. [PMID: 38646516 PMCID: PMC11027160 DOI: 10.1021/acsengineeringau.3c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 04/23/2024]

Abstract

Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design, and manufacturing, including their capacity to work effectively with human language, symbols, code, and numerical data. Here, we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials, applied to retrieving key information about subject areas, developing research hypotheses, discovery of mechanistic relationships across disparate areas of knowledge, and writing and executing simulation codes for active knowledge generation based on physical ground truths. Moreover, when used as sets of AI agents with specific features, capabilities, and instructions, LLMs can provide powerful problem-solution strategies for applications in analysis and design problems. Our experiments focus on using a fine-tuned model, MechGPT, developed based on training data in the mechanics of materials domain. We first affirm how fine-tuning endows LLMs with a reasonable understanding of subject area knowledge. However, when queried outside the context of learned matter, LLMs can have difficulty recalling correct information and may hallucinate. We show how this can be addressed using retrieval-augmented Ontological Knowledge Graph strategies. The graph-based strategy helps us not only to discern how the model understands what concepts are important but also how they are related, which significantly improves generative performance and also naturally allows for injection of new and augmented data sources into generative AI algorithms. We find that the additional feature of relatedness provides advantages over regular retrieval augmentation approaches and not only improves LLM performance but also provides mechanistic insights for exploration of a material design process. Illustrated for a use case of relating distinct areas of knowledge, here, music and proteins, such strategies can also provide an interpretable graph structure with rich information at the node, edge, and subgraph level that provides specific insights into mechanisms and relationships. We discuss other approaches to improve generative qualities, including nonlinear sampling strategies and agent-based modeling that offer enhancements over single-shot generations, whereby LLMs are used to both generate content and assess content against an objective target. Examples provided include complex question answering, code generation, and execution in the context of automated force-field development from actively learned density functional theory (DFT) modeling and data analysis.

Collapse

Hirosawa T, Harada Y, Tokumasu K, Ito T, Suzuki T, Shimizu T. Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration. JMIR Med Inform 2024;12:e55627. [PMID: 38592758 DOI: 10.2196/55627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/14/2024] [Accepted: 03/13/2024] [Indexed: 04/10/2024] Open

Abstract

BACKGROUND

In the evolving field of health care, multimodal generative artificial intelligence (AI) systems, such as ChatGPT-4 with vision (ChatGPT-4V), represent a significant advancement, as they integrate visual data with text data. This integration has the potential to revolutionize clinical diagnostics by offering more comprehensive analysis capabilities. However, the impact on diagnostic accuracy of using image data to augment ChatGPT-4 remains unclear.

OBJECTIVE

This study aims to assess the impact of adding image data on ChatGPT-4's diagnostic accuracy and provide insights into how image data integration can enhance the accuracy of multimodal AI in medical diagnostics. Specifically, this study endeavored to compare the diagnostic accuracy between ChatGPT-4V, which processed both text and image data, and its counterpart, ChatGPT-4, which only uses text data.

METHODS

We identified a total of 557 case reports published in the American Journal of Case Reports from January 2022 to March 2023. After excluding cases that were nondiagnostic, pediatric, and lacking image data, we included 363 case descriptions with their final diagnoses and associated images. We compared the diagnostic accuracy of ChatGPT-4V and ChatGPT-4 without vision based on their ability to include the final diagnoses within differential diagnosis lists. Two independent physicians evaluated their accuracy, with a third resolving any discrepancies, ensuring a rigorous and objective analysis.

RESULTS

The integration of image data into ChatGPT-4V did not significantly enhance diagnostic accuracy, showing that final diagnoses were included in the top 10 differential diagnosis lists at a rate of 85.1% (n=309), comparable to the rate of 87.9% (n=319) for the text-only version (P=.33). Notably, ChatGPT-4V's performance in correctly identifying the top diagnosis was inferior, at 44.4% (n=161), compared with 55.9% (n=203) for the text-only version (P=.002, χ2 test). Additionally, ChatGPT-4's self-reports showed that image data accounted for 30% of the weight in developing the differential diagnosis lists in more than half of cases.

CONCLUSIONS

Our findings reveal that currently, ChatGPT-4V predominantly relies on textual data, limiting its ability to fully use the diagnostic potential of visual information. This study underscores the need for further development of multimodal generative AI systems to effectively integrate and use clinical image data. Enhancing the diagnostic performance of such AI systems through improved multimodal data integration could significantly benefit patient care by providing more accurate and comprehensive diagnostic insights. Future research should focus on overcoming these limitations, paving the way for the practical application of advanced AI in medicine.

Collapse

Xu X, Li C, Yuan X, Zhang Q, Liu Y, Zhu Y, Chen T. ACP-DRL: an anticancer peptides recognition method based on deep representation learning. Front Genet 2024;15:1376486. [PMID: 38655048 PMCID: PMC11035771 DOI: 10.3389/fgene.2024.1376486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open

Beltrami EJ, Grant-Kels JM. Consulting ChatGPT: Ethical dilemmas in language model artificial intelligence. J Am Acad Dermatol 2024;90:879-880. [PMID: 36907556 DOI: 10.1016/j.jaad.2023.02.052] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/14/2023] [Accepted: 02/28/2023] [Indexed: 03/12/2023]

Beaulieu-Jones BR, Berrigan MT, Shah S, Marwaha JS, Lai SL, Brat GA. Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments. Surgery 2024;175:936-942. [PMID: 38246839 PMCID: PMC10947829 DOI: 10.1016/j.surg.2023.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 12/09/2023] [Accepted: 12/15/2023] [Indexed: 01/23/2024]

Abstract

BACKGROUND

Artificial intelligence has the potential to dramatically alter health care by enhancing how we diagnose and treat disease. One promising artificial intelligence model is ChatGPT, a general-purpose large language model trained by OpenAI. ChatGPT has shown human-level performance on several professional and academic benchmarks. We sought to evaluate its performance on surgical knowledge questions and assess the stability of this performance on repeat queries.

METHODS

We evaluated the performance of ChatGPT-4 on questions from the Surgical Council on Resident Education question bank and a second commonly used surgical knowledge assessment, referred to as Data-B. Questions were entered in 2 formats: open-ended and multiple-choice. ChatGPT outputs were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat queries.

RESULTS

A total of 167 Surgical Council on Resident Education and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71.3% and 67.9% of multiple choice and 47.9% and 66.1% of open-ended questions for Surgical Council on Resident Education and Data-B, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained nonobvious insights. Common reasons for incorrect responses included inaccurate information in a complex question (n = 16, 36.4%), inaccurate information in a fact-based question (n = 11, 25.0%), and accurate information with circumstantial discrepancy (n = 6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of questions answered incorrectly on the first query; the response accuracy changed for 6/16 (37.5%) questions.

CONCLUSION

Consistent with findings in other academic and professional domains, we demonstrate near or above human-level performance of ChatGPT on surgical knowledge questions from 2 widely used question banks. ChatGPT performed better on multiple-choice than open-ended questions, prompting questions regarding its potential for clinical application. Unique to this study, we demonstrate inconsistency in ChatGPT responses on repeat queries. This finding warrants future consideration including efforts at training large language models to provide the safe and consistent responses required for clinical application. Despite near or above human-level performance on question banks and given these observations, it is unclear whether large language models such as ChatGPT are able to safely assist clinicians in providing care.

Collapse

Klein E, Kinsella M, Stevens I, Fried-Oken M. Ethical issues raised by incorporating personalized language models into brain-computer interface communication technologies: a qualitative study of individuals with neurological disease. Disabil Rehabil Assist Technol 2024;19:1041-1051. [PMID: 36403143 PMCID: PMC10351684 DOI: 10.1080/17483107.2022.2146217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 09/01/2022] [Accepted: 11/07/2022] [Indexed: 11/21/2022]

Abstract

PURPOSE

To examine the views of individuals with neurodegenerative diseases about ethical issues related to incorporating personalized language models into brain-computer interface (BCI) communication technologies.

METHODS

Fifteen semi-structured interviews and 51 online free response surveys were completed with individuals diagnosed with neurodegenerative disease that could lead to loss of speech and motor skills. Each participant responded to questions after six hypothetical ethics vignettes were presented that address the possibility of building language models with personal words and phrases in BCI communication technologies. Data were analyzed with consensus coding, using modified grounded theory.

RESULTS

Four themes were identified. (1) The experience of a neurodegenerative disease shapes preferences for personalized language models. (2) An individual's identity will be affected by the ability to personalize the language model. (3) The motivation for personalization is tied to how relationships can be helped or harmed. (4) Privacy is important to people who may need BCI communication technologies. Responses suggest that the inclusion of personal lexica raises ethical issues. Stakeholders want their values to be considered during development of BCI communication technologies.

CONCLUSIONS

With the rapid development of BCI communication technologies, it is critical to incorporate feedback from individuals regarding their ethical concerns about the storage and use of personalized language models. Stakeholder values and preferences about disability, privacy, identity and relationships should drive design, innovation and implementation.IMPLICATIONS FOR REHABILITATIONIndividuals with neurodegenerative diseases are important stakeholders to consider in development of natural language processing within brain-computer interface (BCI) communication technologies.The incorporation of personalized language models raises issues related to disability, identity, relationships, and privacy.People who may one day rely on BCI communication technologies care not just about usability of communication technology but about technology that supports their values and priorities.Qualitative ethics-focused research is a valuable tool for exploring stakeholder perspectives on new capabilities of BCI communication technologies, such as the storage and use of personalized language models.

Collapse

Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network. Neurobiol Lang (Camb) 2024;5:7-42. [PMID: 38645614 PMCID: PMC11025651 DOI: 10.1162/nol_a_00116] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/11/2023] [Indexed: 04/23/2024]

Antonello R, Huth A. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data. Neurobiol Lang (Camb) 2024;5:64-79. [PMID: 38645616 PMCID: PMC11025645 DOI: 10.1162/nol_a_00087] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 10/26/2022] [Indexed: 04/23/2024]

Noda M, Ueno T, Koshu R, Takaso Y, Shimada MD, Saito C, Sugimoto H, Fushiki H, Ito M, Nomura A, Yoshizaki T. Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study. JMIR Med Educ 2024;10:e57054. [PMID: 38546736 PMCID: PMC11009855 DOI: 10.2196/57054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 02/22/2024] [Accepted: 03/09/2024] [Indexed: 04/14/2024]

Abstract

BACKGROUND

Artificial intelligence models can learn from medical literature and clinical cases and generate answers that rival human experts. However, challenges remain in the analysis of complex data containing images and diagrams.

OBJECTIVE

This study aims to assess the answering capabilities and accuracy of ChatGPT-4 Vision (GPT-4V) for a set of 100 questions, including image-based questions, from the 2023 otolaryngology board certification examination.

METHODS

Answers to 100 questions from the 2023 otolaryngology board certification examination, including image-based questions, were generated using GPT-4V. The accuracy rate was evaluated using different prompts, and the presence of images, clinical area of the questions, and variations in the answer content were examined.

RESULTS

The accuracy rate for text-only input was, on average, 24.7% but improved to 47.3% with the addition of English translation and prompts (P<.001). The average nonresponse rate for text-only input was 46.3%; this decreased to 2.7% with the addition of English translation and prompts (P<.001). The accuracy rate was lower for image-based questions than for text-only questions across all types of input, with a relatively high nonresponse rate. General questions and questions from the fields of head and neck allergies and nasal allergies had relatively high accuracy rates, which increased with the addition of translation and prompts. In terms of content, questions related to anatomy had the highest accuracy rate. For all content types, the addition of translation and prompts increased the accuracy rate. As for the performance based on image-based questions, the average of correct answer rate with text-only input was 30.4%, and that with text-plus-image input was 41.3% (P=.02).

CONCLUSIONS

Examination of artificial intelligence's answering capabilities for the otolaryngology board certification examination improves our understanding of its potential and limitations in this field. Although the improvement was noted with the addition of translation and prompts, the accuracy rate for image-based questions was lower than that for text-based questions, suggesting room for improvement in GPT-4V at this stage. Furthermore, text-plus-image input answers a higher rate in image-based questions. Our findings imply the usefulness and potential of GPT-4V in medicine; however, future consideration of safe use methods is needed.

Collapse

Elyoseph Z, Levkovich I. Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study. JMIR Ment Health 2024;11:e53043. [PMID: 38533615 PMCID: PMC11004608 DOI: 10.2196/53043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 03/28/2024] Open

Abstract

Background

The current paradigm in mental health care focuses on clinical recovery and symptom remission. This model's efficacy is influenced by therapist trust in patient recovery potential and the depth of the therapeutic relationship. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the health care field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia.

Objective

This study aimed to evaluate the ability of large language models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without professional treatment and the long-term positive and negative outcomes.

Methods

Vignettes were inputted into LLMs interfaces and assessed 10 times by 4 AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists, and mental health nurses) and the general public think about schizophrenia prognosis with and without professional treatment and the positive and negative long-term outcomes of schizophrenia interventions.

Results

For the prognosis of schizophrenia with professional treatment, ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude, and Bard aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional treatment. For long-term outcomes, ChatGPT-4 and Claude predicted more negative outcomes than Bard and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more pessimistic than Bard and ChatGPT-4.

Conclusions

The finding that 3 out of the 4 LLMs aligned closely with the predictions of mental health professionals when considering the "with treatment" condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT-3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, although LLMs hold promise in augmenting health care, their application necessitates rigorous validation and a harmonious blend with human expertise.

Collapse

Nakao T, Miki S, Nakamura Y, Kikuchi T, Nomura Y, Hanaoka S, Yoshikawa T, Abe O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Med Educ 2024;10:e54393. [PMID: 38470459 DOI: 10.2196/54393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/26/2023] [Accepted: 02/16/2024] [Indexed: 03/13/2024]

Davis J, Van Bulck L, Durieux BN, Lindvall C. The Temperature Feature of ChatGPT: Modifying Creativity for Clinical Research. JMIR Hum Factors 2024;11:e53559. [PMID: 38457221 PMCID: PMC10960206 DOI: 10.2196/53559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/11/2023] [Accepted: 01/24/2024] [Indexed: 03/09/2024] Open

Reynolds K, Tejasvi T. Potential Use of ChatGPT in Responding to Patient Questions and Creating Patient Resources. JMIR Dermatol 2024;7:e48451. [PMID: 38446541 PMCID: PMC10955382 DOI: 10.2196/48451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/11/2023] [Accepted: 02/22/2024] [Indexed: 03/07/2024] Open

Roster K, Kann RB, Farabi B, Gronbeck C, Brownstone N, Lipner SR. Readability and Health Literacy Scores for ChatGPT-Generated Dermatology Public Education Materials: Cross-Sectional Analysis of Sunscreen and Melanoma Questions. JMIR Dermatol 2024;7:e50163. [PMID: 38446502 PMCID: PMC10955394 DOI: 10.2196/50163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/02/2024] [Accepted: 02/06/2024] [Indexed: 03/07/2024] Open

Rodriguez DV, Lawrence K, Gonzalez J, Brandfield-Harvey B, Xu L, Tasneem S, Levine DL, Mann D. Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study. JMIR Hum Factors 2024;11:e52885. [PMID: 38446539 PMCID: PMC10955400 DOI: 10.2196/52885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/27/2023] [Accepted: 12/15/2023] [Indexed: 03/07/2024] Open

Abstract

BACKGROUND

Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting.

OBJECTIVE

This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program.

METHODS

We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency.

RESULTS

Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies.

CONCLUSIONS

ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation.

TRIAL REGISTRATION

ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500.

Collapse

Goldstein JA, Chao J, Grossman S, Stamos A, Tomz M. How persuasive is AI-generated propaganda? PNAS Nexus 2024;3:pgae034. [PMID: 38380055 PMCID: PMC10878360 DOI: 10.1093/pnasnexus/pgae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/03/2024] [Indexed: 02/22/2024]

Ligeti B, Szepesi-Nagy I, Bodnár B, Ligeti-Nagy N, Juhász J. ProkBERT family: genomic language models for microbiome applications. Front Microbiol 2024;14:1331233. [PMID: 38282738 PMCID: PMC10810988 DOI: 10.3389/fmicb.2023.1331233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 01/30/2024] Open

Abstract

Background

In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing novel functionalities within extensive datasets. However, the effectiveness of these methods in microbiology faces challenges due to the complex and heterogeneous nature of microbial data, further complicated by low signal-to-noise ratios, context-dependency, and a significant shortage of appropriately labeled datasets. This study introduces the ProkBERT model family, a collection of large language models, designed for genomic tasks. It provides a generalizable sequence representation for nucleotide sequences, learned from unlabeled genome data. This approach helps overcome the above-mentioned limitations in the field, thereby improving our understanding of microbial ecosystems and their impact on health and disease.

Methods

ProkBERT models are based on transfer learning and self-supervised methodologies, enabling them to use the abundant yet complex microbial data effectively. The introduction of the novel Local Context-Aware (LCA) tokenization technique marks a significant advancement, allowing ProkBERT to overcome the contextual limitations of traditional transformer models. This methodology not only retains rich local context but also demonstrates remarkable adaptability across various bioinformatics tasks.

Results

In practical applications such as promoter prediction and phage identification, the ProkBERT models show superior performance. For promoter prediction tasks, the top-performing model achieved a Matthews Correlation Coefficient (MCC) of 0.74 for E. coli and 0.62 in mixed-species contexts. In phage identification, ProkBERT models consistently outperformed established tools like VirSorter2 and DeepVirFinder, achieving an MCC of 0.85. These results underscore the models' exceptional accuracy and generalizability in both supervised and unsupervised tasks.

Conclusions

The ProkBERT model family is a compact yet powerful tool in the field of microbiology and bioinformatics. Its capacity for rapid, accurate analyses and its adaptability across a spectrum of tasks marks a significant advancement in machine learning applications in microbiology. The models are available on GitHub (https://github.com/nbrg-ppcu/prokbert) and HuggingFace (https://huggingface.co/nerualbioinfo) providing an accessible tool for the community.

Collapse

Dergaa I, Fekih-Romdhane F, Hallit S, Loch AA, Glenn JM, Fessi MS, Ben Aissa M, Souissi N, Guelmami N, Swed S, El Omri A, Bragazzi NL, Ben Saad H. ChatGPT is not ready yet for use in providing mental health assessment and interventions. Front Psychiatry 2024;14:1277756. [PMID: 38239905 PMCID: PMC10794665 DOI: 10.3389/fpsyt.2023.1277756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/17/2023] [Indexed: 01/22/2024] Open

Abstract

Background

Psychiatry is a specialized field of medicine that focuses on the diagnosis, treatment, and prevention of mental health disorders. With advancements in technology and the rise of artificial intelligence (AI), there has been a growing interest in exploring the potential of AI language models systems, such as Chat Generative Pre-training Transformer (ChatGPT), to assist in the field of psychiatry.

Objective

Our study aimed to evaluates the effectiveness, reliability and safeness of ChatGPT in assisting patients with mental health problems, and to assess its potential as a collaborative tool for mental health professionals through a simulated interaction with three distinct imaginary patients.

Methods

Three imaginary patient scenarios (cases A, B, and C) were created, representing different mental health problems. All three patients present with, and seek to eliminate, the same chief complaint (i.e., difficulty falling asleep and waking up frequently during the night in the last 2°weeks). ChatGPT was engaged as a virtual psychiatric assistant to provide responses and treatment recommendations.

Results

In case A, the recommendations were relatively appropriate (albeit non-specific), and could potentially be beneficial for both users and clinicians. However, as complexity of clinical cases increased (cases B and C), the information and recommendations generated by ChatGPT became inappropriate, even dangerous; and the limitations of the program became more glaring. The main strengths of ChatGPT lie in its ability to provide quick responses to user queries and to simulate empathy. One notable limitation is ChatGPT inability to interact with users to collect further information relevant to the diagnosis and management of a patient's clinical condition. Another serious limitation is ChatGPT inability to use critical thinking and clinical judgment to drive patient's management.

Conclusion

As for July 2023, ChatGPT failed to give the simple medical advice given certain clinical scenarios. This supports that the quality of ChatGPT-generated content is still far from being a guide for users and professionals to provide accurate mental health information. It remains, therefore, premature to conclude on the usefulness and safety of ChatGPT in mental health practice.

Collapse

Affiliation(s)

Ismail Dergaa Primary Health Care Corporation (PHCC), Doha, Qatar Research Unit Physical Activity, Sport, and Health, UR18JS01, National Observatory of Sport, Tunis, Tunisia High Institute of Sport and Physical Education, University of Sfax, Sfax, Tunisia
Feten Fekih-Romdhane The Tunisian Center of Early Intervention in Psychosis, Department of Psychiatry “Ibn Omrane”, Razi Hospital, Manouba, Tunisia Faculty of Medicine of Tunis, Tunis El Manar University, Tunis, Tunisia
Souheil Hallit School of Medicine and Medical Sciences, Holy Spirit University of Kaslik, Jounieh, Lebanon Psychology Department, College of Humanities, Effat University, Jeddah, Saudi Arabia Applied Science Research Center, Applied Science Private University, Amman, Jordan
Alexandre Andrade Loch Laboratorio de Neurociencias (LIM 27), Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Instituto de Psiquiatria, Universidade de Sao Paulo, São Paulo, Brazil Instituto Nacional de Biomarcadores em Neuropsiquiatria (INBION), Conselho Nacional de Desenvolvimento Científico e Tecnológico, São Paulo, Brazil
Jordan M. Glenn Neurotrack Technologies, Redwood City, CA, United States
Mohamed Saifeddin Fessi High Institute of Sport and Physical Education, University of Sfax, Sfax, Tunisia
Mohamed Ben Aissa Department of Human and Social Sciences, Higher Institute of Sport and Physical Education of Kef, University of Jendouba, Jendouba, Tunisia
Nizar Souissi Research Unit Physical Activity, Sport, and Health, UR18JS01, National Observatory of Sport, Tunis, Tunisia
Noomen Guelmami Department of Health Sciences (DISSAL), Postgraduate School of Public Health, University of Genoa, Genoa, Italy
Sarya Swed Faculty of Medicine, Aleppo University, Aleppo, Syria
Abdelfatteh El Omri Surgical Research Section, Department of Surgery, Hamad Medical Corporation, Doha, Qatar
Nicola Luigi Bragazzi Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, ON, Canada
Helmi Ben Saad Service of Physiology and Functional Explorations, Farhat HACHED Hospital, University of Sousse, Sousse, Tunisia Heart Failure (LR12SP09) Research Laboratory, Farhat HACHED Hospital, University of Sousse, Sousse, Tunisia

Collapse

Bajorath J. Chemical language models for molecular design. Mol Inform 2024;43:e202300288. [PMID: 38010610 DOI: 10.1002/minf.202300288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]

Cheng SL, Tsai SJ, Bai YM, Ko CH, Hsu CW, Yang FC, Tsai CK, Tu YK, Yang SN, Tseng PT, Hsu TW, Liang CS, Su KP. Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study. J Med Internet Res 2023;25:e51229. [PMID: 38145486 PMCID: PMC10760418 DOI: 10.2196/51229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/17/2023] [Accepted: 11/20/2023] [Indexed: 12/26/2023] Open

Abstract

BACKGROUND

ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers.

OBJECTIVE

We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research.

METHODS

We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content.

RESULTS

The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference -4.33; 95% CI -4.79 to -3.86; P<.001) but minimal in the 4-subheading structured format (mean difference -2.33; 95% CI -2.79 to -1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT.

CONCLUSIONS

Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%.

Collapse

Affiliation(s)

Shu-Li Cheng Department of Nursing, Mackay Medical College, Taipei, Taiwan
Shih-Jen Tsai Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Division of Psychiatry, School of Medicine, National Yang-Ming University, Taipei, Taiwan
Ya-Mei Bai Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan Division of Psychiatry, School of Medicine, National Yang-Ming University, Taipei, Taiwan
Chih-Hung Ko Department of Psychiatry, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan Department of Psychiatry, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan Department of Psychiatry, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Chih-Wei Hsu Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
Fu-Chi Yang Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Chia-Kuang Tsai Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Yu-Kang Tu Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan Department of Dentistry, National Taiwan University Hospital, Taipei, Taiwan
Szu-Nian Yang Department of Psychiatry, Tri-service Hospital, Beitou branch, Taipei, Taiwan Department of Psychiatry, Armed Forces Taoyuan General Hospital, Taoyuan, Taiwan Graduate Institute of Health and Welfare Policy, National Yang Ming Chiao Tung University, Taipei, Taiwan
Ping-Tao Tseng Institute of Biomedical Sciences, Institute of Precision Medicine, National Sun Yat-sen University, Kaohsiung, Taiwan Department of Psychology, College of Medical and Health Science, Asia University, Taichung, Taiwan Prospect Clinic for Otorhinolaryngology and Neurology, Kaohsiung, Taiwan
Tien-Wei Hsu Department of Psychiatry, E-Da Dachang Hospital, I-Shou University, Kaohsiung, Taiwan Department of Psychiatry, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan
Chih-Sung Liang Department of Psychiatry, Tri-service Hospital, Beitou branch, Taipei, Taiwan Department of Psychiatry, National Defense Medical Center, Taipei, Taiwan
Kuan-Pin Su College of Medicine, China Medical University, Taichung, Taiwan Mind-Body Interface Laboratory, China Medical University and Hospital, Taichung, Taiwan An-Nan Hospital, China Medical University, Tainan, Taiwan

Collapse

Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR Med Educ 2023;9:e52202. [PMID: 38055323 PMCID: PMC10733815 DOI: 10.2196/52202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023]

Abstract

BACKGROUND

The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages.

OBJECTIVE

This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE).

METHODS

We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents' correct response rates.

RESULTS

Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P<.001). In terms of specific disciplines, GPT-4 scored 23.5 points higher in the "specific diseases," 30.9 points higher in "obstetrics and gynecology," and 26.1 points higher in "internal medicine." In contrast, GPT-4 scores in "medical interviewing and professionalism," "general practice," and "psychiatry" were lower than those of the residents, although this discrepancy was not statistically significant. Upon analyzing scores based on question difficulty, GPT-4 scores were 17.2 points lower for easy problems (P=.007) but were 25.4 and 24.4 points higher for normal and difficult problems, respectively (P<.001). In year-on-year comparisons, GPT-4 scores were 21.7 and 21.5 points higher in the 2020 (P=.01) and 2022 (P=.003) examinations, respectively, but only 3.5 points higher in the 2021 examinations (no significant difference).

CONCLUSIONS

In the Japanese language, GPT-4 also outperformed the average medical residents in the GM-ITE test, originally designed for them. Specifically, GPT-4 demonstrated a tendency to score higher on difficult questions with low resident correct response rates and those demanding a more comprehensive understanding of diseases. However, GPT-4 scored comparatively lower on questions that residents could readily answer, such as those testing attitudes toward patients and professionalism, as well as those necessitating an understanding of context and communication. These findings highlight the strengths and limitations of artificial intelligence applications in medical education and practice.

Collapse

Semrl N, Feigl S, Taumberger N, Bracic T, Fluhr H, Blockeel C, Kollmann M. AI language models in human reproduction research: exploring ChatGPT's potential to assist academic writing. Hum Reprod 2023;38:2281-2288. [PMID: 37833847 DOI: 10.1093/humrep/dead207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/06/2023] [Indexed: 10/15/2023] Open

Savage T, Wang J, Shieh L. A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation. JMIR Med Inform 2023;11:e49886. [PMID: 38010803 DOI: 10.2196/49886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/04/2023] [Accepted: 10/24/2023] [Indexed: 11/29/2023] Open

Abstract

BACKGROUND

Best Practice Alerts (BPAs) are alert messages to physicians in the electronic health record that are used to encourage appropriate use of health care resources. While these alerts are helpful in both improving care and reducing costs, BPAs are often broadly applied nonselectively across entire patient populations. The development of large language models (LLMs) provides an opportunity to selectively identify patients for BPAs.

OBJECTIVE

In this paper, we present an example case where an LLM screening tool is used to select patients appropriate for a BPA encouraging the prescription of deep vein thrombosis (DVT) anticoagulation prophylaxis. The artificial intelligence (AI) screening tool was developed to identify patients experiencing acute bleeding and exclude them from receiving a DVT prophylaxis BPA.

METHODS

Our AI screening tool used a BioMed-RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach; AllenAI) model to perform classification of physician notes, identifying patients without active bleeding and thus appropriate for a thromboembolism prophylaxis BPA. The BioMed-RoBERTa model was fine-tuned using 500 history and physical notes of patients from the MIMIC-III (Medical Information Mart for Intensive Care) database who were not prescribed anticoagulation. A development set of 300 MIMIC patient notes was used to determine the model's hyperparameters, and a separate test set of 300 patient notes was used to evaluate the screening tool.

RESULTS

Our MIMIC-III test set population of 300 patients included 72 patients with bleeding (ie, were not appropriate for a DVT prophylaxis BPA) and 228 without bleeding who were appropriate for a DVT prophylaxis BPA. The AI screening tool achieved impressive accuracy with a precision-recall area under the curve of 0.82 (95% CI 0.75-0.89) and a receiver operator curve area under the curve of 0.89 (95% CI 0.84-0.94). The screening tool reduced the number of patients who would trigger an alert by 20% (240 instead of 300 alerts) and increased alert applicability by 14.8% (218 [90.8%] positive alerts from 240 total alerts instead of 228 [76%] positive alerts from 300 total alerts), compared to nonselectively sending alerts for all patients.

CONCLUSIONS

These results show a proof of concept on how language models can be used as a screening tool for BPAs. We provide an example AI screening tool that uses a HIPAA (Health Insurance Portability and Accountability Act)-compliant BioMed-RoBERTa model deployed with minimal computing power. Larger models (eg, Generative Pre-trained Transformers-3, Generative Pre-trained Transformers-4, and Pathways Language Model) will exhibit superior performance but require data use agreements to be HIPAA compliant. We anticipate LLMs to revolutionize quality improvement in hospital medicine.

Collapse

Mansoor S, Baek M, Juergens D, Watson JL, Baker D. Zero-shot mutation effect prediction on protein stability and function using RoseTTAFold. Protein Sci 2023;32:e4780. [PMID: 37695922 PMCID: PMC10578109 DOI: 10.1002/pro.4780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/13/2023]

Benegas G, Batra SS, Song YS. DNA language models are powerful predictors of genome-wide variant effects. Proc Natl Acad Sci U S A 2023;120:e2311219120. [PMID: 37883436 PMCID: PMC10622914 DOI: 10.1073/pnas.2311219120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/08/2023] [Indexed: 10/28/2023] Open

Zhou W, Prater LC, Goldstein EV, Mooney SJ. Identifying Rare Circumstances Preceding Female Firearm Suicides: Validating A Large Language Model Approach. JMIR Ment Health 2023;10:e49359. [PMID: 37847549 PMCID: PMC10618876 DOI: 10.2196/49359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 08/31/2023] [Accepted: 09/02/2023] [Indexed: 10/18/2023] Open

Abstract

BACKGROUND

Firearm suicide has been more prevalent among males, but age-adjusted female firearm suicide rates increased by 20% from 2010 to 2020, outpacing the rate increase among males by about 8 percentage points, and female firearm suicide may have different contributing circumstances. In the United States, the National Violent Death Reporting System (NVDRS) is a comprehensive source of data on violent deaths and includes unstructured incident narrative reports from coroners or medical examiners and law enforcement. Conventional natural language processing approaches have been used to identify common circumstances preceding female firearm suicide deaths but failed to identify rarer circumstances due to insufficient training data.

OBJECTIVE

This study aimed to leverage a large language model approach to identify infrequent circumstances preceding female firearm suicide in the unstructured coroners or medical examiners and law enforcement narrative reports available in the NVDRS.

METHODS

We used the narrative reports of 1462 female firearm suicide decedents in the NVDRS from 2014 to 2018. The reports were written in English. We coded 9 infrequent circumstances preceding female firearm suicides. We experimented with predicting those circumstances by leveraging a large language model approach in a yes/no question-answer format. We measured the prediction accuracy with F1-score (ranging from 0 to 1). F1-score is the harmonic mean of precision (positive predictive value) and recall (true positive rate or sensitivity).

RESULTS

Our large language model outperformed a conventional support vector machine-supervised machine learning approach by a wide margin. Compared to the support vector machine model, which had F1-scores less than 0.2 for most infrequent circumstances, our large language model approach achieved an F1-score of over 0.6 for 4 circumstances and 0.8 for 2 circumstances.

CONCLUSIONS

The use of a large language model approach shows promise. Researchers interested in using natural language processing to identify infrequent circumstances in narrative report data may benefit from large language models.

Collapse

Korolev V, Protsenko P. Accurate, interpretable predictions of materials properties within transformer language models. Patterns (N Y) 2023;4:100803. [PMID: 37876904 PMCID: PMC10591138 DOI: 10.1016/j.patter.2023.100803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/06/2023] [Accepted: 07/04/2023] [Indexed: 10/26/2023]

Beltrami EJ, Grant-Kels JM. Dermatology in the wake of an AI revolution: Who gets a say? J Am Acad Dermatol 2023;89:e159-e160. [PMID: 37268021 DOI: 10.1016/j.jaad.2023.05.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 06/04/2023]

Ferreira AL, Lipoff JB. The complex ethics of applying ChatGPT and language model artificial intelligence in dermatology. J Am Acad Dermatol 2023;89:e157-e158. [PMID: 37263382 DOI: 10.1016/j.jaad.2023.05.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/02/2023] [Accepted: 05/06/2023] [Indexed: 06/03/2023]

Májovský M, Mikolov T, Netuka D. AI Is Changing the Landscape of Academic Writing: What Can Be Done? Authors' Reply to: AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on "Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened". J Med Internet Res 2023;25:e50844. [PMID: 37651175 PMCID: PMC10502592 DOI: 10.2196/50844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 08/12/2023] [Indexed: 09/01/2023] Open

Liu N, Brown A. AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on "Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened". J Med Internet Res 2023;25:e50591. [PMID: 37651167 PMCID: PMC10502600 DOI: 10.2196/50591] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/12/2023] [Indexed: 09/01/2023] Open

Hsu HY, Hsu KC, Hou SY, Wu CL, Hsieh YW, Cheng YD. Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation. JMIR Med Educ 2023;9:e48433. [PMID: 37561097 PMCID: PMC10477918 DOI: 10.2196/48433] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/11/2023]

Borchert RJ, Hickman CR, Pepys J, Sadler TJ. Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom. JMIR Med Educ 2023;9:e48978. [PMID: 37548997 PMCID: PMC10442724 DOI: 10.2196/48978] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 06/30/2023] [Accepted: 07/25/2023] [Indexed: 08/08/2023]

Abstract

BACKGROUND

ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.

OBJECTIVE

We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.

METHODS

All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.

RESULTS

Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.

CONCLUSIONS

Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.

Collapse

Shaitarova A, Zaghir J, Lavelli A, Krauthammer M, Rinaldi F. Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. Yearb Med Inform 2023;32:230-243. [PMID: 38147865 PMCID: PMC10751112 DOI: 10.1055/s-0043-1768726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open

Baker MN, Burruss CP, Wilson CL. ChatGPT: A Supplemental Tool for Efficiency and Improved Communication in Rural Dermatology. Cureus 2023;15:e43812. [PMID: 37731429 PMCID: PMC10508964 DOI: 10.7759/cureus.43812] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2023] [Indexed: 09/22/2023] Open

Gala D, Makaryus AN. The Utility of Language Models in Cardiology: A Narrative Review of the Benefits and Concerns of ChatGPT-4. Int J Environ Res Public Health 2023;20:6438. [PMID: 37568980 PMCID: PMC10419098 DOI: 10.3390/ijerph20156438] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/30/2023] [Accepted: 07/19/2023] [Indexed: 08/13/2023]

Pavlick E. Symbols and grounding in large language models. Philos Trans A Math Phys Eng Sci 2023;381:20220041. [PMID: 37271171 PMCID: PMC10239679 DOI: 10.1098/rsta.2022.0041] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/20/2023] [Indexed: 06/06/2023]

Beaulieu-Jones BR, Shah S, Berrigan MT, Marwaha JS, Lai SL, Brat GA. Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments. medRxiv 2023:2023.07.16.23292743. [PMID: 37502981 PMCID: PMC10371188 DOI: 10.1101/2023.07.16.23292743] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Abstract

Background

Artificial intelligence (AI) has the potential to dramatically alter healthcare by enhancing how we diagnosis and treat disease. One promising AI model is ChatGPT, a large general-purpose language model trained by OpenAI. The chat interface has shown robust, human-level performance on several professional and academic benchmarks. We sought to probe its performance and stability over time on surgical case questions.

Methods

We evaluated the performance of ChatGPT-4 on two surgical knowledge assessments: the Surgical Council on Resident Education (SCORE) and a second commonly used knowledge assessment, referred to as Data-B. Questions were entered in two formats: open-ended and multiple choice. ChatGPT output were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat encounters.

Results

A total of 167 SCORE and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71% and 68% of multiple-choice SCORE and Data-B questions, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained non-obvious insights. Common reasons for inaccurate responses included: inaccurate information in a complex question (n=16, 36.4%); inaccurate information in fact-based question (n=11, 25.0%); and accurate information with circumstantial discrepancy (n=6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of inaccurate questions; the response accuracy changed for 6/16 questions.

Conclusion

Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care.

Collapse

Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023;5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open

Affiliation(s)

Luc Mottin HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Jean-Philippe Goldman Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Christoph Jäggli Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Rita Achermann Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
Julien Gobeill HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Knafou HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Ehrsam Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexandre Wicky Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Camille L. Gérard Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Tanja Schwenk Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
Mélinda Charrier Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Petros Tsantoulis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Christian Lovis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexander Leichtle Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Michael K. Kiessling Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
Olivier Michielin Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Sylvain Pradervand Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Vasiliki Foufi Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Patrick Ruch HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland

Collapse

Májovský M, Černý M, Kasal M, Komarc M, Netuka D. Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened. J Med Internet Res 2023;25:e46924. [PMID: 37256685 PMCID: PMC10267787 DOI: 10.2196/46924] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/25/2023] [Accepted: 05/03/2023] [Indexed: 06/01/2023] Open

Abstract

BACKGROUND

Artificial intelligence (AI) has advanced substantially in recent years, transforming many industries and improving the way people live and work. In scientific research, AI can enhance the quality and efficiency of data analysis and publication. However, AI has also opened up the possibility of generating high-quality fraudulent papers that are difficult to detect, raising important questions about the integrity of scientific research and the trustworthiness of published papers.

OBJECTIVE

The aim of this study was to investigate the capabilities of current AI language models in generating high-quality fraudulent medical articles. We hypothesized that modern AI models can create highly convincing fraudulent papers that can easily deceive readers and even experienced researchers.

METHODS

This proof-of-concept study used ChatGPT (Chat Generative Pre-trained Transformer) powered by the GPT-3 (Generative Pre-trained Transformer 3) language model to generate a fraudulent scientific article related to neurosurgery. GPT-3 is a large language model developed by OpenAI that uses deep learning algorithms to generate human-like text in response to prompts given by users. The model was trained on a massive corpus of text from the internet and is capable of generating high-quality text in a variety of languages and on various topics. The authors posed questions and prompts to the model and refined them iteratively as the model generated the responses. The goal was to create a completely fabricated article including the abstract, introduction, material and methods, discussion, references, charts, etc. Once the article was generated, it was reviewed for accuracy and coherence by experts in the fields of neurosurgery, psychiatry, and statistics and compared to existing similar articles.

RESULTS

The study found that the AI language model can create a highly convincing fraudulent article that resembled a genuine scientific paper in terms of word usage, sentence structure, and overall composition. The AI-generated article included standard sections such as introduction, material and methods, results, and discussion, as well a data sheet. It consisted of 1992 words and 17 citations, and the whole process of article creation took approximately 1 hour without any special training of the human user. However, there were some concerns and specific mistakes identified in the generated article, specifically in the references.

CONCLUSIONS

The study demonstrates the potential of current AI language models to generate completely fabricated scientific articles. Although the papers look sophisticated and seemingly flawless, expert readers may identify semantic inaccuracies and errors upon closer inspection. We highlight the need for increased vigilance and better detection methods to combat the potential misuse of AI in scientific research. At the same time, it is important to recognize the potential benefits of using AI language models in genuine scientific writing and research, such as manuscript preparation and language editing.

Collapse

Bommasani R, Liang P, Lee T. Holistic Evaluation of Language Models. Ann N Y Acad Sci 2023. [PMID: 37230490 DOI: 10.1111/nyas.15007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Richter-Pechanski P, Wiesenbach P, Schwab DM, Kiriakou C, He M, Geis NA, Frank A, Dieterich C. Few-Shot and Prompt Training for Text Classification in German Doctor's Letters. Stud Health Technol Inform 2023;302:819-820. [PMID: 37203504 DOI: 10.3233/shti230275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Dillion D, Tandon N, Gu Y, Gray K. Can AI language models replace human participants? Trends Cogn Sci 2023:S1364-6613(23)00098-0. [PMID: 37173156 DOI: 10.1016/j.tics.2023.04.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 04/15/2023] [Accepted: 04/17/2023] [Indexed: 05/15/2023]

Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network. bioRxiv 2023:2023.05.05.539646. [PMID: 37205405 PMCID: PMC10187317 DOI: 10.1101/2023.05.05.539646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Owen D, Antypas D, Hassoulas A, Pardiñas AF, Espinosa-Anke L, Collados JC. Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation. JMIR AI 2023;2:e41205. [PMID: 37525646 PMCID: PMC7614849 DOI: 10.2196/41205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]

Abstract

Background

Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums.

Objective

The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection.

Methods

A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs).

Results

Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F₁-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user's estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user's estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history.

Conclusions

Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.

Collapse

Binz M, Schulz E. Using cognitive psychology to understand GPT-3. Proc Natl Acad Sci U S A 2023;120:e2218523120. [PMID: 36730192 DOI: 10.1073/pnas.2218523120] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Salicchi L, Chersoni E, Lenci A. A study on surprisal and semantic relatedness for eye-tracking data prediction. Front Psychol 2023;14:1112365. [PMID: 36818086 PMCID: PMC9931754 DOI: 10.3389/fpsyg.2023.1112365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/13/2023] [Indexed: 02/05/2023] Open

Luo M, Li S, Pang Y, Yao L, Ma R, Huang HY, Huang HD, Lee TY. Extraction of microRNA-target interaction sentences from biomedical literature by deep learning approach. Brief Bioinform 2023;24:6847797. [PMID: 36440972 DOI: 10.1093/bib/bbac497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/16/2022] [Accepted: 10/19/2022] [Indexed: 11/29/2022] Open