1
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. MEDICAL EDUCATION ONLINE 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
2
|
Molenaar A, Jenkins EL, Brennan L, Lukose D, McCaffrey TA. The use of sentiment and emotion analysis and data science to assess the language of nutrition-, food- and cooking-related content on social media: a systematic scoping review. Nutr Res Rev 2024; 37:43-78. [PMID: 36991525 DOI: 10.1017/s0954422423000069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Social media data are rapidly evolving and accessible, which presents opportunities for research. Data science techniques, such as sentiment or emotion analysis which analyse textual emotion, provide an opportunity to gather insight from social media. This paper describes a systematic scoping review of interdisciplinary evidence to explore how sentiment or emotion analysis methods alongside other data science methods have been used to examine nutrition, food and cooking social media content. A PRISMA search strategy was used to search nine electronic databases in November 2020 and January 2022. Of 7325 studies identified, thirty-six studies were selected from seventeen countries, and content was analysed thematically and summarised in an evidence table. Studies were published between 2014 and 2022 and used data from seven different social media platforms (Twitter, YouTube, Instagram, Reddit, Pinterest, Sina Weibo and mixed platforms). Five themes of research were identified: dietary patterns, cooking and recipes, diet and health, public health and nutrition and food in general. Papers developed a sentiment or emotion analysis tool or used available open-source tools. Accuracy to predict sentiment ranged from 33·33% (open-source engine) to 98·53% (engine developed for the study). The average proportion of sentiment was 38·8% positive, 46·6% neutral and 28·0% negative. Additional data science techniques used included topic modelling and network analysis. Future research requires optimising data extraction processes from social media platforms, the use of interdisciplinary teams to develop suitable and accurate methods for the subject and the use of complementary methods to gather deeper insights into these complex data.
Collapse
Affiliation(s)
- Annika Molenaar
- Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC3168, Australia
| | - Eva L Jenkins
- Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC3168, Australia
| | - Linda Brennan
- School of Media and Communication, RMIT University, 124 La Trobe St, MelbourneVIC3004, Australia
| | - Dickson Lukose
- Monash Data Futures Institute, Monash University, Level 2, 13 Rainforest Walk, Monash University, ClaytonVIC3800, Australia
| | - Tracy A McCaffrey
- Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC3168, Australia
| |
Collapse
|
3
|
Johnson AL. Psychotic white men and bipolar black women? Racialized and gendered implications of mental health terminology. Soc Sci Med 2024; 352:117015. [PMID: 38788530 DOI: 10.1016/j.socscimed.2024.117015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/10/2024] [Accepted: 05/20/2024] [Indexed: 05/26/2024]
Abstract
This study investigates the intersection of race, gender, and criminality in the language surrounding mental health and illness. Applying computational methods of word embeddings to full text data from major American newspapers between 2000 and 2023, I show that the landscape of mental health is broadly racialized as black, challenging the notion of mental illness as a predominantly white phenomenon. Cultural ideas about mental illness are gendered such that women are medicalized and men are criminalized, yet certain terms blur the boundary between illness and criminality. I highlight how stereotypes embedded in mental health language perpetuate stigma around men's mental health and justify social control with notable implications for black men. I conclude with recommendations for the mental health movement by advocating for more inclusive discussions around men's mental health and revised person-centric language.
Collapse
Affiliation(s)
- Amy L Johnson
- Lehigh University. Sociology and Anthropology Department. 31 Williams Dr, Bethlehem, PA, USA 18015.
| |
Collapse
|
4
|
Javed K, Li J. Artificial intelligence in judicial adjudication: Semantic biasness classification and identification in legal judgement (SBCILJ). Heliyon 2024; 10:e30184. [PMID: 38737247 PMCID: PMC11088250 DOI: 10.1016/j.heliyon.2024.e30184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/20/2024] [Accepted: 04/22/2024] [Indexed: 05/14/2024] Open
Abstract
History reveals that human societies have suffered in terms of social justice due to cognitive bias. Semantic bias tends to amplify cognitive bias. Therefore, the presence of cognitive biases in extensive historical data can potentially result in unethical and allegedly inhumane predictions since AI systems are trained on this data. The innovation of artificial intelligence and its rapid integration across disciplines has prompted questions regarding the subjectivity of the technology. Current research focuses the semantic bias in legal judgment to increase the legitimacy of training data. By the application of general-purpose Artificial Intelligence (AI) algorithms, we classify and detect the semantics bias that is present in the Chinese Artificial Intelligence and Law (CAIL) dataset. Our findings demonstrate that AI models acquire superior prediction power in the CAIL dataset, which is comprised of hundreds of cases, compared to a structured professional risk assessment tool. To assist legal practitioners during this process, innovative approaches that are based on AI may be implemented inside the legal arena. To accomplish this objective, we suggested a classification model for semantic bias that is related to the classification and identification of semantic biases in legal judgment. Our proposed model legal field uses the example of categorization along with the identification of the CAIL dataset. This will be accomplished by identifying the semantics biases in judicial decisions. We used different types of classifiers such as the Support Vector Machine (SVM), Naïve-Bayes (NB), Multi-Layer Perceptron (MLP), and the K-Nearest Neighbour (KNN) to come across the preferred results. SVM got 96.90 %, NB has 88.80 %, MLP has 86.75 % and KNN achieved 85.66 % accuracy whereas SVM achieved greater accuracy as compared to other models. Additionally, we demonstrate that we were able to get a relatively high classification performance when predicting outcomes based just on the semantic bias categorization in judicial judgments that determine the outcome of the case.
Collapse
Affiliation(s)
- Kashif Javed
- School of Law, Zhengzhou University, Zhengzhou, 450001, Henan, China
| | - Jianxin Li
- School of Law, Zhengzhou University, Zhengzhou, 450001, Henan, China
| |
Collapse
|
5
|
Berry P, Kotha S. The fundamental importance of exploring the risks alongside the benefits of artificial intelligence. J Hepatol 2024; 80:e223-e225. [PMID: 37454874 DOI: 10.1016/j.jhep.2023.06.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/26/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023]
Affiliation(s)
- Philip Berry
- Department of Gastroenterology, Guy's and St Thomas' Foundation Trust, London, United Kingdom
| | - Sreelakshmi Kotha
- Department of Gastroenterology, Guy's and St Thomas' Foundation Trust, London, United Kingdom.
| |
Collapse
|
6
|
Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024; 153:104642. [PMID: 38621641 PMCID: PMC11141428 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]
Abstract
OBJECTIVE To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.
Collapse
Affiliation(s)
- Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Chong Dang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
| | - Braja Gopal Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Debbie L Wilson
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Ching-Yuan Chang
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Wei-Hsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Thomas J George
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
7
|
Tustison NJ, Yassa MA, Rizvi B, Cook PA, Holbrook AJ, Sathishkumar MT, Tustison MG, Gee JC, Stone JR, Avants BB. ANTsX neuroimaging-derived structural phenotypes of UK Biobank. Sci Rep 2024; 14:8848. [PMID: 38632390 PMCID: PMC11024129 DOI: 10.1038/s41598-024-59440-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
UK Biobank is a large-scale epidemiological resource for investigating prospective correlations between various lifestyle, environmental, and genetic factors with health and disease progression. In addition to individual subject information obtained through surveys and physical examinations, a comprehensive neuroimaging battery consisting of multiple modalities provides imaging-derived phenotypes (IDPs) that can serve as biomarkers in neuroscience research. In this study, we augment the existing set of UK Biobank neuroimaging structural IDPs, obtained from well-established software libraries such as FSL and FreeSurfer, with related measurements acquired through the Advanced Normalization Tools Ecosystem. This includes previously established cortical and subcortical measurements defined, in part, based on the Desikan-Killiany-Tourville atlas. Also included are morphological measurements from two recent developments: medial temporal lobe parcellation of hippocampal and extra-hippocampal regions in addition to cerebellum parcellation and thickness based on the Schmahmann anatomical labeling. Through predictive modeling, we assess the clinical utility of these IDP measurements, individually and in combination, using commonly studied phenotypic correlates including age, fluid intelligence, numeric memory, and several other sociodemographic variables. The predictive accuracy of these IDP-based models, in terms of root-mean-squared-error or area-under-the-curve for continuous and categorical variables, respectively, provides comparative insights between software libraries as well as potential clinical interpretability. Results demonstrate varied performance between package-based IDP sets and their combination, emphasizing the need for careful consideration in their selection and utilization.
Collapse
Affiliation(s)
- Nicholas J Tustison
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, USA.
- Department of Neurobiology and Behavior, University of California, Irvine, CA, USA.
| | - Michael A Yassa
- Department of Neurobiology and Behavior, University of California, Irvine, CA, USA
| | - Batool Rizvi
- Department of Neurobiology and Behavior, University of California, Irvine, CA, USA
| | - Philip A Cook
- Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrew J Holbrook
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | | | | | - James C Gee
- Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - James R Stone
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, USA
| | - Brian B Avants
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
8
|
Bailey AH, Williams A, Poddar A, Cimpian A. Intersectional Male-Centric and White-Centric Biases in Collective Concepts. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN 2024:1461672241232114. [PMID: 38613360 DOI: 10.1177/01461672241232114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
In principle, the fundamental concepts person, woman, and man should apply equally to people of different genders and races/ethnicities. In reality, these concepts might prioritize certain groups over others. Based on interdisciplinary theories of androcentrism, we hypothesized that (a) person is more associated with men than women (person = man) and (b) woman is more associated with women than man is with men (i.e., women are more gendered: gender = woman). We applied natural language processing tools (specifically, word embeddings) to the linguistic output of millions of individuals (specifically, the Common Crawl corpus). We found the hypothesized person = man / gender = woman bias. This bias was stronger about Hispanic and White (vs. Asian) women and men. We also uncovered parallel biases favoring White individuals in the concepts person, woman, and man. Western society prioritizes men and White individuals as people and "others" women as people with gender, with implications for equity across policy- and decision-making contexts.
Collapse
|
9
|
Gray M, Samala R, Liu Q, Skiles D, Xu J, Tong W, Wu L. Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science. Clin Pharmacol Ther 2024; 115:687-697. [PMID: 38018360 DOI: 10.1002/cpt.3117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 11/21/2023] [Indexed: 11/30/2023]
Abstract
Artificial intelligence (AI) is increasingly being used in decision making across various industries, including the public health arena. Bias in any decision-making process can significantly skew outcomes, and AI systems have been shown to exhibit biases at times. The potential for AI systems to perpetuate and even amplify biases is a growing concern. Bias, as used in this paper, refers to the tendency toward a particular characteristic or behavior, and thus, a biased AI system is one that shows biased associations entities. In this literature review, we examine the current state of research on AI bias, including its sources, as well as the methods for measuring, benchmarking, and mitigating it. We also examine the biases and methods of mitigation specifically relevant to the healthcare field and offer a perspective on bias measurement and mitigation in regulatory science decision making.
Collapse
Affiliation(s)
- Magnus Gray
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA
| | - Ravi Samala
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, US Food and Drug Administration Center for Devices and Radiological Health, Silver Spring, Maryland, USA
| | - Qi Liu
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Denny Skiles
- Office of Management, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA
| | - Joshua Xu
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA
| | - Weida Tong
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA
| | - Leihong Wu
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA
| |
Collapse
|
10
|
Kaplan DM, Tidwell CA, Chung JM, Alisic E, Demiray B, Bruni M, Evora S, Gajewski-Nemes JA, Macbeth A, Mangelsdorf SN, Mascaro JS, Minor KS, Noga RN, Nugent NR, Polsinelli AJ, Rentscher KE, Resnikoff AW, Robbins ML, Slatcher RB, Tejeda-Padron AB, Mehl MR. Diversity, equity, and inclusivity in observational ambulatory assessment: Recommendations from two decades of Electronically Activated Recorder (EAR) research. Behav Res Methods 2024; 56:3207-3225. [PMID: 38066394 DOI: 10.3758/s13428-023-02293-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2023] [Indexed: 05/30/2024]
Abstract
Ambient audio sampling methods such as the Electronically Activated Recorder (EAR) have become increasingly prominent in clinical and social sciences research. These methods record snippets of naturalistically assessed audio from participants' daily lives, enabling novel observational research about the daily social interactions, identities, environments, behaviors, and speech of populations of interest. In practice, these scientific opportunities are equaled by methodological challenges: researchers' own cultural backgrounds and identities can easily and unknowingly permeate the collection, coding, analysis, and interpretation of social data from daily life. Ambient audio sampling poses unique and significant challenges to cultural humility, diversity, equity, and inclusivity (DEI) in scientific research that require systematized attention. Motivated by this observation, an international consortium of 21 researchers who have used ambient audio sampling methodologies created a workgroup with the aim of improving upon existing published guidelines. We pooled formally and informally documented challenges pertaining to DEI in ambient audio sampling from our collective experience on 40+ studies (most of which used the EAR app) in clinical and healthy populations ranging from children to older adults. This article presents our resultant recommendations and argues for the incorporation of community-engaged research methods in observational ambulatory assessment designs looking forward. We provide concrete recommendations across each stage typical of an ambient audio sampling study (recruiting and enrolling participants, developing coding systems, training coders, handling multi-linguistic participants, data analysis and interpretation, and dissemination of results) as well as guiding questions that can be used to adapt these recommendations to project-specific constraints and needs.
Collapse
Affiliation(s)
- Deanna M Kaplan
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, USA.
| | - Colin A Tidwell
- Department of Psychology, University of Arizona, Tucson, USA
| | - Joanne M Chung
- Department of Psychology, University of Toronto Mississauga, Mississauga, Canada
| | - Eva Alisic
- Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - Burcu Demiray
- Department of Psychology, University of Zurich, Zürich, Switzerland
| | - Michelle Bruni
- Department of Psychology, University of California-Riverside, Riverside, USA
| | - Selena Evora
- Center for Health Promotion and Health Equity, School of Public Health, Brown University, Providence, USA
| | | | | | | | - Jennifer S Mascaro
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Kyle S Minor
- Department of Psychology, Indiana University - Purdue University Indianapolis, Indianapolis, USA
| | - Rebecca N Noga
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina Chapel Hill, Chapel Hill, USA
| | - Nicole R Nugent
- Department of Psychiatry and Human Behavior, Alpert Medical School of Brown University, Providence, USA
| | | | - Kelly E Rentscher
- Department of Psychiatry and Behavioral Medicine, Medical College of Wisconsin, Milwaukee, USA
| | | | - Megan L Robbins
- Department of Psychology, University of California-Riverside, Riverside, USA
| | | | | | - Matthias R Mehl
- Department of Psychology, University of Arizona, Tucson, USA
| |
Collapse
|
11
|
Shlobin NA, Rosseau G. Opportunities and Considerations for the Incorporation of Artificial Intelligence into Global Neurosurgery: A Generative Pretrained Transformer Chatbot-Based Approach. World Neurosurg 2024:S1878-8750(24)00535-7. [PMID: 38561032 DOI: 10.1016/j.wneu.2024.03.149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVE Global neurosurgery is a public health focus in neurosurgery that seeks to ensure safe, timely, and affordable neurosurgical care to all individuals worldwide. Although investigators have begun to explore the promise of artificial intelligence (AI) for neurosurgery, its applicability to global neurosurgery has been largely hypothetical. We characterize opportunities and considerations for the incorporation of AI into global neurosurgery by synthesizing key themes yielded from a series of generative pretrained transformers (GPTs), discuss important limitations of GPTs and cautions when using AI in neurosurgery, and develop a framework for the equitable incorporation of AI into global neurosurgery. METHODS ChatGPT, Bing Chat/Copilot, You, Perplexity.ai, and Google Bard were queried with the prompt "How can AI be incorporated into global neurosurgery?" A layered ChatGPT-based thematic analysis was performed. The authors synthesized the results into opportunities and considerations for the incorporation of AI in global neurosurgery. A Pareto analysis was conducted to determine common themes. RESULTS Eight opportunities and 14 important considerations were synthesized. Six opportunities related to patient care, 1 to education, and another to public health planning. Four of the important considerations were deemed specific to global neurosurgery. The Pareto analysis included all 8 opportunities and 5 considerations. CONCLUSIONS AI may be incorporated into global neurosurgery in a variety of capacities requiring numerous considerations. The framework presented in this manuscript may facilitate the incorporation of AI into global neurosurgery initiatives while balancing contextual factors and the reality of limited resources.
Collapse
Affiliation(s)
- Nathan A Shlobin
- Department of Neurological Surgery, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
| | - Gail Rosseau
- Department of Neurosurgery, George Washington University School of Medicine and Health Sciences, Washington, District of Columbia, USA; Barrow Global, Barrow Neurological Institute, Phoenix, Arizona, USA
| |
Collapse
|
12
|
Hampton J, Mugambi P, Caggiano E, Eugene R, Valente A, Taylor M, Carreiro S. Closing the Digital Divide in Interventions for Substance Use Disorder. JOURNAL OF PSYCHIATRY AND BRAIN SCIENCE 2024; 9:e240002. [PMID: 38726224 PMCID: PMC11081399 DOI: 10.20900/jpbs.20240002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Digital health interventions are exploding in today's medical practice and have tremendous potential to support the treatment of substance use disorders (SUD). Developers and healthcare providers alike must be cognizant of the potential for digital interventions to exacerbate existing inequities in SUD treatment, particularly as they relate to Social Determinants of Health (SDoH). To explore this evolving area of study, this manuscript will review the existing concepts of the digital divide and digital inequities, and the role SDoH play as drivers of digital inequities. We will then explore how the data used and modeling strategies can create bias in digital health tools for SUD. Finally, we will discuss potential solutions and future directions to bridge these gaps including smartphone ownership, Wi-Fi access, digital literacy, and mitigation of historical, algorithmic, and measurement bias. Thoughtful design of digital interventions is quintessential to reduce the risk of bias, decrease the digital divide, and create equitable health outcomes for individuals with SUD.
Collapse
Affiliation(s)
- Jazmin Hampton
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
- Washington University of Health and Science, San Pedro, Belize, Central America
- Division of Public Health, Walden University, Minneapolis, MN 55401, USA
| | - Purity Mugambi
- Manning College of Information and Computer Sciences, University of Massachusetts-Amherst, Amherst, MA 01003, USA
| | - Emily Caggiano
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Reynalde Eugene
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Alycia Valente
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Melissa Taylor
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Stephanie Carreiro
- Division of Toxicology, Department of Emergency Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| |
Collapse
|
13
|
Cao X, Kosinski M. Large language models know how the personality of public figures is perceived by the general public. Sci Rep 2024; 14:6735. [PMID: 38509191 PMCID: PMC10954708 DOI: 10.1038/s41598-024-57271-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/15/2024] [Indexed: 03/22/2024] Open
Abstract
We show that people's perceptions of public figures' personalities can be accurately predicted from their names' location in GPT-3's semantic space. We collected Big Five personality perceptions of 226 public figures from 600 human raters. Cross-validated linear regression was used to predict human perceptions from public figures' name embeddings extracted from GPT-3. The models' accuracy ranged from r = .78 to .88 without controls and from r = .53 to .70 when controlling for public figures' likability and demographics, after correcting for attenuation. Prediction models showed high face validity as revealed by the personality-descriptive adjectives occupying their extremes. Our findings reveal that GPT-3 word embeddings capture signals pertaining to individual differences and intimate traits.
Collapse
Affiliation(s)
- Xubo Cao
- Stanford University, Stanford, USA.
| | | |
Collapse
|
14
|
Kaplan DM, Palitsky R, Arconada Alvarez SJ, Pozzo NS, Greenleaf MN, Atkinson CA, Lam WA. What's in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT. J Med Internet Res 2024; 26:e51837. [PMID: 38441945 PMCID: PMC10951834 DOI: 10.2196/51837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/12/2023] [Accepted: 11/27/2023] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT's underlying language model a serious concern. OBJECTIVE Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human-use sessions (N=1400 total letters). METHODS We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular "male" and "female" names in the United States. Study 1 used 3 different letter-writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between-prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender-stereotyped professional accomplishments, ChatGPT would evidence gender-based language differences replicating those found in systematic reviews of human-written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill-based language for male names). RESULTS Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt-raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. CONCLUSIONS ChatGPT reproduces many gender-based language biases that have been reliably identified in investigations of human-written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real-world scenarios. TRIAL REGISTRATION OSF Registries osf.io/ztv96; https://osf.io/ztv96.
Collapse
Affiliation(s)
- Deanna M Kaplan
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, United States
| | - Roman Palitsky
- Emory Spiritual Health, Woodruff Health Science Center, Emory University, Atlanta, GA, United States
| | | | - Nicole S Pozzo
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, United States
| | | | - Ciara A Atkinson
- Department of Campus Recreation, University of Arizona, Tucson, AZ, United States
| | - Wilbur A Lam
- Wallace H Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| |
Collapse
|
15
|
Wheatley T, Thornton MA, Stolk A, Chang LJ. The Emerging Science of Interacting Minds. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:355-373. [PMID: 38096443 PMCID: PMC10932833 DOI: 10.1177/17456916231200177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
For over a century, psychology has focused on uncovering mental processes of a single individual. However, humans rarely navigate the world in isolation. The most important determinants of successful development, mental health, and our individual traits and preferences arise from interacting with other individuals. Social interaction underpins who we are, how we think, and how we behave. Here we discuss the key methodological challenges that have limited progress in establishing a robust science of how minds interact and the new tools that are beginning to overcome these challenges. A deep understanding of the human mind requires studying the context within which it originates and exists: social interaction.
Collapse
Affiliation(s)
- Thalia Wheatley
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
- Santa Fe Institute
| | - Mark A. Thornton
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Arjen Stolk
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Luke J. Chang
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| |
Collapse
|
16
|
Charlesworth TES, Ghate K, Caliskan A, Banaji MR. Extracting intersectional stereotypes from embeddings: Developing and validating the Flexible Intersectional Stereotype Extraction procedure. PNAS NEXUS 2024; 3:pgae089. [PMID: 38505691 PMCID: PMC10949907 DOI: 10.1093/pnasnexus/pgae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/13/2024] [Indexed: 03/21/2024]
Abstract
Social group-based identities intersect. The meaning of "woman" is modulated by adding social class as in "rich woman" or "poor woman." How does such intersectionality operate at-scale in everyday language? Which intersections dominate (are most frequent)? What qualities (positivity, competence, warmth) are ascribed to each intersection? In this study, we make it possible to address such questions by developing a stepwise procedure, Flexible Intersectional Stereotype Extraction (FISE), applied to word embeddings (GloVe; BERT) trained on billions of words of English Internet text, revealing insights into intersectional stereotypes. First, applying FISE to occupation stereotypes across intersections of gender, race, and class showed alignment with ground-truth data on occupation demographics, providing initial validation. Second, applying FISE to trait adjectives showed strong androcentrism (Men) and ethnocentrism (White) in dominating everyday English language (e.g. White + Men are associated with 59% of traits; Black + Women with 5%). Associated traits also revealed intersectional differences: advantaged intersectional groups, especially intersections involving Rich, had more common, positive, warm, competent, and dominant trait associates. Together, the empirical insights from FISE illustrate its utility for transparently and efficiently quantifying intersectional stereotypes in existing large text corpora, with potential to expand intersectionality research across unprecedented time and place. This project further sets up the infrastructure necessary to pursue new research on the emergent properties of intersectional identities.
Collapse
Affiliation(s)
| | - Kshitish Ghate
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Aylin Caliskan
- Information School, University of Washington, Seattle, WA 98105, USA
| | - Mahzarin R Banaji
- Department of Psychology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
17
|
Sheth S, Baker HP, Prescher H, Strelzow JA. Ethical Considerations of Artificial Intelligence in Health Care: Examining the Role of Generative Pretrained Transformer-4. J Am Acad Orthop Surg 2024; 32:205-210. [PMID: 38175996 DOI: 10.5435/jaaos-d-23-00787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/26/2023] [Indexed: 01/06/2024] Open
Abstract
The integration of artificial intelligence technologies, such as large language models (LLMs), in health care holds potential for improved efficiency and decision support. However, ethical concerns must be addressed before widespread adoption. This article focuses on the ethical principles surrounding the use of Generative Pretrained Transformer-4 and its conversational model, ChatGPT, in healthcare settings. One concern is potential inaccuracies in generated content. LLMs can produce believable yet incorrect information, risking errors in medical records. Opacity of training data exacerbates this, hindering accuracy assessment. To mitigate, LLMs should train on precise, validated medical data sets. Model bias is another critical concern because LLMs may perpetuate biases from their training, leading to medically inaccurate and discriminatory responses. Sampling, programming, and compliance biases contribute necessitating careful consideration to avoid perpetuating harmful stereotypes. Privacy is paramount in health care, using public LLMs raises risks. Strict data-sharing agreements and Health Insurance Portability and Accountability Act (HIPAA)-compliant training protocols are necessary to protect patient privacy. Although artificial intelligence technologies offer promising opportunities in health care, careful consideration of ethical principles is crucial. Addressing concerns of inaccuracy, bias, and privacy will ensure responsible and patient-centered implementation, benefiting both healthcare professionals and patients.
Collapse
Affiliation(s)
- Suraj Sheth
- From the Department of Orthopaedic Surgery, The University of Chicago, Chicago, IL
| | | | | | | |
Collapse
|
18
|
Kvam PD, Irving LH, Sokratous K, Smith CT. Improving the reliability and validity of the IAT with a dynamic model driven by similarity. Behav Res Methods 2024; 56:2158-2193. [PMID: 37450219 DOI: 10.3758/s13428-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2023] [Indexed: 07/18/2023]
Abstract
The Implicit Association Test (IAT), like many behavioral measures, seeks to quantify meaningful individual differences in cognitive processes that are difficult to assess with approaches like self-reports. However, much like other behavioral measures, many IATs appear to show low test-retest reliability and typical scoring methods fail to quantify all of the decision-making processes that generate the overt task performance. Here, we develop a new modeling approach for IATs based on the geometric similarity representation (GSR) model. This model leverages both response times and accuracy on IATs to make inferences about representational similarity between the stimuli and categories. The model disentangles processes related to response caution, stimulus encoding, similarities between concepts and categories, and response processes unrelated to the choice itself. This approach to analyzing IAT data illustrates that the unreliability in IATs is almost entirely attributable to the methods used to analyze data from the task: GSR model parameters show test-retest reliability around .80-.90, on par with reliable self-report measures. Furthermore, we demonstrate how model parameters result in greater validity compared to the IAT D-score, Quad model, and simple diffusion model contrasts, predicting outcomes related to intergroup contact and motivation. Finally, we present a simple point-and-click software tool for fitting the model, which uses a pre-trained neural network to estimate best-fit parameters of the GSR model. This approach allows easy and instantaneous fitting of IAT data with minimal demands on coding or technical expertise on the part of the user, making the new model accessible and effective.
Collapse
Affiliation(s)
- Peter D Kvam
- Department of Psychology, University of Florida, Florida, USA.
| | - Louis H Irving
- Department of Psychology, University of Florida, Florida, USA
| | | | | |
Collapse
|
19
|
Aceves P, Evans JA. Human languages with greater information density have higher communication speed but lower conversation breadth. Nat Hum Behav 2024:10.1038/s41562-024-01815-w. [PMID: 38366103 DOI: 10.1038/s41562-024-01815-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/03/2024] [Indexed: 02/18/2024]
Abstract
Human languages vary widely in how they encode information within circumscribed semantic domains (for example, time, space, colour, human body parts and activities), but little is known about the global structure of semantic information and nothing about its relation to human communication. We first show that across a sample of ~1,000 languages, there is broad variation in how densely languages encode information into words. Second, we show that this language information density is associated with a denser configuration of semantic information. Finally, we trace the relationship between language information density and patterns of communication, showing that informationally denser languages tend towards faster communication but conceptually narrower conversations or expositions within which topics are discussed at greater depth. These results highlight an important source of variation across the human communicative channel, revealing that the structure of language shapes the nature and texture of human engagement, with consequences for human behaviour across levels of society.
Collapse
Affiliation(s)
- Pedro Aceves
- Department of Management and Organization, Carey Business School, Johns Hopkins University, Baltimore, MD, USA.
| | - James A Evans
- Department of Sociology & Knowledge Lab, University of Chicago, Chicago, IL, USA
- Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
20
|
Lin H, Ni L, Phuong C, Hong JC. Natural Language Processing for Radiation Oncology: Personalizing Treatment Pathways. Pharmgenomics Pers Med 2024; 17:65-76. [PMID: 38370334 PMCID: PMC10874185 DOI: 10.2147/pgpm.s396971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/29/2024] [Indexed: 02/20/2024] Open
Abstract
Natural language processing (NLP), a technology that translates human language into machine-readable data, is revolutionizing numerous sectors, including cancer care. This review outlines the evolution of NLP and its potential for crafting personalized treatment pathways for cancer patients. Leveraging NLP's ability to transform unstructured medical data into structured learnable formats, researchers can tap into the potential of big data for clinical and research applications. Significant advancements in NLP have spurred interest in developing tools that automate information extraction from clinical text, potentially transforming medical research and clinical practices in radiation oncology. Applications discussed include symptom and toxicity monitoring, identification of social determinants of health, improving patient-physician communication, patient education, and predictive modeling. However, several challenges impede the full realization of NLP's benefits, such as privacy and security concerns, biases in NLP models, and the interpretability and generalizability of these models. Overcoming these challenges necessitates a collaborative effort between computer scientists and the radiation oncology community. This paper serves as a comprehensive guide to understanding the intricacies of NLP algorithms, their performance assessment, past research contributions, and the future of NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Hui Lin
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- UC Berkeley-UCSF Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, San Francisco, CA, USA
| | - Lisa Ni
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Christina Phuong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Julian C Hong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Joint Program in Computational Precision Health, University of California, Berkeley and San Francisco, Berkeley, CA, USA
| |
Collapse
|
21
|
Lin S, Pandit S, Tritsch T, Levy A, Shoja MM. What Goes In, Must Come Out: Generative Artificial Intelligence Does Not Present Algorithmic Bias Across Race and Gender in Medical Residency Specialties. Cureus 2024; 16:e54448. [PMID: 38510858 PMCID: PMC10951939 DOI: 10.7759/cureus.54448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/18/2024] [Indexed: 03/22/2024] Open
Abstract
Objective Artificial Intelligence (AI) has made significant inroads into various domains, including medicine, raising concerns about algorithmic bias. This study investigates the presence of biases in generative AI programs, with a specific focus on gender and racial representations across 19 medical residency specialties. Methodology This comparative study utilized DALL-E2 to generate faces representing 19 distinct residency training specialties, as identified by the Association of American Medical Colleges (AAMC), which were then compared to the AAMC's residency specialty breakdown with respect to race and gender. Results Our findings reveal an alignment between OpenAI's DALL-E2's predictions and the current demographic landscape of medical residents, suggesting an absence of algorithmic bias in this AI model. Conclusion This revelation gives rise to important ethical considerations. While AI excels at pattern recognition, it inherits and mirrors the biases present in its training data. To combat AI bias, addressing real-world disparities is imperative. Initiatives to promote inclusivity and diversity within medicine are commendable and contribute to reshaping medical education. This study underscores the need for ongoing efforts to dismantle barriers and foster inclusivity in historically male-dominated medical fields, particularly for underrepresented populations. Ultimately, our findings underscore the crucial role of real-world data quality in mitigating AI bias. As AI continues to shape healthcare and education, the pursuit of equitable, unbiased AI applications should remain at the forefront of these transformative endeavors.
Collapse
Affiliation(s)
- Shu Lin
- Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA
| | - Saket Pandit
- Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA
| | - Tara Tritsch
- Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA
| | - Arkene Levy
- Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA
| | - Mohammadali M Shoja
- Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA
| |
Collapse
|
22
|
Brandsen S, Chandrasekhar T, Franz L, Grapel J, Dawson G, Carlson D. Prevalence of bias against neurodivergence-related terms in artificial intelligence language models. Autism Res 2024; 17:234-248. [PMID: 38284311 DOI: 10.1002/aur.3094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 12/27/2023] [Indexed: 01/30/2024]
Abstract
Given the increasing role of artificial intelligence (AI) in many decision-making processes, we investigate the presence of AI bias towards terms related to a range of neurodivergent conditions, including autism, ADHD, schizophrenia, and obsessive-compulsive disorder (OCD). We use 11 different language model encoders to test the degree to which words related to neurodiversity are associated with groups of words related to danger, disease, badness, and other negative concepts. For each group of words tested, we report the mean strength of association (Word Embedding Association Test [WEAT] score) averaged over all encoders and find generally high levels of bias. Additionally, we show that bias occurs even when testing words associated with autistic or neurodivergent strengths. For example, embedders had a negative average association between words related to autism and words related to honesty, despite honesty being considered a common strength of autistic individuals. Finally, we introduce a sentence similarity ratio test and demonstrate that many sentences describing types of disabilities, for example, "I have autism" or "I have epilepsy," have even stronger negative associations than control sentences such as "I am a bank robber."
Collapse
Affiliation(s)
- Sam Brandsen
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
| | - Tara Chandrasekhar
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
| | - Lauren Franz
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
| | - Jordan Grapel
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
| | - Geraldine Dawson
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
| | - David Carlson
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA
| |
Collapse
|
23
|
Guilbeault D, Delecourt S, Hull T, Desikan BS, Chu M, Nadler E. Online images amplify gender bias. Nature 2024; 626:1049-1055. [PMID: 38355800 PMCID: PMC10901730 DOI: 10.1038/s41586-024-07068-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/14/2024] [Indexed: 02/16/2024]
Abstract
Each year, people spend less time reading and more time viewing images1, which are proliferating online2-4. Images from platforms such as Google and Wikipedia are downloaded by millions every day2,5,6, and millions more are interacting through social media, such as Instagram and TikTok, that primarily consist of exchanging visual content. In parallel, news agencies and digital advertisers are increasingly capturing attention online through the use of images7,8, which people process more quickly, implicitly and memorably than text9-12. Here we show that the rise of images online significantly exacerbates gender bias, both in its statistical prevalence and its psychological impact. We examine the gender associations of 3,495 social categories (such as 'nurse' or 'banker') in more than one million images from Google, Wikipedia and Internet Movie Database (IMDb), and in billions of words from these platforms. We find that gender bias is consistently more prevalent in images than text for both female- and male-typed categories. We also show that the documented underrepresentation of women online13-18 is substantially worse in images than in text, public opinion and US census data. Finally, we conducted a nationally representative, preregistered experiment that shows that googling for images rather than textual descriptions of occupations amplifies gender bias in participants' beliefs. Addressing the societal effect of this large-scale shift towards visual communication will be essential for developing a fair and inclusive future for the internet.
Collapse
Affiliation(s)
- Douglas Guilbeault
- Haas School of Business, University of California, Berkeley, Berkeley, CA, USA.
| | - Solène Delecourt
- Haas School of Business, University of California, Berkeley, Berkeley, CA, USA
| | | | | | - Mark Chu
- School of the Arts, Columbia University, New York, NY, USA
| | - Ethan Nadler
- Department of Physics, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
24
|
Giannini F, Marelli M, Stella F, Monzani D, Pancani L. Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts. J Pers 2024. [PMID: 38217359 DOI: 10.1111/jopy.12915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/11/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVE We aimed to develop a machine learning model to infer OCEAN traits from text. BACKGROUND The psycholexical approach allows retrieving information about personality traits from human language. However, it has rarely been applied because of methodological and practical issues that current computational advancements could overcome. METHOD Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity. RESULTS Intrinsic validation of the model yielded excellent results, with R2 values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness. CONCLUSIONS This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.
Collapse
Affiliation(s)
- Federico Giannini
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Marco Marelli
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| | - Fabio Stella
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Dario Monzani
- Department of Psychology, Educational Science and Human Movement, University of Palermo, Palermo, Italy
| | - Luca Pancani
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| |
Collapse
|
25
|
Guevara M, Chen S, Thomas S, Chaunzwa TL, Franco I, Kann BH, Moningi S, Qian JM, Goldstein M, Harper S, Aerts HJWL, Catalano PJ, Savova GK, Mak RH, Bitterman DS. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med 2024; 7:6. [PMID: 38200151 PMCID: PMC10781957 DOI: 10.1038/s41746-023-00970-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/15/2023] [Indexed: 01/12/2024] Open
Abstract
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
Collapse
Affiliation(s)
- Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Spencer Thomas
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tafadzwa L Chaunzwa
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Idalid Franco
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Benjamin H Kann
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jack M Qian
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Susan Harper
- Adult Resource Office, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands
| | - Paul J Catalano
- Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
26
|
Cobert J, Mills H, Lee A, Gologorskaya O, Espejo E, Jeon SY, Boscardin WJ, Heintz TA, Kennedy CJ, Ashana DC, Chapman AC, Raghunathan K, Smith AK, Lee SJ. Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models. Chest 2024:S0012-3692(24)00007-2. [PMID: 38199323 DOI: 10.1016/j.chest.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 12/12/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Language in nonmedical data sets is known to transmit human-like biases when used in natural language processing (NLP) algorithms that can reinforce disparities. It is unclear if NLP algorithms of medical notes could lead to similar transmissions of biases. RESEARCH QUESTION Can we identify implicit bias in clinical notes, and are biases stable across time and geography? STUDY DESIGN AND METHODS To determine whether different racial and ethnic descriptors are similar contextually to stigmatizing language in ICU notes and whether these relationships are stable across time and geography, we identified notes on critically ill adults admitted to the University of California, San Francisco (UCSF), from 2012 through 2022 and to Beth Israel Deaconess Hospital (BIDMC) from 2001 through 2012. Because word meaning is derived largely from context, we trained unsupervised word-embedding algorithms to measure the similarity (cosine similarity) quantitatively of the context between a racial or ethnic descriptor (eg, African-American) and a stigmatizing target word (eg, nonco-operative) or group of words (violence, passivity, noncompliance, nonadherence). RESULTS In UCSF notes, Black descriptors were less likely to be similar contextually to violent words compared with White descriptors. Contrastingly, in BIDMC notes, Black descriptors were more likely to be similar contextually to violent words compared with White descriptors. The UCSF data set also showed that Black descriptors were more similar contextually to passivity and noncompliance words compared with Latinx descriptors. INTERPRETATION Implicit bias is identifiable in ICU notes. Racial and ethnic group descriptors carry different contextual relationships to stigmatizing words, depending on when and where notes were written. Because NLP models seem able to transmit implicit bias from training data, use of NLP algorithms in clinical prediction could reinforce disparities. Active debiasing strategies may be necessary to achieve algorithmic fairness when using language models in clinical research.
Collapse
Affiliation(s)
- Julien Cobert
- Anesthesia Service, San Francisco VA Health Care System, University of California, San Francisco, San Francisco, CA; Department of Anesthesia and Perioperative Care, University of California, San Francisco, San Francisco, CA.
| | - Hunter Mills
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Albert Lee
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Oksana Gologorskaya
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Edie Espejo
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Sun Young Jeon
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - W John Boscardin
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Timothy A Heintz
- School of Medicine, University of California, San Diego, San Diego, CA
| | - Christopher J Kennedy
- Department of Psychiatry, Harvard Medical School, Boston, MA; Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Deepshikha C Ashana
- Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, NC
| | - Allyson Cook Chapman
- Department of Medicine, the Division of Critical Care and Palliative Medicine, University of California, San Francisco, San Francisco, CA; Department of Surgery, University of California, San Francisco, San Francisco, CA
| | - Karthik Raghunathan
- Department of Anesthesia and Perioperative Care, Duke University, Durham, NC
| | - Alex K Smith
- Department of Geriatrics, Palliative, and Extended Care, Veterans Affairs Medical Center, University of California, San Francisco, San Francisco, CA; Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Sei J Lee
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| |
Collapse
|
27
|
Pellert M, Lechner CM, Wagner C, Rammstedt B, Strohmaier M. AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024:17456916231214460. [PMID: 38165766 DOI: 10.1177/17456916231214460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphorically speaking) from the vast text corpora on which they are trained. Such corpora contain sediments of the personalities, values, beliefs, and biases of the countless human authors of these texts, which LLMs learn through a complex training process. The traits that LLMs acquire in such a way can potentially influence their behavior, that is, their outputs in downstream tasks and applications in which they are employed, which in turn may have real-world consequences for individuals and social groups. By eliciting LLMs' responses to language-based psychometric inventories, we can bring their traits to light. Psychometric profiling enables researchers to study and compare LLMs in terms of noncognitive characteristics, thereby providing a window into the personalities, values, beliefs, and biases these models exhibit (or mimic). We discuss the history of similar ideas and outline possible psychometric approaches for LLMs. We demonstrate one promising approach, zero-shot classification, for several LLMs and psychometric inventories. We conclude by highlighting open challenges and future avenues of research for AI Psychometrics.
Collapse
Affiliation(s)
| | | | - Claudia Wagner
- GESIS-Leibniz Institute for the Social Sciences
- Department of Society, Technology and Human Factors, RWTH Aachen University
- Complexity Science Hub Vienna, Vienna, Austria
| | | | - Markus Strohmaier
- Business School, University of Mannheim
- GESIS-Leibniz Institute for the Social Sciences
- Complexity Science Hub Vienna, Vienna, Austria
| |
Collapse
|
28
|
Lee MHJ, Montgomery JM, Lai CK. America's racial framework of superiority and Americanness embedded in natural language. PNAS NEXUS 2024; 3:pgad485. [PMID: 38274118 PMCID: PMC10810327 DOI: 10.1093/pnasnexus/pgad485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024]
Abstract
America's racial framework can be summarized using two distinct dimensions: superiority/inferiority and Americanness/foreignness. We investigated America's racial framework in a corpus of spoken and written language using word embeddings. Word embeddings place words on a low-dimensional space where words with similar meanings are proximate, allowing researchers to test whether the positions of group and attribute words in a semantic space reflect stereotypes. We trained a word embedding model on the Corpus of Contemporary American English-a corpus of 1 billion words that span 30 years and 8 text categories-and compared the positions of racial/ethnic groups with respect to superiority and Americanness. We found that America's racial framework is embedded in American English. We also captured an additional nuance: Asian people were stereotyped as more American than Hispanic people. These results are empirical evidence that America's racial framework is embedded in American English.
Collapse
Affiliation(s)
- Messi H J Lee
- Division of Computational and Data Sciences, Washington University in St. Louis, St. Louis, MO 63130-4899, USA
| | - Jacob M Montgomery
- Department of Political Science, Washington University in St. Louis, St. Louis, MO 63130-4899, USA
| | - Calvin K Lai
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO 63130-4899, USA
| |
Collapse
|
29
|
Cacciamani GE, Chen A, Gill IS, Hung AJ. Artificial intelligence and urology: ethical considerations for urologists and patients. Nat Rev Urol 2024; 21:50-59. [PMID: 37524914 DOI: 10.1038/s41585-023-00796-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/22/2023] [Indexed: 08/02/2023]
Abstract
The use of artificial intelligence (AI) in medicine and in urology specifically has increased over the past few years, during which time it has enabled optimization of patient workflow, increased diagnostic accuracy and enhanced computer analysis of radiological and pathological images. However, before further use of AI is undertaken, possible ethical issues need to be evaluated to improve understanding of this technology and to protect patients and providers. Possible ethical issues that require consideration when applying AI in clinical practice include patient safety, cybersecurity, transparency and interpretability of the data, inclusivity and equity, fostering responsibility and accountability, and the preservation of providers' decision-making and autonomy. Ethical principles for the application of AI to health care and in urology are proposed to guide urologists, patients and regulators to improve use of AI technologies and guide policy-making.
Collapse
Affiliation(s)
- Giovanni E Cacciamani
- The Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
- Department of Radiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Andrew Chen
- The Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Inderbir S Gill
- The Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Andrew J Hung
- The Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
30
|
Sunsay C. A historical evaluation of the disease avoidance theory of xenophobia. PLoS One 2023; 18:e0294816. [PMID: 38150454 PMCID: PMC10752500 DOI: 10.1371/journal.pone.0294816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 10/31/2023] [Indexed: 12/29/2023] Open
Abstract
Historical psychology is emerging as a multidisciplinary field for studying psychological phenomena in a historical context. Historical records can also serve as testbeds for psychological theories, particularly the evolutionary ones. In Study 1 we aimed to gather evidence to evaluate the disease avoidance theory of xenophobia by analyzing the narratives of European explorers from the15th and 16th centuries. Contrary to the theory's expectations, the narratives revealed numerous instances of close physical contact between the explorers and the native populations. Furthermore, rather than using disgust-laden words, the explorers portrayed the natives in a positive light. In Study 2, we employed a word embedding algorithm to explore whether native group names and their unfamiliar appearance were associated with disgust-laden words in the 19th century travel literature. The results indicated that while native group names showed such associations, their appearance did not. Finally, through network analysis, we demonstrated that embedded words such as "savages" mediated the perception of native groups as potential disease-threat. The findings highlight the significance of cultural factors that evolve over time, rather than cognitive adaptations believed to have evolved prior to the emergence of human culture, in explaining xenophobia.
Collapse
Affiliation(s)
- Ceyhun Sunsay
- Kutztown University of Pennsylvania, Kutztown, PA, United States of America
| |
Collapse
|
31
|
Chen Y, Liu TX, Shan Y, Zhong S. The emergence of economic rationality of GPT. Proc Natl Acad Sci U S A 2023; 120:e2316205120. [PMID: 38085780 PMCID: PMC10740389 DOI: 10.1073/pnas.2316205120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 11/13/2023] [Indexed: 12/18/2023] Open
Abstract
As large language models (LLMs) like GPT become increasingly prevalent, it is essential that we assess their capabilities beyond language processing. This paper examines the economic rationality of GPT by instructing it to make budgetary decisions in four domains: risk, time, social, and food preferences. We measure economic rationality by assessing the consistency of GPT's decisions with utility maximization in classic revealed preference theory. We find that GPT's decisions are largely rational in each domain and demonstrate higher rationality score than those of human subjects in a parallel experiment and in the literature. Moreover, the estimated preference parameters of GPT are slightly different from human subjects and exhibit a lower degree of heterogeneity. We also find that the rationality scores are robust to the degree of randomness and demographic settings such as age and gender but are sensitive to contexts based on the language frames of the choice situations. These results suggest the potential of LLMs to make good decisions and the need to further understand their capabilities, limitations, and underlying mechanisms.
Collapse
Affiliation(s)
- Yiting Chen
- Department of Economics, Lingnan University, Hong Kong, China HKG
| | - Tracy Xiao Liu
- Department of Economics, School of Economics and Management, National Center for Economic Research at Tsinghua University, Tsinghua University, Beijing100084, China
| | - You Shan
- Department of Economics, School of Economics and Management, Tsinghua University, Beijing100084, China
| | - Songfa Zhong
- Department of Economics, Hong Kong University of Science and Technology, Hong Kong, China HKG
- Department of Economics, National University of Singapore, Singapore117570, Singapore
| |
Collapse
|
32
|
Lewis M, Cahill A, Madnani N, Evans J. Local similarity and global variability characterize the semantic space of human languages. Proc Natl Acad Sci U S A 2023; 120:e2300986120. [PMID: 38079546 PMCID: PMC10743503 DOI: 10.1073/pnas.2300986120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 11/06/2023] [Indexed: 12/18/2023] Open
Abstract
How does meaning vary across the world's languages? Scholars recognize the existence of substantial variability within specific domains, ranging from nature and color to kinship. The emergence of large language models enables a systems-level approach that directly characterizes this variability through comparison of word organization across semantic domains. Here, we show that meanings across languages manifest lower variability within semantic domains and greater variability between them, using models trained on both 1) large corpora of native language text comprising Wikipedia articles in 35 languages and also 2) Test of English as a Foreign Language (TOEFL) essays written by 38,500 speakers from the same native languages, which cluster into semantic domains. Concrete meanings vary less across languages than abstract meanings, but all vary with geographical, environmental, and cultural distance. By simultaneously examining local similarity and global difference, we harmonize these findings and provide a description of general principles that govern variability in semantic space across languages. In this way, the structure of a speaker's semantic space influences the comparisons cognitively salient to them, as shaped by their native language, and suggests that even successful bilingual communicators likely think with "semantic accents" driven by associations from their native language while writing English. These findings have dramatic implications for language education, cross-cultural communication, and literal translations, which are impossible not because the objects of reference are uncertain, but because associations, metaphors, and narratives interlink meanings in different, predictable ways from one language to another.
Collapse
Affiliation(s)
- Molly Lewis
- Psychology & Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA15213
| | | | | | - James Evans
- Sociology & Data Science, University of Chicago, Chicago, IL60637
- Santa Fe Institute, Santa Fe, NM87501
| |
Collapse
|
33
|
Cardona G, Argiles M, Pérez-Mañá L. Accuracy of a Large Language Model as a new tool for optometry education. Clin Exp Optom 2023:1-4. [PMID: 38044041 DOI: 10.1080/08164622.2023.2288174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/18/2023] [Indexed: 12/05/2023] Open
Abstract
CLINICAL RELEVANCE The unsupervised introduction of certain Artificial Intelligence tools in optometry education may challenge the proper acquisition of accurate clinical knowledge and skills proficiency. BACKGROUND Large Language Models like ChatGPT (Generative Pretrained Transformer) are increasingly being used by researchers and students for work and academic assignments. The authoritative and conversationally correct language provided by these tools may mask their inherent limitations when presented with specific scientific and clinical queries. METHODS Three sets of 10 queries related to contact lenses & anterior eye, low vision and binocular vision & vision therapy were presented to ChatGPT, with instructions to provide five relevant references to support each response. Three experts and 53 undergraduate and post-graduate students graded from 0 to 10 the accuracy of the responses, and the references were evaluated for precision and relevance. Students graded from 0 to 10 the potential usefulness of ChatGPT for their academic coursework. RESULTS Median scores were 7, 8 and 6 (experts) and 8, 9 and 7.5 (students) for the contact lenses & anterior eye, low vision and binocular vision & vision therapy categories, respectively. Responses to more specific queries were awarded lower scores by both experts (ρ = -0.612; P < 0.001) and students (ρ = -0.578; P = 0.001). Of 150 references, 24% were accurate and 19.3% relevant. Students graded the usefulness of ChatGPT with 7.5 (2 to 9), 7 (3 to 9) and 8.5 (3 to 10) for contact lenses & anterior eye, low vision and binocular vision & vision therapy, respectively. CONCLUSION Careful expert appraisal of the responses and, particularly, of the references provided by ChatGPT is required in research and academic settings. As the use of these tools becomes widespread, it is essential to take proactive steps to address their limitations and ensure their responsible use.
Collapse
Affiliation(s)
- Genis Cardona
- Department of Optics and Optometry, Universitat Politècnica de Catalunya, Terrassa, Spain
| | - Marc Argiles
- Department of Optics and Optometry, Universitat Politècnica de Catalunya, Terrassa, Spain
| | - Lluis Pérez-Mañá
- Department of Optics and Optometry, Universitat Politècnica de Catalunya, Terrassa, Spain
| |
Collapse
|
34
|
Ferrara C, Sellitto G, Ferrucci F, Palomba F, De Lucia A. Fairness-aware machine learning engineering: how far are we? EMPIRICAL SOFTWARE ENGINEERING 2023; 29:9. [PMID: 38027253 PMCID: PMC10673752 DOI: 10.1007/s10664-023-10402-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 12/01/2023]
Abstract
Machine learning is part of the daily life of people and companies worldwide. Unfortunately, bias in machine learning algorithms risks unfairly influencing the decision-making process and reiterating possible discrimination. While the interest of the software engineering community in software fairness is rapidly increasing, there is still a lack of understanding of various aspects connected to fair machine learning engineering, i.e., the software engineering process involved in developing fairness-critical machine learning systems. Questions connected to the practitioners' awareness and maturity about fairness, the skills required to deal with the matter, and the best development phase(s) where fairness should be faced more are just some examples of the knowledge gaps currently open. In this paper, we provide insights into how fairness is perceived and managed in practice, to shed light on the instruments and approaches that practitioners might employ to properly handle fairness. We conducted a survey with 117 professionals who shared their knowledge and experience highlighting the relevance of fairness in practice, and the skills and tools required to handle it. The key results of our study show that fairness is still considered a second-class quality aspect in the development of artificial intelligence systems. The building of specific methods and development environments, other than automated validation tools, might help developers to treat fairness throughout the software lifecycle and revert this trend.
Collapse
Affiliation(s)
- Carmine Ferrara
- Software Engineering (SeSa) Lab, University of Salerno, Salerno, Italy
| | - Giulia Sellitto
- Software Engineering (SeSa) Lab, University of Salerno, Salerno, Italy
| | - Filomena Ferrucci
- Software Engineering (SeSa) Lab, University of Salerno, Salerno, Italy
| | - Fabio Palomba
- Software Engineering (SeSa) Lab, University of Salerno, Salerno, Italy
| | - Andrea De Lucia
- Software Engineering (SeSa) Lab, University of Salerno, Salerno, Italy
| |
Collapse
|
35
|
Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, Martin C, Flores MG, Zhang Y, Magoc T, Lipori G, Mitchell DA, Ospina NS, Ahmed MM, Hogan WR, Shenkman EA, Guo Y, Bian J, Wu Y. A study of generative large language model for medical research and healthcare. NPJ Digit Med 2023; 6:210. [PMID: 37973919 PMCID: PMC10654385 DOI: 10.1038/s41746-023-00958-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/01/2023] [Indexed: 11/19/2023] Open
Abstract
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians' Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | | | | | | | | | | | - Ying Zhang
- Research Computing, University of Florida, Gainesville, FL, USA
| | - Tanja Magoc
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
| | - Gloria Lipori
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
- Lillian S. Wells Department of Neurosurgery, Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - Duane A Mitchell
- Lillian S. Wells Department of Neurosurgery, Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - Naykky S Ospina
- Division of Endocrinology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Mustafa M Ahmed
- Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Elizabeth A Shenkman
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
36
|
Brinkmann L, Baumann F, Bonnefon JF, Derex M, Müller TF, Nussberger AM, Czaplicka A, Acerbi A, Griffiths TL, Henrich J, Leibo JZ, McElreath R, Oudeyer PY, Stray J, Rahwan I. Machine culture. Nat Hum Behav 2023; 7:1855-1868. [PMID: 37985914 DOI: 10.1038/s41562-023-01742-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/03/2023] [Indexed: 11/22/2023]
Abstract
The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of 'machine culture', culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission and selection. Recommender algorithms are altering social learning dynamics. Chatbots are forming a new mode of cultural transmission, serving as cultural models. Furthermore, intelligent machines are evolving as contributors in generating cultural traits-from game strategies and visual art to scientific results. We provide a conceptual framework for studying the present and anticipated future impact of machines on cultural evolution, and present a research agenda for the study of machine culture.
Collapse
Affiliation(s)
- Levin Brinkmann
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany.
| | - Fabian Baumann
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
| | | | - Maxime Derex
- Toulouse School of Economics, Toulouse, France
- Institute for Advanced Study in Toulouse, Toulouse, France
| | - Thomas F Müller
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
| | - Anne-Marie Nussberger
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
| | - Agnieszka Czaplicka
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
| | - Alberto Acerbi
- Department of Sociology and Social Research, University of Trento, Trento, Italy
| | - Thomas L Griffiths
- Department of Psychology and Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Joseph Henrich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Richard McElreath
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | - Jonathan Stray
- Center for Human-Compatible Artificial Intelligence, University of California, Berkeley, Berkeley, CA, USA
| | - Iyad Rahwan
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany.
| |
Collapse
|
37
|
Napp C. Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries. PNAS NEXUS 2023; 2:pgad355. [PMID: 38024410 PMCID: PMC10662454 DOI: 10.1093/pnasnexus/pgad355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 10/11/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023]
Abstract
Gender stereotypes contribute to gender imbalances, and analyzing their variations across countries is important for understanding and mitigating gender inequalities. However, measuring stereotypes is difficult, particularly in a cross-cultural context. Word embeddings are a recent useful tool in natural language processing permitting to measure the collective gender stereotypes embedded in a society. In this work, we used word embedding models pre-trained on large text corpora from more than 70 different countries to examine how gender stereotypes vary across countries. We considered stereotypes associating men with career and women with family as well as those associating men with math or science and women with arts or liberal arts. Relying on two different sources (Wikipedia and Common Crawl), we found that these gender stereotypes are all significantly more pronounced in the text corpora of more economically developed and more individualistic countries. Our analysis suggests that more economically developed countries, while being more gender equal along several dimensions, also have stronger gender stereotypes. Public policy aiming at mitigating gender imbalances in these countries should take this feature into account. Besides, our analysis sheds light on the "gender equality paradox," i.e. on the fact that gender imbalances in a large number of domains are paradoxically stronger in more developed/gender equal/individualistic countries.
Collapse
Affiliation(s)
- Clotilde Napp
- CNRS, UMR7088, France
- Université Paris-Dauphine, PSL Research University, Place du Maréchal de Lattre de Tassigny, 75016 Paris, France
| |
Collapse
|
38
|
Acerbi A, Stubbersfield JM. Large language models show human-like content biases in transmission chain experiments. Proc Natl Acad Sci U S A 2023; 120:e2313790120. [PMID: 37883432 PMCID: PMC10622889 DOI: 10.1073/pnas.2313790120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/26/2023] [Indexed: 10/28/2023] Open
Abstract
As the use of large language models (LLMs) grows, it is important to examine whether they exhibit biases in their output. Research in cultural evolution, using transmission chain experiments, demonstrates that humans have biases to attend to, remember, and transmit some types of content over others. Here, in five preregistered experiments using material from previous studies with human participants, we use the same, transmission chain-like methodology, and find that the LLM ChatGPT-3 shows biases analogous to humans for content that is gender-stereotype-consistent, social, negative, threat-related, and biologically counterintuitive, over other content. The presence of these biases in LLM output suggests that such content is widespread in its training data and could have consequential downstream effects, by magnifying preexisting human tendencies for cognitively appealing and not necessarily informative, or valuable, content.
Collapse
Affiliation(s)
- Alberto Acerbi
- Department of Sociology and Social Research, University of Trento, Trento38122, Italy
| | | |
Collapse
|
39
|
Argyle LP, Bail CA, Busby EC, Gubler JR, Howe T, Rytting C, Sorensen T, Wingate D. Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale. Proc Natl Acad Sci U S A 2023; 120:e2311627120. [PMID: 37788311 PMCID: PMC10576030 DOI: 10.1073/pnas.2311627120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 08/18/2023] [Indexed: 10/05/2023] Open
Abstract
Political discourse is the soul of democracy, but misunderstanding and conflict can fester in divisive conversations. The widespread shift to online discourse exacerbates many of these problems and corrodes the capacity of diverse societies to cooperate in solving social problems. Scholars and civil society groups promote interventions that make conversations less divisive or more productive, but scaling these efforts to online discourse is challenging. We conduct a large-scale experiment that demonstrates how online conversations about divisive topics can be improved with AI tools. Specifically, we employ a large language model to make real-time, evidence-based recommendations intended to improve participants' perception of feeling understood. These interventions improve reported conversation quality, promote democratic reciprocity, and improve the tone, without systematically changing the content of the conversation or moving people's policy attitudes.
Collapse
Affiliation(s)
- Lisa P. Argyle
- Department of Political Science, Brigham Young University, Provo, UT, 84602
| | - Christopher A. Bail
- Department of Sociology, Political Science, and Public Policy, Duke University, Durham, NC, 27708
| | - Ethan C. Busby
- Department of Political Science, Brigham Young University, Provo, UT, 84602
| | - Joshua R. Gubler
- Department of Political Science, Brigham Young University, Provo, UT, 84602
| | - Thomas Howe
- Department of Computer Science, Brigham Young University, Provo, UT, 84602
| | | | - Taylor Sorensen
- Department of Computer Science, University of Washington, Seattle, WA, 98195
| | - David Wingate
- Department of Computer Science, Brigham Young University, Provo, UT, 84602
| |
Collapse
|
40
|
Mylrea M, Robinson N. Artificial Intelligence (AI) Trust Framework and Maturity Model: Applying an Entropy Lens to Improve Security, Privacy, and Ethical AI. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1429. [PMID: 37895550 PMCID: PMC10606888 DOI: 10.3390/e25101429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/30/2023] [Accepted: 09/15/2023] [Indexed: 10/29/2023]
Abstract
Recent advancements in artificial intelligence (AI) technology have raised concerns about the ethical, moral, and legal safeguards. There is a pressing need to improve metrics for assessing security and privacy of AI systems and to manage AI technology in a more ethical manner. To address these challenges, an AI Trust Framework and Maturity Model is proposed to enhance trust in the design and management of AI systems. Trust in AI involves an agreed-upon understanding between humans and machines about system performance. The framework utilizes an "entropy lens" to root the study in information theory and enhance transparency and trust in "black box" AI systems, which lack ethical guardrails. High entropy in AI systems can decrease human trust, particularly in uncertain and competitive environments. The research draws inspiration from entropy studies to improve trust and performance in autonomous human-machine teams and systems, including interconnected elements in hierarchical systems. Applying this lens to improve trust in AI also highlights new opportunities to optimize performance in teams. Two use cases are described to validate the AI framework's ability to measure trust in the design and management of AI systems.
Collapse
Affiliation(s)
- Michael Mylrea
- Department of Computer Science & Engineering, Institute of Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Nikki Robinson
- Department of Computer and Data Science, Capitol Technology University, Laurel, ME 20708, USA
| |
Collapse
|
41
|
Leach S, Kitchin AP, Sutton RM. Word embeddings reveal growing moral concern for people, animals and the environment. BRITISH JOURNAL OF SOCIAL PSYCHOLOGY 2023; 62:1925-1938. [PMID: 37403899 DOI: 10.1111/bjso.12663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 06/01/2023] [Indexed: 07/06/2023]
Abstract
The Enlightenment idea of historical moral progress asserts that civil societies become more moral over time. This is often understood as an expanding moral circle and is argued to be tightly linked with language use, with some suggesting that shifts in how we express concern for others can be considered an important indicator of moral progress. Our research explores these notions by examining historical trends in natural language use during the 19th and 20th centuries. We found that the associations between words denoting moral concern and words referring to people, animals, and the environment grew stronger over time. The findings support widely-held views about the nature of moral progress by showing that language has changed in a way that reflects greater concern for others.
Collapse
Affiliation(s)
- Stefan Leach
- School of Psychology, University of Kent, Canterbury, UK
| | | | | |
Collapse
|
42
|
Ash E, Stammbach D, Tobia K. What is (and was) a person? Evidence on historical mind perceptions from natural language. Cognition 2023; 239:105501. [PMID: 37480835 DOI: 10.1016/j.cognition.2023.105501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 07/24/2023]
Abstract
An important philosophical tradition identifies persons as those entities that have minds, such that mind perception is a window into person perception. Psychological research has found that human perceptions of mind consist of at least two distinct dimensions: agency (e.g. planning, deciding) and experience (e.g. feeling, hungering). Taking this insight into the semantic space of natural language, we develop a generalizable, scalable computational-linguistics method for measuring variation in perceived agency and experience in large archives of plain-text documents. The resulting text-based rankings of entities along these dimensions correspond to human judgments of perceived agency and experience assessed in blind surveys. We then map both dimensions of mind in historical English-language corpora over the last 200 years and identify two salient trends. First, we find that while women are now described as having similar levels of agency as men, they are still described as more experience-oriented. Second, we find that domesticated animals have gained higher attributions of experience (but not agency) relative to wild animals, especially since the rise of the global animal rights movement in the 1980s.
Collapse
Affiliation(s)
| | | | - Kevin Tobia
- Georgetown University, United States of America.
| |
Collapse
|
43
|
Davis MA, Lim N, Jordan J, Yee J, Gichoya JW, Lee R. Imaging Artificial Intelligence: A Framework for Radiologists to Address Health Equity, From the AJR Special Series on DEI. AJR Am J Roentgenol 2023; 221:302-308. [PMID: 37095660 DOI: 10.2214/ajr.22.28802] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Artificial intelligence (AI) holds promise for helping patients access new and individualized health care pathways while increasing efficiencies for health care practitioners. Radiology has been at the forefront of this technology in medicine; many radiology practices are implementing and trialing AI-focused products. AI also holds great promise for reducing health disparities and promoting health equity. Radiology is ideally positioned to help reduce disparities given its central and critical role in patient care. The purposes of this article are to discuss the potential benefits and pitfalls of deploying AI algorithms in radiology, specifically highlighting the impact of AI on health equity; to explore ways to mitigate drivers of inequity; and to enhance pathways for creating better health care for all individuals, centering on a practical framework that helps radiologists address health equity during deployment of new tools.
Collapse
Affiliation(s)
- Melissa A Davis
- Department of Diagnostic Radiology, Yale University School of Medicine, 789 Howard Ave, PO Box 20842, New Haven, CT 06520
| | | | - John Jordan
- Stanford University School of Medicine, Stanford, CA
| | - Judy Yee
- Montefiore Medical Center, Albert Einstein College of Medicine, New York, NY
| | | | - Ryan Lee
- Jefferson Health, Philadelphia, PA
| |
Collapse
|
44
|
Foltz PW, Chandler C, Diaz-Asper C, Cohen AS, Rodriguez Z, Holmlund TB, Elvevåg B. Reflections on the nature of measurement in language-based automated assessments of patients' mental state and cognitive function. Schizophr Res 2023; 259:127-139. [PMID: 36153250 DOI: 10.1016/j.schres.2022.07.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/12/2022] [Accepted: 07/13/2022] [Indexed: 11/23/2022]
Abstract
Modern advances in computational language processing methods have enabled new approaches to the measurement of mental processes. However, the field has primarily focused on model accuracy in predicting performance on a task or a diagnostic category. Instead the field should be more focused on determining which computational analyses align best with the targeted neurocognitive/psychological functions that we want to assess. In this paper we reflect on two decades of experience with the application of language-based assessment to patients' mental state and cognitive function by addressing the questions of what we are measuring, how it should be measured and why we are measuring the phenomena. We address the questions by advocating for a principled framework for aligning computational models to the constructs being assessed and the tasks being used, as well as defining how those constructs relate to patient clinical states. We further examine the assumptions that go into the computational models and the effects that model design decisions may have on the accuracy, bias and generalizability of models for assessing clinical states. Finally, we describe how this principled approach can further the goal of transitioning language-based computational assessments to part of clinical practice while gaining the trust of critical stakeholders.
Collapse
Affiliation(s)
- Peter W Foltz
- Institute of Cognitive Science, University of Colorado Boulder, United States of America.
| | - Chelsea Chandler
- Institute of Cognitive Science, University of Colorado Boulder, United States of America; Department of Computer Science, University of Colorado Boulder, United States of America
| | | | - Alex S Cohen
- Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America
| | - Zachary Rodriguez
- Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America
| | - Terje B Holmlund
- Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway
| | - Brita Elvevåg
- Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway; Norwegian Centre for eHealth Research, University Hospital of North Norway, Tromsø, Norway.
| |
Collapse
|
45
|
Elad VM, Anton T, Ganios NC, Rebecca VW. Undereducation is afoot: Assessing the lack of acral lentiginous melanoma educational materials for skin of color. Pigment Cell Melanoma Res 2023; 36:431-438. [PMID: 37171057 DOI: 10.1111/pcmr.13090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/20/2023] [Accepted: 04/05/2023] [Indexed: 05/13/2023]
Abstract
Acral lentiginous melanoma (ALM) is a subtype of cutaneous melanoma notorious for poor outcomes that disproportionately affect individuals with skin of color (e.g., African-, Hispanic-, Asian-descent) when compared to mortality rates among non-Hispanic White populations. There are several societal factors that contribute to racial disparities in ALM, including a lack of representative educational material in the context of patient education and medical instruction. This gap in representative information for the US population includes risk of disease, patterns of incidence, and differences in disease presentation in skin of color. The atypical presentation of ALM on acral volar skin sites makes early detection challenging and necessitates an increased index of suspicion on the part of physicians and patients alike. Studies underscoring the importance of early detection in reducing mortality risk make the availability of adequate representative educational materials indispensable.
Collapse
Affiliation(s)
- Vissy M Elad
- College of Medicine, Northeast Ohio Medical University, Ohio, Rootstown, USA
| | - Trevena Anton
- College of Medicine, Northeast Ohio Medical University, Ohio, Rootstown, USA
| | - Natalie C Ganios
- College of Medicine, Northeast Ohio Medical University, Ohio, Rootstown, USA
| | - Vito W Rebecca
- Department of Biochemistry and Molecular Biology, Johns Hopkins University Bloomberg School of Public Health, Maryland, Baltimore, USA
| |
Collapse
|
46
|
Evans KD, Robbins SA, Bryson JJ. Do We Collaborate With What We Design? Top Cogn Sci 2023. [PMID: 37582263 DOI: 10.1111/tops.12682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 08/17/2023]
Abstract
The use of terms like "collaboration" and "co-workers" to describe interactions between human beings and certain artificial intelligence (AI) systems has gained significant traction in recent years. Yet, it remains an open question whether such anthropomorphic metaphors provide either a fertile or even a purely innocuous lens through which to conceptualize designed commercial products. Rather, a respect for human dignity and the principle of transparency may require us to draw a sharp distinction between real and faux peers. At the heart of the concept of collaboration lies the assumption that the collaborating parties are (or behave as if they are) of similar status: two agents capable of comparable forms of intentional action, moral agency, or moral responsibility. In application to current AI systems, this not only seems to fail ontologically but also from a socio-political perspective. AI in the workplace is primarily an extension of capital, not of labor, and the AI "co-workers" of most individuals will likely be owned and operated by their employer. In this paper, we critically assess both the accuracy and desirability of using the term "collaboration" to describe interactions between humans and AI systems. We begin by proposing an alternative ontology of human-machine interaction, one which features not two equivalently autonomous agents, but rather one machine that exists in a relationship of heteronomy to one or more human agents. In this sense, while the machine may have a significant degree of independence concerning the means by which it achieves its ends, the ends themselves are always chosen by at least one human agent, whose interests may differ from those of the individuals interacting with the machine. We finally consider the motivations and risks inherent to the continued use of the term "collaboration," exploring its strained relation to the concept of transparency, and consequences for the future of work.
Collapse
|
47
|
Curto G, Comim F. SAF: Stakeholders' Agreement on Fairness in the Practice of Machine Learning Development. SCIENCE AND ENGINEERING ETHICS 2023; 29:29. [PMID: 37486434 PMCID: PMC10366323 DOI: 10.1007/s11948-023-00448-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 06/16/2023] [Indexed: 07/25/2023]
Abstract
This paper clarifies why bias cannot be completely mitigated in Machine Learning (ML) and proposes an end-to-end methodology to translate the ethical principle of justice and fairness into the practice of ML development as an ongoing agreement with stakeholders. The pro-ethical iterative process presented in the paper aims to challenge asymmetric power dynamics in the fairness decision making within ML design and support ML development teams to identify, mitigate and monitor bias at each step of ML systems development. The process also provides guidance on how to explain the always imperfect trade-offs in terms of bias to users.
Collapse
Affiliation(s)
| | - Flavio Comim
- IQS School of Management, Universitat Ramon Llull, Barcelona, Spain
| |
Collapse
|
48
|
Fisher E, Flynn MA, Pratap P, Vietas JA. Occupational Safety and Health Equity Impacts of Artificial Intelligence: A Scoping Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6221. [PMID: 37444068 PMCID: PMC10340692 DOI: 10.3390/ijerph20136221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/26/2023] [Accepted: 06/08/2023] [Indexed: 07/15/2023]
Abstract
Artificial intelligence (AI) has the potential to either reduce or exacerbate occupational safety and health (OSH) inequities in the workplace, and its impact will be mediated by numerous factors. This paper anticipates challenges to ensuring that the OSH benefits of technological advances are equitably distributed among social groups, industries, job arrangements, and geographical regions. A scoping review was completed to summarize the recent literature on AI's role in promoting OSH equity. The scoping review was designed around three concepts: artificial intelligence, OSH, and health equity. Scoping results revealed 113 articles relevant for inclusion. The ways in which AI presents barriers and facilitators to OSH equity are outlined along with priority focus areas and best practices in reducing OSH disparities and knowledge gaps. The scoping review uncovered priority focus areas. In conclusion, AI's role in OSH equity is vastly understudied. An urgent need exists for multidisciplinary research that addresses where and how AI is being adopted and evaluated and how its use is affecting OSH across industries, wage categories, and sociodemographic groups. OSH professionals can play a significant role in identifying strategies that ensure the benefits of AI in promoting workforce health and wellbeing are equitably distributed.
Collapse
Affiliation(s)
- Elizabeth Fisher
- Division of Environmental and Occupational Health Sciences, School of Public Health, University of Illinois Chicago, Chicago, IL 60612, USA; (E.F.); (P.P.)
| | - Michael A. Flynn
- Division of Science Integration, National Institute for Occupational Safety and Health, Cincinnati, OH 45226, USA;
| | - Preethi Pratap
- Division of Environmental and Occupational Health Sciences, School of Public Health, University of Illinois Chicago, Chicago, IL 60612, USA; (E.F.); (P.P.)
| | - Jay A. Vietas
- Division of Science Integration, National Institute for Occupational Safety and Health, Cincinnati, OH 45226, USA;
| |
Collapse
|
49
|
Vorisek CN, Stellmach C, Mayer PJ, Klopfenstein SAI, Bures DM, Diehl A, Henningsen M, Ritter K, Thun S. Artificial Intelligence Bias in Health Care: Web-Based Survey. J Med Internet Res 2023; 25:e41089. [PMID: 37347528 PMCID: PMC10337406 DOI: 10.2196/41089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/11/2022] [Accepted: 04/20/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND Resources are increasingly spent on artificial intelligence (AI) solutions for medical applications aiming to improve diagnosis, treatment, and prevention of diseases. While the need for transparency and reduction of bias in data and algorithm development has been addressed in past studies, little is known about the knowledge and perception of bias among AI developers. OBJECTIVE This study's objective was to survey AI specialists in health care to investigate developers' perceptions of bias in AI algorithms for health care applications and their awareness and use of preventative measures. METHODS A web-based survey was provided in both German and English language, comprising a maximum of 41 questions using branching logic within the REDCap web application. Only the results of participants with experience in the field of medical AI applications and complete questionnaires were included for analysis. Demographic data, technical expertise, and perceptions of fairness, as well as knowledge of biases in AI, were analyzed, and variations among gender, age, and work environment were assessed. RESULTS A total of 151 AI specialists completed the web-based survey. The median age was 30 (IQR 26-39) years, and 67% (101/151) of respondents were male. One-third rated their AI development projects as fair (47/151, 31%) or moderately fair (51/151, 34%), 12% (18/151) reported their AI to be barely fair, and 1% (2/151) not fair at all. One participant identifying as diverse rated AI developments as barely fair, and among the 2 undefined gender participants, AI developments were rated as barely fair or moderately fair, respectively. Reasons for biases selected by respondents were lack of fair data (90/132, 68%), guidelines or recommendations (65/132, 49%), or knowledge (60/132, 45%). Half of the respondents worked with image data (83/151, 55%) from 1 center only (76/151, 50%), and 35% (53/151) worked with national data exclusively. CONCLUSIONS This study shows that the perception of biases in AI overall is moderately fair. Gender minorities did not once rate their AI development as fair or very fair. Therefore, further studies need to focus on minorities and women and their perceptions of AI. The results highlight the need to strengthen knowledge about bias in AI and provide guidelines on preventing biases in AI health care applications.
Collapse
Affiliation(s)
- Carina Nina Vorisek
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Caroline Stellmach
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Paula Josephine Mayer
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sophie Anne Ines Klopfenstein
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Institute for Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | - Anke Diehl
- Stabsstelle Digitale Transformation, Universitätsmedizin Essen, Essen, Germany
| | - Maike Henningsen
- Faculty of Health, University of Witten/Herdecke, Witten, Germany
| | - Kerstin Ritter
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sylvia Thun
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
50
|
Wright-Berryman J, Cohen J, Haq A, Black DP, Pease JL. Virtually screening adults for depression, anxiety, and suicide risk using machine learning and language from an open-ended interview. Front Psychiatry 2023; 14:1143175. [PMID: 37377466 PMCID: PMC10291825 DOI: 10.3389/fpsyt.2023.1143175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
Background Current depression, anxiety, and suicide screening techniques rely on retrospective patient reported symptoms to standardized scales. A qualitative approach to screening combined with the innovation of natural language processing (NLP) and machine learning (ML) methods have shown promise to enhance person-centeredness while detecting depression, anxiety, and suicide risk from in-the-moment patient language derived from an open-ended brief interview. Objective To evaluate the performance of NLP/ML models to identify depression, anxiety, and suicide risk from a single 5-10-min semi-structured interview with a large, national sample. Method Two thousand four hundred sixteen interviews were conducted with 1,433 participants over a teleconference platform, with 861 (35.6%), 863 (35.7%), and 838 (34.7%) sessions screening positive for depression, anxiety, and suicide risk, respectively. Participants completed an interview over a teleconference platform to collect language about the participants' feelings and emotional state. Logistic regression (LR), support vector machine (SVM), and extreme gradient boosting (XGB) models were trained for each condition using term frequency-inverse document frequency features from the participants' language. Models were primarily evaluated with the area under the receiver operating characteristic curve (AUC). Results The best discriminative ability was found when identifying depression with an SVM model (AUC = 0.77; 95% CI = 0.75-0.79), followed by anxiety with an LR model (AUC = 0.74; 95% CI = 0.72-0.76), and an SVM for suicide risk (AUC = 0.70; 95% CI = 0.68-0.72). Model performance was generally best with more severe depression, anxiety, or suicide risk. Performance improved when individuals with lifetime but no suicide risk in the past 3 months were considered controls. Conclusion It is feasible to use a virtual platform to simultaneously screen for depression, anxiety, and suicide risk using a 5-to-10-min interview. The NLP/ML models performed with good discrimination in the identification of depression, anxiety, and suicide risk. Although the utility of suicide risk classification in clinical settings is still undetermined and suicide risk classification had the lowest performance, the result taken together with the qualitative responses from the interview can better inform clinical decision-making by providing additional drivers associated with suicide risk.
Collapse
Affiliation(s)
- Jennifer Wright-Berryman
- Department of Social Work, College of Allied Health Sciences, University of Cincinnati, Cincinnati, OH, United States
| | | | - Allie Haq
- Clarigent Health, Mason, OH, United States
| | | | - James L. Pease
- Department of Social Work, College of Allied Health Sciences, University of Cincinnati, Cincinnati, OH, United States
| |
Collapse
|