Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

34
(from Reference Citation Analysis)

Article PDFs (17)

Cited by > 0 (19)

Searched Name

Davy Weissenbacher

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

O'Connor K, Golder S, Weissenbacher D, Klein AZ, Magge A, Gonzalez-Hernandez G. Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review. J Med Internet Res 2024;26:e47923. [PMID: 38488839 PMCID: PMC10980991 DOI: 10.2196/47923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 03/19/2024] Open

Abstract

BACKGROUND

Patient health data collected from a variety of nontraditional resources, commonly referred to as real-world data, can be a key information source for health and social science research. Social media platforms, such as Twitter (Twitter, Inc), offer vast amounts of real-world data. An important aspect of incorporating social media data in scientific research is identifying the demographic characteristics of the users who posted those data. Age and gender are considered key demographics for assessing the representativeness of the sample and enable researchers to study subgroups and disparities effectively. However, deciphering the age and gender of social media users poses challenges.

OBJECTIVE

This scoping review aims to summarize the existing literature on the prediction of the age and gender of Twitter users and provide an overview of the methods used.

METHODS

We searched 15 electronic databases and carried out reference checking to identify relevant studies that met our inclusion criteria: studies that predicted the age or gender of Twitter users using computational methods. The screening process was performed independently by 2 researchers to ensure the accuracy and reliability of the included studies.

RESULTS

Of the initial 684 studies retrieved, 74 (10.8%) studies met our inclusion criteria. Among these 74 studies, 42 (57%) focused on predicting gender, 8 (11%) focused on predicting age, and 24 (32%) predicted a combination of both age and gender. Gender prediction was predominantly approached as a binary classification task, with the reported performance of the methods ranging from 0.58 to 0.96 F1-score or 0.51 to 0.97 accuracy. Age prediction approaches varied in terms of classification groups, with a higher range of reported performance, ranging from 0.31 to 0.94 F1-score or 0.43 to 0.86 accuracy. The heterogeneous nature of the studies and the reporting of dissimilar performance metrics made it challenging to quantitatively synthesize results and draw definitive conclusions.

CONCLUSIONS

Our review found that although automated methods for predicting the age and gender of Twitter users have evolved to incorporate techniques such as deep neural networks, a significant proportion of the attempts rely on traditional machine learning methods, suggesting that there is potential to improve the performance of these tasks by using more advanced methods. Gender prediction has generally achieved a higher reported performance than age prediction. However, the lack of standardized reporting of performance metrics or standard annotated corpora to evaluate the methods used hinders any meaningful comparison of the approaches. Potential biases stemming from the collection and labeling of data used in the studies was identified as a problem, emphasizing the need for careful consideration and mitigation of biases in future studies. This scoping review provides valuable insights into the methods used for predicting the age and gender of Twitter users, along with the challenges and considerations associated with these methods.

Collapse

O’Connor K, Weissenbacher D, Elyaderani A, Lautenbach E, Scotch M, Gonzalez-Hernandez G. Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis. medRxiv 2024:2023.07.14.23292681. [PMID: 37503241 PMCID: PMC10371180 DOI: 10.1101/2023.07.14.23292681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Abstract

Background

There has been an unprecedented effort to sequence the SARS-CoV-2 virus and examine its molecular evolution. This has been facilitated by the availability of publicly accessible databases, the Global Initiative on Sharing All Influenza Data (GISAID) and GenBank, which collectively hold millions of SARS-CoV-2 sequence records. Genomic epidemiology, however, seeks to go beyond phylogenetic analysis by linking genetic information to patient characteristics and disease outcomes, enabling a comprehensive understanding of transmission dynamics and disease impact.While these repositories include fields reflecting patient-related metadata for a given sequence, inclusion of these demographic and clinical details is scarce. The extent to which patient-related metadata is reported in published sequencing studies and its quality remains largely unexplored.

Methods

The NIH's LitCovid collection will be used for automated classification of articles reporting having deposited SARS-CoV-2 sequences in public repositories, while an independent search will be conducted in PubMed for validation. Data extraction will be conducted using Covidence. The extracted data will be synthesized and summarized to quantify the availability of patient metadata in the published literature of SARS-CoV-2 sequencing studies. For the bibliometric analysis, relevant data points, such as author affiliations and citation metrics will be extracted.

Discussion

This scoping review will report on the extent and types of patient-related metadata reported in genomic viral sequencing studies of SARS-CoV-2, identify gaps in this reporting, and make recommendations for improving the quality and consistency of reporting in this area. The bibliometric analysis will uncover trends and patterns in the reporting of patient-related metadata, including differences in reporting based on study types or geographic regions. Co-occurrence networks of author keywords will also be presented. The insights gained from this study may help improve the quality and consistency of reporting patient metadata, enhancing the utility of sequence metadata and facilitating future research on infectious diseases.

Collapse

Weissenbacher D, Courtright K, Rawal S, Crane-Droesch A, O'Connor K, Kuhl N, Merlino C, Foxwell A, Haines L, Puhl J, Gonzalez-Hernandez G. Detecting goals of care conversations in clinical notes with active learning. J Biomed Inform 2024;151:104618. [PMID: 38431151 DOI: 10.1016/j.jbi.2024.104618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 01/22/2024] [Accepted: 02/26/2024] [Indexed: 03/05/2024]

Boonstra MJ, Weissenbacher D, Moore JH, Gonzalez-Hernandez G, Asselbergs FW. Artificial intelligence: revolutionizing cardiology with large language models. Eur Heart J 2024;45:332-345. [PMID: 38170821 PMCID: PMC10834163 DOI: 10.1093/eurheartj/ehad838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open

Micale C, Golder S, O'Connor K, Weissenbacher D, Gross R, Hennessy S, Gonzalez-Hernandez G. Correction to: Patient-Reported Reasons for Antihypertensive Medication Change: A Quantitative Study Using Social Media. Drug Saf 2024;47:193. [PMID: 38231378 DOI: 10.1007/s40264-023-01394-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]

Weissenbacher D, Rawal S, Zhao X, Priestley JRC, Szigety KM, Schmidt SF, Higgins MJ, Magge A, O'Connor K, Gonzalez-Hernandez G, Campbell IM. PhenoID, a language model normalizer of physical examinations from genetics clinical notes. medRxiv 2024:2023.10.16.23296894. [PMID: 37904943 PMCID: PMC10614999 DOI: 10.1101/2023.10.16.23296894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]

Abstract

Background

Phenotypes identified during dysmorphology physical examinations are critical to genetic diagnosis and nearly universally documented as free-text in the electronic health record (EHR). Variation in how phenotypes are recorded in free-text makes large-scale computational analysis extremely challenging. Existing natural language processing (NLP) approaches to address phenotype extraction are trained largely on the biomedical literature or on case vignettes rather than actual EHR data.

Methods

We implemented a tailored system at the Children's Hospital of Philadelpia that allows clinicians to document dysmorphology physical exam findings. From the underlying data, we manually annotated a corpus of 3136 organ system observations using the Human Phenotype Ontology (HPO). We provide this corpus publicly. We trained a transformer based NLP system to identify HPO terms from exam observations. The pipeline includes an extractor, which identifies tokens in the sentence expected to contain an HPO term, and a normalizer, which uses those tokens together with the original observation to determine the specific term mentioned.

Findings

We find that our labeler and normalizer NLP pipeline, which we call PhenoID, achieves state-of-the-art performance for the dysmorphology physical exam phenotype extraction task. PhenoID's performance on the test set was 0.717, compared to the nearest baseline system (Pheno-Tagger) performance of 0.633. An analysis of our system's normalization errors shows possible imperfections in the HPO terminology itself but also reveals a lack of semantic understanding by our transformer models.

Interpretation

Transformers-based NLP models are a promising approach to genetic phenotype extraction and, with recent development of larger pre-trained causal language models, may improve semantic understanding in the future. We believe our results also have direct applicability to more general extraction of medical signs and symptoms.

Funding

US National Institutes of Health.

Collapse

Micale C, Golder S, O'Connor K, Weissenbacher D, Gross R, Hennessy S, Gonzalez-Hernandez G. Patient-Reported Reasons for Antihypertensive Medication Change: A Quantitative Study Using Social Media. Drug Saf 2024;47:81-91. [PMID: 37995049 DOI: 10.1007/s40264-023-01366-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2023] [Indexed: 11/24/2023]

Abstract

INTRODUCTION

Hypertension is the leading cause of heart disease in the world, and discontinuation or nonadherence of antihypertensive medication constitutes a significant global health concern. Patients with hypertension have high rates of medication nonadherence. Studies of reasons for nonadherence using traditional surveys are limited, can be expensive, and suffer from response, white-coat, and recall biases. Mining relevant posts by patients on social media is inexpensive and less impacted by the pressures and biases of formal surveys, which may provide direct insights into factors that lead to non-compliance with antihypertensive medication.

METHODS

This study examined medication ratings posted to WebMD, an online health forum that allows patients to post medication reviews. We used a previously developed natural language processing classifier to extract indications and reasons for changes in angiotensin receptor II blocker (ARB) and angiotensin-converting enzyme inhibitor (ACEI) treatments. After extraction, ratings were manually annotated and compared with data from the US Food and Drug administration (FDA) Adverse Events Reporting System (FAERS) public database.

RESULTS

From a collection of 343,459 WebMD reviews, we automatically extracted 1867 posts mentioning changes in ACEIs or ARBs, and manually reviewed the 300 most recent posts regarding ACEI treatments and the 300 most recent posts regarding ARB treatments. After excluding posts that only mentioned a dose change or were a false-positive mention, 142 posts in the ARBs dataset and 187 posts in the ACEIs dataset remained. The majority of posts (97% ARBs, 91% ACEIs) indicated experiencing an adverse event as the reason for medication change. The most common adverse events reported mapped to the Medical Dictionary for Regulatory Activities were "musculoskeletal and connective tissue disorders" like muscle and joint pain for ARBs, and "respiratory, thoracic, and mediastinal disorders" like cough and shortness of breath for ACEIs. These categories also had the largest differences in percentage points, appearing more frequently on WebMD data than FDA data (p < 0.001).

CONCLUSION

Musculoskeletal and respiratory symptoms were the most commonly reported adverse effects in social media postings associated with drug discontinuation. Managing such symptoms is a potential target of interventions seeking to improve medication persistence.

Collapse

Lanera C, Lorenzoni G, Barbieri E, Piras G, Magge A, Weissenbacher D, Donà D, Cantarutti L, Gonzalez-Hernandez G, Giaquinto C, Gregori D. Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach. J Pers Med 2023;14:28. [PMID: 38248729 PMCID: PMC10817419 DOI: 10.3390/jpm14010028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024] Open

Affiliation(s)

Corrado Lanera Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
Giulia Lorenzoni Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
Elisa Barbieri Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
Gianluca Piras Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
Arjun Magge Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
Davy Weissenbacher Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
Daniele Donà Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
Luigi Cantarutti Società Servizi Telematici—Pedianet, 35100 Padova, Italy;
Graciela Gonzalez-Hernandez Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
Carlo Giaquinto Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.) Società Servizi Telematici—Pedianet, 35100 Padova, Italy;
Dario Gregori Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)

Collapse

Weissenbacher D, O'Connor K, Klein A, Golder S, Flores I, Elyaderani A, Scotch M, Gonzalez-Hernandez G. Text mining biomedical literature to identify extremely unbalanced data for digital epidemiology and systematic reviews: dataset and methods for a SARS-CoV-2 genomic epidemiology study. medRxiv 2023:2023.07.29.23293370. [PMID: 37577535 PMCID: PMC10418574 DOI: 10.1101/2023.07.29.23293370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]

Abstract

There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a randomized control trial. While manual extraction is cost efficient for small studies, larger studies such as systematic reviews are much more costly and time-consuming. To avoid exhaustive manual searches and extraction, and their related cost and effort, natural language processing (NLP) methods can be tailored for the more subtle extraction and decision tasks that typically only humans have performed. The need for such studies that use the published literature as a data source became even more evident as the COVID-19 pandemic raged through the world and millions of sequenced samples were deposited in public repositories such as GISAID and GenBank, promising large genomic epidemiology studies, but more often than not lacked many important details that prevented large-scale studies. Thus, granular geographic location or the most basic patient-relevant data such as demographic information, or clinical outcomes were not noted in the sequence record. However, some of these data was indeed published, but in the text, tables, or supplementary material of a corresponding published article. We present here methods to identify relevant journal articles that report having produced and made available in GenBank or GISAID, new SARS-CoV-2 sequences, as those that initially produced and made available the sequences are the most likely articles to include the high-level details about the patients from whom the sequences were obtained. Human annotators validated the approach, creating a gold standard set for training and validation of a machine learning classifier. Identifying these articles is a crucial step to enable future automated informatics pipelines that will apply Machine Learning and Natural Language Processing to identify patient characteristics such as co-morbidities, outcomes, age, gender, and race, enriching SARS-CoV-2 sequence databases with actionable information for defining large genomic epidemiology studies. Thus, enriched patient metadata can enable secondary data analysis, at scale, to uncover associations between the viral genome (including variants of concern and their sublineages), transmission risk, and health outcomes. However, for such enrichment to happen, the right papers need to be found and very detailed data needs to be extracted from them. Further, finding the very specific articles needed for inclusion is a task that also facilitates scoping and systematic reviews, greatly reducing the time needed for full-text analysis and extraction.

Collapse

Weissenbacher D, O’Connor K, Rawal S, Zhang Y, Tsai RTH, Miller T, Xu D, Anderson C, Liu B, Han Q, Zhang J, Kulev I, Köprü B, Rodriguez-Esteban R, Ozkirimli E, Ayach A, Roller R, Piccolo S, Han P, Vydiswaran VGV, Tekumalla R, Banda JM, Bagherzadeh P, Bergler S, Silva JF, Almeida T, Martinez P, Rivera-Zavala R, Wang CK, Dai HJ, Alberto Robles Hernandez L, Gonzalez-Hernandez G. Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition. Database (Oxford) 2023;2023:baac108. [PMID: 36734300 PMCID: PMC9896308 DOI: 10.1093/database/baac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/28/2022] [Accepted: 12/13/2022] [Indexed: 02/04/2023]

Affiliation(s)

Davy Weissenbacher Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Karen O’Connor DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Siddharth Rawal DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Yu Zhang Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan
Richard Tzong-Han Tsai Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan IoX Center, National Taiwan University, Da’an District, Section 4, Roosevelt Rd, No. 1, Barry Lam Hall, Taipei 106, Taiwan Research Center for Humanities and Social Sciences, Academia Sinica, No. 128, Section 2, Academia Rd, Nangang District, Taipei 115, Taiwan
Timothy Miller Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Dongfang Xu Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Carol Anderson NVIDIA, Santa Clara, CA, USA
Bo Liu NVIDIA, Santa Clara, CA, USA
Qing Han Department of Statistics, Florida State University, Tallahassee, FL, USA
Jinfeng Zhang Department of Statistics, Florida State University, Tallahassee, FL, USA
Igor Kulev Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
Berkay Köprü Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
Raul Rodriguez-Esteban Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Switzerland
Elif Ozkirimli Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
Ammer Ayach Speech and Language Technology Lab, DFKI, Berlin, Germany
Roland Roller Speech and Language Technology Lab, DFKI, Berlin, Germany
Stephen Piccolo Department of Biology, Brigham Young University, Provo, UT, USA
Peijin Han Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
V G Vinod Vydiswaran Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, USA School of Information, University of Michigan, Ann Arbor, MI, USA
Ramya Tekumalla Department of Computer Science, Georgia State University, Atlanta, GA, USA
Juan M Banda Department of Computer Science, Georgia State University, Atlanta, GA, USA
Parsa Bagherzadeh CLaC Labs, Concordia University, Montreal, Canada
Sabine Bergler CLaC Labs, Concordia University, Montreal, Canada
João F Silva DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal
Tiago Almeida DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal Department of Computation, University of A Coruña, Spain
Paloma Martinez Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
Renzo Rivera-Zavala Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
Chen-Kai Wang Big Data Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Hong-Jie Dai Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
Luis Alberto Robles Hernandez Department of Computer Science, Georgia State University, Atlanta, GA, USA
Graciela Gonzalez-Hernandez Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA

Collapse

Golder S, Weissenbacher D, O’Connor K, Hennessy S, Gross R, Hernandez GG. Patient-Reported Reasons for Switching or Discontinuing Statin Therapy: A Mixed Methods Study Using Social Media. Drug Saf 2022;45:971-981. [PMID: 35933649 PMCID: PMC9402720 DOI: 10.1007/s40264-022-01212-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2022] [Indexed: 11/16/2022]

Weissenbacher D, Flores JI, Wang Y, O’Connor K, Rawal S, Stevens R, Gonzalez-Hernandez G. Automatic Cohort Determination from Twitter for HIV Prevention amongst Black and Hispanic Men. AMIA Annu Symp Proc 2022;2022:504-513. [PMID: 35854738 PMCID: PMC9285152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Magge A, Weissenbacher D, O'Connor K, Scotch M, Gonzalez-Hernandez G. SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning. medRxiv 2022:2021.02.09.21251454. [PMID: 33594374 PMCID: PMC7885933 DOI: 10.1101/2021.02.09.21251454] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]

Golder S, Klein AZ, Magge A, O’Connor K, Cai H, Weissenbacher D, Gonzalez-Hernandez G. A chronological and geographical analysis of personal reports of COVID-19 on Twitter from the UK. Digit Health 2022;8:20552076221097508. [PMID: 35574580 PMCID: PMC9096830 DOI: 10.1177/20552076221097508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 04/12/2022] [Indexed: 11/30/2022] Open

Weissenbacher D, Ge S, Klein A, O'Connor K, Gross R, Hennessy S, Gonzalez-Hernandez G. Active neural networks to detect mentions of changes to medication treatment in social media. J Am Med Inform Assoc 2021;28:2551-2561. [PMID: 34613417 PMCID: PMC8633624 DOI: 10.1093/jamia/ocab158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/13/2021] [Accepted: 07/23/2021] [Indexed: 12/30/2022] Open

Magge A, Tutubalina E, Miftahutdinov Z, Alimova I, Dirkson A, Verberne S, Weissenbacher D, Gonzalez-Hernandez G. DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. J Am Med Inform Assoc 2021;28:2184-2192. [PMID: 34270701 PMCID: PMC8449608 DOI: 10.1093/jamia/ocab114] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/20/2021] [Accepted: 06/08/2021] [Indexed: 11/17/2022] Open

Magge A, Weissenbacher D, O'Connor K, Tahsin T, Gonzalez-Hernandez G, Scotch M. GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography. Bioinformatics 2021;36:5120-5121. [PMID: 32683454 PMCID: PMC7755405 DOI: 10.1093/bioinformatics/btaa647] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 07/03/2020] [Accepted: 07/13/2020] [Indexed: 12/27/2022] Open

Weissenbacher D, Sarker A, Klein A, O'Connor K, Magge A, Gonzalez-Hernandez G. Deep neural networks ensemble for detecting medication mentions in tweets. J Am Med Inform Assoc 2021;26:1618-1626. [PMID: 31562510 PMCID: PMC6857507 DOI: 10.1093/jamia/ocz156] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 07/26/2019] [Accepted: 08/13/2019] [Indexed: 11/12/2022] Open

Klein AZ, Magge A, O'Connor K, Flores Amaro JI, Weissenbacher D, Gonzalez Hernandez G. Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set. J Med Internet Res 2021;23:e25314. [PMID: 33449904 PMCID: PMC7834613 DOI: 10.2196/25314] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/14/2020] [Accepted: 12/14/2020] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

In the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone.

OBJECTIVE

The objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in the United States that are not based on testing and, thus, may not have been reported to the Centers for Disease Control and Prevention.

METHODS

Beginning January 23, 2020, we collected English tweets from the Twitter Streaming application programming interface that mention keywords related to COVID-19. We applied handwritten regular expressions to identify tweets indicating that the user potentially has been exposed to COVID-19. We automatically filtered out "reported speech" (eg, quotations, news headlines) from the tweets that matched the regular expressions, and two annotators annotated a random sample of 8976 tweets that are geo-tagged or have profile location metadata, distinguishing tweets that self-report potential cases of COVID-19 from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on bidirectional encoder representations from transformers (BERT). Finally, we deployed the automatic pipeline on more than 85 million unlabeled tweets that were continuously collected between March 1 and August 21, 2020.

RESULTS

Interannotator agreement, based on dual annotations for 3644 (41%) of the 8976 tweets, was 0.77 (Cohen κ). A deep neural network classifier, based on a BERT model that was pretrained on tweets related to COVID-19, achieved an F1-score of 0.76 (precision=0.76, recall=0.76) for detecting tweets that self-report potential cases of COVID-19. Upon deploying our automatic pipeline, we identified 13,714 tweets that self-report potential cases of COVID-19 and have US state-level geolocations.

CONCLUSIONS

We have made the 13,714 tweets identified in this study, along with each tweet's time stamp and US state-level geolocation, publicly available to download. This data set presents the opportunity for future work to assess the utility of Twitter data as a complementary resource for tracking the spread of COVID-19.

Collapse

Weissenbacher D, O'Connor K, Hiraki AT, Kim JD, Gonzalez-Hernandez G. An empirical evaluation of electronic annotation tools for Twitter data. Genomics Inform 2020;18:e24. [PMID: 32634878 PMCID: PMC7362942 DOI: 10.5808/gi.2020.18.2.e24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Accepted: 06/16/2020] [Indexed: 11/30/2022] Open

Golder S, Klein AZ, Magge A, O'Connor K, Cai H, Weissenbacher D, Gonzalez-Hernandez G. Extending A Chronological and Geographical Analysis of Personal Reports of COVID-19 on Twitter to England, UK. medRxiv 2020:2020.05.05.20083436. [PMID: 32511492 PMCID: PMC7273260 DOI: 10.1101/2020.05.05.20083436] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Klein AZ, Magge A, O’connor K, Cai H, Weissenbacher D, Gonzalez-hernandez G. A Chronological and Geographical Analysis of Personal Reports of COVID-19 on Twitter.. [PMID: 32511608 PMCID: PMC7276035 DOI: 10.1101/2020.04.19.20069948] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G. A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes. J Biomed Inform 2020;112S:100076. [PMID: 34417007 DOI: 10.1016/j.yjbinx.2020.100076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/30/2020] [Accepted: 07/27/2020] [Indexed: 10/23/2022]

Abstract

BACKGROUND

In the United States, 17% of pregnancies end in fetal loss: miscarriage or stillbirth. Preterm birth affects 10% of live births in the United States and is the leading cause of neonatal death globally. Preterm births with low birthweight are the second leading cause of infant mortality in the United States. Despite their prevalence, the causes of miscarriage, stillbirth, and preterm birth are largely unknown.

OBJECTIVE

The primary objectives of this study are to (1) assess whether women report miscarriage, stillbirth, and preterm birth, among others, on Twitter, and (2) develop natural language processing (NLP) methods to automatically identify users from which to select cases for large-scale observational studies.

METHODS

We handcrafted regular expressions to retrieve tweets that mention an adverse pregnancy outcome, from a database containing more than 400 million publicly available tweets posted by more than 100,000 users who have announced their pregnancy on Twitter. Two annotators independently annotated 8109 (one random tweet per user) of the 22,912 retrieved tweets, distinguishing those reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). We used the annotated tweets to train and evaluate feature-engineered and deep learning-based classifiers. We further annotated 7512 (of the 8109) tweets to develop a generalizable, rule-based module designed to filter out reported speech-that is, posts containing what was said by others-prior to automatic classification. We performed an extrinsic evaluation assessing whether the reported speech filter could improve the detection of women reporting adverse pregnancy outcomes on Twitter.

RESULTS

The tweets annotated as "outcome" include 1632 women reporting miscarriage, 119 stillbirth, 749 preterm birth or premature labor, 217 low birthweight, 558 NICU admission, and 458 fetal/infant loss in general. A deep neural network, BERT-based classifier achieved the highest overall F₁-score (0.88) for automatically detecting "outcome" tweets (precision = 0.87, recall = 0.89), with an F₁-score of at least 0.82 and a precision of at least 0.84 for each of the adverse pregnancy outcomes. Our reported speech filter significantly (P < 0.05) improved the accuracy of Logistic Regression (from 78.0% to 80.8%) and majority voting-based ensemble (from 81.1% to 82.9%) classifiers. Although the filter did not improve the F₁-score of the BERT-based classifier, it did improve precision-a trade-off of recall that may be acceptable for automated case selection of more prevalent outcomes. Without the filter, reported speech is one of the main sources of errors for the BERT-based classifier.

CONCLUSION

This study demonstrates that (1) women do report their adverse pregnancy outcomes on Twitter, (2) our NLP pipeline can automatically identify users from which to select cases for large-scale observational studies, and (3) our reported speech filter would reduce the cost of annotating health-related social media data and can significantly improve the overall performance of feature-based classifiers.

Collapse

Klein AZ, Sarker A, Weissenbacher D, Gonzalez-Hernandez G. Towards scaling Twitter for digital epidemiology of birth defects. NPJ Digit Med 2019;2:96. [PMID: 31583284 PMCID: PMC6773753 DOI: 10.1038/s41746-019-0170-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 08/12/2019] [Indexed: 11/13/2022] Open

Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics 2019;34:i565-i573. [PMID: 29950020 PMCID: PMC6022665 DOI: 10.1093/bioinformatics/bty273] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Tahsin T, Weissenbacher D, O'Connor K, Magge A, Scotch M, Gonzalez-Hernandez G. GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records. Bioinformatics 2019;34:1606-1608. [PMID: 29240889 DOI: 10.1093/bioinformatics/btx799] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 12/11/2017] [Indexed: 11/13/2022] Open

Scotch M, Tahsin T, Weissenbacher D, O'Connor K, Magge A, Vaiente M, Suchard MA, Gonzalez-Hernandez G. Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography. Virus Evol 2019;5:vey043. [PMID: 30838129 PMCID: PMC6395475 DOI: 10.1093/ve/vey043] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography.

Collapse

Affiliation(s)

Matthew Scotch College of Health Solutions, Arizona State University, 550 N. 3rd St., Phoenix, AZ, USA.,Biodesign Center for Environmental Health Engineering, Arizona State University, 727 E. Tyler St, Tempe, AZ, USA
Tasnia Tahsin College of Health Solutions, Arizona State University, 550 N. 3rd St., Phoenix, AZ, USA
Davy Weissenbacher Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 423 Guardian Drive, Philadelphia, PA, USA
Karen O'Connor Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 423 Guardian Drive, Philadelphia, PA, USA
Arjun Magge College of Health Solutions, Arizona State University, 550 N. 3rd St., Phoenix, AZ, USA.,Biodesign Center for Environmental Health Engineering, Arizona State University, 727 E. Tyler St, Tempe, AZ, USA
Matteo Vaiente College of Health Solutions, Arizona State University, 550 N. 3rd St., Phoenix, AZ, USA.,Biodesign Center for Environmental Health Engineering, Arizona State University, 727 E. Tyler St, Tempe, AZ, USA
Marc A Suchard Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, 621 Charles E. Young Dr. South, Los Angeles, CA, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Dr. South, Los Angeles, CA, USA.,Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, 650 Charles E Young Dr. South, Los Angeles, CA, USA
Graciela Gonzalez-Hernandez Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 423 Guardian Drive, Philadelphia, PA, USA

Collapse

Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature. Pac Symp Biocomput 2019;24:100-111. [PMID: 30864314 PMCID: PMC6417823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Klein AZ, Sarker A, Cai H, Weissenbacher D, Gonzalez-Hernandez G. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter. J Biomed Inform 2018;87:68-78. [PMID: 30292855 PMCID: PMC6295660 DOI: 10.1016/j.jbi.2018.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 09/26/2018] [Accepted: 10/03/2018] [Indexed: 10/28/2022]

Abstract

BACKGROUND

Although birth defects are the leading cause of infant mortality in the United States, methods for observing human pregnancies with birth defect outcomes are limited.

OBJECTIVE

The primary objectives of this study were (i) to assess whether rare health-related events-in this case, birth defects-are reported on social media, (ii) to design and deploy a natural language processing (NLP) approach for collecting such sparse data from social media, and (iii) to utilize the collected data to discover a cohort of women whose pregnancies with birth defect outcomes could be observed on social media for epidemiological analysis.

METHODS

To assess whether birth defects are mentioned on social media, we mined 432 million tweets posted by 112,647 users who were automatically identified via their public announcements of pregnancy on Twitter. To retrieve tweets that mention birth defects, we developed a rule-based, bootstrapping approach, which relies on a lexicon, lexical variants generated from the lexicon entries, regular expressions, post-processing, and manual analysis guided by distributional properties. To identify users whose pregnancies with birth defect outcomes could be observed for epidemiological analysis, inclusion criteria were (i) tweets indicating that the user's child has a birth defect, and (ii) accessibility to the user's tweets during pregnancy. We conducted a semi-automatic evaluation to estimate the recall of the tweet-collection approach, and performed a preliminary assessment of the prevalence of selected birth defects among the pregnancy cohort derived from Twitter.

RESULTS

We manually annotated 16,822 retrieved tweets, distinguishing tweets indicating that the user's child has a birth defect (true positives) from tweets that merely mention birth defects (false positives). Inter-annotator agreement was substantial: κ = 0.79 (Cohen's kappa). Analyzing the timelines of the 646 users whose tweets were true positives resulted in the discovery of 195 users that met the inclusion criteria. Congenital heart defects are the most common type of birth defect reported on Twitter, consistent with findings in the general population. Based on an evaluation of 4169 tweets retrieved using alternative text mining methods, the recall of the tweet-collection approach was 0.95.

CONCLUSIONS

Our contributions include (i) evidence that rare health-related events are indeed reported on Twitter, (ii) a generalizable, systematic NLP approach for collecting sparse tweets, (iii) a semi-automatic method to identify undetected tweets (false negatives), and (iv) a collection of publicly available tweets by pregnant users with birth defect outcomes, which could be used for future epidemiological analysis. In future work, the annotated tweets could be used to train machine learning algorithms to automatically identify users reporting birth defect outcomes, enabling the large-scale use of social media mining as a complementary method for such epidemiological research.

Collapse

Weissenbacher D, Sarker A, Tahsin T, Scotch M, Gonzalez G. Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods. AMIA Jt Summits Transl Sci Proc 2017;2017:114-122. [PMID: 28815119 PMCID: PMC5543364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Tahsin T, Weissenbacher D, Jones-Shargani D, Magee D, Vaiente M, Gonzalez G, Scotch M. Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research. Database (Oxford) 2017;2017:4781736. [PMID: 30412219 PMCID: PMC6225896 DOI: 10.1093/database/bax093] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Revised: 11/20/2017] [Accepted: 11/21/2017] [Indexed: 02/06/2023]

Tahsin T, Weissenbacher D, Rivera R, Beard R, Firago M, Wallstrom G, Scotch M, Gonzalez G. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records. J Am Med Inform Assoc 2016;23:934-41. [PMID: 26911818 PMCID: PMC4997033 DOI: 10.1093/jamia/ocv172] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Revised: 10/22/2015] [Accepted: 10/22/2015] [Indexed: 01/09/2023] Open

Weissenbacher D, Tahsin T, Beard R, Figaro M, Rivera R, Scotch M, Gonzalez G. Knowledge-driven geospatial location resolution for phylogeographic models of virus migration. Bioinformatics 2015;31:i348-56. [PMID: 26072502 PMCID: PMC4542781 DOI: 10.1093/bioinformatics/btv259] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Affiliation(s)

Davy Weissenbacher Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Tasnia Tahsin Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Rachel Beard Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Mari Figaro Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Robert Rivera Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Matthew Scotch Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
Graciela Gonzalez Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA

Collapse

Ananiadou S, Thompson P, Thomas J, Mu T, Oliver S, Rickinson M, Sasaki Y, Weissenbacher D, McNaught J. Supporting the education evidence portal via text mining. Philos Trans A Math Phys Eng Sci 2010;368:3829-3844. [PMID: 20643679 PMCID: PMC2981997 DOI: 10.1098/rsta.2010.0152] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]