1
|
Ning X, Fan Z, Burgun E, Ren Z, Schleyer T. Improving information retrieval from electronic health records using dynamic and multi-collaborative filtering. PLoS One 2021; 16:e0255467. [PMID: 34351962 PMCID: PMC8341500 DOI: 10.1371/journal.pone.0255467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 07/16/2021] [Indexed: 12/02/2022] Open
Abstract
Due to the rapid growth of information available about individual patients, most physicians suffer from information overload and inefficiencies when they review patient information in health information technology systems. In this paper, we present a novel hybrid dynamic and multi-collaborative filtering method to improve information retrieval from electronic health records. This method recommends relevant information from electronic health records to physicians during patient visits. It models information search dynamics using a Markov model. It also leverages the key idea of collaborative filtering, originating from Recommender Systems, for prioritizing information based on various similarities among physicians, patients and information items. We tested this new method using electronic health record data from the Indiana Network for Patient Care, a large, inter-organizational clinical data repository maintained by the Indiana Health Information Exchange. Our experimental results demonstrated that, for top-5 recommendations, our method was able to correctly predict the information in which physicians were interested in 46.7% of all test cases. For top-1 recommendations, the corresponding figure was 24.7%. In addition, the new method was 22.3% better than the conventional Markov model for top-1 recommendations.
Collapse
Affiliation(s)
- Xia Ning
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States of America
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States of America
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, United States of America
| | - Ziwei Fan
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States of America
| | - Evan Burgun
- Defense Finance and Accounting Service, Indianapolis, IN, United States of America
| | - Zhiyun Ren
- Hyperscience, New York, NY, United States of America
| | - Titus Schleyer
- Regenstrief Institute, Indianapolis, IN, United States of America
- Indiana University School of Medicine, Indianapolis, IN, United States of America
| |
Collapse
|
2
|
Delos Santos NP, Texari L, Benner C. MEIRLOP: improving score-based motif enrichment by incorporating sequence bias covariates. BMC Bioinformatics 2020; 21:410. [PMID: 32938397 PMCID: PMC7493370 DOI: 10.1186/s12859-020-03739-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 09/04/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. RESULTS We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. CONCLUSIONS Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.
Collapse
Affiliation(s)
- Nathaniel P Delos Santos
- Department of Biomedical Informatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Lorane Texari
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Christopher Benner
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA.
| |
Collapse
|
3
|
Abstract
Virtual audits using Google Street View are an increasingly popular method of assessing neighborhood environments for health and urban planning research. However, the validity of these studies may be threatened by issues of image availability, image age, and variance of image age, particularly in the Global South. This study identifies patterns of Street View image availability, image age, and image age variance across cities in Latin America and assesses relationships between these measures and measures of resident socioeconomic conditions. Image availability was assessed at 530,308 near-road points within the boundaries of 371 Latin American cities described by the SALURBAL (Salud Urbana en America Latina) project. At the subcity level, mixed-effect linear and logistic models were used to assess relationships between measures of socioeconomic conditions and image availability, average image age, and the standard deviation of image age. Street View imagery was available at 239,394 points (45.1%) of the total sampled, and rates of image availability varied widely between cities and countries. Subcity units with higher scores on measures of socioeconomic conditions had higher rates of image availability (OR = 1.11 per point increase of combined index, p < 0.001) and the imagery was newer on average (- 1.15 months per point increase of combined index, p < 0.001), but image capture date within these areas varied more (0.59-month increase in standard deviation of image age per point increase of combined index, p < 0.001). All three assessed threats to the validity of Street View virtual audit studies spatially covary with measures of socioeconomic conditions in Latin American cities. Researchers should be attentive to these issues when using Street View imagery.
Collapse
Affiliation(s)
- Dustin Fry
- Department of Epidemiology and Biostatistics, Drexel University Dornsife School of Public Health, 3600 Market Street 7th Floor, Philadelphia, PA 19104 USA
| | - Stephen J. Mooney
- Department of Epidemiology, University of Washington School of Public Health, 1959 NE Pacific Street, Seattle, WA 98195 USA
| | - Daniel A. Rodríguez
- Department of City & Regional Planning, University of California–Berkeley College of Environmental Design, 230 Wurster Hall, Berkeley, CA 94720 USA
| | - Waleska T. Caiaffa
- Department of Preventive and Social Medicine, Federal University of Minas Gerais Observatory for Urban Health in Belo Horizonte, Av. Alfredo Balena, 190, Belo Horizonte, CEP: 30130-100 Brazil
| | - Gina S. Lovasi
- Department of Epidemiology and Biostatistics, Drexel University Dornsife School of Public Health, 3600 Market Street 7th Floor, Philadelphia, PA 19104 USA
| |
Collapse
|
4
|
Abstract
For decades, health literacy has been used to describe the ability of individuals to locate, interpret, and apply health information to their decisions. The US Department of Health and Human Services has now proposed redefining the term to emphasize the role of society in providing accessible, comprehensible information. This redefinition would reflect a welcome shift to encompass the roles of those who communicate information, not simply those who seek it. However, redefining an accepted term would have serious negative effects on the indexing of the research literature and create difficulties interpreting studies conducted under the previous definition. Therefore, we strongly caution against redefining the accepted term. Instead, we propose introducing a new term-health information fluency-defined as universal effective use of health information. The old term can continue to be used to describe the set of concerns about individual skills, but by promoting the new term, the Department of Health and Human Services can encourage research into creating accurate, accessible health information that people can easily find, understand, and use to inform their decisions.
Collapse
Affiliation(s)
- Jessica S Ancker
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA.
| | - Lisa V Grossman
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Natalie C Benda
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
5
|
Patel V, Spouge JL. Estimating the basic reproduction number of a pathogen in a single host when only a single founder successfully infects. PLoS One 2020; 15:e0227127. [PMID: 31923263 PMCID: PMC6953795 DOI: 10.1371/journal.pone.0227127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 12/12/2019] [Indexed: 11/27/2022] Open
Abstract
If viruses or other pathogens infect a single host, the outcome of infection may depend on the initial basic reproduction number R0, the expected number of host cells infected by a single infected cell. This article shows that sometimes, phylogenetic models can estimate the initial R0, using only sequences sampled from the pathogenic population during its exponential growth or shortly thereafter. When evaluated by simulations mimicking the bursting viral reproduction of HIV and simultaneous sampling of HIV gp120 sequences during early viremia, the estimated R0 displayed useful accuracies in achievable experimental designs. Estimates of R0 have several potential applications to investigators interested in the progress of infection in single hosts, including: (1) timing a pathogen’s movement through different microenvironments; (2) timing the change points in a pathogen’s mode of spread (e.g., timing the change from cell-free spread to cell-to-cell spread, or vice versa, in an HIV infection); (3) quantifying the impact different initial microenvironments have on pathogens (e.g., in mucosal challenge with HIV, quantifying the impact that the presence or absence of mucosal infection has on R0); (4) quantifying subtle changes in infectability in therapeutic trials (either human or animal), even when therapies do not produce total sterilizing immunity; and (5) providing a variable predictive of the clinical efficacy of prophylactic therapies.
Collapse
Affiliation(s)
- Vruj Patel
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - John L. Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
6
|
Abstract
BACKGROUND In this paper, we discuss the design and development of a formal ontology to describe misinformation about vaccines. Vaccine misinformation is one of the drivers leading to vaccine hesitancy in patients. While there are various levels of vaccine hesitancy to combat and specific interventions to address those levels, it is important to have tools that help researchers understand this problem. With an ontology, not only can we collect and analyze varied misunderstandings about vaccines, but we can also develop tools that can provide informatics solutions. RESULTS We developed the Vaccine Misinformation Ontology (VAXMO) that extends the Misinformation Ontology and links to the nanopublication Resource Description Framework (RDF) model for false assertions of vaccines. Preliminary assessment using semiotic evaluation metrics indicated adequate quality for our ontology. We outlined and demonstrated proposed uses of the ontology to detect and understand anti-vaccine information. CONCLUSION We surmised that VAXMO and its proposed use cases can support tools and technology that can pave the way for vaccine misinformation detection and analysis. Using an ontology, we can formally structure knowledge for machines and software to better understand the vaccine misinformation domain.
Collapse
Affiliation(s)
- Muhammad Amith
- School of Biomedical Informatics, The University of Texas Health Science Center, 7000 Fannin Street, Suite 600, Houston, TX, USA
| | - Cui Tao
- School of Biomedical Informatics, The University of Texas Health Science Center, 7000 Fannin Street, Suite 600, Houston, TX, USA.
| |
Collapse
|
7
|
Padula AM, Huang H, Baer RJ, August LM, Jankowska MM, Jellife-Pawlowski LL, Sirota M, Woodruff TJ. Environmental pollution and social factors as contributors to preterm birth in Fresno County. Environ Health 2018; 17:70. [PMID: 30157858 PMCID: PMC6114053 DOI: 10.1186/s12940-018-0414-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 08/21/2018] [Indexed: 05/21/2023]
Abstract
BACKGROUND Environmental pollution exposure during pregnancy has been identified as a risk factor for preterm birth. Most studies have evaluated exposures individually and in limited study populations. METHODS We examined the associations between several environmental exposures, both individually and cumulatively, and risk of preterm birth in Fresno County, California. We also evaluated early (< 34 weeks) and spontaneous preterm birth. We used the Communities Environmental Health Screening Tool and linked hospital discharge records by census tract from 2009 to 2012. The environmental factors included air pollution, drinking water contaminants, pesticides, hazardous waste, traffic exposure and others. Social factors, including area-level socioeconomic status (SES) and race/ethnicity were also evaluated as potential modifiers of the relationship between pollution and preterm birth. RESULTS In our study of 53,843 births, risk of preterm birth was associated with higher exposure to cumulative pollution scores and drinking water contaminants. Risk of preterm birth was twice as likely for those exposed to high versus low levels of pollution. An exposure-response relationship was observed across the quintiles of the pollution burden score. The associations were stronger among early preterm births in areas of low SES. CONCLUSIONS In Fresno County, we found multiple pollution exposures associated with increased risk for preterm birth, with higher associations among the most disadvantaged. This supports other evidence finding environmental exposures are important risk factors for preterm birth, and furthermore the burden is higher in areas of low SES. This data supports efforts to reduce the environmental burden on pregnant women.
Collapse
Affiliation(s)
- Amy M. Padula
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, 550 16th Street, Mail Stop 0132, San Francisco, CA 94143 USA
| | - Hongtai Huang
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, 550 16th Street, Mail Stop 0132, San Francisco, CA 94143 USA
- Department of Pediatrics, University of California, San Francisco, USA
| | - Rebecca J. Baer
- Department of Pediatrics, University of California, San Diego, USA
| | - Laura M. August
- Office of Environmental Health Hazard Assessment, California Environmental Protection Agency, Sacramento, USA
| | | | | | - Marina Sirota
- Department of Pediatrics, University of California, San Francisco, USA
| | - Tracey J. Woodruff
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, 550 16th Street, Mail Stop 0132, San Francisco, CA 94143 USA
| |
Collapse
|
8
|
Kilicoglu H, Ben Abacha A, Mrabet Y, Shooshan SE, Rodriguez L, Masterton K, Demner-Fushman D. Semantic annotation of consumer health questions. BMC Bioinformatics 2018; 19:34. [PMID: 29409442 PMCID: PMC5802048 DOI: 10.1186/s12859-018-2045-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 01/24/2018] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. RESULTS The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. CONCLUSIONS To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Asma Ben Abacha
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Yassine Mrabet
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Sonya E. Shooshan
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Laritza Rodriguez
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Kate Masterton
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Dina Demner-Fushman
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| |
Collapse
|
9
|
Romagnoli KM, Nelson SD, Hines L, Empey P, Boyce RD, Hochheiser H. Information needs for making clinical recommendations about potential drug-drug interactions: a synthesis of literature review and interviews. BMC Med Inform Decis Mak 2017; 17:21. [PMID: 28228132 PMCID: PMC5322613 DOI: 10.1186/s12911-017-0419-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 02/14/2017] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Drug information compendia and drug-drug interaction information databases are critical resources for clinicians and pharmacists working to avoid adverse events due to exposure to potential drug-drug interactions (PDDIs). Our goal is to develop information models, annotated data, and search tools that will facilitate the interpretation of PDDI information. To better understand the information needs and work practices of specialists who search and synthesize PDDI evidence for drug information resources, we conducted an inquiry that combined a thematic analysis of published literature with unstructured interviews. METHODS Starting from an initial set of relevant articles, we developed search terms and conducted a literature search. Two reviewers conducted a thematic analysis of included articles. Unstructured interviews with drug information experts were conducted and similarly coded. Information needs, work processes, and indicators of potential strengths and weaknesses of information systems were identified. RESULTS Review of 92 papers and 10 interviews identified 56 categories of information needs related to the interpretation of PDDI information including drug and interaction information; study design; evidence including clinical details, quality and content of reports, and consequences; and potential recommendations. We also identified strengths/weaknesses of PDDI information systems. CONCLUSIONS We identified the kinds of information that might be most effective for summarizing PDDIs. The drug information experts we interviewed had differing goals, suggesting a need for detailed information models and flexible presentations. Several information needs not discussed in previous work were identified, including temporal overlaps in drug administration, biological plausibility of interactions, and assessment of the quality and content of reports. Richly structured depictions of PDDI information may help drug information experts more effectively interpret data and develop recommendations. Effective information models and system designs will be needed to maximize the utility of this information.
Collapse
Affiliation(s)
- Katrina M. Romagnoli
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
| | - Scott D. Nelson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Lisa Hines
- Pharmacy Quality Alliance, Springfield, VA USA
| | - Philip Empey
- School of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, PA USA
| | - Richard D. Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
| |
Collapse
|
10
|
Carson MB, Scholtens DM, Frailey CN, Gravenor SJ, Kricke GE, Soulakis ND. An Outcome-Weighted Network Model for Characterizing Collaboration. PLoS One 2016; 11:e0163861. [PMID: 27706199 PMCID: PMC5051930 DOI: 10.1371/journal.pone.0163861] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 09/15/2016] [Indexed: 11/18/2022] Open
Abstract
Shared patient encounters form the basis of collaborative relationships, which are crucial to the success of complex and interdisciplinary teamwork in healthcare. Quantifying the strength of these relationships using shared risk-adjusted patient outcomes provides insight into interactions that occur between healthcare providers. We developed the Shared Positive Outcome Ratio (SPOR), a novel parameter that quantifies the concentration of positive outcomes between a pair of healthcare providers over a set of shared patient encounters. We constructed a collaboration network using hospital emergency department patient data from electronic health records (EHRs) over a three-year period. Based on an outcome indicating patient satisfaction, we used this network to assess pairwise collaboration and evaluate the SPOR. By comparing this network of 574 providers and 5,615 relationships to a set of networks based on randomized outcomes, we identified 295 (5.2%) pairwise collaborations having significantly higher patient satisfaction rates. Our results show extreme high- and low-scoring relationships over a set of shared patient encounters and quantify high variability in collaboration between providers. We identified 29 top performers in terms of patient satisfaction. Providers in the high-scoring group had both a greater average number of associated encounters and a higher percentage of total encounters with positive outcomes than those in the low-scoring group, implying that more experienced individuals may be able to collaborate more successfully. Our study shows that a healthcare collaboration network can be structurally evaluated to characterize the collaborative interactions that occur between healthcare providers in a hospital setting.
Collapse
Affiliation(s)
- Matthew B. Carson
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
- * E-mail:
| | - Denise M. Scholtens
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
| | - Conor N. Frailey
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
| | - Stephanie J. Gravenor
- Department of Emergency Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
| | - Gayle E. Kricke
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
| | - Nicholas D. Soulakis
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States of America
| |
Collapse
|
11
|
Mikal J, Hurst S, Conway M. Ethical issues in using Twitter for population-level depression monitoring: a qualitative study. BMC Med Ethics 2016; 17:22. [PMID: 27080238 PMCID: PMC4832544 DOI: 10.1186/s12910-016-0105-5] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 04/06/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recently, significant research effort has focused on using Twitter (and other social media) to investigate mental health at the population-level. While there has been influential work in developing ethical guidelines for Internet discussion forum-based research in public health, there is currently limited work focused on addressing ethical problems in Twitter-based public health research, and less still that considers these issues from users' own perspectives. In this work, we aim to investigate public attitudes towards utilizing public domain Twitter data for population-level mental health monitoring using a qualitative methodology. METHODS The study explores user perspectives in a series of five, 2-h focus group interviews. Following a semi-structured protocol, 26 Twitter users with and without a diagnosed history of depression discussed general Twitter use, along with privacy expectations, and ethical issues in using social media for health monitoring, with a particular focus on mental health monitoring. Transcripts were then transcribed, redacted, and coded using a constant comparative approach. RESULTS While participants expressed a wide range of opinions, there was an overall trend towards a relatively positive view of using public domain Twitter data as a resource for population level mental health monitoring, provided that results are appropriately aggregated. Results are divided into five sections: (1) a profile of respondents' Twitter use patterns and use variability; (2) users' privacy expectations, including expectations regarding data reach and permanence; (3) attitudes towards social media based population-level health monitoring in general, and attitudes towards mental health monitoring in particular; (4) attitudes towards individual versus population-level health monitoring; and (5) users' own recommendations for the appropriate regulation of population-level mental health monitoring. CONCLUSIONS Focus group data reveal a wide range of attitudes towards the use of public-domain social media "big data" in population health research, from enthusiasm, through acceptance, to opposition. Study results highlight new perspectives in the discussion of ethical use of public data, particularly with respect to consent, privacy, and oversight.
Collapse
Affiliation(s)
- Jude Mikal
- />Minnesota Population Center, University of Minnesota, Twin Cities, 50 Willey Hall, 225 – 19th Avenue South, Minneapolis, MN 55455 USA
| | - Samantha Hurst
- />Department of Family Medicine & Public Health, University of California, San Diego, MTF 162E, 9500 Gilman Drive, La Jolla, CA USA
| | - Mike Conway
- />Department of Biomedical Informatics, University of Utah, Rm 2008, 421 Wakara Way, #140, Salt Lake City, UT USA
| |
Collapse
|
12
|
Pineda AL, Ogoe HA, Balasubramanian JB, Rangel Escareño C, Visweswaran S, Herman JG, Gopalakrishnan V. On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue. BMC Cancer 2016; 16:184. [PMID: 26944944 PMCID: PMC4778315 DOI: 10.1186/s12885-016-2223-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 02/28/2016] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue. METHODS Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis. RESULTS All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method. CONCLUSIONS The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.
Collapse
Affiliation(s)
- Arturo López Pineda
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Henry Ato Ogoe
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Jeya Balaji Balasubramanian
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Claudia Rangel Escareño
- Department of Computational Genomics, National Institute of Genomic Medicine, Periferico Sur No. 4809, Col. Arenal Tepepan, Tlalpan, 14610, Mexico City, Mexico.
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - James Gordon Herman
- Division of Hematology/Oncology, Department of Medicine, University of Pittsburgh School of Medicine, UPMC Cancer Pavilion, 5150 Centre Avenue, 15232, Pittsburgh, PA, USA.
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| |
Collapse
|