1
|
Schairer CE, Mehta SR, Vinterbo SA, Hoenigl M, Kalichman M, Little SJ. Trust and Expectations of Researchers and Public Health Departments for the Use of HIV Molecular Epidemiology. AJOB Empir Bioeth 2019; 10:201-213. [PMID: 31050604 DOI: 10.1080/23294515.2019.1601648] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Background: Molecular epidemiology (ME) is a technique used to study the dynamics of pathogen transmission through a population. When used to study HIV infections, ME generates powerful information about how HIV is transmitted, including epidemiologic patterns of linkage and, potentially, transmission direction. Thus, ME raises challenging questions about the most responsible way to protect individual privacy while acquiring and using these data to advance public health and inform HIV intervention strategies. Here, we report on stakeholders' expectations for how researchers and public health agencies might use HIV ME. Methods: We conducted in-depth semistructured interviews with 40 key stakeholders to find out how these individuals respond to the proposed risks and benefits of HIV ME. Transcripts were coded and analyzed using Atlas.ti. Expectations were assessed through analysis of responses to hypothetical scenarios designed to help interviewees think through the implications of this emerging technique in the contexts of research and public health. Results: Our analysis reveals a wide range of imagined responsibilities, capabilities, and trustworthiness of researchers and public health agencies. Specifically, many respondents expect researchers and public health agencies to use HIV ME carefully and maintain transparency about how data will be used. Informed consent was discussed as an important opportunity for notification of privacy risks. Furthermore, some respondents wished that public health agencies were held to the same form of oversight and accountability represented by informed consent in research. Conclusions: To prevent HIV ME from becoming a barrier to testing or a source of public mistrust, the sense of vulnerability expressed by some respondents must be addressed. In research, informed consent is an obvious opportunity for this. Without giving specimen donors a similar opportunity to opt out, public health agencies may find it difficult to adopt HIV ME without deterring testing and treatment.
Collapse
Affiliation(s)
- Cynthia E Schairer
- a Department of Psychiatry, University of California San Diego , La Jolla , California , USA
| | - Sanjay R Mehta
- b Department of Medicine, University of California San Diego , La Jolla , California , USA.,c Department of Medicine, San Diego Veterans Affairs Medical Center , San Diego , California , USA.,d Department of Pathology, University of California San Diego , La Jolla , California , USA
| | - Staal A Vinterbo
- e Department of Information Security and Communication Technology, Norwegian University of Science and Technology , Gjøvik , Norway
| | - Martin Hoenigl
- b Department of Medicine, University of California San Diego , La Jolla , California , USA
| | - Michael Kalichman
- d Department of Pathology, University of California San Diego , La Jolla , California , USA.,f Research Ethics Program, University of California San Diego , La Jolla , California , USA
| | - Susan J Little
- b Department of Medicine, University of California San Diego , La Jolla , California , USA
| |
Collapse
|
2
|
Müller L, Gangadharaiah R, Klein SC, Perry J, Bernstein G, Nurkse D, Wailes D, Graham R, El-Kareh R, Mehta S, Vinterbo SA, Aronoff-Spencer E. An open access medical knowledge base for community driven diagnostic decision support system development. BMC Med Inform Decis Mak 2019; 19:93. [PMID: 31029130 PMCID: PMC6486985 DOI: 10.1186/s12911-019-0804-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 03/21/2019] [Indexed: 12/21/2022] Open
Abstract
INTRODUCTION While early diagnostic decision support systems were built around knowledge bases, more recent systems employ machine learning to consume large amounts of health data. We argue curated knowledge bases will remain an important component of future diagnostic decision support systems by providing ground truth and facilitating explainable human-computer interaction, but that prototype development is hampered by the lack of freely available computable knowledge bases. METHODS We constructed an open access knowledge base and evaluated its potential in the context of a prototype decision support system. We developed a modified set-covering algorithm to benchmark the performance of our knowledge base compared to existing platforms. Testing was based on case reports from selected literature and medical student preparatory material. RESULTS The knowledge base contains over 2000 ICD-10 coded diseases and 450 RX-Norm coded medications, with over 8000 unique observations encoded as SNOMED or LOINC semantic terms. Using 117 medical cases, we found the accuracy of the knowledge base and test algorithm to be comparable to established diagnostic tools such as Isabel and DXplain. Our prototype, as well as DXplain, showed the correct answer as "best suggestion" in 33% of the cases. While we identified shortcomings during development and evaluation, we found the knowledge base to be a promising platform for decision support systems. CONCLUSION We built and successfully evaluated an open access knowledge base to facilitate the development of new medical diagnostic assistants. This knowledge base can be expanded and curated by users and serve as a starting point to facilitate new technology development and system improvement in many contexts.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Robert El-Kareh
- Division of Biomedical Informatics, UCSD, San Diego, CA USA
- Division of Hospital Medicine, UCSD, San Diego, CA USA
| | - Sanjay Mehta
- Division of Infectious Diseases, UCSD, San Diego, CA USA
| | - Staal A. Vinterbo
- Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Eliah Aronoff-Spencer
- Design Lab, UCSD, San Diego, CA USA
- Division of Infectious Diseases, UCSD, San Diego, CA USA
| |
Collapse
|
3
|
Schairer C, Mehta SR, Vinterbo SA, Hoenigl M, Kalichman M, Little S. Perceptions of molecular epidemiology studies of HIV among stakeholders. J Public Health Res 2017; 6:992. [PMID: 29291190 PMCID: PMC5736996 DOI: 10.4081/jphr.2017.992] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Revised: 08/21/2017] [Accepted: 09/04/2017] [Indexed: 12/02/2022] Open
Abstract
Background: Advances in viral sequence analysis make it possible to track the spread of infectious pathogens, such as HIV, within a population. When used to study HIV, these analyses (i.e., molecular epidemiology) potentially allow inference of the identity of individual research subjects. Current privacy standards are likely insufficient for this type of public health research. To address this challenge, it will be important to understand how stakeholders feel about the benefits and risks of such research. Design and Methods: To better understand perceived benefits and risks of these research methods, in-depth qualitative interviews were conducted with HIV-infected individuals, individuals at high-risk for contracting HIV, and professionals in HIV care and prevention. To gather additional perspectives, attendees to a public lecture on molecular epidemiology were asked to complete an informal questionnaire. Results: Among those interviewed and polled, there was near unanimous support for using molecular epidemiology to study HIV. Questionnaires showed strong agreement about benefits of molecular epidemiology, but diverse attitudes regarding risks. Interviewees acknowledged several risks, including privacy breaches and provocation of anti-gay sentiment. The interviews also demonstrated a possibility that misunderstandings about molecular epidemiology may affect how risks and benefits are evaluated. Conclusions: While nearly all study participants agree that the benefits of HIV molecular epidemiology outweigh the risks, concerns about privacy must be addressed to ensure continued trust in research institutions and willingness to participate in research. Significance for public health When molecular epidemiology is used to study HIV, it can demonstrate how HIV infections are related and how to target prevention efforts. Applying these analyses for maximal benefit in the fight against HIV would almost certainly make individuals whose data are analyzed vulnerable to discovery. However, absolute protection of this sensitive information would require that research into these methods not be done. The success of HIV molecular epidemiology will depend on finding a balance between public health and the interests of individuals living with HIV. The stakeholders interviewed in this study agreed that molecular epidemiology should be used to study HIV epidemics and transmission despite risks to privacy. However, these interviews also highlighted the difficulty of understanding molecular epidemiology and its privacy implications. For HIV molecular epidemiology to continue, privacy protections must go beyond simply masking traditional identifiers and assuming participants are informed enough to consent to the risks.
Collapse
Affiliation(s)
| | - Sanjay R Mehta
- Department of Medicine, University of California San Diego, CA.,Department of Medicine, San Diego Veterans Affairs Medical Center, San Diego, CA.,Department of Pathology, University of California San Diego, CA, USA
| | | | - Martin Hoenigl
- Department of Medicine, University of California San Diego, CA
| | - Michael Kalichman
- Department of Pathology, University of California San Diego, CA, USA
| | - Susan Little
- Department of Medicine, University of California San Diego, CA
| |
Collapse
|
4
|
Abstract
Rapid growth in the genetic sequencing of pathogens in recent years has led to the creation of large sequence databases. This aggregated sequence data can be very useful for tracking and predicting epidemics of infectious diseases. However, the balance between the potential public health benefit and the risk to personal privacy for individuals whose genetic data (personal or pathogen) are included in such work has been difficult to delineate, because neither the true benefit nor the actual risk to participants has been adequately defined. Existing approaches to minimise the risk of privacy loss to participants are based on de-identification of data by removal of a predefined set of identifiers. These approaches neither guarantee privacy nor protect the usefulness of the data. We propose a new approach to privacy protection that will quantify the risk to participants, while still maximising the usefulness of the data to researchers. This emerging standard in privacy protection and disclosure control, which is known as differential privacy, uses a process-driven rather than data-centred approach to protecting privacy.
Collapse
Affiliation(s)
- Sanjay R Mehta
- Division of Infectious Diseases, University of California, San Diego, CA, USA.
| | - Staal A Vinterbo
- Division of Biomedical Informatics, University of California, San Diego, CA, USA
| | - Susan J Little
- Division of Infectious Diseases, University of California, San Diego, CA, USA
| |
Collapse
|
5
|
Abstract
OBJECTIVE Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. METHODS A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. RESULTS Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. CONCLUSIONS Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.
Collapse
Affiliation(s)
- Staal A Vinterbo
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California 92093-0728, USA.
| | | | | |
Collapse
|
6
|
Ohno-Machado L, Bafna V, Boxwala AA, Chapman BE, Chapman WW, Chaudhuri K, Day ME, Farcas C, Heintzman ND, Jiang X, Kim H, Kim J, Matheny ME, Resnic FS, Vinterbo SA. iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Inform Assoc 2012; 19:196-201. [PMID: 22081224 PMCID: PMC3277627 DOI: 10.1136/amiajnl-2011-000538] [Citation(s) in RCA: 115] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 08/15/2011] [Indexed: 11/03/2022] Open
Abstract
iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.
Collapse
Affiliation(s)
- Lucila Ohno-Machado
- Division of Biomedical Informatics, University of California San Diego, La Jolla, California 92093, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Kim J, Grillo JM, Boxwala AA, Jiang X, Mandelbaum RB, Patel BA, Mikels D, Vinterbo SA, Ohno-Machado L. Anomaly and signature filtering improve classifier performance for detection of suspicious access to EHRs. AMIA Annu Symp Proc 2011; 2011:723-731. [PMID: 22195129 PMCID: PMC3243249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Our objective is to facilitate semi-automated detection of suspicious access to EHRs. Previously we have shown that a machine learning method can play a role in identifying potentially inappropriate access to EHRs. However, the problem of sampling informative instances to build a classifier still remained. We developed an integrated filtering method leveraging both anomaly detection based on symbolic clustering and signature detection, a rule-based technique. We applied the integrated filtering to 25.5 million access records in an intervention arm, and compared this with 8.6 million access records in a control arm where no filtering was applied. On the training set with cross-validation, the AUC was 0.960 in the control arm and 0.998 in the intervention arm. The difference in false negative rates on the independent test set was significant, P=1.6×10(-6). Our study suggests that utilization of integrated filtering strategies to facilitate the construction of classifiers can be helpful.
Collapse
Affiliation(s)
- Jihoon Kim
- Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Abstract
The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data, because an attacker can still use auxiliary information to infer sensitive individual information. Additional perturbation is necessary to prevent these inferences, and the challenge is to perturb the data in a way that preserves its analytic utility.No existing anonymization algorithm provides both perfect privacy protection and perfect analytic utility. We make the new observation that anonymization algorithms are not required to operate in the original vector-space basis of the data, and many algorithms can be improved by operating in a judiciously chosen alternate basis. A spectral basis derived from the data's eigenvectors is one that can provide substantial improvement. We introduce the term spectral anonymization to refer to an algorithm that uses a spectral basis for anonymization, and we give two illustrative examples.We also propose new measures of privacy protection that are more general and more informative than existing measures, and a principled reference standard with which to define adequate privacy protection.
Collapse
Affiliation(s)
- Thomas A Lasko
- Google, Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043.
| | | |
Collapse
|
9
|
Curtis DW, Pino EJ, Bailey JM, Shih EI, Waterman J, Vinterbo SA, Stair TO, Guttag JV, Greenes RA, Ohno-Machado L. SMART--an integrated wireless system for monitoring unattended patients. J Am Med Inform Assoc 2007; 15:44-53. [PMID: 17947629 DOI: 10.1197/jamia.m2016] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Monitoring vital signs and locations of certain classes of ambulatory patients can be useful in overcrowded emergency departments and at disaster scenes, both on-site and during transportation. To be useful, such monitoring needs to be portable and low cost, and have minimal adverse impact on emergency personnel, e.g., by not raising an excessive number of alarms. The SMART (Scalable Medical Alert Response Technology) system integrates wireless patient monitoring (ECG, SpO(2)), geo-positioning, signal processing, targeted alerting, and a wireless interface for caregivers. A prototype implementation of SMART was piloted in the waiting area of an emergency department and evaluated with 145 post-triage patients. System deployment aspects were also evaluated during a small-scale disaster-drill exercise.
Collapse
Affiliation(s)
- Dorothy W Curtis
- Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, 77 Massachusetts Ave., BLDG 32-G914, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
Background Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. Results It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n2 - n)/2) for n haplotypes but not approximable within (1 - ε) ln(n/2) for any ε > 0 unless NP ⊂ DTIME(nlog log n). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O((2m - p + 1)) ≤ O(m(n2 - n)/2) where p ≤ min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. Conclusion The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel.
Collapse
Affiliation(s)
- Staal A Vinterbo
- Decision Systems Group, Brigham and Women's Hospital, 75 Francis Street, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA, USA
- Harvard-MIT, Division of Health Sciences and Technology Boston, MA, USA
| | - Stephan Dreiseitl
- Dept. of Software Engineering, Upper Austria University of Applied Sciences, Hagenberg, Austria
| | - Lucila Ohno-Machado
- Decision Systems Group, Brigham and Women's Hospital, 75 Francis Street, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA, USA
- Harvard-MIT, Division of Health Sciences and Technology Boston, MA, USA
| |
Collapse
|
11
|
Abstract
MOTIVATION Interpretation of classification models derived from gene-expression data is usually not simple, yet it is an important aspect in the analytical process. We investigate the performance of small rule-based classifiers based on fuzzy logic in five datasets that are different in size, laboratory origin and biomedical domain. RESULTS The classifiers resulted in rules that can be readily examined by biomedical researchers. The fuzzy-logic-based classifiers compare favorably with logistic regression in all datasets. AVAILABILITY Prototype available upon request.
Collapse
Affiliation(s)
- Staal A Vinterbo
- Decision Systems Group, Brigham and Women's Hospital, Harvard Medical School/Massachusetts Institute of Technology, Boston, USA.
| | | | | |
Collapse
|
12
|
Abstract
We investigate the use of perceptrons for classification of microarray data where we use two datasets that were published in [Nat. Med. 7 (6) (2001) 673] and [Science 286 (1999) 531]. The classification problem studied by Khan et al. is related to the diagnosis of small round blue cell tumours (SRBCT) of childhood which are difficult to classify both clinically and via routine histology. Golub et al. study acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). We used a simulated annealing-based method in learning a system of perceptrons, each obtained by resampling of the training set. Our results are comparable to those of Khan et al. and Golub et al., indicating that there is a role for perceptrons in the classification of tumours based on gene-expression data. We also show that it is critical to perform feature selection in this type of models, i.e. we propose a method for identifying genes that might be significant for the particular tumour types. For SRBCTs, zero error on test data has been obtained for only 13 out of 2308 genes; for the ALL/AML problem, we have zero error for 9 out of 7129 genes that are used for the classification procedure. Furthermore, we provide evidence that Epicurean-style learning and simulated annealing-based search are both essential for obtaining the best classification results.
Collapse
Affiliation(s)
- Andreas Albrecht
- Computer Science Department, University of Hertfordshire, Hatfield, Herts AL10 9AB, UK.
| | | | | |
Collapse
|
13
|
Ohno-Machado L, Vinterbo SA, Dreiseitl S. Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance. Proc AMIA Symp 2001:503-7. [PMID: 11825239 PMCID: PMC2243599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.
Collapse
Affiliation(s)
- L Ohno-Machado
- Decision Systems Group, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
14
|
Vinterbo SA, Ohno-Machado L, Dreiseitl S. Hiding information by cell suppression. Proc AMIA Symp 2001:726-30. [PMID: 11825281 PMCID: PMC2243346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Joining relational data can jeopardize patient confidentiality if disseminated data for research can be joined with publicly available data containing, for example, explicit identifiers. Ambiguity in data hinders the construction of primary keys that are of importance when joining data tables. We define two values to be indiscernible if they are the same or at least one of them is a special value. Two rows in a data table are indiscernible if their corresponding entries are indiscernible. We further define a table to be k-ambiguous if each row is indiscernible from at least k rows in the same table. We present two simple heuristics to make a table k-ambiguous by cell suppression, and compare them on example data.
Collapse
Affiliation(s)
- S A Vinterbo
- Decision Systems Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| | | | | |
Collapse
|