Abstract
PURPOSE
A set of deidentified patient data compliant with the Health Information Portability and Accountability Act (HIPAA) was compiled, the data lost as a function of unique data elements (UDEs) were measured, and the deidentified data were tested for potential for reidentification.
METHODS
After approval by the institutional review board of an integrated health system, a limited-data set was created by querying the health system's pharmacy, administrative, and financial files for patients discharged between January 1 and December 31, 2000. Using the HIPAA "safe-harbor" method, this limited-data set was converted into a deidentified-data table for future statistical analysis, and UDEs in both data sets were identified and quantified. Unique combinations of commonly available data were also identified.
RESULTS
The limited-data set, representing 4,738 patient discharges, contained 810,456 UDEs in 322,657 records organized into four data tables (demographics, diagnoses, medication orders, and laboratory test results). The deidentified-data table, representing 4,722 discharges, contained 562,171 UDEs in 128 data-type columns in a single data table. About 31% of the data volume was lost. Much of the information lost was of the type that is of special interest to researchers (e.g., time between episodes of care, ages of >89 years).
CONCLUSION
A study suggested that deidentified patient data with a reasonable degree of protection against reidentification were less complete than may be necessary for good research.
Collapse