The Health Insurance Portability and Accountability Act of 1996 (“HIPAA”) protects the privacy of patients and sets forth guidelines on how this private health information can be shared. Though the privacy of a patient must be protected, the legal right of a business to sell health information of patients has been upheld by the Supreme Court of the United States.
Data de-identification is the process of eliminating Personally Identifiable Data (PII) from any document or other media, including an individual’s Protected Health Information (PHI).
The HIPAA Safe Harbor Method is a precise standard for the de-identification of personal health information when disclosed for secondary purposes.
It requires the removal of 18 identifiers from a dataset:
ARX is an open-source tool that anonymizes sensitive personal information. It supports a range of privacy and risk models, techniques for data transformation, and techniques to analyze the utility of output data.
The deid software package includes code and dictionaries that automatically locate and remove PHI in free text from medical records. It was developed using over 2,400 nursing notes that were methodically de-identified by a multi-pass process including various automated methods as well as reviews by multiple experts working autonomously.
Privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption
Using real-world evidence in biomedical research, an indispensable complement to clinical trials requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions.
We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any
intermediate data.
We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including
Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics.
Using the system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone.
Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.