UK professors and researchers have developed the first general framework for safeguarding privacy in deep learning built over Pytorch. They have reported their findings in the paper “A generic framework for privacy preserving deep learning.”
Using constructs that preserve privacy
This paper introduces a transparent framework to preserve privacy while using deep learning in PyTorch. This framework puts a premium on data ownership and its securing processing. It introduces a value representation which is based on chains of commands and tensors. The resulting abstraction allows implementation of complex constructs that preserve privacy. Constructs like federated learning, secure multiparty computation, and differential privacy are used.
Boston Housing and Pima Indian Diabetes datasets are used in the paper to show early results. Except for differential privacy, other privacy features do not affect prediction accuracy. The current framework implementation introduces a significant overhead which is to be addressed in a later development stage.
Deep learning operations in untrusted environments
To perform operations in untrusted environments without disclosing data, Secure Multiparty Computation (SMPC) is used, which is a popular approach. In machine learning, SMPC can protect the model weights while allowing multiple worker nodes to participate in training with their own datasets. This is known as federated learning (FL). These securely trained models are still vulnerable to reverse engineering attacks. This vulnerability is addressed by differentially private (DP) methods.
The standardized PyTorch framework contains:
- A chain structure in which performing transformations or sending tensors to other workers can be shown as a chain of operations.
- For a virtual to real context of federated learning, a concept called Virtual Workers is introduced. These Virtual Workers reside in the same machine and do not communicate over the network.
Results and conclusion
A reasonably small overhead is observed when using Web Socket workers in place of Virtual Workers. This overhead is due to the low network latency when communication takes place between different local tabs. When using the Pima Indian Diabetes dataset, the same overhead in performance is observed. The design in this paper relies on chains of tensors exchanged between the local and remote workers. Decreasing training time is an issue to be addressed. Another concern is securing MPC to avoid malicious attempts targeted at corrupting the data or the model.
For more details, read the research paper.