Full Program
Summary:
Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets. It is a crucial step in statistical analysis and machine learning as the quality of the input data directly affects the reliability and validity of the results obtained from any analysis. Data cleaning poses challenges in privacy and confidentiality when the process is outsourced. There has been recent research focus on privacy and confidentiality preserving data cleaning. In this paper, we propose a practical qualitative data cleaning system that preserves the privacy and confidentiality of the data utilising trusted execution environments. We have implemented our system in Python and deployed it using the Gramine library operating system on Intel Software Guard Extensions (SGX) hardware.Author(s):
Anirban Basu
Hitachi, Ltd.
Japan
Masayuki Yoshino
Hitachi, Ltd.
Japan
Minako Toba
Hitachi, Ltd.
Japan