Preserving & Randomizing Data Responses in Web Application

Today, the privacy of individual data has become a prime concern as a huge amount of data is generated online. Securing and preserving data became a key focus of humans for several decades. The issue with online private data is that it is sensitive and confidential. Hence, a group of mathematicians and cryptographers came together to resolve this issue by introducing the concept of Differential Privacy.
The numerical computation can analyze and calculate the sum of leakage gives the different ways to adjust data as per convenience and have data privacy over original data. IBM’s Cost of a Data Breach report provides a complete insight on the average cost of a data breach for the year 2020, which was close to $4 Million. Differential privacy maintains the privacy of over sensitive and personal data in compliance with data privacy and protection regulations such as GDPR, CCPA, HIPAA, or GLBA.
The project aims to integrate all mechanisms, models, and tools involved in DiffPrivLib. The primary purpose of this project is to develop a user-friendly and open-source web application. This application is designed in a python programming package and will experiment with the dataset to perform the analysis to show the impact of differential privacy algorithms on different values on epsilon with accuracy and privacy.
Approach
The main focus of the project is to develop an integrated user-friendly application with high resilience and sustainability using IBM’s Diffprivlib library, which can perform computations based on digital technologies such as AI/ML, Cryptography, and Security.
The model was developed based on SDLC phases, followed by Data Science or Machine Learning lifecycle. The methodology consists of six different points, which followed a Waterfall model approach. The six steps consist of Gathering Data, Data Preprocessing, Train Model, Test Model, Model Deployment, and Model Monitoring. The concept of the project is to manage and perform the different privacy mechanisms involved in Diffprivlib Library. The project performs only for the supervised dataset for classification and regression model. In future analysis, this project will improve the deploying mechanism for unsupervised raw data using the K-means algorithm.
Results
The open-source web application was built using IBM’s Differential Privacy Library called Diffprivlib. The outcome of the private differential data is in the form of values given in the form of a graph, and simple computational values. The application developed is cost-effective and easy to access, which helps to achieve differential privacy to a certain amount.
Keywords: Differential Privacy, Python Programming, Open-Source Library, Data Science, Machine Learning, Data Analytics