m-Privacy for Collaborative Data Publishing (2013)

Note: Please Scroll Down to See the Download Link.

m-Privacy for Collaborative Data Publishing



In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack” by colluding data providers who may use their own data records (a subset of the overall data) to infer the data records contributed by other data providers. The paper addresses this new threat, and makes several contributions. First, we introduce the notion of m-privacy, which guarantees that the anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. Second, we present heuristic algorithms exploiting the monotonicity of privacy constraints for efficiently checking m-privacy given a group of records. Third, we present a data provider-aware anonymization algorithm with adaptive m-privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency. Finally, we propose secure multi-party computation protocols for collaborative data publishing with m-privacy. All protocols are extensively analyzed and their security and efficiency are formally proved. Experiments on real-life datasets suggest that our approach achieves better or comparable utility and efficiency than existing and baseline algorithms while satisfying m-privacy.


Most work has focused on a single data provider setting and considered the data recipient as an attacker. A large body of literature assumes limited background knowledge of the attacker, and defines privacy using relaxed adversarial notion by considering specific types of attacks. Representative principles include k-anonymity, ldiversity, and t-closeness. A few recent works have modeled the instance level background knowledge as corruption, and studied perturbation techniques under these syntactic privacy notions


1. Collaborative data publishing can be considered as a multi-party computation problem, in which multiple providers wish to compute an anonymized view of their data without disclosing any private and sensitive information

2. The problem of inferring information from anonymized data has been widely studied in a single data provider setting. A data recipient that is an attacker, e.g., P0, attempts to infer additional information about data records using the published data, T , and background knowledge, BK.


We consider the collaborative data publishing setting with horizontally partitioned data across multiple data providers, each contributing a subset of records Ti. As a special case, a data provider could be the data owner itself who is contributing its own records. This is a very common scenario in social networking and recommendation systems. Our goal is to publish an anonymized view of the integrated data such that a data recipient including the data providers will not be able to compromise the privacy of the individual records provided by other parties.


Compared to our preliminary version, our new contributions extend above results. First, we adapt privacy verification and anonymization mechanisms to work for m-privacy with respect to any privacy constraint, including nonmonotonic ones. We list all necessary privacy checks and prove that no fewer checks are enough to confirm m-privacy. Second, we propose SMC protocols for secure m-privacy verification and anonymization. For all protocols we prove their security, complexity and experimentally confirm their efficiency.


1.     Patient Registration

2.     Attacks by External Data Recipient Using Anonymized  Data

3.     Attacks by Data Providers Using Anonymized Data and Their Own Data

4.     Doctor Login

5.     Admin Login

            Modules Description

Patient Registration:

In this module if patients have to take treatment, he/she should register their details like Name, Age, and Disease they get affected, Email etc. These details are maintained in a Database by the Hospital management. Only Doctors can see all their details. Patient can only see his own record.

Based on this paper:

When the data are distributed among multiple data providers or data owners, two main settings are used for anonymization. One approach is for each provider to anonymize the data independently (anonymize-and-aggregate, Figure 1A), which results in potential loss of integrated data utility. A more desirable approach is collaborative data publishing which anonymize data from all Providers as if they would come from one source (aggregate-and-anonymize, Figure 1B), using either a trusted third-party(TTP) or Secure Multi-party Computation (SMC) protocols to do computations .

Attacks by External Data Recipient Using Anonymized Data:

A data recipient, e.g. P0, could be an attacker and attempts to infer additional information about the records using the published data (T∗) and some background knowledge (BK) such as publicly available external data.

Attacks by Data Providers Using Anonymized Data and Their Own Data:

 Each data provider, such as P1 in Figure 1, can also use anonymized data T∗ and his own data (T1) to infer additional information about other records. Compared to the attack by the external recipient in the first attack scenario, each provider has additional data knowledge of their own records, which can help with the attack. This issue can be further worsened when multiple data providers collude with each other.


                                                      FIGURE 1


                                                          FIGURE: 2

Doctor Login:

                          In this module Doctor can see all the patients details and will get the background knowledge(BK),by the chance he will see horizontally partitioned data of distributed data base of the group of hospitals and can see how many patients are affected without knowing of individual records of the patients and sensitive information about the individuals.


Admin Login:

                          In this module Admin acts as Trusted Third Party (TTP).He can see all individual records and their sensitive information among the overall hospital distributed data base. Anonymation can be done by this people. He/She collected information’s from various hospitals and grouped into each other and make them as an anonymized data.





ü Processor             -        Pentium –IV

ü Speed                             -        1.1 Ghz

ü RAM                    -        256 MB(min)

ü Hard Disk            -        20 GB

ü Key Board            -        Standard Windows Keyboard

ü Mouse                  -        Two or Three Button Mouse

ü Monitor                -        SVGA



ü Operating System                    : Windows XP

ü Programming Language           : JAVA

ü Java Version                           : JDK 1.6 & above.


Click here to download m-Privacy for Collaborative Data Publishing (2013) source code