Technical Report TR720:
Addressing the Limitations of Γ-privacy
(Oct 2015), 7
[Written for PhD qualifying exam]
Collection of provenance information is an important aspect of any scientific workflow system. Workflow provenance generally captures lot of information about individual modules in the workflow including input parameters, input and output data products, intermediate data products, module invocation times etc. Therefore, a complete provenance graph contains enough information for someone to have a clear picture about the workflow structure, individual modules and data flow within the workflow. This can cause privacy issues in certain workflows which consume sensitive information. To address these issues, workflow owners may want to keep some provenance information confidential and make sure those are not published with provenance data. Davidson et al. presents Γ-privacy which quantifies the module privacy requirements of scientific workflow provenance data. It ensures the privacy of all modules in the workflow by hiding some information from the original provenance data. And also, Γ-privacy tries to minimize the cost of hidden data to make sure the maximum amount of provenance information is published. However, Cheney and Perera points out some limitations of Γ-privacy including the difficulty of deciding an appropriate value for Γ in a complex workflow. In this paper, we discuss those limitations in more details and present a solution to address the main limitations of Γ-privacy including the difficulty of selecting a value for Γ using the ideas from differential privacy and ProPub.
- Available as: