Indiana University Bloomington

School of Informatics and Computing

Technical Report TR721:
Secure Provenance for Data Preservation Repositories

Isuru Suriarachchi
(Oct 2015), 9
[Written for PhD qualifying exam]
Importance of research data preservation and management has been accepted by the scientists all around the world. Interest and investment in data preservation projects has become higher than ever before. Already there are number of well-known research data repositories for different types of research data. Data preservation, sharing, discovery and reuse are the key features which are common across all such repositories. Data provenance is used to track lineage or processing history of a particular data product. Capturing provenance has been identified as an important step in any scientific application. Therefore, data preservation repositories are also utilizing provenance practices mainly to enhance data discovery. However, in some situations, the complete provenance information about datasets cannot be published in preservation repositories due to various possible reasons. Therefore, such repositories should facilitate mechanisms to control the amount of provenance information exposed for outside people. In this paper, we identify the scenarios in which the conflicts between obfuscation and disclosure of provenance exists in the context of data preservation repositories. We propose a secure provenance model which is capable of preserving provenance integrity while satisfying obfuscation requirements. We build our design based on SEAD repository.

