Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. One such tool is CDE, which is a software packaging tool that encapsulates source code, datasets and environments. However, CDE does not include information about origins of dependencies. Consequently when multiple CDE packages are combined and merged to create a software pipeline, several issues arise requiring an author to manually verify compatibility of distributions, environment variables, software dependencies and compiler options. In this work, we propose software provenance to be included as part of CDE so that resulting provenance-included CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Furthermore, we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.
Document type: Part of book or chapter of book
The different versions of the original document can be found in:
https://link.springer.com/content/pdf/10.1007%2F978-3-319-16462-5_8.pdf
http://link.springer.com/content/pdf/10.1007/978-3-319-16462-5_8,http://dx.doi.org/10.1007/978-3-319-16462-5_8
http://people.cs.uchicago.edu/~quanpt/sole/qpham_ipaw14.pdf,https://link.springer.com/chapter/10.1007%2F978-3-319-16462-5_8,https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8,https://academic.microsoft.com/#/detail/379886638
Published on 19/03/15
Accepted on 19/03/15
Submitted on 19/03/15
Volume 2015, 2015
DOI: 10.1007/978-3-319-16462-5_8
Licence: CC BY-NC-SA license
Are you one of the authors of this document?