m (Scipediacontent moved page Draft Content 717641091 to Pham et al 2015a)
 
Line 3: Line 3:
  
 
Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. One such tool is CDE, which is a software packaging tool that encapsulates source code, datasets and environments. However, CDE does not include information about origins of dependencies. Consequently when multiple CDE packages are combined and merged to create a software pipeline, several issues arise requiring an author to manually verify compatibility of distributions, environment variables, software dependencies and compiler options. In this work, we propose software provenance to be included as part of CDE so that resulting provenance-included CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Furthermore, we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.
 
Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. One such tool is CDE, which is a software packaging tool that encapsulates source code, datasets and environments. However, CDE does not include information about origins of dependencies. Consequently when multiple CDE packages are combined and merged to create a software pipeline, several issues arise requiring an author to manually verify compatibility of distributions, environment variables, software dependencies and compiler options. In this work, we propose software provenance to be included as part of CDE so that resulting provenance-included CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Furthermore, we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.
 
Document type: Part of book or chapter of book
 
 
== Full document ==
 
<pdf>Media:Draft_Content_717641091-beopen297-5362-document.pdf</pdf>
 
  
  
Line 14: Line 9:
 
The different versions of the original document can be found in:
 
The different versions of the original document can be found in:
  
[https://link.springer.com/content/pdf/10.1007%2F978-3-319-16462-5_8.pdf https://link.springer.com/content/pdf/10.1007%2F978-3-319-16462-5_8.pdf]
+
* [https://link.springer.com/content/pdf/10.1007%2F978-3-319-16462-5_8.pdf https://link.springer.com/content/pdf/10.1007%2F978-3-319-16462-5_8.pdf]
  
[http://link.springer.com/content/pdf/10.1007/978-3-319-16462-5_8 http://link.springer.com/content/pdf/10.1007/978-3-319-16462-5_8],[http://dx.doi.org/10.1007/978-3-319-16462-5_8 http://dx.doi.org/10.1007/978-3-319-16462-5_8]
+
* [http://link.springer.com/content/pdf/10.1007/978-3-319-16462-5_8 http://link.springer.com/content/pdf/10.1007/978-3-319-16462-5_8],
 +
: [http://dx.doi.org/10.1007/978-3-319-16462-5_8 http://dx.doi.org/10.1007/978-3-319-16462-5_8]
  
[http://people.cs.uchicago.edu/~quanpt/sole/qpham_ipaw14.pdf http://people.cs.uchicago.edu/~quanpt/sole/qpham_ipaw14.pdf],[https://link.springer.com/chapter/10.1007%2F978-3-319-16462-5_8 https://link.springer.com/chapter/10.1007%2F978-3-319-16462-5_8],[https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8 https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8],[https://academic.microsoft.com/#/detail/379886638 https://academic.microsoft.com/#/detail/379886638]
+
* [http://people.cs.uchicago.edu/~quanpt/sole/qpham_ipaw14.pdf http://people.cs.uchicago.edu/~quanpt/sole/qpham_ipaw14.pdf],
 +
: [https://link.springer.com/chapter/10.1007/978-3-319-16462-5_8 https://link.springer.com/chapter/10.1007/978-3-319-16462-5_8],
 +
: [https://dblp.uni-trier.de/db/conf/ipaw/ipaw2014.html#PhamMF14 https://dblp.uni-trier.de/db/conf/ipaw/ipaw2014.html#PhamMF14],
 +
: [https://www.scipedia.com/public/Pham_et_al_2015a https://www.scipedia.com/public/Pham_et_al_2015a],
 +
: [https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8 https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8],
 +
: [https://academic.microsoft.com/#/detail/379886638 https://academic.microsoft.com/#/detail/379886638]

Latest revision as of 17:07, 21 January 2021

Abstract

Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. One such tool is CDE, which is a software packaging tool that encapsulates source code, datasets and environments. However, CDE does not include information about origins of dependencies. Consequently when multiple CDE packages are combined and merged to create a software pipeline, several issues arise requiring an author to manually verify compatibility of distributions, environment variables, software dependencies and compiler options. In this work, we propose software provenance to be included as part of CDE so that resulting provenance-included CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Furthermore, we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1007/978-3-319-16462-5_8
https://link.springer.com/chapter/10.1007/978-3-319-16462-5_8,
https://dblp.uni-trier.de/db/conf/ipaw/ipaw2014.html#PhamMF14,
https://www.scipedia.com/public/Pham_et_al_2015a,
https://rd.springer.com/chapter/10.1007/978-3-319-16462-5_8,
https://academic.microsoft.com/#/detail/379886638
Back to Top

Document information

Published on 19/03/15
Accepted on 19/03/15
Submitted on 19/03/15

Volume 2015, 2015
DOI: 10.1007/978-3-319-16462-5_8
Licence: CC BY-NC-SA license

Document Score

0

Views 0
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?