Provenance for seismological processing pipelines in a distributed streaming workflow

Abstract

Harvesting provenance for streaming workflows presents challenges related to the high rate of the updates and a large distribution of the execution, which can be spread across several institutional infrastructures. Moreover, the typically large volume of data produced by each transformation step can not be always stored and preserved efficiently. This can represent an obstacle for the evaluation of the results, for instance, in real-time, suggesting the importance of customisable metadata extraction procedures. In this paper we present our approach to the aforementioned provenance challenges within a use-case driven scenario in the field of seismology, which requires the execution of processing pipelines over a large datastream. In particular, we will discuss the current implementation and the upcoming challenges for an in-worfklow programmatic approach to provenance tracing, building on composite functions, selective recording and domain specific metadata production.

Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1145/2457317.2457369

http://www.research.ed.ac.uk/portal/en/publications/provenance-for-seismological-processing-pipelines-in-a-distributed-streaming-workflow(7f4fd006-4377-4cbc-9b2b-98d35cbacb94).html,

https://dblp.uni-trier.de/db/conf/edbt/edbtw2013.html#SpinusoCA13,

https://academic.microsoft.com/#/detail/2010205794

http://dl.acm.org/ft_gateway.cfm?id=2457369&ftid=1357152&dwn=1,

http://dx.doi.org/10.1145/2457317.2457369

Abstract

Original document

Document information

Document Score

Share this document

claim authorship