Abstract

Complex information extraction (IE) pipelines are becoming an integral component of most text processing frameworks. We introduce a first system to help IE users analyze extraction pipeline semantics and operator transformations interactively while debugging. This allows the effort to be proportional to the need, and to focus on the portions of the pipeline under the greatest suspicion. We present a generic debugger for running post-execution analysis of any IE pipeline consisting of arbitrary types of operators. For this, we propose an effective provenance model for IE pipelines which captures a variety of operator types, ranging from those for which full to no specifications are available. We have evaluated our proposed algorithms and provenance model on large-scale real-world extraction pipelines.


Original document

The different versions of the original document can be found in:

http://i.stanford.edu/~anishds/publications/cikm2011/p2229-dassarma.pdf,
https://dblp.uni-trier.de/db/conf/cikm/cikm2011.html#SarmaJB11,
https://doi.acm.org/10.1145/2063576.2063933,
https://academic.microsoft.com/#/detail/2171897232
http://dx.doi.org/10.1145/2063576.2063933
Back to Top

Document information

Published on 01/01/2011

Volume 2011, 2011
DOI: 10.1145/2063576.2063933
Licence: CC BY-NC-SA license

Document Score

0

Views 2
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?