Abstract

Experimentation using IR systems has traditionally been a procedural and laborious process. Queries must be run on an index, with any parameters of the retrieval models suitably tuned. With the advent of learning-to-rank, such experimental processes (including the appropriate folding of queries to achieve cross-fold validation) have resulted in complicated experimental designs and hence scripting. At the same time, machine learning platforms such as Scikit Learn and Apache Spark have pioneered the notion of an experimental pipeline , which naturally allows a supervised classification experiment to be expressed a series of stages, which can be learned or transformed. In this demonstration, we detail Terrier-Spark, a recent adaptation to the Terrier Information Retrieval platform which permits it to be used within the experimental pipelines of Spark. We argue that this (1) provides an agile experimental platform for information retrieval, comparable to that enjoyed by other branches of data science; (2) aids research reproducibility in information retrieval by facilitating easily-distributable notebooks containing conducted experiments; and (3) facilitates the teaching of information retrieval experiments in educational environments.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1145/3209978.3210174 under the license http://www.acm.org/publications/policies/copyright_policy#Background
http://eprints.gla.ac.uk/160904,
https://dl.acm.org/citation.cfm?id=3210174,
https://academic.microsoft.com/#/detail/2798639660
Back to Top

Document information

Published on 01/01/2018

Volume 2018, 2018
DOI: 10.1145/3209978.3210174
Licence: Other

Document Score

0

Views 0
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?