Abstract

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.


Original document

The different versions of the original document can be found in:

https://arxiv.org/abs/1610.10001,
https://ui.adsabs.harvard.edu/abs/2016arXiv161010001B/abstract,
https://arxiv.org/pdf/1610.10001v1,
http://dblp.uni-trier.de/db/journals/corr/corr1610.html#BoytsovNMN16,
https://doi.org/10.1145/2983323.2983815,
https://doi.acm.org/10.1145/2983323.2983815,
https://is.muni.cz/publication/1377704/cs/Off-the-Beaten-Path-Lets-Replace-Term-Based-Retrieval-with-k-NN-Search/Boytsov-Novak-Malkov-Nyberg,
https://www.muni.cz/vyzkum/publikace/1377704,
https://dl.acm.org/ft_gateway.cfm?id=2983815&ftid=1806330&dwn=1,
https://dl.acm.org/citation.cfm?id=2983815,
https://academic.microsoft.com/#/detail/2537425075
http://dx.doi.org/10.1145/2983323.2983815 under the license http://www.acm.org/publications/policies/copyright_policy#Background
Back to Top

Document information

Published on 01/01/2016

Volume 2016, 2016
DOI: 10.1145/2983323.2983815
Licence: Other

Document Score

0

Views 0
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?