Algorithm Selection for Document Retrieval

Pansell, Albin; Riis, Simon

Algorithm Selection for Document Retrieval

dc.contributor.author	Pansell, Albin
dc.contributor.author	Riis, Simon
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Petersen Moura Trancoso, Pedro
dc.contributor.supervisor	Petersen Moura Trancoso, Pedro
dc.contributor.supervisor	Hagman, Oskar
dc.date.accessioned	2025-09-10T09:00:07Z
dc.date.issued	2024
dc.date.submitted
dc.description.abstract	Energy consumption in data centers is increasing and is expected to use a significant share of the world’s energy in the future. As the quantities of data are also increasing, searching this data uses substantial computational resources. One way to reduce the energy consumption of search clusters is to reduce the search latency. The type of search used for unstructured data, such as in web search engines, is called top-k document retrieval. Several different top-k document retrieval algorithms exist, but no single algorithm performs best in all cases. A technique that can be used in problems with this characteristic is algorithm selection. Algorithm selection is a technique that involves predicting and using one, out of a set of complementary algorithms, that will perform the best in a specific problem instance. In this thesis, we study whether algorithm selection is a viable approach for improving query latency, over state-of-the-art query processing algorithms. To do this, we first measured the performance of a set of relevant state-of-the-art top-k document retrieval algorithms in different queries. Using this performance data, we trained a machine learning classifier to predict which algorithm would perform the best, given a query. We show that the potential latency improvement of the algorithm selection approach varies with the database size. The largest potential improvement we observed is near 25%. We present an algorithm selector which achieves 17 out of the 25%. We also discuss how this could reduce the energy consumption of the querying process by around 15%, but also some practical considerations.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/310443
dc.language.iso	eng
dc.relation.ispartofseries	CSE-24-163
dc.setspec.uppsok	Technology
dc.subject	information retrieval, top-k document retrieval, algorithm selection, query processing, energy consumption, machine learning, random forest, decision tree
dc.title	Algorithm Selection for Document Retrieval
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	High-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 24-163 AP SR.pdf
Size:: 3.44 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen