Algorithm Selection for Document Retrieval

dc.contributor.authorPansell, Albin
dc.contributor.authorRiis, Simon
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerPetersen Moura Trancoso, Pedro
dc.contributor.supervisorPetersen Moura Trancoso, Pedro
dc.contributor.supervisorHagman, Oskar
dc.date.accessioned2025-09-10T09:00:07Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractEnergy consumption in data centers is increasing and is expected to use a significant share of the world’s energy in the future. As the quantities of data are also increasing, searching this data uses substantial computational resources. One way to reduce the energy consumption of search clusters is to reduce the search latency. The type of search used for unstructured data, such as in web search engines, is called top-k document retrieval. Several different top-k document retrieval algorithms exist, but no single algorithm performs best in all cases. A technique that can be used in problems with this characteristic is algorithm selection. Algorithm selection is a technique that involves predicting and using one, out of a set of complementary algorithms, that will perform the best in a specific problem instance. In this thesis, we study whether algorithm selection is a viable approach for improving query latency, over state-of-the-art query processing algorithms. To do this, we first measured the performance of a set of relevant state-of-the-art top-k document retrieval algorithms in different queries. Using this performance data, we trained a machine learning classifier to predict which algorithm would perform the best, given a query. We show that the potential latency improvement of the algorithm selection approach varies with the database size. The largest potential improvement we observed is near 25%. We present an algorithm selector which achieves 17 out of the 25%. We also discuss how this could reduce the energy consumption of the querying process by around 15%, but also some practical considerations.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310443
dc.language.isoeng
dc.relation.ispartofseriesCSE-24-163
dc.setspec.uppsokTechnology
dc.subjectinformation retrieval, top-k document retrieval, algorithm selection, query processing, energy consumption, machine learning, random forest, decision tree
dc.titleAlgorithm Selection for Document Retrieval
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeHigh-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-163 AP SR.pdf
Storlek:
3.44 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: