Algorithm Selection for Document Retrieval

Pansell, Albin; Riis, Simon

Algorithm Selection for Document Retrieval

Ladda ner

CSE 24-163 AP SR.pdf (3.44 MB)

Publicerad

2024

Författare

Pansell, Albin

Riis, Simon

Typ

Examensarbete för masterexamen
Master's Thesis

Program

High-performance computer systems (MPHPC), MSc

Sammanfattning

Energy consumption in data centers is increasing and is expected to use a significant share of the world’s energy in the future. As the quantities of data are also increasing, searching this data uses substantial computational resources. One way to reduce the energy consumption of search clusters is to reduce the search latency. The type of search used for unstructured data, such as in web search engines, is called top-k document retrieval. Several different top-k document retrieval algorithms exist, but no single algorithm performs best in all cases. A technique that can be used in problems with this characteristic is algorithm selection. Algorithm selection is a technique that involves predicting and using one, out of a set of complementary algorithms, that will perform the best in a specific problem instance. In this thesis, we study whether algorithm selection is a viable approach for improving query latency, over state-of-the-art query processing algorithms. To do this, we first measured the performance of a set of relevant state-of-the-art top-k document retrieval algorithms in different queries. Using this performance data, we trained a machine learning classifier to predict which algorithm would perform the best, given a query. We show that the potential latency improvement of the algorithm selection approach varies with the database size. The largest potential improvement we observed is near 25%. We present an algorithm selector which achieves 17 out of the 25%. We also discuss how this could reduce the energy consumption of the querying process by around 15%, but also some practical considerations.

Ämne/nyckelord

information retrieval, top-k document retrieval, algorithm selection, query processing, energy consumption, machine learning, random forest, decision tree

URI

https://hdl.handle.net/20.500.12380/310443

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Algorithm Selection for Document Retrieval

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By