Evaluating the Impact of Compression on Inverted Index Search Engine Performance

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Efficient compression of inverted indexes is vital for scalable search engines, yet the literature lacks a comprehensive comparison of both modern integer-specific and dictionary-based codecs. This thesis bridges that gap by integrating VarByte, Simple8b, FOR, PFOR/NewPFOR/FastPFOR alongside LZ4, Snappy, and Zstandard into Apache Lucene Core and rigorously benchmarking their impact on compression ratio, indexing throughput, and query latency. Our systematic evaluation uncovers the distinct trade-offs — integer codecs tend to enable faster indexing at the expense of larger footprints, while dictionary schemes offer stronger space savings with moderate latency overhead. Finally, we distill these insights into a lightweight decision-support selection tree that guides practitioners to the optimal codec choice based on their specific application priorities. Keywords:

Beskrivning

Ämne/nyckelord

Inverted Index, Compression Ratio, Search Latency, Query Latency, Index Compression, Information Retrieval

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced