Evaluating the Impact of Compression on Inverted Index Search Engine Performance
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Efficient compression of inverted indexes is vital for scalable search engines, yet the literature lacks a comprehensive comparison of both modern integer-specific and dictionary-based codecs. This thesis bridges that gap by integrating VarByte, Simple8b, FOR, PFOR/NewPFOR/FastPFOR alongside LZ4, Snappy, and Zstandard into Apache Lucene Core and rigorously benchmarking their impact on compression ratio, indexing throughput, and query latency. Our systematic evaluation uncovers the distinct trade-offs — integer codecs tend to enable faster indexing at the expense of larger footprints, while dictionary schemes offer stronger space savings with moderate latency overhead. Finally, we distill these insights into a lightweight decision-support selection tree that guides practitioners to the optimal codec choice based on their specific application priorities. Keywords:
Beskrivning
Ämne/nyckelord
Inverted Index, Compression Ratio, Search Latency, Query Latency, Index Compression, Information Retrieval
