Evaluating the Impact of Compression on Inverted Index Search Engine Performance

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Efficient compression of inverted indexes is vital for scalable search engines, yet the literature lacks a comprehensive comparison of both modern integer-specific and dictionary-based codecs. This thesis bridges that gap by integrating VarByte, Simple8b, FOR, PFOR/NewPFOR/FastPFOR alongside LZ4, Snappy, and Zstandard into Apache Lucene Core and rigorously benchmarking their impact on compression ratio, indexing throughput, and query latency. Our systematic evaluation uncovers the distinct trade-offs — integer codecs tend to enable faster indexing at the expense of larger footprints, while dictionary schemes offer stronger space savings with moderate latency overhead. Finally, we distill these insights into a lightweight decision-support selection tree that guides practitioners to the optimal codec choice based on their specific application priorities. Keywords:

Description

Keywords

Inverted Index, Compression Ratio, Search Latency, Query Latency, Index Compression, Information Retrieval

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By