Evaluating the Impact of Compression on Inverted Index Search Engine Performance
| dc.contributor.author | Kaulio, Hannes | |
| dc.contributor.author | Blom, Martin | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Petersen Moura Trancoso, Pedro | |
| dc.contributor.supervisor | Petersen Moura Trancoso, Pedro | |
| dc.date.accessioned | 2025-12-11T15:28:18Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | Efficient compression of inverted indexes is vital for scalable search engines, yet the literature lacks a comprehensive comparison of both modern integer-specific and dictionary-based codecs. This thesis bridges that gap by integrating VarByte, Simple8b, FOR, PFOR/NewPFOR/FastPFOR alongside LZ4, Snappy, and Zstandard into Apache Lucene Core and rigorously benchmarking their impact on compression ratio, indexing throughput, and query latency. Our systematic evaluation uncovers the distinct trade-offs — integer codecs tend to enable faster indexing at the expense of larger footprints, while dictionary schemes offer stronger space savings with moderate latency overhead. Finally, we distill these insights into a lightweight decision-support selection tree that guides practitioners to the optimal codec choice based on their specific application priorities. Keywords: | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310811 | |
| dc.language.iso | eng | |
| dc.relation.ispartofseries | CSE 25-83 | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Inverted Index, Compression Ratio, Search Latency, Query Latency, Index Compression, Information Retrieval | |
| dc.title | Evaluating the Impact of Compression on Inverted Index Search Engine Performance | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | High-performance computer systems (MPHPC), MSc |
