Improved Pattern Generation for Bloom Filters with Bit Patterns Optimizing bit patterns for use in blocked Bloom filters
Examensarbete för masterexamen
Computer science – algorithms, languages and logic (MPALG), MSc
Set-membership is a commonly occurring problem in many areas of computing, from networking to database applications. A popular data structure for this problem is the Bloom filter: a small, hash-based probabilistic structure which guarantees no false negatives, but can result in false positives. Recently they have been used as an important tool in bioinformatics where the data sets are huge, and as a consequence the filters also need to be large. Blocked Bloom filters with bit patterns have been suggested as an alternative to cope with the deteriorated cache- and hash-behaviour in these cases. It was recently discovered that optimal pattern design for use in these structures is linked to two-stage group testing. There has also been some recent partial results that indicate a certain structure of optimal patterns. This thesis concerns itself with investigating these structural properties to find a better pattern design for use in Blocked Bloom filters with bit patterns. Our main result is a new, deterministic, algorithm for pattern generation used in these structures based on the Chinese Remainder Theorem. The results indicate that this construction improves the false positive rate for all our testing scenarios. As a side-result we also propose a modification to a known combinatorial design used in group testing which significantly reduces the needed number of tests for high number of defectives.
Data- och informationsvetenskap , Computer and Information Science