Consequences of positive matches in Bloom filters

Question

Assumptions:

The usernames of registered users are stored in a set
I want to use a Bloom filter to make lookups faster.
The Bloom filter as a certain probability of false-positives (0.1%)

When a new user wants to register, in most cases, my UI tells them "this name is not in use, you're good to go".

But what does the backend need to do if a positive match is found?

The result might be a false-positive. Would finding out the true answer not add to time-complexity and thus make Bloom filters inefficient in many cases?

Telling a user "Name aready in use, choose a different oe" might not be so bad, but what about other use cases where you cannot be wrong.

templatetypedef templatetypedef · Accepted Answer · 2020-04-20T15:22:20

The general model for using Bloom filters goes like this:

Query the filter to see whether the answer might be yes.
If the Bloom filter says no, the answer is definitely no.
If the Bloom filter says yes, the answer might be yes, so query a more accurate data structure to get a final determination.

Bloom filters really shine when step (3) is of the form “query some server somewhere to search a gigantic database to see if you have the item in question.” In that case, reducing the number of times the server needs to get pinged in order to make a determination can lead to huge performance gains on the client and reduce the load on the servers.

On the other hand, if you’re storing a small data set locally on a machine, then a Bloom filter isn’t likely to do all that much because querying that data set directly is probably going to be fast enough for all your needs.

Consequences of positive matches in Bloom filters

2 Answers