I have a rather abstract question: usual hashing algorithms (both cryptographic and non-cryptographic) change drastically if input changes even slightly.
Digest::SHA1.hexdigest 'hello'
=> "aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d"
Digest::SHA1.hexdigest 'hello!'
=> "8f7d88e901a5ad3a05d8cc0de93313fd76028f8c"
Are there hash algorithms that don't change the output when input changes slightly?
Ideally such algorithm should have a tolerance
setting, which should tell how much of the input changes the hash should tolerate before changing the output.
For example, if input tolerance is 70%, these "hello" and "hello!" strings should produce the same hashed output, but if it's 95%, then these two strings should produce different (slightly) output.
Maybe it's not called hashing at all, but this area is an unknown unknown to me.
fuzzy hashing
– halex