I am totally desperate!
I am using apache flink with java and I would like to know if is it possible to modify the keyby method in order to key by similarities and not by the exact name?
I have two different DataStreams and I am doing a union. In the first stream , the name of the field that I want to KeyBy is "John Locke", while in the second Datastream the field value is "John L".
I have an algorithm that gives me an score between some different strings . My idea is: if the score between both strings is higher than 0'80 for example, then those two strings will be consider the same and when I apply the keyby("name") those similar string will be keyed as they have the exact same name.
Visual example:
datastream1----- John Locke, Mickey Micke, Will Williams
satastream2----- Mickey M., John L., Anthony Brown
Datastream d3= datastream1.union(datastream2)
d3.keyby the score/ the similatiry, not the exact name.
I hope you understand, thanks!