0
votes

my doubt is about if custom classes in Flink with Java needs to override or not hashCode() and equals() methods because I have read in this page that hashCode() MUST never be implemented in distributed systems and Apache Flink is one of them.

Example: I have this class:

public class EventCounter {
    public String Id;
    public long count;
    public Timestamp firstEvent;
    public Timestamp lastEvent;
    public Date date;

    public EventCounter() {
    }
}

Do I need to implement hashCode() and equals() for this kind of classes in Flink or it is better for performance if I let Flink manage those methods on it's own?

Kind regards!

3

3 Answers

2
votes

Types that you want to use as keys in Flink (i.e., as values you return from a KeySelector) must have valid implementations of hashCode and equals. In particular, hashCode must be deterministic across JVMs (which is why arrays and enums don't work as keys in Flink).

0
votes

Before writing the two methods, just think about your class need to be, symmetric or transitive or consistent?

It specially designed for Hash based algorithms. So you need to make sure that them in proper way, and a side note creating hash code is a CPU intensive task.

0
votes

hasCode() and equals() methods needs to be implemented only in cases where the object/class is going to be used as keys into Flink, example:

DataStream<EventCounter> stream = env.addSource(...);
KeyedStream<EventCounter, String> keyed = stream.keyby(k->k); /*Where k is the class object type!*/