I am trying to understand the workaround that is mentioned in:
https://issues.apache.org/jira/browse/KAFKA-3705
as in
Today in Kafka Streams DSL, KTable joins are only based on keys. If users want to join a KTable A by key a with another KTable B by key b but with a "foreign key" a, and assuming they are read from two topics which are partitioned on a and b respectively, they need to do the following pattern:
tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' is partitioned on "a" tableA.join(tableB', joiner);
I have a hard time to understand what is exactly happening.
In particular that sentence is confusing: "If users want to join a KTable A by key a with another KTable B by key b but with a "foreign key" a". Also i do not understand the code above either.
Can someone clarify a bit what is happening here ?
This is also mentioned here:
Close the gap between the semantics of KTables in streams and tables in relational databases. It is common practice to capture changes as they are made to tables in a RDBMS into Kafka topics (JDBC-connect, Debezium, Maxwell). These entities typically have multiple one-to-many relationship. Usually RDBMSs offer good support to resolve this relationship with a join. Streams falls short here and the workaround (group by - join - lateral view) is not well supported as well and is not in line with the idea of record based processing. https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable
What means (group by - join - lateral view) ? I suspect it is related to the code above, but again a bit hard to follow. Could any one shed some light on this ?