I have some troubles using the SQL syntax from Flink to join multiple tables when at least one of them have a time attribute column.
I have a table Table1
that use the schema (id, value1, rowtime), when rowtime is used as a flink rowtime.
I want to join this table with a table Table2
, that use the schema (id, value2). The join must be done on matching id
.
In the end, I want to group the result of this join by using a tumbling time window.
Is it possible to do this using only the SQL syntax ?
Here is an example of what I would like to do :
SELECT
Table1.id as id,
TUMBLE_END(rowtime, INTERVAL '10' SECOND),
MAX(value1) as value1,
MAX(value2) as value2
FROM Table1 JOIN TABLE2 ON Table1.id = Table2.id
GROUP BY Table1.id, TUMBLE(rowtime, INTERVAL '10' SECOND)
But it give me the following error :
2019-11-12 16:37:57.191 [main] ERROR - Cannot generate a valid execution plan for the given query:
FlinkLogicalCalc(expr#0..6=[{inputs}], id=[$t0], EXPR$1=[$t4], value1=[$t1], value2=[$t2])
FlinkLogicalWindowAggregate(group=[{0}], value1=[MAX($2)], value2=[MAX($3)])
FlinkLogicalCalc(expr#0..2=[{inputs}], expr#3=[0], proj#0..1=[{exprs}], value1=[$t3], value2=[$t3])
FlinkLogicalJoin(condition=[=($0, $2)], joinType=[inner])
FlinkLogicalTableSourceScan(table=[[Table1]], fields=[id, value1, rowtime], source=[KafkaTableSource(id, value1, rowtime)])
FlinkLogicalTableSourceScan(table=[[Table2]], fields=[id, value2], source=[Table2_Type(id, value2)])
Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.
Please check the documentation for the set of currently supported SQL features.
org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query:
FlinkLogicalCalc(expr#0..6=[{inputs}], id=[$t0], EXPR$1=[$t4], value1=[$t1], value2=[$t2])
FlinkLogicalWindowAggregate(group=[{0}], value1=[MAX($2)], value2=[MAX($3)])
FlinkLogicalCalc(expr#0..2=[{inputs}], expr#3=[0], proj#0..1=[{exprs}], value1=[$t3], value2=[$t3])
FlinkLogicalJoin(condition=[=($0, $2)], joinType=[inner])
FlinkLogicalTableSourceScan(table=[[kafkaDataStream]], fields=[id, value1, rowtime], source=[KafkaTableSource(id, value1, rowtime)])
FlinkLogicalTableSourceScan(table=[[SensorConfigurationUpdateHTTP]], fields=[id, value2], source=[Table2_Type(id, value2)])
Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.
Please check the documentation for the set of currently supported SQL features.
at org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:387)
at org.apache.flink.table.api.TableEnvironment.optimizePhysicalPlan(TableEnvironment.scala:302)
at org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:816)
at org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:379)
at org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879)
at org.apache.flink.table.api.Table.insertInto(table.scala:1126)
...
I also tried to cast my rowtime
into a TIMESTAMP
type (as advised by the error message), but then I am no able to process the Time Window anymore. It leads to the following error :
2019-11-12 16:44:52.473 [main] ERROR - Window can only be defined over a time attribute column.
org.apache.flink.table.api.ValidationException: Window can only be defined over a time attribute column.
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.getOperandAsTimeIndicator$1(DataStreamLogicalWindowAggregateRule.scala:84)
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.translateWindowExpression(DataStreamLogicalWindowAggregateRule.scala:89)
at org.apache.flink.table.plan.rules.common.LogicalWindowAggregateRule.onMatch(LogicalWindowAggregateRule.scala:65)
at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:315)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:252)
at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
at org.apache.flink.table.api.TableEnvironment.runHepPlanner(TableEnvironment.scala:360)
at org.apache.flink.table.api.TableEnvironment.runHepPlannerSequentially(TableEnvironment.scala:326)
at org.apache.flink.table.api.TableEnvironment.optimizeNormalizeLogicalPlan(TableEnvironment.scala:282)
at org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:813)
at org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:379)
at org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879)
at org.apache.flink.table.api.Table.insertInto(table.scala:1126)