0
votes

I am new to Flink, and I am using Flink 1.3(old version, but my company has used this version...) to write a stream application. The application involves join with one stream(from kafka) and two static Hive tables(change once every day, the two tables are about 100 million rows),

I would ask what the best way is to do the join, Stream API or SQL API? It looks to me that Stream API doens't support read from Hive.

At this point, I want to know which API(data stream api or stream sql api) should be used.

Thanks!

1

1 Answers

0
votes

I think that the best approche for you case is to convert the stream to table

And then Joining tha data against the hive table data .

// get StreamTableEnvironment 
 // registration of a DataSet in a BatchTableEnvironment is equivalent 
StreamTableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

DataStream<Tuple2<Long, String>> stream = ...

// Convert the DataStream into a Table with default fields "f0", "f1" 
Table table1 = tableEnv.fromDataStream(stream);

// Convert the DataStream into a Table with fields "myLong", "myString" 
Table table2 = tableEnv.fromDataStream(stream, "myLong, myString");