1
votes

I have problem with KSQL is data lost while update stream (terminate query and drop stream , create new stream) and publish data to 'MainTopic'.

My KSQL architecture is:

MAIN_STREAM ----> CONDITION_STREAM

I lost data during terminate query and drop stream until create new CONDITION_STREAM.

Can suggest way to update new CONDITION_STREAM while publish data in 'MainTopic' and continue consume data at time of terminate query and drop stream.

Forgive my English skill. Thank.

I try to use 'auto.offset.reset'='earliest' for CONDITION_STREAM but this consume all of data in MainTopic from MAIN_STREAM.

STEP 1 : create MAIN_STREAM from main topic and condition stream for output topic

CREATE STREAM MAIN_STREAM (event_id VARCHAR , payload VARCHAR) WITH (KAFKA_TOPIC='MainTopic', VALUE_FORMAT='json');

STEP 2 : create CONDITION_STREAM filter data from MAIN_STREAM

CREATE STREAM CONDITION_STREAM WITH (KAFKA_TOPIC='OutputTopic', VALUE_FORMAT='json') AS SELECT * from MAIN_STREAM WHERE  event = "payment";

STEP 3 : terminate query id of CONDITION_STREAM

TERMINATE <CONDITION_STREAM_QUERY_ID>;

STEP 4 : create new CONDITION_STREAM

DROP STREAM CONDITION_STREAM;

STEP 5 : create new CONDITION_STREAM

CREATE STREAM CONDITION_STREAM WITH (KAFKA_TOPIC='OutputTopic', VALUE_FORMAT='json') AS SELECT * from main_stream WHERE  event = "something " AND EXTRACTJSONFIELD(payload, '$.status') = 'init';
1

1 Answers

0
votes

Recreating the CONDITION_STREAM and having it pick up processing of its main_stream input from where it left off is not currently possible, (as of ksqlDB v0.10).

However, its an area receiving a lot of thought and attention. There is a design proposal in review at the current time to take the first steps in allowing existing stream and table views to be updated in-place. See https://github.com/confluentinc/ksql/pull/5611 for more info.