0
votes

I am creating a ksql stream from kafka topic. Source topic has 50 partitions, and target stream also has 50 partitions, But the issue is source partition 1 is going to random partition in the target stream ( example partition 10).

Schema: CREATE STREAM SCHEMA_BASE ( ID VARCHAR, Timestamp VARCHAR, CITY VARCHAR, Partition INTEGER) WITH ( KAFKA_TOPIC = 'SPARK_EVENTS', VALUE_FORMAT = 'JSON', TIMESTAMP_FORMAT = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSS''Z''', TIMESTAMP = 'Timestamp' );

Stream : CREATE STREAM spark_event_streams as SELECT ID, Timestamp, CITY, Partition FROM SCHEMA_BASE PARTITION BY Partition;

Is there a way I can force the target stream to use exact partitioning??

1
Did you use custom partioner while producing your data into the main stream? What is the keys in SPARK_EVENTS topic? Seems your main stream is not partioned by PARTITIONRan Lupovich

1 Answers

0
votes

Custom partitioning is not supported in ksqlDB, and ksqlDB always uses the default partitioner, that implement a round-robin strategy if the message key is null.

I filed https://github.com/confluentinc/ksql/issues/7984 to maybe extend ksqlDB with a new feature.