Clickhouse Kafka topic join :

Question

I'm having two ( and more) Kafka topics and I need to join them. My question from what I read on blogs/StackOverflow.... two option:

1) stream them both, Clickhouse Kafka engine/spark streaming, to a sperate tables and then run join which is not recommended in Clickhouse?

2) build one table with all columns and use Clickhouse Engine/spark streaming to update the same entrance?

Any advice

valo valo · Accepted Answer · 2019-03-07T15:44:18

As always it really depends what kind of data you import and how you are going to use it, but I would say that in most cases it is better to import the 2 topics into a single table (so option 2). From there you will be able to quickly filter and aggregate the records. Depending on the queries you want to do, you should import the data using an appropriate ORDER BY columns, which will make your queries much faster.

If you give more details about the schema of the data you want to join, I can be more specific with the answer.

Clickhouse Kafka topic join :

2 Answers