How to compare the performance of two spark streaming jobs?

Question

My spark job is consuming data from a kafka topic and performing some operations. The difference comes in the serialization, one is using java serialization and the other is using kryo serialization. How to compare these two streaming jobs as they have a different number of inputs per second and different input batch size in the same time interval?

Vladislav Varslavans Vladislav Varslavans · Accepted Answer · 2019-12-04T10:23:48

I would say that you need to calculate time required to process single input.

Say you get batch with 1000 records. Measure whole process time and divide by 1000. You got processing time of single record. Collect that info for multiple batches and then compare results.

How to compare the performance of two spark streaming jobs?

1 Answers