We are using Kafka and Spark streaming to process trade data. We receive data from Kafka in avro format [key, byte[]]. We deserialize the data and send it further for processing. We are using DStreams in spark streaming application. We have requirement where we need to partition the data based on the key in the received avro record. So whenever, we receive data from kafka in form of stream, it should send the record to specified executor.
There are 10 different types of keys possible which we receive from Kafka. So All records with key1 should go to Node1, key2 should go to Node2 etc.
As the received stream data, we map to RDD but not pairRDD.
Please let us know if we can configure Paritioning based on Key of received record from kafka.