1
votes

There is no out-of-the box solution to clone data from one Azure EventHub to another EventHub. What are possible options to achieve this?

1

1 Answers

1
votes

One simple option for duplicating an Azure EventHub stream is to write a clone-job in PySpark. You just read the stream from your source-Eventhub select the body and if relevant for your scenario also the properties from the source-streaming dataframe and write this stream to your target-EventHub:

df = spark \
  .readStream \
  .format("eventhubs") \
  .options(**ehSource) \
  .load() \
  .select ("properties", "body") \
  .writeStream \
  .format("eventhubs") \
  .options(**ehTarget) \
  .option("checkpointLocation", checkploc) \
  .start()