Spark Structured Streaming Batch

Question

I am running batch in Structured programming of Spark. The below snippet code throws error saying "kafka is not a valid Spark SQL Data Source;". The version I am using for the same is --> spark-sql-kafka-0-10_2.10. Your help is appreciated. Thanks.

Dataset<Row> df = spark
    .read()         
    .format("kafka")
    .option("kafka.bootstrap.servers", "*****")
    .option("subscribePattern", "test.*")
    .option("startingOffsets", "earliest")
    .option("endingOffsets", "latest")
    .load();

Exception in thread "main" org.apache.spark.sql.AnalysisException: kafka is not a valid Spark SQL Data Source.;

Try to use sql-kafka-0-10_2.11, not spark-sql-kafka-0-10_2.10. — himanshuIIITian
What Spark version do you use? How do you execute the above code? In spark-shell or as part of a Spark application? How do you execute the Spark application? — Jacek Laskowski
Jacek, I use Spark 2.1.0 version. I am not using spark-shell for now. I just running the main class from eclipse. — Ansip

randombulbs randombulbs · Accepted Answer · 2017-11-21T18:40:18

I had the same problem and like me you are using read instead of readStream.

Changing spark.read() to spark.readStream worked fine for me.

Spark Structured Streaming Batch

2 Answers