1
votes

I am running batch in Structured programming of Spark. The below snippet code throws error saying "kafka is not a valid Spark SQL Data Source;". The version I am using for the same is --> spark-sql-kafka-0-10_2.10. Your help is appreciated. Thanks.

Dataset<Row> df = spark
    .read()         
    .format("kafka")
    .option("kafka.bootstrap.servers", "*****")
    .option("subscribePattern", "test.*")
    .option("startingOffsets", "earliest")
    .option("endingOffsets", "latest")
    .load();
Exception in thread "main" org.apache.spark.sql.AnalysisException: kafka is not a valid Spark SQL Data Source.;
2
Try to use sql-kafka-0-10_2.11, not spark-sql-kafka-0-10_2.10. - himanshuIIITian
What Spark version do you use? How do you execute the above code? In spark-shell or as part of a Spark application? How do you execute the Spark application? - Jacek Laskowski
Jacek, I use Spark 2.1.0 version. I am not using spark-shell for now. I just running the main class from eclipse. - Ansip

2 Answers

2
votes

I had the same problem and like me you are using read instead of readStream.

Changing spark.read() to spark.readStream worked fine for me.

0
votes

Use the spark-submit mechanism and pass along -jars spark-sql-kafka-0-10_2.11-2.1.1.jar

Adjust the version of kafka, scala and spark in that library according to ur own situation.