I am trying to read kafka using spark but facing some library related issue I guess .
I am pushing some event to kafka topics which I am able to read through kafka console consumer but unable to read through spark. I am using spark-sql-kafka library and the project is written in maven. Scala version is 2.11.12 and spark version is 2.4.3.
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
My java code is below : -
SparkSession spark = SparkSession.builder()
.appName("kafka-tutorials")
.master("local[*]")
.getOrCreate();
Dataset<Row> rows = spark.readStream().
format("kafka").option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "meetup-trending-topics")
.option("startingOffsets", "latest")
.load();
rows.writeStream()
.outputMode("append")
.format("console")
.start();
spark.streams().awaitAnyTermination();
spark.stop();
Below is error message which I am getting:-
Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".; at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652) at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:161)
Solution:- Either of both 1)create uber jar or ii) --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3 I was earlier giving --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3 option after mainclass .