1
votes

Use Case: An application uses spark to process data for 5 minutes, the data to be processed could be of several hundred thousands of records in data storage. The choice for data storage is Elastic Search.

Issue: Do we have a connector for the spark in elasticsearch similar to the connector in MongoDB?

https://www.mongodb.com/products/spark-connector.

Investigation: I spent a lot of time but the best I could find was a solution using search API with scroll(we can fetch the limited number of records for given number interval), but this does not fit my use-case.

Please note that my elastic search will have JSON data and we do not want to save RDD. as mentioned in below

https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html

1

1 Answers

0
votes

You can use spark connector for ES , and data is not saved in any binary form - but RDD/Dataframe is serialized as JSON and thats what goes into Elasticsearch.