I'm having trouble understanding how Spark interacts with storage.
I would like to make a Spark cluster that fetches data from a RocksDB database (or any other key-value store). However, at this moment, the best I can do is fetch the whole dataset from the database into memory in each of the cluster nodes (into a map for example) and build an RDD from that object.
What do I have to do to fetch only the necessary data (like Spark does with HDFS)? I've read about Hadoop Input Format and Record Readers, but I'm not completely grasping what I should implement.
I know this is kind of a broad question, but I would really appreciate some help to get me started. Thank you in advance.