i would like to perform a Apache Spark map-reduce on 5 files and output them to mongodb. I would prefer not using HDFS since NameNodes are a single point of failure (http://wiki.apache.org/hadoop/NameNode).
A. Is it possilbe to read multiple files in RDD, perform a map reduction on a key from all the files and use the casbah toolkit to output the results to mongodb
B. Is it possible to use the client to read from mongodb into RDD, perform a map reduce and right output back to mongodb using the casbah toolkit
C. Is it possible to read multiple files in RDD, map them with keys that exist in mongodb, reduce them to a single document and insert them back into mongodb
I know all of this is possible using the mongo-hadoop connector. I just dont like the idea of using HDFS since it is a single point of failure and backUpNameNodes are not implemented yet.
Ive read some things on line but they are not clear.
MongoDBObject not being added to inside of an rrd foreach loop casbah scala apache spark
Not sure whats going on there. The JSON does not even appear to be valid...
resources: