We are working on a Proof of Concept with MongoDB and Amazon EMR. We have been able to get a simple end to end solution working where it can read data from one collection in mongo, perform map/reduce functions and then write the output to another collection in Mongo.
My question is - is it possible to read in additional collections from Mongo that would be used for lookup purposes. i.e. all data in collection1 would have the map/reduce functions performed on it but the map/reduce functions would use data from collection2 and collection3 for lookup purposes.
If this is not possible - then what is the best way to get the lookup data into hadoop so it can be used for lookup purposes?