Does mapreduce and any of the other hadoop technologies (HBase, Hive, pig etc) lend themselves well to situations where you have multiple input files and where data needs to be compared between the different datasources.
In the past I've written a few mapreduce jobs using Hadoop and Pig. However these tasks were quite simple since they involved manipulating only a single dataset. The requirements we have now, dictates that we read data from multiple sources and perform comparisons on various data elements on another datasource. We then report on the differences. The datasets we are working with are in the region of 10million - 60million records and so far we haven't manage to make these jobs fast enough.
Is there a case for using mapreduce in order to solve such issues or am I going down the wrong route.
Any suggestions are much appreciated.