I am new to Hadoop and MapReduce. I have some directory and files within this (each file 10 MB big and N could be 100. Files may be compressed or uncompressed) like: MyDir1/file1 MyDir1/file2 ... MyDir1/fileN
MyDir2/file1 MyDir2/file2 ... MyDir3/fileN
I want to design a MapReduce application where one mapper or reducer would process entire MyDir1 i.e. I dont want the MyDir1 to be split across multiple mappers. Similarly I want MyDir2 to be processed by other mapper/reducer completely without splitting.
Any idea on how to go about this? Do I need to write my own InputFormat and read the input files?