I'm trying to run a job on Elastic MapReduce (EMR) with a custom jar. I'm trying to process about a 1000 files in a single directory. When I submit my job with the parameter s3n://bucketname/compressed/*.xml.gz, I get a "matched 0 files" error. If I pass just the absolute path to a file (e.g. s3n://bucketname/compressed/00001.xml.gz), it runs fine, but only one file gets processed. I tried using the name of the directory (s3n://bucketname/compressed/), hoping that the files within will be processed, but that just passes the directory to the job.
At the same time, I have a smaller local hadoop installation. In that, when I pass my job with wildcards (/path/to/dir/on/hdfs/*.xml.gz), it works fine and all 1000 files are listed correctly.
How do I get EMR to list all my files?
compressedin the same bucket. As soon as I deleted the empty file, the program started working. - Shashank Agarwal