If I am running an EMR job (in Java) on Amazon Web Services to process large amounts of data, is it possible to have every single mapper access a small file stored on S3? Note that the small file I am talking about is NOT the input to the mappers. Rather, the mappers need to process the input according to some rules in the small file. Maybe the large input file is a billion lines of text, for example, and I want to filter out words that are in a blacklist or something by reading a small file of blacklisted words stored in an S3 bucket. In this case, each mapper would process different parts of the input data, but they would all need to access the restricted words file on S3. How can I make the mappers do this in Java?
EDIT: I am not using the Hadoop framework, so there is no setup() or map() method calls. I am simply using the streaming EMR service and reading stdin line by line from input file.