im researching Hadoop and MapReduce (I'm a beginner!) and have a simple question regarding HDFS. I'm a little confused about how HDFS and MapReduce work together.
Lets say I have logs from System A, Tweets, and a stack of documents from System B. When this is loaded into Hadoop/HDFS, is this all thrown into one big HDFS bucket, or would there be 3 areas (for want of a better word)? If so, what is the correct terminology?
The questions stems from understanding how to execute a MapReduce job. If I only wanted to concentrate on the Logs for example, can this be done, or are all jobs executed on the entire content stored on the cluster?
Thanks for your guidance! TM