I am totally new to Hadoop though I understand the concept of map reduce fairly well.
Most of the Hadoop tutorials start with the WordCount example. So I wrote a simple wordcount program which worked perfectly well. But then I am trying to take word count of a very large document. (Over 50GB).
So my question to the Hadoop experts is, how will Hadoop handle the large file ? Will it transfer the copies of the file to each mapper or will it automatically split it into blocks and transfer those blocks to mappers ?
Most of my experience with MapReduce was because of CouchDB where mapper handles on document at a time but from what I read about Hadoop, I wonder if it is designed to handle multiple small files or few large files or both ?