My Hadoop knowledge is 4 weeks old. I am using a sandbox with Hadoop.
According to the theory, when a file is copied into the HDFS file system, it will be split into 128 MB blocks. Each block will then be copied into different data nodes and then replicated to data nodes.
Question:
When I copy a data file (~500 MB) from local file system into HDFS (put command) entire file is still present in HDFS (-ls command). I was expecting to see 128 MB block. What am I doing wrong here ?
If suppose, I manage to split & distribute data file in HDFS, is there a way to combine and retrieve original file back to local file system ?