0
votes

When I copy the data file to HDFS by using -copyFromLocal command` data gets copied into to HDFS. When I see this file through web browser, it shows that the replication factor is 3 and file is in location "/user/hduser/inputData/TestData.txt" with a size of 250 MB.

I have 3 CentOS servers as DataNodes, CentOS Desktop as NameNode and client.

When I copy from local to the above mentioned path, where exactly it copies to? Does it copy to NameNode or DataNode as blocks of 64 MB? Or, it won't replicate until I run MapReduce job and map prepares splits and replicates the data to DataNodes?

Please clarify my queries.

1

1 Answers

0
votes

1 . When i copy from local to this above mentioned path. Where exactly it copies to ? Ans: The data gets copied to HDFS or HADOOP Distributed file system. which consists of data node and name node. The data that you copy resides in data nodes as blocks (64MB or multiple of 64 MB) and the information of which blocks resides in which data node and its replica is stored in namenode.

2. is it copies to namenode or datanode as many splits of 64 MB ? or Ans: your file will be stored in data node as blocks of 64MB and the location and order of the splits is stored in name node.

3 it wont replicate untill i run MapReduce Job. and map prepares splits and replicates to datanodes. Ans: This is not true. As soon as the data is copied in HDFS, Filesystem replicates the data based on the set replication ratio irrespective of process used to copy the data.