I am setting up distributed HBase on HDFS and I trying to understand behavior of the system during read operations.
This is how I understand high level steps of the read operation.
- Client connects to NameNode to get list of DataNodes which contain replicas of the rows that he interested in.
- From here Client caches list of DataNodes and start talking to chosen DataNode directly until it needs some other rows from other DataNode, in which case it asks NameNode again.
My questions are as follows:
- Who chooses the best replica DataNode to contact? How Client chooses "closest" replica? Does NameNode return list of relative DataNodes in a sorted order ?
- What are the scenarios(if any) when Client switches to another DataNode that has requested rows? For example if one of the DataNode becomes overloaded/slow can the client library figure out to contact another DataNode from the list returned by the NameNode?
- Is there a possibility of getting stale data from one of the replicas? For example client acquired list of DataNodes and starts reading from one of them. In the mean time there is a write request coming from another client to NameNode. We have dfs.replication == 3 and dfs.replication.min = 2. NameNode consider write successful after flushing to disk on 2 out of 3 nodes, while first client is reading from the 3rd node and doesn't know (yet) that there is another write that has been committed ?
- Hadoop maintains the same reading policy when supporting HBase?
Thank you