Since hadoop runs over HDFS, and data is duplicated across the HDFS cluster for redundancy, does a hadoop map operation actually waste a lot of processor cycles by running mappers over same data points, on different nodes in the cluster? (as the nodes, by design, have some data overlap between them, as per the replication level).
Or does it first, depending on some job management strategy of some sort, only address part of the nodes, to avoid that kind of duplicate calculation, in some very clever way?