My question is about the concept of distributed cache specifically for Hadoop and whether it should be called distributed Cache. A conventional definition of distributed cache is "A distributed cache spans multiple servers so that it can grow in size and in transactional capacity".
This is not true in hadoop as Distributed cache is distributed to all the nodes which runs the tasks i.e. the same file mentioned in the driver code.
Shouldn't this be called a replicative cache. The intersection of cache on all nodes should be null (or close to it) if we go by the conventional distributed cache definition. But for hadoop the result of intersection is the same file which is present in all nodes.
Is my understanding correct or am i missing something? Please guide.
Thanks