I want to make my understanding about hadoop distributed cache clear. I know that when we add files to distributed cache, the files get loaded to the disk of every node in the cluster.
So how do the data of the files get transmitted to all the nodes in the cluster. Is it through the network? If so, will it not cause a strain on the network?
I have the following thoughts, are they correct?
If the files are large, wont there be network congestion?
If the number of nodes are large, even though the files are of medium or small size, the replication of the files and transmission to all nodes, wont it cause network congestion and memory constraints?
Please help me in understanding these concepts.
Thanks!!!