I have a lot of small files (size ~ 1MB) that I need to distribute. It's known that Hadoop and HDFS prefer large files. But I don't know whether this can also be applied to Distributed Cache since the distributed files are stored on local machines.
If they need to be merged, what is the best way to merge files programmatically on HDFS ?
One more question: what are the benefits of using symlink ? Thanks