I have 1000+ files available in HDFS with a naming convention of 1_fileName.txt
to N_fileName.txt
. Size of each file is 1024 MB.
I need to merge these files in to one (HDFS)with keeping the order of the file. Say 5_FileName.txt
should append only after 4_fileName.txt
What is the best and fastest way to perform this operation.
Is there any method to perform this merging without copying the actual data between data nodes? For e-g: Get the block locations of this files and create a new entry (FileName) in the Namenode with these block locations?