Is there an alternative for hadoop -getmerge?

Question

I was trying to merge 80 GB files in a cluster using hadoop get merge ,

but as hadoop get merge has the property of copying the files from hdfs to local file system i have to copy to local and then copyFromLocal to hdfs again ,

hadoop fs -getmerge hdfs:///path_in_hdfs/* ./local_path

hadoop fs -copyFromLocal ./local_path hdfs://Destination_hdfs_Path/

My problem here is The datanode local is less than 80 GB,

I need to know is there an alternative to -getmerge where merge happens directly from HDFS to HDFS

I tried hadoop -cat also but it is not working..

VK_217 VK_217 · Accepted Answer · 2016-04-19T20:17:43

HDFS command with -cat option should work. Pipe the result of -cat command to the -put command.

hadoop fs -cat hdfs://input_hdfs_path/* | hadoop fs -put - hdfs://output_hdfs_path/output_file.txt

Is there an alternative for hadoop -getmerge?

3 Answers

File 1

File 2

Merged File