I have ten sevrers, each of them will generate about 3G log files every day. I have completed the hadoop tutorial, and installed each hadoop (HDFS) at each machine. What I want is to use map-reduce to analyze these logs.
My question is how to enable daily logs for MP in the hadoop? Currently, for a server A, I manually copied a log file to HDFS directory:
>hadoop fs -put local_log_path /A/log_20170219.1
and then
>hadoop jar MR_path MP_driver_class /A/log_20170219.1 output_path.
Is there any other more efficient ways, so that I do not have to go each server, and copy the newly generated logs to DFS system manually? Does the command fs -put really involves large data file moving here?