Hadoop handle for logs across multiple nodes

Question

I have ten sevrers, each of them will generate about 3G log files every day. I have completed the hadoop tutorial, and installed each hadoop (HDFS) at each machine. What I want is to use map-reduce to analyze these logs.

My question is how to enable daily logs for MP in the hadoop? Currently, for a server A, I manually copied a log file to HDFS directory:

  >hadoop fs -put local_log_path /A/log_20170219.1

and then

  >hadoop jar MR_path  MP_driver_class /A/log_20170219.1 output_path.

Is there any other more efficient ways, so that I do not have to go each server, and copy the newly generated logs to DFS system manually? Does the command fs -put really involves large data file moving here?

SurjanSRawat SurjanSRawat · Accepted Answer · 2017-02-21T04:54:33

You can have look At apache flume which serves this use case of storing server logs in hdfs based on configurations .

Hadoop handle for logs across multiple nodes

2 Answers