0
votes

I have ten sevrers, each of them will generate about 3G log files every day. I have completed the hadoop tutorial, and installed each hadoop (HDFS) at each machine. What I want is to use map-reduce to analyze these logs.

My question is how to enable daily logs for MP in the hadoop? Currently, for a server A, I manually copied a log file to HDFS directory:

  >hadoop fs -put local_log_path /A/log_20170219.1

and then

  >hadoop jar MR_path  MP_driver_class /A/log_20170219.1 output_path.

Is there any other more efficient ways, so that I do not have to go each server, and copy the newly generated logs to DFS system manually? Does the command fs -put really involves large data file moving here?

2

2 Answers

1
votes

You can have look At apache flume which serves this use case of storing server logs in hdfs based on configurations .

0
votes

There are many ways of achieving this.

1) If you want to use normal way, you can check out the distcp, this has added advantage over the normal put or copyFromLocal command. Distcp is simply distributed copy. You can then schedule a cron to perform the distcp and then execute the jar on successful completion of the copying.

For more info :- https://hadoop.apache.org/docs/r1.2.1/distcp2.html

2) If you want to reduce this effort and use a tool , then you can check any ingestion tool such as Flume, Splunk.