2
votes

Hi I have few doubts regarding flume configuration for log analysis from multisource log produces servers

I have 2 apache servers running in linux servers. and one node is running with HDFS with all daemons running in same node.

  1. In which node(s) flume need to install to capture streaming logs from both servers and load into HDFS DB.
  2. Please provide flume configuration file(s) for this scenario. Here if we want to capture streams using command: tail -f /home/tomcat/webapps/logs/catalina.out
1
Sounds like you want us to do everything for you. Perhaps read the docs, have a go at getting it setup and then ask specific questions about what stumps you? - Sarge

1 Answers

1
votes

As to your first question:

1.In which node(s) flume need to install to capture streaming logs from both servers and load into HDFS DB.??

Flume will need to be installed on each apache server to read the Apache logs and then send the records to HDFS

NOTE: Don't forget when installing Apache Flume to include the HDFS jar/plugin so that upon startup, it will actually send the records to HDFS and not give you errors. Also make sure the flume on the apache tomcat node has access to the HDFS node and port that it is running on.

As to your second question:

2.Please provide flume configuration file(s) for this scenario. Here if we want to capture streams using command: tail -f /home/tomcat/webapps/logs/catalina.out

With regards to Flume configuration, specifically the "source": Here is a sample configuration:

# Describe/configure the source for tailing file
agent.sources.SrcLog.type = exec
agent.sources.SrcLog.command = tail -F /home/tomcat/webapps/logs/catalina.out
agent.sources.SrcLog.restart = true
agent.sources.SrcLog.restartThrottle = 1000
agent.sources.SrcLog.logStdErr = true
agent.sources.SrcLog.batchSize = 50

For more details do look at the Apache Flume User Guide