I am trying to understand how to tail a file with flume-ng so that I can push the data into HDFS. In the first instance I have setup a simple conf file:
tail1.sources = source1
tail1.sinks = sink1
tail1.channels = channel1
tail1.sources.source1.type = exec
tail1.sources.source1.command = tail -F /var/log/apache2/access.log
tail1.sources.source1.channels = channel1
tail1.sinks.sink1.type = logger
tail1.channels.channel1.type = memory
tail1.channels.channel1.capacity = 1000
tail1.channels.channel1.transactionCapacity = 100
tail1.sources.source1.channels = channel1
tail1.sinks.sink1.channel = channel1
This is a test, where my expectation is I will see the output on the console. I run this with the following command:
flume-ng agent --conf-file tail1.conf -n tail1 -Dflume.root.logger=DEBUG,INFO,console
I get the following output:
12/12/05 11:01:07 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1 12/12/05 11:01:07 INFO node.FlumeNode: Flume node starting - tail1 12/12/05 11:01:07 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting 12/12/05 11:01:07 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 8 12/12/05 11:01:07 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting 12/12/05 11:01:07 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:tail1.conf 12/12/05 11:01:07 INFO conf.FlumeConfiguration: Processing:sink1 12/12/05 11:01:07 INFO conf.FlumeConfiguration: Processing:sink1 12/12/05 11:01:07 INFO conf.FlumeConfiguration: Added sinks: sink1 Agent: tail1 12/12/05 11:01:07 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [tail1] 12/12/05 11:01:07 INFO properties.PropertiesFileConfigurationProvider: Creating channels 12/12/05 11:01:08 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: channel1, registered successfully. 12/12/05 11:01:08 INFO properties.PropertiesFileConfigurationProvider: created channel channel1 12/12/05 11:01:08 INFO sink.DefaultSinkFactory: Creating instance of sink: sink1, type: logger 12/12/05 11:01:08 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{source1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource@1839aa9 }} sinkRunners:{sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@11f0c98 counterGroup:{ name:null counters:{} } }} channels:{channel1=org.apache.flume.channel.MemoryChannel@1740f55} } 12/12/05 11:01:08 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel channel1 12/12/05 11:01:08 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started 12/12/05 11:01:08 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink sink1 12/12/05 11:01:08 INFO nodemanager.DefaultLogicalNodeManager: Starting Source source1 12/12/05 11:01:08 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
However nothing further happens.
I have another session where I have the following command:
tail -F /var/log/apache2/access.log
Where I can see the file being written to:
192.168.1.81 - - [05/Dec/2012:10:58:07 +0000] "GET / HTTP/1.1" 200 483 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"
192.168.1.81 - - [05/Dec/2012:10:58:07 +0000] "GET /favicon.ico HTTP/1.1" 404 502 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"
192.168.1.81 - - [05/Dec/2012:10:58:21 +0000] "GET / HTTP/1.1" 304 209 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"
192.168.1.81 - - [05/Dec/2012:10:58:22 +0000] "GET /favicon.ico HTTP/1.1" 404 502 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"
Can you help? I am thoroughly confused.