3
votes

So i configured flume to write my apache2 access logs to hdfs ...and as i figured out the by the logs of flume is that all the configuration are correct but i dont know the reason why is it still not writing to hdfs. So here is my flume config file

#agent and component of agent
search.sources = so
search.sinks = si
search.channels = sc

# Configure a channel that buffers events in memory:
search.channels.sc.type = memory
search.channels.sc.capacity = 20000
search.channels.sc.transactionCapacity = 100


# Configure the source:
search.sources.so.channels = sc
search.sources.so.type = exec
search.sources.so.command = tail -F /var/log/apache2/access.log

# Describe the sink:
search.sinks.si.channel = sc
search.sinks.si.type = hdfs
search.sinks.si.hdfs.path = hdfs://localhost:9000/flumelogs/
search.sinks.si.hdfs.writeFormat = Text
search.sinks.si.hdfs.fileType = DataStream
search.sinks.si.hdfs.rollSize=0
search.sinks.si.hdfs.rollCount = 10000
search.sinks.si.hdfs.batchSize=1000
search.sinks.si.rollInterval=1

and here are my flume logs

14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Creating channels
14/12/18 17:47:56 INFO channel.DefaultChannelFactory: Creating instance of channel sc   type memory
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Created channel sc
14/12/18 17:47:56 INFO source.DefaultSourceFactory: Creating instance of source so, type exec
14/12/18 17:47:56 INFO sink.DefaultSinkFactory: Creating instance of sink: si, type: hdfs
14/12/18 17:47:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/18 17:47:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Channel sc connected to [so, si]
14/12/18 17:47:56 INFO node.Application: Starting new configuration:{ sourceRunners:{so=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:so,state:IDLE} }} sinkRunners:{si=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3de76481 counterGroup:{ name:null counters:{} } }} channels:{sc=org.apache.flume.channel.MemoryChannel{name: sc}} }
14/12/18 17:47:56 INFO node.Application: Starting Channel sc
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: sc: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: sc started
14/12/18 17:47:56 INFO node.Application: Starting Sink si
14/12/18 17:47:56 INFO node.Application: Starting Source so
14/12/18 17:47:56 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: si: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: si started
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: so: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: so started

and this is the command, i have used to start flume

flume-ng agent -n search -c conf -f ../conf/flume-conf-search 

and i have a path in hdfs

       hadoop fs -mkdir hdfs://localhost:9000/flumelogs

but i dont know why it is not writing to hdfs..i can see the access logs of apache2 ..but flume is not sending them to hdfs /flumelogs dir....please help ! !

2

2 Answers

1
votes

I don't think it's a permission issue, you would see exceptions when flume is flushing to HDFS. There are two possible reasons for this problem:

1) there's is not enough data in the buffer, flume doesn't think it has to flush yet. Your sink batch size is 1000, your channel's capacity is 20000. To verify this, CTRL -C your flume process, that will force the process to flush to HDFS.

2) the more probable reason is that your exec source is not running properly. This can be due to a path problem with the tail command. Add the full path to tail in your command, like /bin/tail -F /var/log/apache2/access.log or /usr/bin/tail -F /var/log/apache2/access.log (depending on your system) check

which tail 

for the correct path.

0
votes

Could you please check the permissions on this folder : hdfs://localhost:9000/flumelogs/

My guess is that flume doesn't have the permission to write to that folder