0
votes

I have a Flume 1.5 agent running on a Ubuntu workstation that collects logs from various devices and re-formats the logs into a comma delimited file with very long rows. After the collection and re-reformatting of the logs they are placed into a spool directory where the Flume Agent sends the log file to a Hadoop server running a Flume agent to accept the log file and place them in a HDFS directory.

Everything works fine except that when Flume sends the file to HDFS directory there are Line Feeds after every 2048 characters in each row.

Below is my flume config files. Is there a setting to tell flume to not insert line feeds?

#On Ubuntu Workstation
#list sources, sinks and channels in the agent
agent.sources = axon_source
agent.channels = memorychannel
agent.sinks = AvroOut

#define flow
agent.sources.axon_source.channels = memorychannel
agent.sinks.AvroOut.channel = memorychannel
agent.channels.memorychannel.type = memory
agent.channels.memorychannel.capacity = 100000

#source
agent.sources.axon_source.type = spooldir
agent.sources.axon_source.spoolDir = /home/ubuntu/workspace/logdump
agent.sources.axon_source.decodeErrorPolicy = ignore

#avro out
agent.sinks.AvroOut.type = avro
agent.sinks.AvroOut.hostname = 172.31.12.221
agent.sinks.AvroOut.port = 41415
agent.sinks.AvroOut.maxIoWorkers = 2


------------------------------------------------------------


#On Hadoop Server
agent.sources = AvroIn
agent.sources.AvroIn.type = avro
agent.sources.AvroIn.bind = 172.31.131.1
agent.sources.AvroIn.port = 41415
agent.sources.AvroIn.channels = MemChan1

agent.channels = MemChan1
agent.channels.MemChan1.type = memory
agent.channels.MemChan1.capacity = 100000

agent.sinks = HDFSSink
agent.sinks.HDFSSink.type = hdfs
agent.sinks.HDFSSink.channel = MemChan1
agent.sinks.HDFSSink.hdfs.path = /Logs/%Y%m/
agent.sinks.HDFSSink.hdfs.filePrefix = axoncapture
agent.sinks.HDFSSink.hdfs.fileSuffix = .log
agent.sinks.HDFSSink.hdfs.minBlockReplicas = 1
agent.sinks.HDFSSink.hdfs.rollCount = 0
agent.sinks.HDFSSink.hdfs.rollSize = 314572800
agent.sinks.HDFSSink.hdfs.writeFormat = Text
agent.sinks.HDFSSink.hdfs.fileType = DataStream
agent.sinks.HDFSSink.hdfs.useLocalTimeStamp = True
1

1 Answers

1
votes

Found the answer to my question:

The default maxLineLength for the LINE deserializer is 2048: http://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#line

I added the line to my flume.conf file and fixed the problem: agent.sources.axon_source.deserializer.maxLineLength=60000