1
votes

I am sending events from flume to kafka topic through flume sink....for this I am using file channel but I am getting very less performance while writing to kafka the flume sink is writing at a rate of 190 msg per sec whereas flume source is dumping the events to the channel at a rate of 3000 msg per sec....I want to increase my sink performance......please help how can I achive that....I had tried various configuration for that....this is my configuration file-----

agent1.sources = AspectJ
agent1.channels = fileTailChannel
agent1.sinks = APMNullSink

#AspectJ source
agent1.sources.AspectJ.type=com.flume.test.DumbSource
agent1.sources.AspectJ.path=/media/sf_New_Log/calltracedata.txt
agent1.sources.AspectJ.eventtype=CallTrace
agent1.sources.AspectJ.channels=fileTailChannel
agent1.sources.AspectJ.batchSize=1000
agent1.sources.AspectJ.batchDurationMillis=1000
agent1.sources.AspectJ.application = AspectJ
agent1.sources.AspectJ.multi.line.mode=true
agent1.sources.AspectJ.new.event.marker={
agent1.sources.AspectJ.event.terminator=}
agent1.sources.AspectJ.tailer.start.reading.from.end = false


#File Tail Channel 
agent1.channels.fileTailChannel.type = file
agent1.channels.fileTailChannel.checkpointDir = /tmp/flume/filechannel/checkpoint
agent1.channels.fileTailChannel.dataDirs = /tmp/flume/filechannel/data
agent1.channels.fileTailChannel.transactionCapacity=100000

#APM Null Sink
agent1.sinks.APMNullSink.type = com.flume.test.TestJSON2KafkaSink
agent1.sinks.APMNullSink.channel = fileTailChannel
agent1.sinks.APMNullSink.testSize = 1000
agent1.sinks.APMNullSink.zk.connect=sandbox.hortonworks.com:2181
agent1.sinks.APMNullSink.metadata.broker.list=sandbox.hortonworks.com:6667
agent1.sinks.APMNullSink.topic=Test
agent1.sinks.APMNullSink.producer.type=async
agent1.sinks.APMNullSink.serializer.class=kafka.serializer.StringEncoder
agent1.sinks.APMNullSink.batch.num.messages=1000
agent1.sinks.APMNullSink.batchSize=50000
agent1.sinks.APMNullSink.batchDurationMillis=1000
agent1.sinks.APMNullSink.queue.buffering.max.ms=5000
agent1.sinks.APMNullSink.queue.buffering.max.messages=100000
agent1.sinks.APMNullSink.send.buffer.bytes=2097152
agent1.sinks.APMNullSink.compression.codec=snappy
1

1 Answers

0
votes

I think your problem is that everything is running on one box, so the quick ingestion to the file channel is impacting kafka, which needs to do more work to write.

I suggest two options:

  1. Set the capacity of the file channel to limit the queue size, so it doesn't go faster than kafka can ingest.
  2. Or use a kafka channel: https://flume.apache.org/FlumeUserGuide.html#kafka-channel. However in this option the messages in the topic get wrapped into a AvroFlumeEvent, so subscribers need to deserialize using that class.

Also I don't understand why you need you own com.flume.test.TestJSON2KafkaSink instead of the one that comes with Flume. Maybe you have some performance issue in your code.