0
votes

I am setting the properties of a Flume Agent and I am not sure what value should I use for the batchSize (number of events to batch together for send).

In my particular case I will use the console as a sink. As I understand the logger-sink is the type used in this case. But Flume documentation doesn't mention the batchSize paramenter for this kind of sink. Isn't it necessary to define a batchSize for logger-sinks?

1

1 Answers

0
votes

Well, I found an answer for the question: Isn't it necessary to define a batchSize for logger-sinks?

https://flume.apache.org/FlumeUserGuide.html#logger-sink There is not batchSize, instead there is a parameter calle maxBytesToLog which defines the maximum number of bytes of the Event body to log (by default its value is 16). Here there is simple example I found of a Flume Agent which uses the console as sink:

node.sources = my-source
node.channels = my-channel
node.sinks = my-sink
# Since node 1 sink is avro-type, here we indicate avro as source type
node.sources.my-source.type = avro
node.sources.my-source.bind = 0.0.0.0
node.sources.my-source.port = 11112
node.sources.my-source.channels = my-channel
node.channels.my-channel.type = memory
node.channels.my-channel.capacity = 10000
node.channels.my-channel.transactionCapacity = 100
node.sinks.my-sink.type = logger
node.sinks.my-sink.channel = my-channel
node.sinks.my-sink.maxBytesToLog = 256

Source from: https://medium.com/@DCA/something-about-flume-3cb720ba00e8#.37zs23dnt

And about the main question How to determine the batchSize of the sink?

With regards to the hdfs batch size, the larger your batch size the better performance will be. However, keep in mind that if a transaction fails the entire transaction will be replayed which could have the implication of duplicate events downstream.

From: https://cwiki.apache.org/confluence/display/FLUME/BatchSize,+ChannelCapacity+and+ChannelTransactionCapacity+Properties