2
votes

in my ambari cluster ( version 2.6 )

we have master machines and workers machines while kafka installed on the master machines

the partition /data is only 15G and kafka log folder is - /data/var/kafka/kafka-logs

most of the folders under /data/var/kafka/kafka-logs are with size 4K-40K

but two folders are very huge size - 5G-7G , and this cause /data to be 100%

example:

under /data/var/kafka/kafka-logs/mmno.aso.prpl.proces-90

12K     00000000000000000000.index
1.0G    00000000000000000000.log
16K     00000000000000000000.timeindex
12K     00000000000001419960.index
1.0G    00000000000001419960.log
16K     00000000000001419960.timeindex
12K     00000000000002840641.index
1.0G    00000000000002840641.log
16K     00000000000002840641.timeindex
12K     00000000000004260866.index
1.0G    00000000000004260866.log
16K     00000000000004260866.timeindex
12K     00000000000005681785.index
1.0G    00000000000005681785.log

is it possible to limit the size of the logs? or other solution ? we have small /data and need logs should not be with 1G size , how to solve it?

1

1 Answers

3
votes

Kafka has a number of broker/topic configurations for limiting the size of logs. In particular:

  • log.retention.bytes: The maximum size of the log before deleting it
  • log.retention.hours: The number of hours to keep a log file before deleting it

Note that these are not hard bounds as deletion happens per segment as described in: http://kafka.apache.org/documentation/#impl_deletes. Also these are per topic. But by setting these you should be able to control the size of your data directory.

See http://kafka.apache.org/documentation/#brokerconfigs for the full list of log.retention.*/log.roll.*/log.segment.* configs