0
votes

I am trying to simulate syslog flume agent which eventually should put the data into HDFS.
My scenario follows:

the syslog flume agent is running on physical server A, following are the configuration details:

===

syslog_agent.sources = syslog_source
syslog_agent.channels = MemChannel
syslog_agent.sinks = HDFS

# Describing/Configuring the source
syslog_agent.sources.syslog_source.type = syslogudp
#syslog_agent.sources.syslog_source.bind = 0.0.0.0
syslog_agent.sources.syslog_source.bind = localhost
syslog_agent.sources.syslog_source.port = 514

# Describing/Configuring the sink
syslog_agent.sinks.HDFS.type=hdfs
syslog_agent.sinks.HDFS.hdfs.path=hdfs://<IP_ADD_OF_NN>:8020/user/ec2-user/syslog
syslog_agent.sinks.HDFS.hdfs.fileType=DataStream
syslog_agent.sinks.HDFS.hdfs.writeformat=Text
syslog_agent.sinks.HDFS.hdfs.batchSize=1000
syslog_agent.sinks.HDFS.hdfs.rollSize=0
syslog_agent.sinks.HDFS.hdfs.rollCount=10000
syslog_agent.sinks.HDFS.hdfs.rollInterval=600

# Describing/Configuring the channel
syslog_agent.channels.MemChannel.type=memory
syslog_agent.channels.MemChannel.capacity=10000
syslog_agent.channels.MemChannel.transactionCapacity=1000

#Bind sources and sinks to the channel
syslog_agent.sources.syslog_source.channels = MemChannel
syslog_agent.sinks.HDFS.channel = MemChannel

I am sending syslog "logs" from different physical server B using the inbuilt utility "logger", like this:

sudo logger --server <IP_Address_physical_server_A> --port 514 --udp

I do see yje log messages going into physical server-A 's path --> /var/log/messages

But I don't see any message going into HDFS; it seems the the flume agent isn't able to get any data, even though the messages are going from server-B to server-A.

Am I doing something wrong here? Can anyone help me how to resolve this?

EDIT

The following is the output of netstat command on server-A where the syslog daemon is running:

tcp        0      0 0.0.0.0:514             0.0.0.0:*               LISTEN      573/rsyslogd
tcp6       0      0 :::514                  :::*                    LISTEN      573/rsyslogd
udp        0      0 0.0.0.0:514             0.0.0.0:*                           573/rsyslogd
udp6       0      0 :::514                  :::*                                573/rsyslogd
1

1 Answers

1
votes

I'm not sure what logger --server.gives you, but most examples I have seen use netcat.

In any case, you've set batchSize=1000, so until you send 1000 messages, Flume will not write to HDFS.

Keep in mind, HDFS is not a streaming platform, and prefers not to have small files.

If you're looking for log collection, look into Elasticsearch or Solr fronted by a Kafka topic