1
votes

I would like to have the flume agent sitting outside of a hadoop cluster, and want to know if it is possible to use flume to send messages into the hadoop cluster using WebHDFS.

If not, are there alternatives to using WebHDFS? Using a multi-tiered layer of flume would still require me to have flume agents running inside a hadoop cluster.

1
I am looking for a relevant answer to the same question soaptree but haven't figured it out yet. Will share when I do. If you have learnt it as of now, please be kind to answer your own riddle.. - nitinr708
Many many thanks for this soaptree. I have tried to set one up but this example is godsent. - nitinr708

1 Answers

1
votes

flume agents can run on their own machines without being inside a hadoop cluster, as long as you specify "hdfs" as their type.

I have a flume agent writing avro events to a hdfs sink, without being on a hadoop cluster or using WebHDFS.

Here are its settings:

agent.sinks.sink1.channel = channel1
agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://hadoopd1.x.y.z/day/id/
agent.sinks.sink1.hdfs.rollInterval = 300
agent.sinks.sink1.hdfs.fileType = DataStream
agent.sinks.sink1.hdfs.writeFormat=Text
agent.sinks.sink1.hdfs.fileSuffix=.avro
agent.sinks.sink1.serializer=avro_event
agent.sinks.sink1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder