Flume agent: How flume agent gets data from a webserver located in different physical server

Question

I am trying to understand Flume and referring to the official page of flume at flume.apache.org

In particular, referring to this section, I am bit confused in this.

Do we need to run the flume agent on the actual webserver or can we run flume agents in a different physical server and acquire data from webserver?

If above is correct, then how flume agent gets the data from webserver logs? How can webserver make its data available to the flume agent ?

Can anyone help understand this?

OneCricketeer OneCricketeer · Accepted Answer · 2018-04-08T21:29:39

The Flume agent must pull data from a source, publish to a channel, which then writes to a sink.

You can install Flume agent in either a local or remote configuration. But, keep in mind that having it remote will add some network latency to your event processing, if you are concerned about that. You can also "multiplex" Flume agents to have one remote aggregation agent, then individual local agents on each web server.

Assuming a flume agent is locally installed using a Spooldir or exec source, it'll essentially tail any file or run that command locally. This is how it would get data from logs.

If the Flume agent is setup as a Syslog or TCP source (see Data ingestion section on network sources), then it can be on a remote machine, and you must establish a network socket in your logging application to publish messages to the other server. This is a similar pattern to Apache Kafka.

Flume agent: How flume agent gets data from a webserver located in different physical server

1 Answers