0
votes

I have a flume agent which reads from a spool directory source and after some transformation writes to hdfs. Since flume tries to rename the processed files to '.COMPLETED', I'am getting a permission denied exception for writing in the spool directory.

I was wondering how safe it would be to give write permission to sensitive data.

Is there a round about solution for flume to identify the processed files in the spool directory

1

1 Answers

0
votes

The spool directory source's way of working requires renaming of files.

As a workaround, it's safer to have a "read-only" copy of the files and create some mechanism (eg. cron job) that copies files to the spooling directory Flume has write access to. (And possibly set the deletePolicy configuration option to immediate, to avoid filling the disk.)

If you would like to request such a feature, I recommend creating a new ticket at https://issues.apache.org