5
votes

I have to implement file watcher functionality in Erlang: There should be a process that list files if specific directory and do something, when files appear.

I take a look at OTP. So at the moment I have following ideas: 1. Create Supervisor that will control gen_servers (one server per folder) 2. Create WatchServer - gen_server for each folder that I want to monitor. 3. Create ProcessFileServer - gen server that should do something with files )assume copy to different folder=

So First problem: WatchServer should not wait for request, it should generate one in predefined intervals.

At the moment I have created a timer in init/1 function and handle on_timer event in handle_info function.

Now questions: 1. Are there better ideas? 2. How should I inform ProcessFileServer that file found? It seams to me that it would be much more convenient create WatchServers and ProcessServers independently, but in this case I do not know to whom send message?

May be there are some similar project/libs available?

3
As you look new to SO: avoid "Hello" and "Thanks" this is regarded as clutter, see meta.stackexchange.com/questions/2950/… I have fixed your question already.Peer Stritzinger

3 Answers

4
votes

if you are using Linux, you can use inotify. It is a kernel service that lets you subscribe to file system events. Don't poll the filesystem, let the filesystem call you.

you can try https://github.com/massemanet/inotify for observing your directory.

Ulf

2
votes

In Erlang it is very cheap to create processes (orders of magnitudes compared to other systems).

Therefore I recommend to create a new ProcessFileServer each time a new file to process is appearing. When it is done with just terminate the process with exit reason normal.

I would suggest the following structure:

                              top_supervisor
                                      |
              +-----------------------+-------------------------+
              |                                                 |
       directory_supervisor                             processing_supervisor
               |                                         simple_one_for_one
    +----------+-----...-----+                                   |
    |          |             |                       starts children transient
    |          |             |                                   |
dir_watcher_1 dir_watcher_2 dir_watcher_n   +-------------+------+---...----+
                                            |             |                 |
                                        proc_file_1   proc_file_2       proc_file_n

When a dir_watcher notices a new file appeared. It calls the processing_supervisors supervisor:start_child\2 function, with the extra parameter of the file pathe e.g.

The processing_supervisor should start its children with transient restart policy.

So if one of the proc_file servers is crashing it will be restarted, but when they terminate with exit reason normal they are not restarted. So you just exit normal when done and crash when whatever else happens.

If you don't overdo it, cyclic polling for files is Ok. If the system becomes loaded because of this polling you can investigate in kernel notification systems (e.g. FreeBSD KQUEUE or the higher level services building upon it on MacOSX) to send you a message when a file appears in a directory. These services however have a complexity because it is necessary for them to throw up their hands if too many events happen (otherwise they wouldn't be a performance improvement but the opposite). So you will have to have a robust polling solution as a fallback anyway.

So don't do premature optimization and start with polling, adding improvements (which would be isolated in the dir_watcher servers) when it gets necessary.


Regarding the comment what behaviour to use as dir_watcher process since it doesn't use much of gen_servers functionality:

  • There is no problem with only using part of gen_servers posibilities, in fact it is very common not to use all of it. In your case you only set up a timer in init and use handle_info to do your work. The rest of the gen_server is just the unchanged template.

  • If you later want changing parameters like poll frequency it is easy to add into this.

  • gen_fsm is much less used since it only fits a quite limited model and is not very flexible. I use it only when it really fits 100% to the requirement (which it does almost never).

  • In a case where you just want a simple plain Erlang server you can use the spawn functions in proc_lib to get just the minimal functionality to run under a supervisor.

  • A interesting way to write more natural Erlang code and still have the OTP advantages is plain_fsm, here you have the advantages of selective receive and flexible message handling needed especially when handling protocols paired with the nice features of OTP.

Having said all this: if I would write a dir_watcher I'd just use a gen_server and use only what I need. The unused functionality doesn't really cost you anything and everybody understands what it does.

2
votes

I have written such a library, based on polling. (It would be nice to extend it to use inotify on platforms where this is supported.) It was originally meant to be used in EUnit, but I turned into a separate project instead. You can find it here:

https://github.com/richcarl/file_monitor