1
votes

haven't done any Clojure for couple years, so decided to go back and not ignore core.async this time around ) pretty cool stuff, that - but it surprised me almost immediately. Now, I understand that there's inherent indeterminism when multiple threads are involved, but there's something bigger than that at play here.

The source code for my oh-so-simple example, where I am trying to copy lines from STDIN to a file:

(defn append-to-file
  "Write a string to the end of a file"
  ([filename s]
   (spit filename (str s "\n")
         :append true))
  ([s]
   (append-to-file "/tmp/journal.txt" s)))

(defn -main
  "I don't do a whole lot ... yet."
  [& args]
  (println "Initializing..")
  (let [out-chan (a/chan)]
    (loop [line (read-line)]
      (if (empty? line) :ok
          (do
            (go (>! out-chan line))
            (go (append-to-file (<! out-chan)))
            (recur (read-line)))))))

except, of course, this turned out to be not so simple. I think I've narrowed it down to something that's not properly cleaned up. Basically, running the main function produces inconsistent results. Sometimes I run it 4 times, and see 12 lines in the output. But sometimes, 4 run will produce just 10 lines. Or like below, 3 times, 6 lines:

akamac.home ➜  coras git:(master) ✗ make clean
cat /dev/null > /tmp/journal.txt
lein clean
akamac.home ➜  coras git:(master) ✗ make compile
lein uberjar
Compiling coras.core
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT.jar
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar
akamac.home ➜  coras git:(master) ✗ make run    
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make check  
cat /tmp/journal.txt
line a
line z
line b
line a
line b
line z

(Basically, sometimes a run produced 3 lines, sometimes 0, sometimes 1 or 2). The fact that lines appear in random order doesn't bother me - go blocks do things in a concurrent/threaded manner, and all bets are off. But why they don't do all of work all the time? (Because I am misusing them somehow, but where?) Thanks!

2
the one thing I am suspicious of is this: after I enqueue a message in the channel, I don't know how long it'll take for a thread from a go block to pick it up and do it's work. And since I don't block on anything, the main thread might terminate before a go-block's threa gets a chance to do the work. So there's a race condition... - alexakarpov
@CharlesDuffy you've a point. I forgot I'm limited in "accepting" my own answers by 2 days - but not in providing it! - alexakarpov

2 Answers

4
votes

There's many problems with this code, let me walk through them real quick:

1) Every time you call (go ...) you're spinning of a new "thread" that will be executed in a thread pool. It is undefined when this thread will run.

2) You aren't waiting for the completion of these threads, so it's possible (and very likely) that you will end up reading several lines form the file, writing several lines to the channel, before a read even occurs.

3) You are firing off multiple calls to append-to-file at the same time (see #2) these functions are not synchronized, so it's possible that multiple threads will append at once. Since access to files in most OSes is uncoordinated it's possible for two threads to write to your file at the same time, overwriting eachother's results.

4) Since you are creating a new go block for every line read, it's possible they will execute in a different order than you expect, this means the lines in the output file may be out of order.

I think all this can be fixed a bit by avoiding a rather common anti-pattern with core.async: don't create go blocks (or threads) inside unbounded or large loops. Often this is doing something you don't expect. Instead create one core.async/thread with a loop that reads from the file (since it's doing IO, never do IO inside a go block) and writes to the channel, and one that reads from the channel and writes to the output file.

View this as an assembly line build out of workers (go blocks) and conveyor belts (channels). If you built a factory you wouldn't have a pile of people and pair them up saying "you take one item, when you're done hand it to him". Instead you'd organize all the people once, with conveyors between them and "flow" the work (or data) between the workers. Your workers should be static, and your data should be moving.

0
votes

.. and of course, this was a misuse of core.async on my part:

If I care about seeing all the data in the output, I must use a blocking 'take' on the channel, when I want to pass the value to my I/O code -- and, as it was pointed out, that blocking call should not be inside a go block. A single line change was all I needed:

from:

(go (append-to-file (<! out-chan)))

to:

(append-to-file (<!! out-chan))