2
votes

I'm relatively new to Julia language, and I 've been recently trying to process some files in parallel manner. My code looks something like>

for ln in eachline (somefile)
...
proces this line

    for ln2 in eachline (someotherfile)
..
..
    process ln and ln2
..
..

I've been trying to speed things up a bit with @everywhere and @parallel functions, but it doesn't seem to work for eachline function.

Am I missing something?

Thanks for help.

1

1 Answers

4
votes

From @parallel macro we already know that:

@parallel [reducer] for var = range
   body
end

The specified range is partitioned and locally executed across all workers.

To do the above job in minimum time, @parallel gets length(range) then partitions it between nworkers().

for more details you can:
. see macro output -> macroexpand(:(@parallel for i in 1:5 i end))
or:
. check macro source -> milti.jl

EachLine is one of Julia iterables, it implements all mandatory methods of iterable interface, but length() is not one of those. (check this discussion), so EachLine is not a range and @parallel fails to do it's task because lack of length() function.

But there are at list two solutions to parallelize the process part:

  1. use lis=readlines() to collect a range of lines, the @parallel for li in lis
  2. use pmap()

Julia’s pmap() (page 483) is designed for the case where each function call does a large amount of work. In contrast, @parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers.

a sample code:

len=function(s::AbstractString)
      string(length(s)) * " " * string(myid());
    end
function test()
  open("eula.1028.txt") do io
    pmap(len,eachline(io))
  end
end