Julia - the way of kings (generator performance)

Question

I had some python code which I tried to port to Julia to learn this lovely language. I used generators in python. After porting it seems to me (at this moment) that Julia is really slow in this area!

I made part of my code simplified to this exercise:

Think 4x4 chess board. Find every N-moves long path, chess king could do. In this exercise, the king is not allowed to leap twice at the same position in one path. Don't waste memory -> make a generator of every path.

Algorithm is pretty simple:

if we sign every position with numbers:

0  1  2  3
4  5  6  7
8  9  10 11
12 13 14 16

point 0 has 3 neighbors (1, 4, 5). We could find a table for every neighbor for every point:

NEIG = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]]

PYTHON

A recursive function (generator) which enlarge given path from the list of points or from a generator of (generator of ...) points:

def enlarge(path):
    if isinstance(path, list):
        for i in NEIG[path[-1]]:
            if i not in path:
                yield path[:] + [i]
    else:
        for i in path:
            yield from enlarge(i)

Function (generator) which give every path with given length

def paths(length):
    steps = ([i] for i in range(16))  # first steps on every point on board
    for _ in range(length-1):
        nsteps = enlarge(steps)
        steps = nsteps
    yield from steps

We could see that there are 905776 paths with length 10:

sum(1 for i in paths(10))
Out[89]: 905776

JULIA (this code was created by @gggg during our discussion here )

const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py]
function enlarge(path::Vector{Int})
    (push!(copy(path),loc) for loc in NEIG[path[end]] if !(loc in path))
end
collect(enlarge([1]))
function enlargepaths(paths)
    Iterators.Flatten(enlarge(path) for path in paths)
end
collect(enlargepaths([[1],[2]]))
function paths(targetlen)
    paths = ([i] for i=1:16)
    for newlen in 2:targetlen
        paths = enlargepaths(paths)
    end
    paths
end
p = sum(1 for path in paths(10))

benchmark

In ipython we could time it:

python 3.6.3:

%timeit sum(1 for i in paths(10))
1.25 s ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

julia 0.6.0

julia> @time sum(1 for path in paths(10))
  2.690630 seconds (41.91 M allocations: 1.635 GiB, 11.39% gc time)
905776

Julia 0.7.0-DEV.0

julia> @time sum(1 for path in paths(10))
  4.951745 seconds (35.69 M allocations: 1.504 GiB, 4.31% gc time)
905776

Question(s):

We Julians are saying this: It is important to note that the benchmark codes are not written for absolute maximal performance (the fastest code to compute recursion_fibonacci(20) is the constant literal 6765). Instead, the benchmarks are written to test the performance of identical algorithms and code patterns implemented in each language.

In this benchmark, we are using the same idea. Just simple for cycles over arrays enclosed to generators. (Nothing from numpy, numba, pandas or others c-written and compiled python packages)

Is assumption that Julia's generators are terribly slow right?

What could we do to make it really fast?

paths(10) gives Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Flatten{Base.Generator{Base.Generator{UnitRange{Int64},##15#16},#enlarge}},#enlarge}},#enlarge}},#enlarge}},#enlarge}},#enlarge}},#enlarge}},#enlarge}},#enlarge}}(Base.Generator{Base.Iterators.F... I don't think this looks like its going to be fast. — daycaster
Sorry if this is obvious, but you are running the julia one twice and timing the second run not the first right? Otherwise you are just timing the compile time... — Colin T Bowers
I know it's cheating but the most effective way to get what you want is likely enumerate_paths (.pathcounts) in the LightGraphs package. — Michael K. Borregaard
Looking at the results of the proposed Julia code, I'm not sure if the numbers are correct. For example, eyeballing: [i for i in paths(3) if i[1] == 1] I can't see any paths where the king would return to position 1. So instead of the 15 paths from positon 1 this should be 18, right? I might be misunderstanding the problem proposition here. — niczky12
yes, right now generators aren't the best way to program but they work. In any case you need to optimize it, you can use a loop, so this has an easy workaround for the time being. There are a few type-inference and inlining issues that come up with generators that cause this, for now, which hopefully get fixed in a 1.x. — Chris Rackauckas

tholy tholy · Accepted Answer · 2017-11-03T20:29:18

const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py];

function expandto(n, path, targetlen)
    length(path) >= targetlen && return n+1
    for loc in NEIG[path[end]]
        loc in path && continue
        n = expandto(n, (path..., loc), targetlen)
    end
    n
end

function npaths(targetlen)
    n = 0
    for i = 1:16
        path = (i,)
        n = expandto(n, path, targetlen)
    end
    n
end

Benchmark (after executing once for JIT-compilation):

julia> @time npaths(10)
  0.069531 seconds (5 allocations: 176 bytes)
905776

which is considerably faster.

Julia - the way of kings (generator performance)

6 Answers