29
votes

Essentially the situation I have is this. I have a module (which also imports a number of other modules).

I have a script like:

import MyModule

tic()
MyModule.main()

tic()
MyModule.main()

In MyModule:

__precompile__()

module MyModule
    export main

    function main()
        toc()
        ...
    end
end

The first toc() call outputs around 20 seconds. The second outputs 2.3e-5. Can anyone offer a guess as to where the time is going? Does Julia do some kind of initialization on the first call into a module, and how can I figure out what that is?

2

2 Answers

62
votes

Precompilation can be confusing. I'll try to explain how it works.

Julia loads modules by first parsing them, then running so-called "top-level" statements, one at a time. Each top-level statement is lowered, then interpreted (if possible) or compiled and executed if the interpreter does not support that particular top-level statement.

What __precompile__ does is actually fairly simple (modulo details): it performs all of the steps listed above at precompile time. Note that the steps above include execution, which can be surprising if you are more familiar with statically-compiled languages. It is not possible, in general, to precompile dynamic code without executing it, because the execution of code can result in changes such as the creation of new functions, methods, and types.

The difference between a precompile run and a regular run is that serializable information from a precompile run is saved to a cache. Things that are serializable include ASTs from parsing and lowering and results of type inference.

This means that Julia precompilation goes much further than the compilation of most static languages. For instance, consider the following Julia package that computes the number 5000000050000000 in quite an inefficient way:

module TestPackage

export n

n = 0
for i in 1:10^8
    n += i
end

end

On my machine:

julia> @time using TestPackage
  2.151297 seconds (200.00 M allocations: 2.980 GB, 8.12% gc time)

julia> workspace()

julia> @time using TestPackage
  2.018412 seconds (200.00 M allocations: 2.980 GB, 2.90% gc time)

Now let's give the __precompile__() directive, changing the package to

__precompile__()

module TestPackage

export n

n = 0
for i in 1:10^8
    n += i
end

end

And look at the performance during, and after, precompilation:

julia> @time using TestPackage
INFO: Precompiling module TestPackage.
  2.696702 seconds (222.21 k allocations: 9.293 MB)

julia> workspace()

julia> @time using TestPackage
  0.000206 seconds (340 allocations: 16.180 KB)

julia> n
5000000050000000

What has happened here is that the module was run at precompilation time, and the result saved. This is distinct from what compilers for static languages typically do.


Can precompilation change behaviour of a package? Certainly. Precompilation is, as mentioned earlier, effectively running the package at precompile-time, instead of at load-time. That doesn't matter for pure functions (as referential transparency guarantees that their result will always be the same), and it doesn't matter for most impure functions, but it does matter in some cases. Suppose we had a package that does nothing but println("Hello, World!") when it is loaded. Without precompilation, it looks like this:

module TestPackage

println("Hello, World")

end

And this is how it behaves:

julia> using TestPackage
Hello, World

julia> workspace()

julia> using TestPackage
Hello, World

Now let's add the __precompile__() directive, and the result is now:

julia> using TestPackage
INFO: Precompiling module TestPackage.
Hello, World

julia> workspace()

julia> using TestPackage

There is no output the second time it is loaded! That's because the computation, println, was already done when the package was compiled, so it is not done again. This is the second point of surprise for those used to compiling static languages.

This of course raises the question of initialization steps that can't just be done at compile-time; for instance, if my package needs the date and time that it's initialized, or needs to create, maintain, or delete resources like files and sockets. (Or, in a simple case, needs to print information to the terminal.) So there is a special function that is not called at precompile-time, but is called at load-time. This function is called the __init__ function.

We redesign our package as follows:

__precompile__()

module TestPackage

function __init__()
    println("Hello, World")
end

end

yielding the following result:

julia> using TestPackage
INFO: Recompiling stale cache file /home/fengyang/.julia/lib/v0.6/TestPackage.ji for module TestPackage.
Hello, World

julia> workspace()

julia> using TestPackage
Hello, World

The point of the examples above is to possibly surprise, and hopefully illuminate. The first step to understanding precompilation is to understand that it is different from how static languages are typically compiled. What precompilation in a dynamic language like Julia means is:

  • All top-level statements are executed at precompile-time, instead of at load-time.
  • Any statements that are to be executed at load-time must be moved to the __init__ function.

This should also make it more clear why precompilation is not turned on by default: it is not always safe to! Package developers must check to make sure they are not using any top-level statements who have side effects or varying results, and move those to the __init__ function.

So what does this have to do with the delay on first call into a module? Well, let's look at a more practical example:

__precompile__()

module TestPackage

export cube

square(x) = x * x
cube(x) = x * square(x)

end

And do the same measurement:

julia> @time using TestPackage
INFO: Recompiling stale cache file /home/fengyang/.julia/lib/v0.6/TestPackage.ji for module TestPackage.
  0.310932 seconds (1.23 k allocations: 56.328 KB)

julia> workspace()

julia> @time using TestPackage
  0.000341 seconds (352 allocations: 17.047 KB)

After precompilation, loading becomes much faster. That's because during the precompilation, the statements square(x) = x^2 and cube(x) = x * square(x) are executed. These are top-level statements like any other, and they involves a certain degree of work. The expression must be parsed, lowered, and the names square and cube bound inside the module. (There is also the export statement, which is less costly but still needs to be executed.) But as you noticed:

julia> @time using TestPackage
INFO: Recompiling stale cache file /home/fengyang/.julia/lib/v0.6/TestPackage.ji for module TestPackage.
  0.402770 seconds (220.37 k allocations: 9.206 MB)

julia> @time cube(5)
  0.003710 seconds (483 allocations: 26.096 KB)
125

julia> @time cube(5)
  0.000003 seconds (4 allocations: 160 bytes)
125

julia> workspace()

julia> @time using TestPackage
  0.000220 seconds (370 allocations: 18.164 KB)

julia> @time cube(5)
  0.003542 seconds (483 allocations: 26.096 KB)
125

julia> @time cube(5)
  0.000003 seconds (4 allocations: 160 bytes)
125

What's going on here? Why does cube need to be compiled again, when there is clearly a __precompile__() directive? And why doesn't the result of the compilation get saved?

The answers are fairly straightforward:

  • The cube(::Int) was never compiled during precompilation. This can be seen from the following three facts: precompilation is execution, type inference and codegen don't happen until execution (unless forced), and the module does not contain an execution of cube(::Int).
  • Once I type cube(5) in the REPL, this is no longer precompile-time. The results from my REPL run are not getting saved.

Here is how to fix the problem: execute the cube function on the desired argument types.

__precompile__()

module TestPackage

export cube

square(x) = x * x
cube(x) = x * square(x)

# precompile hints
cube(0)

end

Then

julia> @time using TestPackage
INFO: Recompiling stale cache file /home/fengyang/.julia/lib/v0.6/TestPackage.ji for module TestPackage.
  0.411265 seconds (220.25 k allocations: 9.200 MB)

julia> @time cube(5)
  0.003004 seconds (15 allocations: 960 bytes)
125

julia> @time cube(5)
  0.000003 seconds (4 allocations: 160 bytes)
125

There is still some first-use overhead; however, note especially the allocation numbers for first-run. This time, we have already inferred and generated code for the cube(::Int) method during precompilation. The results of that inference and code generation are saved, and can be loaded from cache (which is faster and requires much less runtime allocation) instead of redone. The benefits are more significant for real-world loads than for our toy example, of course.

But:

julia> @time cube(5.)
  0.004048 seconds (439 allocations: 23.930 KB)
125.0

julia> @time cube(5.)
  0.000002 seconds (5 allocations: 176 bytes)
125.0

Since we have only executed cube(0), we have only inferred and compiled the cube(::Int) method, and so the first run of cube(5.) will still require inference and code generation.

Sometimes, you want to force Julia to compile something (possibly saving it to cache, if this is happening during precompilation) without actually running it. That's what the precompile function, which can be added to your precompile hints, is for.


As a final note, note the following limitations of precompilation:

  • Precompilation only caches results from your package's module, for your package's functions. If you depend on functions from other modules, then those will not be precompiled.
  • Precompilation only supports serializable results. In particular, results that are C objects and contain C pointers are typically not serializable. This includes BigInt and BigFloat.
8
votes

The quick answer is, the first time you run a function it has to compile, so you're measuring compilation time. If you're not aware of this, see the performance tips.

But I'll assume you know that, but it's still bothering you. The reason is because Modules in Julia don't compile: modules are THE dynamic scope. When you're playing around in the REPL, you are working in the module Main. When you're using Juno and click on code in a module, it will evaluate that code in the module, thus giving you a quick way to play around dynamically in a non-Main module (I think you can change the REPL scope to another module too). Modules are dynamic so they can't compile (when you see a module precompile, it's actually just precompiling lots of the functions defined inside of it). (This is why dynamic things like eval happen in a module's global scope).

So when you put main in a module, that's no different than having it in the REPL. Global scopes of modules thus have the same type-stability/inference issues as the REPL (but the REPL is just the global scope of the Main module). So, just like in the REPL, the first time you call the function it has to compile.