Julia parallel speedup performance for large scale computations

Question

General context:

I have developed a fairly large Navier-Stokes (finite difference) solver written in FORTRAN90. It has adaptive grids (hence load-balance issue), and I have tried various techniques (MPI, OpenMP & OpenMP-MPI hyrbid) to parallelize it. However, it does not scale good enough i.e. according to Amdahl's law it runs 96-97% of the computations in parallel. Also, the general size of the mesh is a couple of hundred million points, which would require to increase later in the future.

Query:

Now, I am thinking of switching to Julia, since it has become very tedious to maintain and add further functionalities to the existing code.

The problem is that I am unable to find a good answer about the parallel performance of Julia. I have searched on the internet as well as have watched a lot of youtube videos. What I have noticed is that most people say that Julia is very much suitable for the parallel computing, some even provide a bar chart showing the reduction in the elapsed time compared to the serial code. However, some of the answers/videos are quite old, which make them a little unreliable due to the growing nature of this new language.

Therefore, I would like to know if the language has the ability to scale even for a few thousand cores?

Extra information:

I am still trying hard to improve the speedup of my existing code to achieve almost linear performance for a couple of thousand cores. The solver needs to exchange overlapping points 3-4 times per timestep. Hence, it involves a huge communication overhead. However, the non-adaptive grid version of the code easily scales up to 20k cores.

I have also read somewhere that Julia does not use InfiniBand standard for data communication in parallel.

not an anwer, but might help: a use case for parallel julia. Found it using google scholar. Unfortunately, it's paywalled — Felipe Lema
@FelipeLema Thanks! That's an interesting article. However, as you rightly said it doesn't answer the question. — Soni

PatrickB PatrickB · Accepted Answer · 2016-11-08T20:22:33

The following paper has scaling results for pde constrained parameter estimation problems but not up to anywhere near the number of cores you seem to be interested in: https://arxiv.org/abs/1606.07399. I haven't seen any examples going up to thousands of cores.

Re infiniband: By default Julia uses shared memory for communication within a node and TCP/IP across nodes, so by default infiniband is not supported. However, the language allows for the implementation of custom transports and I imagine someone will add infiniband support at some point but I couldn't find any implementations with a quick google search.

Julia parallel speedup performance for large scale computations

1 Answers