How to execute a "g++" command on a Python server in Heroku?

Question

I am trying to deploy a Python server to Heroku, and I need to execute a "g++" command on one of the libraries to install it on the server.

I want to create a gunicorn and Flask server hosting facebook's XLM model from cross-lingual model pretaining : https://github.com/facebookresearch/XLM

The model requires the "fastBPE" library (https://github.com/glample/fastBPE), which requires to be installed with the command : g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast

However, since the Heroku server is configured for Python, it doesn't recognize the "g++" command.

Here is what I tried so far : - adding the buildpack “heroku-buildpack-apt” in Heroku and creating an "Aptfile" in my source file, to write "g++" inside of it, as well as "build-essential" - inside the main python file, I create a subprocess to launch "apt-get install g++" :

import subprocess
process = subprocess.Popen("apt-get install g++", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
(output, err) = process.communicate()
#This makes the wait possible
p_status = process.wait()
#This gives the output of the command being executed
print("Command apt-get output: ",output)

However, whenever I run the following subprocess to install the fastBPE package :

import subprocess
process = subprocess.Popen("g++ -std=c++11 -pthread -O3 tools/fastBPE/fastBPE/main.cc -IfastBPE -o tools/fastBPE/fast", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
(output, err) = process.communicate()
p_status = process.wait()
print("Command apt-get output: ",output)

I systematically get "g++: not found" as output.

Also, the command "which g++" returns nothing, but "which gcc" returns "/usr/bin/gcc", so gcc is installed but not g++

Aside: You almost certainly shouldn't be compiling this library at build time. It needs to be done during deployment or else it will be lost every time your dyno restarts due to Heroku's ephemeral filesystem. And this definitely won't work either: "inside the main python file, I create a subprocess to launch "apt-get install g++"". The Aptfile approach is likely the place to start here. — Chris
Hi Chris, I don't see any way to do that at deployment. My strategy is that if I manage to launch the compilation at the beginning of the main script, even if the Dyno restarts it will go through the installation script everytime and compile the fastBPE library. — SoufianeL
That seems like a lot of work to do at launch. It will delay restarts, and this is fundamentally a build task. You may need to use a custom buildpack, e.g. by forking the official Python one. — Chris

SoufianeL SoufianeL · Accepted Answer · 2019-06-17T09:16:22

I managed to figure it out eventually.

For posterity, there are 2 solutions that worked for me :

1 - The not-so-good-one was to execute the g++ command on a Linux computer with exactly the same environment as the Heroku server, push it to Heroku and make sure to never modify it afterwards. You can then call fastBPE with a subprocess like above ==> it works, but it's more of a DIY unstable solution. The associated GitHub main file is https://github.com/Tony4469/xlm-agir/blob/master/mlm_tlm_prod.py

2 - The best solution was to precompile everything on a Docker container with a Miniconda environment, you can install and run all the necessary commands and then push it easily to heroku. You can find the Dockerfile I used here : https://github.com/Tony4469/laser-agir/blob/master/Dockerfile

How to execute a "g++" command on a Python server in Heroku?

1 Answers