22
votes

I need to execute some off-screen rendering program on AWS EC2 GPU instance with CentOS. However, while I found that Ubuntu is very easy to setup, I cannot let CentOS work properly.

The goal is to run some essential utility/test tool on EC2 GPU instance (without screen or X client). In the following article, I will describe how the Ubuntu can be setup and how CentOS/Amazon Linux AMI fails.

Ubuntu

On ubuntu 12.04, everything works very smoothly. The EC2 environment I used are:

  • Instance type: Both CG1 and G2 were tested and worked properly.
  • AMI image: Ubuntu Server 12.04.3 LTS for HVM Instances (ami-b93264d0 in US-East)
  • Most of the other settings are default.

After the instance is launched, the following commands are executed:

# Install the Nvidia driver
sudo apt-add-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current
# Driver installation needs reboot
sudo reboot now

# Install and configure X window with virtual screen
sudo apt-get install xserver-xorg libglu1-mesa-dev freeglut3-dev mesa-common-dev libxmu-dev libxi-dev
sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
sudo /usr/bin/X :0 &

# OpenGL programs are now workable. Ex. glxinfo, glxgears
DISPLAY=:0 glxinfo

The glxgears can also run in the background without physical screen or X client:

$ DISPLAY=:0 glxgears
95297 frames in 5.0 seconds = 19059.236 FPS
95559 frames in 5.0 seconds = 19111.727 FPS
94173 frames in 5.0 seconds = 18834.510 FPS

CentOS or Amazon Linux AMI

Both "CentOS" and "Amazon Linux AMI" are derived from Red Hat Enterprise edition. However, I cannot make any of them work.

A few days ago, AWS announced their new G2 instance type. In this article, the Amazon Linux AMI with NVIDIA Drivers is recommended for Linux platform. In this AMI, the Nvidia driver, X window and OpenGL libraries are all installed. However, I just get GLX error messages when trying to execute OpenGL programs.

The EC2 instance is launched with the following setting:

  • AMI image: Amazon Linux AMI with NVIDIA GRID GPU Driver (ami-637c220a in US-East)
  • Instance type: G2
  • Most of the other settings are default

After booted, the steps to reproduce this issue is very simple:

sudo X :0 & # Start the X window
glxinfo
glxgears

The output is:

$ glxinfo
name of display: :0
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Error: couldn't find RGB GLX visual or fbconfig

Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".
Xlib:  extension "GLX" missing on display ":0".

$ glxgears
Xlib:  extension "GLX" missing on display ":0".
Error: couldn't get an RGB, Double-buffered visual

The following error is found in /var/log/Xorg.0.log:

[139017.484] (EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)

I have googled and tried a lot of possible solution, such as:

  • Use the clean CentOS HVM AMI and install Nvidia driver manually
  • Tried both CG1/G2 instance types
  • Regenerate the X window config with nvidia-xconfig
  • Use Xvfb instead of X window
  • Reinstall Nvidia driver after mesa libraries are installed

... but none of them works.

Does anyone have a concrete solution for this issue? Everything I mentioned should be reproducible (I tried many times). I'll appreciate if you can provide reproducible instructions to make OpenGL (GLX) works on EC2 GPU instances with CentOS/Amazon Linux AMI.

2
You may find this useful: github.com/rncry/gpu-desktopuser2851943

2 Answers

16
votes

lspci | grep VGA

You should see the busID is 0:3:0.

Using sudo, add this into your xorg.conf like so:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GRID K520"
    BusID           "0:3:0"
EndSection

This should fix GLX failures.

6
votes

Just an additional find:

I did this to get the X Server running:

sudo /usr/bin/X :0 &

However, my OpenGL application was still not using the GPU for image rendering, and was therefore being REALLY slow.

This is what saved me -- setting a DISPLAY environment variable to the same display (ID: 0) that the X Server is using:

export DISPLAY=:0.0