Problems with Orca and OpenMPI for parallel jobs

Question

Hello to the community:

I recently started to use ORCA software for some quantum calculation but I have been having a lot of problems to lunch a parallel calculation in the cluster of my University.

To install Orca I used the static version: orca_4_2_1_linux_x86-64_openmpi314.tar.xz. In a shared direction of the cluster (/data/shared/opt/ORCA/). And putted in my ~/.bash_profile:

export PATH="/data/shared/opt/ORCA/orca_4_2_1_linux_x86-64_openmpi314:$PATH"
export LD_LIBRARY_PATH="/data/shared/opt/ORCA/orca_4_2_1_linux_x86-64_openmpi314:$LD_LIBRARY_PATH"

For the installation of the corresponding OpenMPI version (3.1.4)

tar -xvf openmpi-3.1.4.tar.gz
cd openmpi-3.1.4
./configure --prefix="/data/shared/opt/ORCA/openmpi314/"
make -j 10
make install

When I use the frontend server all is wonderful: With a .sh like this:

#! /bin/bash
export PATH="/data/shared/opt/ORCA/openmpi314/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/data/shared/opt/ORCA/openmpi314/lib"
$(which orca) test.inp > test.out

and an input like this:

# Computation of myjob at b3lyp/6-31+G(d,p)
%pal nprocs 10 end
%maxcore 8192

! RKS B3LYP 6-31+G(d,p)
! TightSCF Grid5 NoFinalGrid
! Opt
! Freq
%cpcm 
    smd true
    SMDsolvent "water"
end

* xyz 0 1
C 0 0 0
O 0 0 1.5
*

The problem appears when I use the nodes:

.inp file:

#! Computation at RKS B3LYP/6-31+G(d,p) for cis1_bh267_m_Cell_152
%pal nprocs 12 end
%maxcore 8192

! RKS B3LYP 6-31+G(d,p)
! TightSCF Grid5 NoFinalGrid
! Opt
! Freq
%cpcm 
    smd true
    SMDsolvent "water"
end

* xyz 0 1
 C  -4.38728130   0.21799058   0.17853303
 C  -3.02072869   0.82609890  -0.29733316
 F  -2.96869122   2.10937041   0.07179384
 F  -3.01136328   0.87651596  -1.63230798
 C  -1.82118365   0.05327804   0.23420220
 O  -2.26240947  -0.92805650   1.01540713
 C  -0.53557484   0.33394113  -0.05236121
 C   0.54692198  -0.46942807   0.50027196
 O   0.31128292  -1.43114232   1.22440290
 C   1.93990391  -0.12927675   0.16510948
 C   2.87355011  -1.15536140  -0.00858832
 C   4.18738231  -0.82592189  -0.32880964
 C   4.53045856   0.52514329  -0.45102225
 N   3.63662927   1.52101319  -0.26705841
 C   2.36381718   1.20228695   0.03146190
 F  -4.51788749   0.24084604   1.49796862
 F  -4.53935644  -1.04617745  -0.19111502
 F  -5.43718443   0.87033190  -0.30564680
 H  -1.46980819  -1.48461498   1.39034280
 H  -0.26291843   1.15748249  -0.71875720
 H   2.57132559  -2.20300864   0.10283592
 H   4.93858460  -1.60267627  -0.48060140
 H   5.55483009   0.83859415  -0.70271364
 H   1.67507560   2.05019549   0.17738396
*

.sh file (Slurm job):

#!/bin/bash
#SBATCH -p deflt #which partition I want
#SBATCH -o cis1_bh267_m_Cell_152_myjob.out #path for the slurm output
#SBATCH -e cis1_bh267_m_Cell_152_myjob.err #path for the slurm error output
#SBATCH -c 12 #number of cpu(logical cores)/task (task is normally an MPI process, default is one and the option to change it is -n)
#SBATCH -t 2-00:00 #how many time I want the resources (this impacts the job priority as well)
#SBATCH --job-name=cis1_bh267_m_Cell_152 #(to recognize your jobs when checking them with "squeue -u USERID")
#SBATCH -N 1 #number of node, usually 1 when no parallelization over nodes
#SBATCH --nice=0 #lowering your priority if >0
#SBATCH --gpus=0 #number of gpu you want

# This block is echoing some SLURM variables
echo "Jobid = $SLURM_JOBID"
echo "Host = $SLURM_JOB_NODELIST"
echo "Jobname = $SLURM_JOB_NAME"
echo "Subcwd = $SLURM_SUBMIT_DIR"
echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK"

# This block is for the execution of the program
export PATH="/data/shared/opt/ORCA/openmpi314/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/data/shared/opt/ORCA/openmpi314/lib"
$(which orca) ${SLURM_JOB_NAME}.inp > ${SLURM_JOB_NAME}.log --use-hwthread-cpus

I used the --use-hwthread-cpus flag as a recommendation but the same problem appears with and without this flag. All the error is:

There are not enough slots available in the system to satisfy the 12 slots that were requested by the application: /data/shared/opt/ORCA/orca_4_2_1_linux_x86-64_openmpi314/orca_gtoint_mpi

Either request fewer slots for your application, or make more slots available for use. A "slot" is the Open MPI term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which Open MPI processes are run:

1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided)

2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided)

3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

4. If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores In all the above cases, if you want Open MPI to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the number of available slots when deciding the number of processes to launch.

*[file orca_tools/qcmsg.cpp, line 458]:

.... aborting the run*

When I go to the output of the calculation, it looks like start to run but when launch the parallel jobs fail and give:

ORCA finished by error termination in GTOInt Calling Command: mpirun -np 12 --use-hwthread-cpus /data/shared/opt/ORCA/orca_4_2_1_linux_x86-64_openmpi314/orca_gtoint_mpi cis1_bh267_m_Cell_448.int.tmp cis1_bh267_m_Cell_448 [file orca_tools/qcmsg.cpp, line 458]: .... aborting the run

We have two kind of nodes on the cluster: A punch of them are:

Xeon 6-core E-2136 @ 3.30GHz (12 logical cores) and Nvidia GTX 1070Ti

And the other ones:

AMD Epyc 24-core (24 logical cores) and 4x Nvidia RTX 2080Ti Using the command scontrol show node the details of one node of each group are:

First Group:

NodeName=fang1 Arch=x86_64 CoresPerSocket=6
CPUAlloc=12 CPUTot=12 CPULoad=12.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:gtx1070ti:1
NodeAddr=fang1 NodeHostName=fang1 Version=19.05.5
OS=Linux 5.7.12-arch1-1 #1 SMP PREEMPT Fri, 31 Jul 2020 17:38:22 +0000
RealMemory=15923 AllocMem=0 FreeMem=171 Sockets=1 Boards=1
State=ALLOCATED ThreadsPerCore=2 TmpDisk=7961 Weight=1 Owner=N/A MCS_label=N/A
Partitions=deflt,debug,long
BootTime=2020-10-27T09:56:18 SlurmdStartTime=2020-10-27T15:33:51
CfgTRES=cpu=12,mem=15923M,billing=12,gres/gpu=1,gres/gpu:gtx1070ti=1
AllocTRES=cpu=12,gres/gpu=1,gres/gpu:gtx1070ti=1
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Second Group

NodeName=fang50 Arch=x86_64 CoresPerSocket=24
CPUAlloc=48 CPUTot=48 CPULoad=48.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:rtx2080ti:4
NodeAddr=fang50 NodeHostName=fang50 Version=19.05.5
OS=Linux 5.7.12-arch1-1 #1 SMP PREEMPT Fri, 31 Jul 2020 17:38:22 +0000
RealMemory=64245 AllocMem=0 FreeMem=807 Sockets=1 Boards=1
State=ALLOCATED ThreadsPerCore=2 TmpDisk=32122 Weight=1 Owner=N/A MCS_label=N/A
Partitions=deflt,long
BootTime=2020-12-15T10:09:43 SlurmdStartTime=2020-12-15T10:14:17
CfgTRES=cpu=48,mem=64245M,billing=48,gres/gpu=4,gres/gpu:rtx2080ti=4
AllocTRES=cpu=48,gres/gpu=4,gres/gpu:rtx2080ti=4
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

I use in the script of Slurm the flag -c, --cpus-per-task = integer; and in the input for Orca the command %pal nprocs integer end. I tested different combinations of this two parameters in order to see if I am using more CPU than the available:

-c, --cpus-per-task = integer	%pal nprocs integer end
None	6
None	3
None	2
1	2
1	12
2	6
3	4
12	12

With different amount of memories: 8000 MBi and 2000 MBi (my total memory is around 15 GBi). And in all the cases the same error appears. I am not an expert user neither in ORCA non in informatic (but maybe you guess this for the extension of the question), so maybe the solution is simple but I really don’t have it, Idon't know what's going on!

A lot of thanks in advance,

Alejandro.

Why not ask questions like this with the ORCA tag at MMSE? The author of ORCA is there: mattermodeling.stackexchange.com/a/4351/5 — user1271772

Alexey Alexey · Accepted Answer · 2020-12-26T23:01:05

Faced the same issue. Explicit declaration --prefix ${OMPI_HOME} directly as ORCA parameter and using of static linked ORCA version helps me:

export RSH_COMMAND="/usr/bin/ssh"
export PARAMS="--mca routed direct --oversubscribe -machinefile ${HOSTS_FILE} --prefix ${OMPI_HOME}"
$ORCA_DIR/orca $WORKDIR/$JOBFILE.inp "$PARAMS" > $WORKDIR/$JOBFILE.out

Also, It's better to build OpenMPI 3.1.x with --disable-builtin-atomics flag.

Problems with Orca and OpenMPI for parallel jobs

First Group:

Second Group

2 Answers