1
votes

I'm using sbatch to submit my job.
Command line mpirun --version gives:

Intel(R) MPI library for Linux* OS, Version 5.0 Build 20140507
Copyright (C) 2003-2014, Intel Corporation. All rights reserved.

So I think I'm working with Intel mpi.
Following the instructions: submitting an MPI job using Intel MPI, I write my script like this:

#!/bin/bash
#SBATCH --ntask=4
#SBATCH -t 00:10:00

. ~/.bash_profile

module load intel
mpirun mycc

mycc is the executable I get after compiling source files with mpicc.
Then I use command sbatch -p partitionname -J myjob script.sh, my job failed with exitcode 127:0. The slurm-jobid.out file says that(leave aside the set locale warning):

/usr/share/Modules/init/sh: line 2: /usr/bin/modulecmd: No such file or directory /tmp/slurmd/job252624/slurm_scirpt: line 10: mpirun: command not found

But I have checked and /usr/bin/modulecmd file does exist.
Any suggestion is aprreciated.

Edit
I also asked the question here.

I have removed the source statement and module load one.
I tried to load the module on the log in node before submitting my job. But there is something wrong. It says that:

moduleCmd_Lad.c(204): Error: 105: Unable to locate a modulefile for 'intel'

I use module avail command to see what modules are available:

---------/usr/share/Modules/modulefiles-------------------

dot module-info mpich2-x86_64 use.won

module-cvs modules null

---------/etc/modulefiles---------------------------------

compat-openmpi-psm-x86_64 compat-openmpi-x86_64

Forgive me for the messy formatting.

Solved

The problem is finally solved. My final script.sh is like this:

#!/bin/bash
srun -p partitionname -n 4 -t 00:10:00 mycc

Then use command sbatch -p partitionname -J myjob script.sh to submit the job.

1
did you check it exist on the login node or on the compute nodes? I suspect the script is executed on the first node slurm allocated you, so if the module command is not installed there, you will get this error message...Gilles
Obviously not a Warwick student :^). You'll find more help on the Super User exchange as this is more sysadmin related than programming :).Samidamaru
@Gilles Can you be more specific? I mean, how can I check if the file exist on the login in node or on the compute nodes? I had thought that all files are stored on storage nodes.dudu
@Samidamaru I'll cp my question there. :)dudu
@Samindamaru the problem with Super User is that there is no slurm tag, so lots of users come here to ask, as there is more answers in Stack OverflowCarles Fenoy

1 Answers

2
votes

Apparently the /usr/bin/modulecmd does not exist in all the compute nodes. Make sure it exists in all the compute nodes and try again.

Also you shouldn't need to source the bash_profile if the /home is shared by all the nodes, as Slurm will export by default all the environment to the job.