1
votes

I am trying to submit python jobs to PBS and get the printed content as output. An easy example goes like:

python file test.py:

import time
print(time.time())

pbs submit file job_test.pbs:

#!/bin/bash
#PBS -l nodes=2:ppn=8,walltime=8:00:00
#PBS -N test
#PBS -q gpu

module load anaconda/3 torque cuda80 cudnn

cd /path-to-the-test.py-program
python test.py

and finally the qsub command:

qsub job_test.pbs

Since the job is very easy, I will see the status goes from Q to E and to C in no time using qstat. Then the problem comes that I don't see the output file which should be in the /path-to-the-test.py-program. I tried with both setting the #PBS -o /path-to-the-test.py-program/output.txt in the PBS script and using command as qsub -o /path-to-the-test.py-program/output.txt job_test.pbs but none of them works. So how can I do this right?

1

1 Answers

0
votes

I ran into this before.

I am not quite sure about how you met the problem, but here my situation was I do not have password-free login between nodes.

You may check like this:

  1. ssh to your computational node, if you need password, then what cause your problem is very possible the same as mine.

  2. Check the system log (as root), typically at /var/log/messages, and find something like

Apr 10 14:52:19 node1 pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool/242.master.OU user@master:/home/user/path/to/sample.pbs.o242' failed with status=1, giving up after 4 attempts

The key is the failure of scp.

  1. Check /var/spool/torque/spool/--- the output files should be there.

So if your situation is exactly like mine, you can ssh-copy-id to your computational node. In my case, because we have the /home directory mounted with an NFS driver, all nodes share the same /home, I only need

ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa localhost

Hope my answer help.