Using torque, if I run a job with qsub with particular arguments, the job finishes and three things happen. 1) I get a file.eXXXX file containing the stderr of the process 2) I get a file.oXXXX file containing the stdout of the process 3) I receive an email with information such as allocation and exit status.
I'd like to have this status information in a file next to the .oXXXX and .eXXXX files, because it is too difficult to correlate 100s of emails with 100s of job output files especially several days later. I can't find such a capability built in. Nevertheless I noticed that I can use "qstat -f job-id" to get information pretty similar to what's in the email. But I don't see in the documentation how long a delay I am allowed for running qstat.
I thought about after launching the job A with qsub, thereafter use the job ID to launch a dependent job B (qsub -W depend=...) which will run "qstat -f" of the id of A, communicating id-A via an environment variable. However, I don't know how far in the future job B will run. Also if job B is not run on the same node as A, will qstat be able to find the correct information?
My idea seems convoluted. Isn't there an easier/better way of doing this?
I don't think this can be done by installing some sort of email monitor, because I read my email on a completely different machine which does not have access to the compute cluster.
echo Successat the end of the job script and check for that line in the.oXXXXXXfile. - Dmitri Chubarovqstat -falready contains the information inresources_usedparameters. It is likely that something likeqstat -f $PBS_JOBID | grep resources_usedshould work when executed as the last line of the job script. - Dmitri Chubarov