2
votes

I am writing script in python which submit multiple job to qsub but we need to determine the load on qsub.if more jobs are in queue or load is high on qsub than I need to inform the user and run the job local environment. I have checked the command page but could not get useful information.

qstat [options]
        [-ext]                            view additional attributes
        [-explain a|c|A|E]                show reason for c(onfiguration ambiguous), a(larm), suspend A(larm), E(rror) state
        [-f]                              full output
        [-fjc]                            full output grouped according to job class instances
        [-F [resource_attributes]]        full output and show (selected) resources of queue(s)
        [-g {c}]                          display cluster queue summary
        [-g {d}]                          display all job-array tasks (do not group)
        [-g {t}]                          display all parallel job tasks (do not group)
        [-help]                           print this help
        [-j job_identifier_list ]         show scheduler job information
        [-l resource_list]                request the given resources
        [-ne]                             hide empty queues
        [-ncb]                            suppress additional binding specific parameters
        [-pe pe_list]                     select only queues with one of these parallel environments
        [-nenv]                           do not request job environment
        [-njd]                            do not show details about foreign jobs
        [-q wc_queue_list]                print information on given queue
        [-qs {a|c|d|o|s|u|A|C|D|E|S}]     selects queues, which are in the given state(s)
        [-r]                              show requested resources of job(s)
        [-s {p|r|s|z|hu|ho|hs|hd|hj|ha|h|a}] show pending, running, suspended, zombie jobs
2
You could probably parse the output of qstat -a. - fjarri
qstat does not have a option - user765443
The one we have on our cluster (2.5.12) does. But in any case, even just qstat shows all queued jobs, so you can estimate the load from the amount of queued jobs and their walltimes. - fjarri
Agree if I run qstat I will get all information But it will be very time taking process as run qstat and parse the output. Do we have any command/option which help us determine load it will save lot of time - user765443
It will be somewhat time-consuming to write a parser, I doubt it will take long to run it. I don't know about existing command-line tools; we use ganglia, but it takes effort to install. - fjarri

2 Answers

2
votes

The ideal solution for this is to use a scheduler, such as Moab or Maui (I think Maui can do this) that can assign nodes to jobs intelligently, including not using nodes in the cluster if they are at a high load already. Typically, schedulers offer policies that allow you to handle typical HPC scenarios such as this one. (In the interest of full disclosure, I am currently an engineer at the company that sells Moab - Maui is free to use)

If you wish to do this via scripts, pbsnodes -a reports the load average for the nodes in the cluster. It is inside a larger status string in this format:

status = attr=[val][,attr2=[val]...]

The attribute you're looking for is loadave, so if you wrappered qsub inside a script that calls pbsnodes (or has cached results from pbsnodes) to obtain this value and then either qsubs the job or runs it in your local environment that would work. To me, it seems easier to use a scheduler.

0
votes

That looks like a sun grid engine(or derivative) qstat.

You can request grid engine not to launch the job unless it can run (more or less) immediately with qsub -now n. If you don't want it to run on a machine with a high load you may be able to request load_avg,load_long,load_medium or load_short va the -l option to qsub depending on how the cluster is configured.

To list out queued jobs qstat -u '*' -g d -s p

You can optionally add -xml to get output in that format