Maui & Torque Cluster

We have set up a Maui & Torque cluster on the new blade servers at ILP to support the Big_Data project. There are 59 compute nodes and one master node. We are planning on incorporating more machines into this pool in the near future.

To access the master node, login to maui-torque.bigdata. The command "diagnose -n" will give a quick listing of the nodes in the system from Maui's point of view:

 someuser@someuser-d3:~$ ssh maui-torque.bigdata
 someuser@blade063:~$ sudo -u maui diagnose -n
 diagnosing node table (5120 slots)
 Name                    State  Procs     Memory         Disk          Swap      Speed  Opsys   Arch Par   Load Res Classes                        Network                        Features              
 
 blade064                 Idle   2:2    12015:12015       1:1       12015:12015   1.00 DEFAUL [NONE] DEF   0.00 000 [primary_2:2]                  [DEFAULT]                      [NONE]              
 blade065                 Idle   2:2     5967:5967        1:1        5967:5967    1.00 DEFAUL [NONE] DEF   0.00 000 [primary_2:2]                  [DEFAULT]                      [NONE]              
 ...
 blade121                 Idle   4:4     7987:7987        1:1       14608:14608   1.00 DEFAUL [NONE] DEF   0.02 000 [primary_4:4]                  [DEFAULT]                      [NONE]              
 blade122                 Idle   4:4     7987:7987        1:1       14480:14480   1.00 DEFAUL [NONE] DEF   0.08 000 [primary_4:4]                  [DEFAULT]                      [NONE]              
 -----                     --- 218:218 459101:459101     59:59     803744:803744
 
 Total Nodes: 59  (Active: 0  Idle: 59  Down: 0)

If you are not connected to the ILP network, you can access it via ssh through clsshsvr.pittsburgh.intel-research.net.

 mymachine:~> ssh @clsshsvr.pittsburgh.intel-research.net
 clsshsvr:~> ssh maui-torque.bigdata
 blade063:~> sudo -u maui diagnose -n
 [...]

Because of a limitation in Maui's permission system, you must be user "maui" to run any of the diagnose commands. All users can sudo to "maui".


Maui & Torque Overview

Maui & Torque are a batch job management system. Torque is based on OpenPBS and acts as a resource manager while Maui acts as a scheduler. Torque takes care of keeping track of what resources are available (machines, CPUs, memory, disk, I/O, etc.) and provides low-level mechanisms for starting, stopping, and managing jobs and job queues. Although Torque comes with a basic scheduler, it is often replaced with something that utilizes the available resources more intelligently. This is Maui's job -- it communicates with Torque to get information about what resources are available in the cluster and what jobs are waiting to be executed. It then makes scheduling decisions about when it will schedule jobs to run and on which machines.


Command Overview

Since Maui & Torque are two separate entities, there are two sets of commands that may be used to look at the cluster's status. Below are the Torque commands with sample usage:

 qsub: Submit a job into the job queue
 
 someuser@blade063:~$ qsub -l ncpus=1 /homes/someuser/test-script 
 432.blade063
 qdel: Remove a jobs from the job queue
 
 someuser@blade063:~$ qdel 432.blade063
 qstat: Inspect the state of the job queue
 
 someuser@blade063:~$ qstat -Q
 Queue              Max   Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T         
 ----------------   ---   ---   ---   ---   ---   ---   ---   ---   ---   --- -         
 primary              0     0   yes   yes     0     0     0     0     0     0 E

It's worth noting that all the Torque commands have man pages and 'man qsub' is surprisingly helpful. There is really only primary Maui command that is needed. It takes a whole bunch of different arguments to specify what functionality is being requested from it.

 diagnose: The Maui Swiss Army knife
 
 diagnose -j: Diagnose jobs (job queue)
 diagnose -n: Diagnose nodes (node list)
 
 Both of these commands give information about the state of the cluster from Maui's perspective.  You can see all the jobs in the job queue using 'diagnose -j' and you can see which nodes are allocated and which are idle using 'diagnose -n'.

That's about it for commands. Below are a few sample jobs just for practice.


Simple Jobs

Create a simple shell script that gathers information about the system it runs on. Here is an example:

 sleep 15
 hostname
 who
 uptime
 date

Now, this job can be submitted to a machine to run in the cluster:

 someuser@blade063:~$ qsub -l ncpus=1 /homes/someuser/test-script 
 496.blade063
 someuser@blade063:~$ sudo -u maui diagnose -j
 Name                  State Par Proc QOS     WCLimit R  Min     User    Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class Features
 
 496                 Running DEF    1 DEF    00:00:00 1    1   someuser   someuser        -    00:00:01   [NONE] [NONE] [NONE]    >=0    >=0    NC0 [primary:1] [NONE]
 
 
 Total Jobs: 1  Active Jobs: 1
 someuser@blade063:~$ ls
 bin  local  man  scratch  share  src	test-script  tmp
 someuser@blade063:~$ sleep 15
 someuser@blade063:~$ ls
 bin  local  man  scratch  share  src	test-script  test-script.e606  test-script.o606  tmp
 someuser@blade063:~$ cat ./test-script.o606
 blade122
  15:53:58 up 42 days, 22:58,  0 users,  load average: 0.10, 0.14, 0.10
 Tue Oct  9 15:53:58 EDT 2007

That is the simplest way to submit jobs. Here is a script that submits many copies of this job (and runs them on multiple machines).

 #! /bin/bash
 
 JOBCOUNT=50
 I=0
 while [[ $I -lt $JOBCOUNT ]]; do
 	qsub -l ncpus=1 /homes/someuser/test-script
 	I=$((I+1))
 done
 someuser@blade063:~$ ./pushhard 
 607.blade063
 608.blade063
 ...
 655.blade063
 656.blade063
 someuser@blade063:~$ cat ./test-script.o* | grep "^blade" | sort | uniq -c
     2 blade111
     4 blade112
     4 blade113
     4 blade114
     4 blade115
     4 blade116
     4 blade117
     4 blade118
     4 blade119
     4 blade120
     4 blade121
     8 blade122

Maui & Torque Management Quick Notes

Here are some useful commands to manage Maui and Torque:

  • "pbs_server" - main torque management process that runs on maui-torque
  • "pbs_mom" - torque server process that runs on each of the worker nodes
  • "/usr/local/maui/sbin/maui" - scheduler that runs on maui-torque
  • "qterm" - emergency stop of pbs_server; kills all running jobs
  • "qterm -t quick" - emergency stop of pbs_server, but does not kill running jobs
  • "pbs_server -t hot" - restart pbs_server safely after qterm -t quick (will not restart running jobs)
  • "schedctl -k" - kill maui cleanly; restart by running /usr/local/maui/sbin/maui
  • "pbsnodes -l" - list nodes in "interesting" states
  • "pbsnodes -o nodename" - set node to offline; useful for draining a node; existing jobs continue running
  • "pbsnodes -c nodename" - set node to online state
  • "qmgr -c "create node nodename nodeopts" " - add a worker node dynamically; nodeopts sets options such as np=x, gpus=x, ...
  • "qmgr -c "delete node nodename" " - delete a worker node dynamically

Some useful external references:


Dependencies