Maui & Torque Cluster
We have set up a Maui & Torque cluster on the new blade servers at ILP to support the Big_Data project. There are 59 compute nodes and one master node. We are planning on incorporating more machines into this pool in the near future.
To access the master node, login to maui-torque.bigdata. The command "diagnose -n" will give a quick listing of the nodes in the system from Maui's point of view:
someuser@someuser-d3:~$ ssh maui-torque.bigdata someuser@blade063:~$ sudo -u maui diagnose -n diagnosing node table (5120 slots) Name State Procs Memory Disk Swap Speed Opsys Arch Par Load Res Classes Network Features blade064 Idle 2:2 12015:12015 1:1 12015:12015 1.00 DEFAUL [NONE] DEF 0.00 000 [primary_2:2] [DEFAULT] [NONE] blade065 Idle 2:2 5967:5967 1:1 5967:5967 1.00 DEFAUL [NONE] DEF 0.00 000 [primary_2:2] [DEFAULT] [NONE] ... blade121 Idle 4:4 7987:7987 1:1 14608:14608 1.00 DEFAUL [NONE] DEF 0.02 000 [primary_4:4] [DEFAULT] [NONE] blade122 Idle 4:4 7987:7987 1:1 14480:14480 1.00 DEFAUL [NONE] DEF 0.08 000 [primary_4:4] [DEFAULT] [NONE] ----- --- 218:218 459101:459101 59:59 803744:803744 Total Nodes: 59 (Active: 0 Idle: 59 Down: 0)
If you are not connected to the ILP network, you can access it via ssh through clsshsvr.pittsburgh.intel-research.net.
@clsshsvr.pittsburgh.intel-research.net clsshsvr:~> ssh maui-torque.bigdata blade063:~> sudo -u maui diagnose -n [...]
Because of a limitation in Maui's permission system, you must be user "maui" to run any of the diagnose commands. All users can sudo to "maui".
Maui & Torque Overview
Maui & Torque are a batch job management system. Torque is based on OpenPBS and acts as a resource manager while Maui acts as a scheduler. Torque takes care of keeping track of what resources are available (machines, CPUs, memory, disk, I/O, etc.) and provides low-level mechanisms for starting, stopping, and managing jobs and job queues. Although Torque comes with a basic scheduler, it is often replaced with something that utilizes the available resources more intelligently. This is Maui's job -- it communicates with Torque to get information about what resources are available in the cluster and what jobs are waiting to be executed. It then makes scheduling decisions about when it will schedule jobs to run and on which machines.
Since Maui & Torque are two separate entities, there are two sets of commands that may be used to look at the cluster's status. Below are the Torque commands with sample usage:
qsub: Submit a job into the job queue someuser@blade063:~$ qsub -l ncpus=1 /homes/someuser/test-script 432.blade063
qdel: Remove a jobs from the job queue someuser@blade063:~$ qdel 432.blade063
qstat: Inspect the state of the job queue someuser@blade063:~$ qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext T ---------------- --- --- --- --- --- --- --- --- --- --- - primary 0 0 yes yes 0 0 0 0 0 0 E
It's worth noting that all the Torque commands have man pages and 'man qsub' is surprisingly helpful. There is really only primary Maui command that is needed. It takes a whole bunch of different arguments to specify what functionality is being requested from it.
diagnose: The Maui Swiss Army knife diagnose -j: Diagnose jobs (job queue) diagnose -n: Diagnose nodes (node list) Both of these commands give information about the state of the cluster from Maui's perspective. You can see all the jobs in the job queue using 'diagnose -j' and you can see which nodes are allocated and which are idle using 'diagnose -n'.
That's about it for commands. Below are a few sample jobs just for practice.
Create a simple shell script that gathers information about the system it runs on. Here is an example:
sleep 15 hostname who uptime date
Now, this job can be submitted to a machine to run in the cluster:
someuser@blade063:~$ qsub -l ncpus=1 /homes/someuser/test-script 496.blade063 someuser@blade063:~$ sudo -u maui diagnose -j Name State Par Proc QOS WCLimit R Min User Group Account QueuedTime Network Opsys Arch Mem Disk Procs Class Features 496 Running DEF 1 DEF 00:00:00 1 1 someuser someuser - 00:00:01 [NONE] [NONE] [NONE] >=0 >=0 NC0 [primary:1] [NONE] Total Jobs: 1 Active Jobs: 1 someuser@blade063:~$ ls bin local man scratch share src test-script tmp someuser@blade063:~$ sleep 15 someuser@blade063:~$ ls bin local man scratch share src test-script test-script.e606 test-script.o606 tmp someuser@blade063:~$ cat ./test-script.o606 blade122 15:53:58 up 42 days, 22:58, 0 users, load average: 0.10, 0.14, 0.10 Tue Oct 9 15:53:58 EDT 2007
That is the simplest way to submit jobs. Here is a script that submits many copies of this job (and runs them on multiple machines).
#! /bin/bash JOBCOUNT=50 I=0 while [[ $I -lt $JOBCOUNT ]]; do qsub -l ncpus=1 /homes/someuser/test-script I=$((I+1)) done
someuser@blade063:~$ ./pushhard 607.blade063 608.blade063 ... 655.blade063 656.blade063 someuser@blade063:~$ cat ./test-script.o* | grep "^blade" | sort | uniq -c 2 blade111 4 blade112 4 blade113 4 blade114 4 blade115 4 blade116 4 blade117 4 blade118 4 blade119 4 blade120 4 blade121 8 blade122
Maui & Torque Management Quick Notes
Here are some useful commands to manage Maui and Torque:
- "pbs_server" - main torque management process that runs on maui-torque
- "pbs_mom" - torque server process that runs on each of the worker nodes
- "/usr/local/maui/sbin/maui" - scheduler that runs on maui-torque
- "qterm" - emergency stop of pbs_server; kills all running jobs
- "qterm -t quick" - emergency stop of pbs_server, but does not kill running jobs
- "pbs_server -t hot" - restart pbs_server safely after qterm -t quick (will not restart running jobs)
- "schedctl -k" - kill maui cleanly; restart by running /usr/local/maui/sbin/maui
- "pbsnodes -l" - list nodes in "interesting" states
- "pbsnodes -o nodename" - set node to offline; useful for draining a node; existing jobs continue running
- "pbsnodes -c nodename" - set node to online state
- "qmgr -c "create node nodename nodeopts" " - add a worker node dynamically; nodeopts sets options such as np=x, gpus=x, ...
- "qmgr -c "delete node nodename" " - delete a worker node dynamically
Some useful external references: