On Scientific Linux DRMAA packages can be Installed via YUM repository:
yum install pbs-drmaa #Torque yum install pbspro-drmaa #PBS Proffesional
Alternatively compile DRMAA using source package downloaded from SourceForge.
After installation you need either:
- configure the DRMAA library to use Torque logs (RECOMMENDED). An example configuration of the DRMAA library (must be put into /opt/qcg/dependencies/etc/pbs_drmaa.conf file):
# pbs_drmaa.conf - Sample pbs_drmaa configuration file. wait_thread: 0, pbs_home: "/var/spool/pbs", cache_job_state: 600,
Note: Remember to mount server log directory as described in the eariler note.
or
- configure Torque to keep information about completed jobs (e.g.: by setting: qmgr -c 'set server keep_completed = 300'). If running in such configuration try to provide more resources (e.g. two cores instead of one) for the VM that hosts the service. Moreover tune the DRMAA configuration in order to throttle polling rate:
wait_thread: 0, cache_job_state: 60, pool_delay: 60,
It is possible to set the default queue by setting default job category (in the /etc/qcg/pbs_drmaa.conf file):
job_categories: { default: "-q plgrid", },
Maui/Moab configuration
Add appropriate rights for the qcg-comp and qcg-broker users in the Maui scheduler configuaration file:
vim /var/spool/maui/maui.cfg # primary admin must be first in list ADMIN1 root ADMIN2 qcg-broker ADMIN3 qcg-comp
The qcg-comp user needs ADMIN3 privileges in order to get detailed information about all users jobs (e.g. estimated start time). The qcg-broker users needs ADMIN2 privileges in order to create advance reservations (optional).
Search PATH
The service assumes that the following commands are in the standard search path:
- pbsnodes
- showres
- setres
- releaseres
- checknode
If any of the above commands is not installed in a standard location (e.g. /usr/bin) you may need to edit the /etc/sysconfig/qcg-compd file and set the PATH variable appropriately, e.g.:
# INIT_WAIT=5 # # DRM specific options PATH=$PATH:/opt/maui/bin
Logging Level
If you compiled DRMAA with logging switched on (--enable-debug) you can set in /etc/sysconfig/qcg-compd its logging level:
# INIT_WAIT=5 # # DRM specific options DRMAA_LOG_LEVEL=INFO
Restricted node access
Read this section only if the system is configured in such way that not majority of nodes are accesible using any queue/user. In such case you should provide nodes filter expression in the sysconfig file (/etc/sysconfig/qcg-compd). Examples:
- Provide information about nodes that was taged with qcg property
QCG_NODE_FILTER=properties:qcg
- Provide information about all nodes except those tagged as gpgpu
QCG_NODE_FILTER=properties:~gpgpu
- Provide information only about resources that have hp as the epoch value:
QCG_NODE_FILTER=resources_available.epoch:hp
In general the QCG_NODE_FILTER must adhere the following syntax:
pbsnodes-attr:regular-expression
or if you want to reverse semantic (i.e. all nodes except those matching the expression)
pbsnodes-attr:~regular-expression
Configuring PBS DRMA submit filter
In order to enforce PL-Grid grant policy you must configure PBS DRMAA submit filter by editing the `/etc/sysconfig/qcg-compd and adding variable pointing to the DRMAA submit filter, e.g.:
PBSDRMAA_SUBMIT_FILTER="/opt/exp_soft/plgrid/qcg-app-scripts/app-scripts/tools/plgrid-grants/pbsdrmaa_submit_filter.py"
An example submit filter can be found in QosCosGrid svn:
svn co https://apps.man.poznan.pl/svn/qcg-computing/trunk/app-scripts/tools/plgrid-grants
More about PBS DRMAA submit filters can be found here.
Restricting advance reservation
By default the QCG-Computing service can reserve any number of hosts. One can limit it by configuring the Maui/Moab scheduler and the QCG-Computing service properly:
- In Maui/Moab mark some subset of nodes, using the partition mechanism, as reservable for QCG-Computing:
# all users can use both the DEFAULT and RENABLED partition SYSCFG PLIST=DEFAULT,RENABLED #in Moab you should use 0 instead DEFAULT #SYSCFG PLIST=0,RENABLED # mark some set of the machines (e.g. 64 nodes) as reservable NODECFG[node01] PARTITION=RENABLED NODECFG[node02] PARTITION=RENABLED NODECFG[node03] PARTITION=RENABLED ... NODECFG[node64] PARTITION=RENABLED
- Tell the QCG-Computing to limit reservation to the aforementioned partition by editing the /etc/sysconfig/qcg-compd configuration file:
QCG_AR_MAUI_PARTITION="RENABLED"
- Moreover the QCG-Computing (since version 2.4) can enforce limits on maximal reservations duration length (default: one week) and size (measured in number of slots reserved):
... <ReservationsPolicy> <MaxDuration>24</MaxDuration> <!-- 24 hours --> <MaxSlots>100</MaxSlots> </ReservationsPolicy> ...
PBS JSDL Filter modules
There is no standard way in expressing requested number of cores (on any number of nodes) among the Torque/PBS Pro installations. It is also depended on version, scheduler used, and even scheduler configuration. It might be:
- -l procs=N
- -l nodes=N
- -l select=N
In case in your system number of cores requested are specified in other way than -l procs then you have to modify the configuration of the pbs_jsdl_filter module in the qcg-compd.xml file.
<sm:Module xsi:type="pbs_jsdl_filter"> <smc:TCCExpression>-l nodes=%d </smc:TCCExpression> </sm:Module>
You will learn about the qcg-compd.xml later in the manual. By now just remember that you may need to come back here later.