[[PageOutline]] = Introduction = The QCG-Computing service is an open source service acting as a computing provider and offering on-demand access to computing resources and jobs over the HPC Basic Profile compliant Web Services interface. In addition QCG-Computing offers a remote interface for Advance Reservations management. Within !QosCosGrid the QCG-Notification service is widely used for brokering various types of notification messages related to the state of a job (e.g. including predefined status of a job or snippet from the the job's output file). This document describes installation of the both QCG services: QCG-Computing and QCG-Notification. These services should be deployed on the same machine (or virtual machine) that: * has at least 1GB of memory (recommended value: 2 GB) * has 10 GB of free disk space (most of the space will be used for the log files) * has any modern CPU (if you plan to use virtual machine you should dedicate to it one or two cores from the host machine) * is running under: * Centos 7 (in most cases the provided RPMs should work with any operating system based on Redhat Enterpise Linux 7) = Prerequisites = We assume that you have a local resource manager/scheduler already installed. The QCG services are typically installed on a submit machine for the scheduling system. Since version 2.4 the QCG-Computing services discover installed applications using the [http://modules.sourceforge.net/ Environment Modules] package. For this reason you should install modules on the QCG-Computing host and mount directories that contain all module files used at your cluster as well as make sure that a user `qcg-comp` can see all these modules. The QCG services do not require from you to install any QCG component on the worker nodes, however the provided application wrapper scripts, that are typically used by QCG, need the following software to be available on worker nodes: * bash, * rsync, * zip/unzip, * dos2unix, * python. These packages are usually available out of the box on most of the HPC systems. Both services, QCG-Notification and QCG-Computing, require access to a database over ODBC. In most cases, the database is located on the same system as services. Currently the PostgreSQL database and UnixODBC are supported. To install them on CentOS Linux invoke: {{{ yum install postgresql postgresql-server yum install unixODBC postgresql-odbc }}} == Shared file system == Deployment of QCG-Computing requires usually two shared file systems in the cluster: * "Users' directories" - shared between the QCG host and all worker nodes. Used for storing jobs' sandbox directories. It can be either HOME or scratch file system. You can read more about this [[ScratchOrNotToScratch| here]]. * "Applications scripts" - shared between all worker nodes. Used for storing [[ApplicationScripts|Applications Scripts]] = Firewall configuration = In order to expose the !QosCosGrid services externally you need to open the following incoming ports in the firewall: * 19000 (TCP) - QCG-Computing * 19001 (TCP) - QCG-Notification * 2811 (TCP) - GridFTP server * 20000-25000 (TCP) - GridFTP port-range (if you want to use different port-range adjust the `GLOBUS_TCP_PORT_RANGE` variable in the `/etc/xinetd.d/gsiftp` file) You may also want to allow SSH access from white-listed machines (for administration purpose only). The following outgoing trafic should be allowed in general: * NTP, DNS, HTTP, HTTPS services * gridftp (TCP ports: 2811 and port-ranges: 20000-25000) = QCG-Notification = == Installation == QCG-Notification may be installed using Yum Package Manager from RPMs. The procedure is as follows: * At first you need to install the QCG repository: {{{ rpm -Uvh http://www.qoscosgrid.org/qcg-packages/centos7/x86_64/qcg-repo-unstable-1.0.0-1.centos7.noarch.rpm }}} * install QCG-Notification using YUM Package Manager: {{{ #!div style="font-size: 90%" {{{#!sh yum install qcg-ntf qcg-ntf-logrotate }}} }}} == Configuration == The first step is to configure QCG-Notification database using provided script: {{{ #!div style="font-size: 90%" {{{#!sh /usr/share/qcg-ntf/tools/qcg-ntf-install.sh Welcome to qcg-ntf installation script! This script will guide you through process of configuring proper environment for running the QCG-Notification service. You have to answer few questions regarding parameters of your database. If you are not sure just press Enter and use the default values. Use local PostgreSQL server? (y/n) [y]: y Database [qcg-ntf]: User [qcg-ntf]: Password [qcg-ntf]: MojeTajneHaslo Create database? (y/n) [y]: y Create user? (y/n) [y]: y Checking for system user qcg_ntf...OK Checking whether PostgreSQL server is installed...OK Checking whether PostgreSQL server is running...OK Performing installation * Creating user qcg-ntf...OK * Creating database qcg-ntf...OK * Creating database schema...OK * Checking for ODBC data source qcg-ntf... * Installing ODBC data source...OK The newly established database settings must be reflected in the Database section of the QCG-Notification configuration file (by default /etc/qcg/qcg-ntf/qcg-ntfd.xml) Remember to add appropriate entry to /var/lib/pgsql/data/pg_hba.conf (as the first rule!) to allow user qcg-ntf to access database qcg-ntf. For instance: host qcg-ntf qcg-ntf 127.0.0.1/32 md5 and reload Postgres server. }}} }}} Add a new rule to the pg_hba.conf as requested and reload Postgres: {{{ #!div style="font-size: 90%" {{{#!sh vim /var/lib/pgsql/data/pg_hba.conf systemctl reload postgresql }}} }}} Now minor updates should be be also applied to the QCG-Notification main configuration file located in: `/etc/qcg/qcg-ntf/qcg-ntfd.xml`. You will propably need to change the ''Host'' parameter (in most cases it must be an external address, also do not use 0.0.0.0 wildcard address) as well as the ''Password'' parameter for the database connection. A part of the configuration file with marked key parameters is presented below: {{{ #!div style="font-size: 90%" {{{#!xml .... host.example.com 19001 true .... true qcg-ntf qcg-ntf qcg-ntf false .... }}} }}} == Running the service == The QCG-Notification startup script is available in standard systemd paths: {{{ #!div style="font-size: 90%" {{{#!sh systemctl start qcg-ntfd }}} }}} The service logs can be found in: {{{ #!div style="font-size: 90%" {{{#!sh /var/log/qcg/qcg-ntf/qcg-ntfd.log }}} }}} It could be then stopped with the following command: {{{ #!div style="font-size: 90%" {{{#!sh systemctl stop qcg-ntfd }}} }}} '''Note:''' `qcg-ntfd` will be started with the `qcg_ntf` user permissions. = QCG-Computing = == Preparation of the environment == ==== CA and host certificates ==== At first install all need trusted CA certificates ([http://apps.man.poznan.pl/trac/qcg/wiki/CA%20certificates instruction]). Moreover we assume that the X.509 host certificate (signed by your local [http://www.eugridpma.org/members/worldmap/ Certificate Authority]) and key is already installed in the following locations: * `/etc/grid-security/qcg-compcert.pem` * `/etc/grid-security/qcg-compkey.pem` In case where QCG-Computing is run from unprivileged account, these files must be owned by the same account. Because during the installation, the qcg-comp account is created we suggest to use this account as owner of certificate and key files. ==== Other ==== Most of the grid services and security infrastructures are sensitive to time skews. Thus we recommend to install a Network Time Protocol daemon or use any other solution that provides accurate clock synchronization. Also disable automatic packages update as it may hurt running system. == Installation == If it is not yet installed, install the QCG repository: {{{ rpm -Uvh http://www.qoscosgrid.org/qcg-packages/centos7/x86_64/qcg-repo-unstable-1.0.0-1.centos7.noarch.rpm }}} Install the qcg-comp packages: {{{ yum install qcg-comp qcg-comp-client qcg-comp-logrotate }}} == Database initialization == Setup the QCG-Computing database using the provided script: {{{ #!div style="font-size: 90%" {{{#!sh /usr/share/qcg-comp/tools/qcg-comp-install.sh Welcome to qcg-comp installation script! This script will guide you through process of configuring proper environment for running the QCG-Computing service. You have to answer few questions regarding parameters of your database. If you are not sure just press Enter and use the default values. Use local PostgreSQL server? (y/n) [y]: y Database [qcg-comp]: User [qcg-comp]: Password [RAND-PASSWD]: MojeTajneHaslo Create database? (y/n) [y]: y Create user? (y/n) [y]: y Checking for system user qcg-comp...OK Checking whether PostgreSQL server is installed...OK Checking whether PostgreSQL server is running...OK Performing installation * Creating user qcg-comp...OK * Creating database qcg-comp...OK * Creating database schema...OK * Checking for ODBC data source qcg-comp... * Installing ODBC data source...OK Remember to add appropriate entry to /var/lib/pgsql/data/pg_hba.conf (as the first rule!) to allow user qcg-comp to access database qcg-comp. For instance: host qcg-comp qcg-comp 127.0.0.1/32 md5 and reload Postgres server. }}} }}} Add a new rule to the pg_hba.conf as requested: {{{ #!div style="font-size: 90%" {{{#!sh vim /var/lib/pgsql/data/pg_hba.conf systemctl reload postgresql }}} }}} == Authorization modules == For testing purposes or if your user community is small enough to maintain it manually you can use a plain grid mapfile which provides static mapping between user's certificate Distinguish Name and a local account: {{{ #!div style="font-size: 90%" {{{#!default #for test purpose only add mapping for your account echo '"MyCertDN" myaccount' >> /etc/grid-security/grid-mapfile }}} }}} For the single account submit configuration, all DNS's should be mapped onto the same, qcg-comp account. Additionally the special entry for QCG-Broker must be put to the grid mapfile: {{{ #!div style="font-size: 90%" {{{#!default echo '"/C=PL/O=GRID/O=PSNC/CN=qcg-broker/broker.compat.qcg.psnc.pl" qcg-comp' >> /etc/grid-security/grid-mapfile }}} }}} This is a DN that will be used by the QCG-Broker service to periodically obtain a report about currently available resources & accounts. == Getting the DRMAA library == The QCG-Computing service use DRMAA compilant interface for the batch job submission. Thus you need to install a library appropriate for your system. The latest version of the SLURM DRMAA library can be downloaded from the [[https://git.man.poznan.pl/stash/scm/qcg/slurm-drmaa.git| Git repository]]. ==== Prerequisites ==== The following package should be installed to build SLURM DRMAA library: {{{ #!div style="font-size: 90%" {{{#!sh yum install autoconf automake libtool m4 bison gperf ragel hiredis-devel git }}} }}} ==== Build & install ==== {{{ #!div style="font-size: 90%" {{{#!sh git clone https://git.man.poznan.pl/stash/scm/qcg/slurm-drmaa.git cd slurm-drmaa ./configure --prefix=/opt/qcg/dependencies --sysconfdir=/opt/qcg/dependencies/etc CFLAGS=-fstack-protector-all make clean all sudo make install }}} }}} ==== Configuration ==== The example configuration file is created in the destination directory {{{ #!div style="font-size: 90%" {{{#!sh sudo cp /opt/qcg/dependencies/etc/slurm_drmaa.conf.example /opt/qcg/dependencies/etc/slurm_drmaa.conf }}} }}} The default settings should be appropriate for most installations. ==== Slurm notifications ==== The Slurm DRMAA library traces status of jobs submitted to a scheduling system by polling Slurm about a current status of a job. To minimize the number of queries, the `qcg-comp-slurm-redis-notifier` package has been developed. It contains a script that tracks the Slurm controller logs and pushes notifications to the local Redis database about jobs that changed it state. The Slurm DRMAA library registers for the Redis notifications and waits until they come. To use this mechanism, the following packages must be installed: {{{ #!div style="font-size: 90%" {{{#!sh yum install redis python-redis systemctl enable redis systemctl start redis yum install qcg-comp-slurm-redis-notifier }}} }}} The path to the Slurm controller logs should be configured in the `/etc/qcg/qcg-comp/qcg-slurm-redis-notifier.json` file. Now, the notifier service is ready to start: {{{ #!div style="font-size: 90%" {{{#!sh systemctl start qcg-slurm-redis-notifier }}} }}} The log file of the notifier service is stored in `/var/log/qcg/qcg-comp/qcg-slurm-redis-notifier.log`. == Service configuration == Edit the preinstalled service configuration file (`/etc/qcg/qcg-comp/qcg-comp`**d**`.xml`): {{{ #!div style="font-size: 90%" {{{#!xml /usr/lib64/qcg-core/modules/ /usr/lib64/qcg-comp/modules/ /var/log/qcg/qcg-comp/ /var/log/qcg/qcg-comp/qcg-compd.log INFO frontend.example.com 19000 false /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem /etc/grid-security/grid-mapfile http://frontend.example.com:19001/ /etc/qcg/qcg-comp/application_mapfile qcg-comp qcg-comp qcg-comp qcg-comp hpc.example.com QCG enabled cluster }}} }}} In most cases it should be enough to change only the following elements: `Transport/Module/Host` :: the hostname of the machine where the service is deployed. You can put here `0.0.0.0` if you want to listen on all interfaces. `Transport/Module/Authentication/Module/X509CertFile` and `Transport/Module/Authentication/Module/X509KeyFile` :: Path to the certificate and key files (for single submit account these files must be owned by the `qcg-comp` user). `Module[type="smc:notification_wsn"]/Module/ServiceURL` :: the localhost URL of the QCG-Notification service (In most cases this is the same address as the QCG-Computing service) `Module[type="submission_drmaa"]/@path` :: path to the DRMAA library (the `libdrmaa.so`). `Module[type="general_python"]/usr/lib64/qcg-comp/modules/python/monitoring.py` :: path to the monitoring module which gathers general information about currenlty available modules `Module[type="general_python"]/usr/lib64/qcg-comp/modules/python/modules_info.py` :: path to the plugin which gathers information about currenlty available environment modules; if enabled the `environment-modules` package '''must''' be installed and configured `Module[type="general_python"]/usr/lib64/qcg-comp/modules/python/plgrid_info.py` :: path to the module which gathers extended information about system, such as: available and default grants, users scratch directory; developed for PL-Grid infrastructure (integrated with qcg-gridmapfilegenerator) but can also be used in other infrastructures to report for example user scratch directories. To report users scratch directory, this module should be uncommented. For non-PL-Grid sites, the script `/usr/share/qcg-comp/tools/setup-plgrid-plugin.sh` should be executed to create necessary files. `Module[type="general_python"]/usr/lib64/qcg-comp/modules/python/node_types.py` :: path to the module reporting available node types (developed for Compat infrastructure); Every QCG-Computing service communicating with Compat QCG-Broker instance should have enabled this module. The configuration file (/etc/qcg/qcg-comp/nodes.conf) contains mapping between node type names and slurm node’s features. In Compat the node type is a class of nodes that have similar configuration (the similar performance). `Module[type="reservation_python"]/@path` :: path to the reservation module. Change this if you are using different scheduler than Maui (e.g. use `reservation_moab.py` for Moab, `reservation_pbs.py` for PBS Pro) `Database/Password` :: the `qcg-comp` database password generated in the earlier step by the qcg-comp-install.sh script `SetuidEnabled` :: set this to `false` if QCG-Computing service should be run on unprivileged account and all jobs should be submitted to the scheduling system from this single account; if set to `false`, the startup service script (/usr/lib/systemd/system/qcg-compd.service) should be modified to launch service from a different than `root` startup account. `UseScratch` :: This element should be set to `true` if jobs shall start in other than home directory of the user. When `SetuidEnabled` is set to `false`, this also means that all jobs will be started from the subdirectory of the `qcg-comp` user's home directory (`/var/log/qcg/qcg-comp` by default). The `QCG_SCRATCH_DIR_ROOT` environment variable should be set in `/etc/sysconfig/qcg-compd` file and point to the root directory of user's scratch directories. For example, if `QCG_SCRATCH_DIR_ROOT=/var/scratch`, and `SetuidEnabled` set to `false` (with default `qcg-comp` as a unprivileged QCG-Computing account), all jobs will be started in `/var/scratch/qcg-comp` directory. `FactoryAttributes/CommonName` :: a common name of the cluster (e.g. reef.man.poznan.pl). You can use any name that is unique among all systems (e.g. cluster name + domain name of your institution) `FactoryAttributes/LongDescription` :: a human readable description of the cluster ==== Creating applications' script space ==== A common case for the QCG-Computing service is that an application is accessed using an abstract app name rather than specifying absolute executable path. The application name/version to executbale path mappings are stored in the file `/etc/qcg/qcg-comp/application_mapfile`: {{{ cat /etc/qcg/qcg-comp/application_mapfile # ApplicationName ApplicationVersion Executable bash * /opt/exp_soft/qcg/qcg-app-scripts/apps/bash.app }}} It is also common to provide here wrapper scripts rather than target executables. The wrapper script can handle such aspects of the application lifetime like: environment initialization, copying files from/to scratch storage and application monitoring. It is recommended to create separate directory for those wrapper scripts (e.g. the application partition) for an applications. This directory must be readable by all users and from every worker node (the application partition usually fullfils those requirements). You **must** provide at least mapping for the 'bash' application. To install the basic set of application scripts (including 'bash' application): {{{ #!div style="font-size: 90%" {{{#!sh yum install qcg-appscripts }}} }}} Edit the configuration file `/etc/qcg/qcg-comp/app-scripts/config` and set `cluster_shared_path` with path to the created directory accessible from all worker nodes. To deploy scripts to the shared path, execute: {{{ #!div style="font-size: 90%" {{{#!sh qcg-appscripts-deploy }}} }}} The last step is to edit `/etc/qcg/qcg-comp/application_mapfile` file and set a proper path to deployed `*.app` files. Please read more on [ApplicationScripts Application Scripts]. == Starting the service == As root type: {{{ systemctl start qcg-compd }}} The service logs can be found in: {{{ /var/log/qcg/qcg-comp/qcg-compd.log }}} **Note:** In current version, whenever you restart the PosgreSQL server, you need also restart the QCG-Computing and QCG-Notification services: {{{ systemctl restart qcg-compd systemctl restart qcg-ntfd }}} == Stopping the service == The service can be stopped using the following command: {{{ systemctl stop qcg-compd }}} == Verifying the installation == * Edit the QCG-Computing client configuration file (`/etc/qcg/qcg-comp/qcg-comp.xml`): * set the `Host` and `Port` to reflects the changes in the service configuration file (`qcg-compd.xml`). {{{ /usr/lib64/qcg-core/modules/ /usr/lib64/qcg-comp/modules/ httpg://frontend.example.com:19000/ }}} * Initialize your credentials: {{{ grid-proxy-init -rfc Your identity: /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski Enter GRID pass phrase for this identity: Creating proxy .................................................................. Done Your proxy is valid until: Wed Apr 6 05:01:02 2012 }}} * Query the QCG-Computing service: {{{ qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way true IT cluster IT department cluster for public use 0 1 worker.example.com x86_32 41073741824 http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ogf.org/hpcp/2007/01/bp/BasicFilter http://schemas.qoscosgrid.org/comp/2011/04 http://example.com/SunGridEngine http://localhost:2211/ }}} * Submit a sample job: {{{ qcg-comp -c -J /usr/share/qcg-comp/doc/examples/jsdl/sleep.xml Activity Id: ccb6b04a-887b-4027-633f-412375559d73 }}} * Query it status: {{{ qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Executing qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Executing qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Finished exit status = 0 }}} * Submit a job which produces some output: {{{ $ qcg-comp -c -J /usr/share/qcg-comp/doc/examples/jsdl/date.xml Activity Id: 591effa9-143d-4cae-9dd9-02e40f760448 $ qcg-comp -s -a 591effa9-143d-4cae-9dd9-02e40f760448 status = Queued $ qcg-comp -s -a 591effa9-143d-4cae-9dd9-02e40f760448 status = Finished (exit status = 0) $ qcg-comp -o -J /usr/share/qcg-comp/doc/examples/jsdl/date.xml File /tmp/date.staged.out staged out. All files staged out. $ cat /tmp/date.staged.out Mon Jul 29 02:23:33 HST 2013 }}}