[[PageOutline]] = The QCG-Accounting Agent = == Architecture == [[Image(QCG-Accounting.png, 800px)]] == Installation == You can install the package using the !QosCosGrid yum repository: {{{ yum install qcg-accounting qcg-accounting-logrotate }}} == Configuration == The whole configuration of QCG-Accounting agent is stored in single properties files (`/etc/qcg/qcg-acc/config.properties`). List of configuration properties: === Common === * qcg.site.name - your GOCDB site name, * qcg.batch.server - hostname where the batch server is running, * qcg.parser.plugin - the name of the log parser plugin (e.g. `pbs`). Delete this property if the agent has no access to LRMS logs, * qcg.publishers.plugins - the coma separated list of publisher plugins (e.g. `bat,apel`) * qcg.debug - if set to true produce more verbose messages, * qcg.state.dir - local state directory (default: `/var/run/qcg/qcg-acc/`), * qcg.max.delay - maximum random delay before reporting starts, the random delay was introduced in order to avoid sending reports by all sites in the same time, * qcg.subprocess.timeout - the general timeout after which child processes started by QCG-Accounting (e.g. log parsers) will be killed; the timeout value less than 0 means that the timeout will be disabled (default -1), * qcg.default.vo - the default VO name sent in case no FQAN was available (default: "vo.plgrid.pl"), e.g. job was submitted using non-VOMS proxy, * qcg.db.pass - password of the QCG-Computing database (see `` section of the `qcg-compd.xml` file), If your database setup is not standard you may need to configure also the following properties: * qcg.db.host - QCG-Computing database host, * qcg.db.port - QCG-Computing database port, * qcg.db.name - QCG-Computing database name, * qcg.db.user - QCG-Computing database name. * qcg.db.max.days - Limit processed job records to those that has started N days ago (Default is to look 90 days back) * qcg.db.max.records - The maximum number of jobs processed in single run (default 5000) Also if you want to report job as different (by default the QCG-Accounting agent tries to guess local hostname automatically) submit host than you may want to set the following property: * qcg.submit.host=host.second.alias === Parser plugins === ==== SLURM plugin - slurm ==== No configuration needed. The plugin assumes that the `sacct` command is usable on the qcg machine. ==== PBS Pro and Torque log parser - pbs ==== * qcg.pbs.home - the root of the Torque/PBS Pro spool directory (e.g. `/var/torque`). * qcg.pbs.max.days - max number of days to look back into the past (default: 7 days). === Publishers plugins === ==== BAT publisher (PL-Grid only) - bat ==== At first you must ask the BAT administrator to provide you all credentials (username/password and X.509 certificate) needed to connect to the BAT. Copy the received keystore into the file `/etc/qcg/qcg-acc/truststore.ts` (make sure that this file is only readable by root). * qcg.bat.user and qcg.bat.pass - put here values provided by the BAT administrator * qcg.bat.keystore.pass - keystore pass (provided with key by the BAT administrator) * qcg.bat.test - enables test mode (i.e. do not send records to BAT broker) - default: false. * qcg.bat.grid.only - set this to `true` if you do not want to report LRMS specific job information. ==== APEL SSM publisher - apel ==== * Install APEL SSM2 from the UMD-3 repository: {{{ yum install apel-ssm }}} * Make sure that `/var/spool/apel/outgoing/` exists: {{{ mkdir /var/spool/apel/outgoing/ chmod 0700 /var/spool/apel/outgoing/ }}} * In the GOCDB add new endpoint for QCG host of '''gLite-APEL''' service type providing its Host DN (need to be able to publish records to APEL). * In the GOCDB add new endpoint for QCG host of '''APEL''' service type providing its Host DN (neeed to be monitored by Nagios). * Update the configuration of APEL SSM2 in the /etc/apel/sender.cfg file: * Configure APEL SSM2 to publish CPU Accounting data [[https://wiki.egi.eu/wiki/APEL/SSM2Configuration|instruction]], * Provide appropriate authentication / authorization data, * Adjust logging. **IMPORTANT:** If you want to pipe output of the ssmsend command to the QCG-Accounting log file, ensure that the console output is enabled (`console: true`). Then configure the plugin itself: * qcg.ssm.msg.dir - directory for outgoing usage record messages (default: `/var/spool/apel/outgoing/`), * qcg.ssm.benchmark.type - benchmark name: either Si2k or HEPSPEC, * qcg.ssm.benchmark.value - benchmark value (if cluster is composed of machines various types provide here weighted mean), * qcg.ssm.site.name - site name as reported to APEL (optional). Default: `qcg.site.name`, * qcg.ssm.max.records - the maximum number of records sent int single message (default: max 1000 records per file), * qcg.ssm.safe.mode - do not run APEL publisher if there are some old unsent records: (default: true), * qcg.ssm.send.timeout - the timeout after which the ssmsend program will be killed; if the value is less than 0, the timeout will be disabled; otherwise the value will overwrite the value specified in qcg.subprocess.timeout (default -1). ** Known Issues ** - The QCG-Accounting **must** be installed on different machine than the glite-APEL and UNICORE SSM publishers otherwise reports may get overridden. ==== Grid-SAFE publisher - gridsafe ==== The Gird-SAFE publisher plugin was developed within the [http://www.mapper-project.eu/web/guest MAPPER project] to simplify gathering accounting data from many infrastructures (EGI, PRACE and campus resources). Steps needed to configure the GRID-SAFE plugin: * you can use the host cert-key pair to authenticate in the Grid-SAFE RUPI service, but first you need to convert it into the PKCS12 format. You must report your host DN to the Grid-SAFE administrator {{{ openssl pkcs12 -export -descert -inkey /etc/grid-security/hostkey.pem -in /etc/grid-security/hostcert.pem -out /etc/qcg/qcg-acc/hostcred.p12 -name "HOST Certificate" chmod 0400 /etc/qcg/qcg-acc/hostcred.p12 }}} * you can use the example configuration: {{{ qcg.gridsafe.url=https://gridsafe-mapper.drg.lrz.de:8443/axis2/services/RUPIService qcg.gridsafe.keystore=/etc/qcg/qcg-acc/hostcred.p12 qcg.gridsafe.keystore.pass=gridsafepass qcg.gridsafe.truststore=/etc/qcg/qcg-acc/gridsafe-truststore.jks qcg.gridsafe.truststore.pass=storepass qcg.gridsafe.truststore.type=jks #send usage report only about the following users qcg.gridsafe.filter.userdn.file=http://gridsafe-mapper.drg.lrz.de/gridsafe/mapper.users }}} * or configure it manually: * qcg.gridsafe.url - URL of the Grid-SAFE RUPI !WebService (e.g. https://gridsafe-mapper.drg.lrz.de:8443/axis2/services/RUPIService), * qcg.gridsafe.keystore - path to the keystore file for the RUPI plugin, * qcg.gridsafe.keystore.pass - password to access the keystore, * qcg.gridsafe.keystore.type - type of the keystore: pkcs12 or jks (default is pkcs12), * qcg.gridsafe.truststore - path to the truststore file for the RUPI plugin, * qcg.gridsafe.truststore.pass - password to access the truststore, * qcg.gridsafe.truststore.type - type of the truststore: pkcs12 or jks (default is pkcs12). == Filters == * qcg.PUBLISHER.filter.userdn - send usage records only for jobs with the given X.509 DN * qcg.PUBLISHER.filter.userdn.file - send usage record only for jobs with the X.509 DN's listed in the given file (the file location can be an URL stream, e.g. http://gridsafe-mapper.drg.lrz.de/gridsafe/mapper.users) * qcg.PUBLISHER.filter.project - send usage record only for jobs with the given project Id (grant) * qcg.PUBLISHER.filter.project.file - send usage records only for jobs with project id (grant) listed in the given file == Troubleshooting == The QCG-Accounting Agent stores all diagnostic information in the following log file: `/var/log/qcg/qcg-acc/qcg-accounting.log`. You may also try to set the `qcg.debug` configuration property to `true` in order to get more verbosity of log messages. == Migration from version 2.X to 3.0 == * stop cron temporary and make sure that no QCG-Accounting process is running {{{ /etc/init.d/crond stop ps -AF | grep QCGAcc }}} * backup `/opt/plgrid/var/run/qcg-acc/`: {{{ cp -r /opt/plgrid/var/run/qcg-acc/ /opt/plgrid/var/run/qcg-acc.copy }}} * update qcg-accounting: {{{ yum update qcg-accounting qcg-accounting-logrotate }}} * copy configuration files and keystores to the new conf dir: {{{ cp /opt/plgrid/qcg/etc/qcg-acc/config.properties.rpmsave /etc/qcg/qcg-acc/config.properties cp /opt/plgrid/qcg/etc/qcg-acc/keystore.ks.rpmsave /etc/qcg/qcg-acc/keystore.ks }}} * update any paths in config.properties (i.e. `/opt/plgrid/qcg/etc/qcg-acc/` to `/etc/qcg/qcg-acc/`) * **IMPORTANT** copy last reported job ids: {{{ cp /opt/plgrid/var/run/qcg-acc.copy/* /var/run/qcg/qcg-acc/ }}} * try to run once and check for any errors (you may want to set temporary `qcg.max.delay` to 0): {{{ /usr/bin/qcg-accounting ... [ INFO] - Tue May 21 00:37:29 CEST 2013: new lastReportedID: 26957. Processing took: 0 seconds. }}} * start again cron {{{ /etc/init.d/crond start }}} = License = QCG-Accounting is released under the GPL license. For !QosCosGrid licensing details see: [[https://apps.man.poznan.pl/trac/qcg/wiki/license|QosCosGrid license]] = FAQ = * Q: I want to republish records for all jobs that started N days ago. What should I do? * A: You can do this by deleting /var/run/qcg/qcg-acc/PUBLISHER-NAME.last.id and setting `qcg.db.max.days` to the number of days back that you want to publish records. Also remember to adjust the qcg.pbs.max.days so it is not smaller than `qcg.db.max.days`. * Q: I am getting "Plugin apel throwed an exception: /var/spool/apel/outgoing//51dd5ec6 message dir not empty. Please rerun ssmsend manually java.lang.!IllegalStateException: /var/spool/apel/outgoing//51dd5ec6 message dir not empty. Please rerun ssmsend manually" but I had run ssmsend already. What is wrong? * A: Some messages may be locked. Delete the lock file and run ssmsend again: {{{ rm /var/spool/apel/outgoing/*/*.lck }}} = Release Notes = * [[export:tags/current/RELEASE_NOTES#1|Release Notes]]