Version 14 (modified by mmamonski, 11 years ago) (diff) |
---|
Installation
After installing QCG repository you can install the probe with singe command:
yum install qcg-comp-nagios-probe
Usage
Usage:
/usr/libexec/grid-monitoring/probes/org.qoscosgrid/computing/check_qcg_comp -H hostname -p port -x proxy -t timeout [-v 0-3 -j test-jsdl.xml] -H hostname - QCG-Computing host -p port - QCG-Computing port -x proxy - path to the file containing valid user X509 proxy -t timeout - test timout given in seconds -v 0-3 - verbosity (default: 0) -j test-jsdl.xml - JSDL document decribing job to be tested
Example:
/usr/libexec/grid-monitoring/probes/org.qoscosgrid/computing/check_qcg_comp -H qcg.inula.man.poznan.pl -p 19000 -x /tmp/x509up_u602 -t 60 -v 4
Exit Codes
- STATUS_OK (0) - Job finished successfully
- STATUS_WARNING (1) - Job finished with exit code different than 0, Job did not finish within given timeout
- STATUS_CRITICAL (2) - Submission of a job failed. Job ended with status Failed or Cancelled.
- STATUS_UNKNOWN (3) - The probe internal or configuration error.
Interpreting error messages
- "No CA certs provided" - check if the QCG-Comp service is registered in GOCDB with url starting with httpg (not https)
- "Failed to submit a job: com.sun.xml.ws.client.ClientTransportException: HTTP transport error: java.net.SocketTimeoutException: connect timed out" - the machine is either down or there is some firewall issue between the Nagios machine and the QCG-Computing machine (The QCG-Computing service by default listen on port 19000)
- "Failed to submit a job: com.sun.xml.ws.client.ClientTransportException: HTTP transport error: java.net.ConnectException: Connection refused - the host is up but the service is down (check if the qcg-comp service is running)