Version 16 (modified by mmamonski, 8 years ago) (diff)

--

Installation

After installing  QCG repository you can install the probe with singe command:

yum install qcg-comp-nagios-probe

Usage

Usage:

/usr/libexec/grid-monitoring/probes/org.qoscosgrid/computing/check_qcg_comp
 -H hostname -p port -x proxy -t timeout [-v 0-3 -j test-jsdl.xml]

-H hostname - QCG-Computing host

-p port - QCG-Computing port

-x proxy - path to the file containing valid user X509 proxy

-t timeout - test timout given in seconds

-v 0-3 - verbosity (default: 0)

-j test-jsdl.xml - JSDL document decribing job to be tested

Example:

/usr/libexec/grid-monitoring/probes/org.qoscosgrid/computing/check_qcg_comp -H qcg.inula.man.poznan.pl -p 19000 -x /tmp/x509up_u602 -t 60 -v 4

Exit Codes

  • STATUS_OK (0) - Job finished successfully
  • STATUS_WARNING (1) - Job finished with exit code different than 0, Job did not finish within given timeout
  • STATUS_CRITICAL (2) - Submission of a job failed. Job ended with status Failed or Cancelled.
  • STATUS_UNKNOWN (3) - The probe internal or configuration error.

Interpreting error messages

  • "No CA certs provided" - check if the QCG-Comp service is registered in GOCDB with url starting with httpg (not https)
  • "Failed to submit a job: com.sun.xml.ws.client.ClientTransportException: HTTP transport error: java.net.SocketTimeoutException: connect timed out" - the machine is either down or there is some firewall issue between the Nagios machine and the QCG-Computing machine (The QCG-Computing service by default listen on port 19000)
  • "Failed to submit a job: com.sun.xml.ws.client.ClientTransportException: HTTP transport error: java.net.ConnectException: Connection refused - the host is up but the service is down (check if the qcg-comp service is running)

Adding new tests