QosCosGrid Components and Architecture

The QosCosGrid stack consists of a set of components playing different roles in a grid/cloud computing. On this page we present general description of main QosCosGrid components as well as description of basic relations between those in a typical QosCosGrid scenario. If you need more detailed information about functionality as well as configuration and installation procedures of particular QCG components visit their home pages:

Component Main function Home Page
QCG-Computing Basic Execution Service (BES) supporting advance reservation (more info)   QCG-Computing Home Page
QCG-Broker Resource management and brokering service (more info)   QCG-Broker Home Page
QCG-Client Text-based client for QCG (more info)   QCG-Client Home Page
QCG-Icon Lightweight desktop client for QCG (more info)   QCG-Icon Home Page
QCG-Now A new desktop client that is going to replace QCG-Icon   QCG-Now Home Page
QCG-Notification Notification capabilities based on WS-Notification (more info)   QCG-Notification Home Page
QCG-Coordinator Supports QCG-Computing in cross-cluster execution of jobs (more info)   QCG-Coordinator Home Page
QCG-Tools Various elements extending the QCG stack   QCG-Tools Home Page
QCG-Nagios Nagios probes for the QCG stack   QCG-Nagios Home Page

Architecture

The following diagram presents a general architecture of the QosCosGrid middleware. Information about individual components is available by clicking on the corresponding objects.

Cross-cluster communication End-user information QCG Data Movement QCG Data Movement QCG Broker QCG-Computing QCG-Computing QCG-Notification QCG-Notification

In a nutshell, the QosCosGrid middleware consists of two logical levels: grid domain and administrative domain. Grid-level services control, schedule and generally supervise the execution of end-users applications, which are spread between independent administrative domains. The administrative domain represents a single resource provider (e.g. HPC or datacenter) participating in a certain Grid or Cloud environment by sharing its computational resources, e.g. computing clusters, with both local and external end-users. The logical separation of administrative domains corresponds with the fact that they are possessed by different institutions or resource owners. Each institution contributes its resources for the benefit of the entire Grid or Cloud, while controlling its own administrative domain and own resource allocation/sharing policies.

The key component of every administrative domain in QosCosGrid is the QCG-Computing, which gives the remote access to queuing systems resources. QCG-Computing supports advance reservations, parallel execution environments - OpenMPI, ProActive and MUSCLE with coordinators responsible for synchronization of cross-cluster executions (see Cross-cluster communication) and QCG Data Movement services for managing input and output data. The another relevant service at the administrative domain is in charge of notification mechanism and it is called QCG-Notification. These services are tightly integrated and connected to the Grid-level services. The critical service on that level is QCG-Broker, which is a meta-scheduling framework controlling executions of applications on the top of queuing systems via QCG-Computing services.

Core Components

QCG-Computing

QCG-Computing (the successor of the OpenDSP project) is an open architecture implementation of SOAP Web service for multi-user access and policy-based job control routines by various queuing and batch systems managing local computational resources. This key service in QosCosGrid is using Distributed Resource Management Application API (DRMAA) to communicate with the underlying queuing systems. QCG-Computing has been designed to support a variety of plugins and modules for external communication as well as to handle a large number of concurrent requests from external clients and services. Consequently, it can be used and integrated with various authentication, authorization and accounting services or to extend capabilities of existing e-infrastructures based on Unicore, gLite, Globus Toolkit, and others. QCG-Computing service is compliant with the OGF HPC Basic Profile specification, which serves as a profile over Open Grid Forum standards like JSDL and OGSA Basic Execution Service. In addition, it offers remote interfaces for advance reservation management, and supports basic file transfer mechanisms. QCG-Computing was successfully tested with the following queuing systems: Sun Grid Engine (SGE), Platform LSF, Torque/Maui, PBS Pro, Condor, Apple XGrid and LoadLeveler. Therefore, as a crucial component in QosCosGrid, it can be easily set up on the majority of computing clusters and supercomputers running the aforementioned queuing systems. Currently, advance reservation capabilities in QCG-Computing are exposed for SGE, Platform LSF and Maui (a scheduler that is typically used in conjunction with Torque). Moreover, generic extensions for advance reservation have been proposed for the next DRMAA standard release.

For more information see  QCG-Computing Home Page

QCG-Coordinator

QCG-Coordinator supports the spawning of parallel application processes on co-allocated computational resources. Since the standard methodologies used in MPI or ProActive are adjusted to run jobs only on a single machine, it was needed to provide additional entity to support multi-cluster runs.

For more information see  QCG-Coordinator Home Page

QCG-Notification

QCG-Notification is an open-source implementation of the family of WS-Notification standards (version 1.3). In the context of QosCosGrid, it is used mainlty to extend features provided by QCG-Computing and QCG-Broker by adding standards-based synchronous and asynchronous notification features. It plays also a role of an intermediary component in sending e-mail and XMPP notifications to QCG users. QCG-Notification supports the topic-based publish/subscribe pattern for asynchronous message exchange among Web services and other entities, in particular services or clients that want to be integrated with QosCosGrid. The main part of QCG-Notification is based on a highly efficient, extended version of the NotificationBroker, managing all items participating in notification events. Today, QCG-Notification offers sophisticated notification capabilities, e.g. topic and message content notification filtering, pull-and-push styles of transporting messages. QCG-Notification has been integrated with a number of communication protocols as well as various Web Service security mechanisms. The modular architecture of QCG-Notification makes it relatively straightforward to develop new extensions and plugins to meet new requirements.

For more information see  QCG-Notification Home Page

QCG-Broker

QCG-Broker is a successor of the Grid Resource Management System (GRMS) project. QCG-Broker was designed to be an open-source meta-scheduling framework that allows developers to build and easily deploy resource management systems to control large-scale distributed computing infrastructures running queuing or batch systems locally. Based on dynamic resource selection, advance reservation and various scheduling methodologies, combined with feedback control architecture, QCG-Broker deals efficiently with various meta-scheduling challenges, e.g., co-allocation, load-balancing among clusters, remote job control, file staging support or job migration. The main goal of QCG-Broker was to manage the whole process of remote job submission and advance reservation to various batch queuing systems and subsequently to underlying clusters and computational resources. It has been designed as an independent core component for resource management processes which can take advantage of various low-level core and grid services and existing technologies, such as QCG-Computing or QCG-Notification, as well as various grid middleware services such as gLite, Globus or Unicore. Addressing various demanding computational needs of large-scale complex simulations, which in many cases can exceed capabilities of a single cluster, QCG-Broker can flexibly distribute and control applications onto many computing clusters or supercomputers on behalf of end users. Moreover, owing to some built-in metascheduling procedures it can optimize and run efficiently a wide range of applications while at the same time increasing the overall throughput of computing e-infrastructures. Advance reservation mechanisms are used to create, synchronize and simultaneously manage the co-allocation of computing resources located at different Administrative Domains. The XML-based job definition language Job Profile makes it relatively easy to specify the requirements of large-scale parallel applications together with the complex parallel communication topologies. Consequently, application developers and end users are able to run their experiments in parallel over multiple clusters as well to perform various benchmark-based experiments as alternative topologies are taken into account during meta-scheduling processes in QCG-Broker.

For more information see  QCG-Broker Home Page

Data Movement

As many other e-infrastructures controlled by middleware services, QosCosGrid takes advantage of the GridFTP protocol for large data transfer operations, in particular to stage in and stage out files for advanced simulations. GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide area networks. It is a de facto standard for all data transfers in grid and cloud environments and extends the standard FTP protocol with functions such as third-party transfer, parallel and striped data transfer, self-tuning capabilities, X509 proxy certificate-based security, support for reliable and restartable data transfers. The development of GridFTP is coordinated by the GridFTP Working Group under the hood of the Open Grid Forum community.

Attachments