Version 33 (modified by piontek, 13 years ago) (diff)

--

QosCosGrid Components and Architecture

The QosCosGrid stack consists of a set of components playing different roles in a grid/cloud computing. On this page we present general description of main QosCosGrid components as well as description of basic relations between those in a typical QosCosGrid scenario. If you need more detailed information about functionality as well as configuration and installation procedures of particular QCG components follow links presented in the table:

Component Main function Home Page
QCG-Computing Basic Execution Service (BES) with Advanced Reservation Support  link
QCG-Notification Notification capabilities based on brokered version of OASIS WS-Notification standard  link
QCG-Broker Resource Management and Brokering service  link
QCG-OpenMPI Extended version of OpenMPI library supporting cross-cluster job execution  link
QCG-Core Common library for QosCosGrid elements  link
QCG-Tools Various elements extending the QosCosGrid functionality  link

Architecture

The following diagram presents a general architecture of the QosCosGrid middleware. Information about individual components is available by clicking on the corresponding objects.

Cross-cluster communication QCG Science Gateways QCG Data Movement QCG Data Movement QCG Broker QCG-Computing QCG-Computing QCG-Notification QCG-Notification

In a nutshell, the QosCosGrid middleware consists of two logical levels: grid domain and administrative domain. Grid-level services control, schedule and generally supervise the execution of end-users applications, which are spread between independent administrative domains. The administrative domain represents a single resource provider (e.g. HPC or datacenter) participating in a certain Grid or Cloud environment by sharing its computational resources, e.g. computing clusters, with both local and external end-users. The logical separation of administrative domains corresponds with the fact that they are possessed by different institutions or resource owners. Each institution contributes its resources for the benefit of the entire Grid or Cloud, while controlling its own administrative domain and own resource allocation/sharing policies.

The key component of every administrative domain in QosCosGrid is the QCG-Computing, which gives the remote access to queuing systems resources. QCG-Computing supports advance reservations, parallel execution environments - OpenMPI, ProActive and MUSCLE with coordinators responsible for synchronization of cross-cluster executions (see Cross-cluster communication) and QCG Data Movement services for managing input and output data. The another relevant service at the administrative domain is in charge of notification mechanism and it is called QCG-Notification. These services are tightly integrated and connected to the Grid-level services. The critical service on that level is QCG-Broker, which is a meta-scheduling framework controlling executions of applicaitons on the top of queuing systems via QCG-Computing services.

Components

QCG-Computing

QCG-Computing (the successor of the OpenDSP project) is an open architecture implementation of SOAP Web service for multi-user access and policy-based job control routines by various queuing and batch systems managing local computational resources. This key service in QosCosGrid is using Distributed Resource Management Application API (DRMAA) to communicate with the underlying queuing systems. QCG-Computing has been designed to support a variety of plugins and modules for external communication as well as to handle a large number of concurrent requests from external clients and services. Consequently, it can be used and integrated with various authentication, authorization and accounting services or to extend capabilities of existing e-infrastructures based on Unicore, gLite, Globus Toolkit, and others. QCG-Computing service is compliant with the OGF HPC Basic Profile specification, which serves as a profile over Open Grid Forum standards like JSDL and OGSA Basic Execution Service. In addition, it offers remote interfaces for advance reservation management, and supports basic file transfer mechanisms. QCG-Computing was successfully tested with the following queuing systems: Sun Grid Engine (SGE), Platform LSF, Torque/Maui, PBS Pro, Condor, Apple XGrid and LoadLeveler. Therefore, as a crucial component in QosCosGrid, it can be easily set up on the majority of computing clusters and supercomputers running the aforementioned queuing systems. Currently, advance reservation capabilities in QCG-Computing are exposed for SGE, Platform LSF and Maui (a scheduler that is typically used in conjunction with Torque). Moreover, generic extensions for advance reservation have been proposed for the next DRMAA standard release.

For more information see  QCG-Computing Home Page

QCG-Notification

QCG-Notification is an open-source implementation of the family of WS-Notification standards (version 1.3). In the context of QosCosGrid, it is used to extend features provided by QCG-Computing by adding standards-based synchronous and asynchronous notification features. QCG-Notification supports the topic-based publish/subscribe pattern for asynchronous message exchange among Web services and other entities, in particular services or clients that want to be integrated with QosCosGrid. The main part of QCG-Notification is based on a highly efficient, extended version of the NotificationBroker, managing all items participating in notification events. Today, QCG-Notification offers sophisticated notification capabilities, e.g., topic and message content notification filtering, pull-and-push styles of transporting messages. QCG-Notification has been integrated with a number of communication protocols as well as various Web services security mechanisms. The modular architecture of QCG-Notification makes it relatively straightforward to develop new extensions and plugins to meet new requirements.

For more information see  QCG-Notification Home Page

QCG-Broker

The QCG-Broker is a successor of the Grid Resource Management System (GRMS) Project. QCG-Broker was designed to be an open-source meta-scheduling framework that allows developers to build and easily deploy resource management systems to control large-scale distributed computing infrastructures running queuing or batch systems locally. Based on dynamic resource selection, advance reservation and various scheduling methodologies, combined with feedback control architecture, QCG-Broker deals efficiently with various meta-scheduling challenges, e.g., co-allocation, load-balancing among clusters, remote job control, file staging support or job migration. The main goal of QCG-Broker was to manage the whole process of remote job submission and advance reservation to various batch queuing systems and subsequently to underlying clusters and computational resources. It has been designed as an independent core component for resource management processes which can take advantage of various low-level core and grid services and existing technologies, such as QCG-Computing or QCG-Notification, as well as various grid middleware services such as gLite, Globus or Unicore. Addressing various demanding computational needs of large-scale complex simulations, which in many cases can exceed capabilities of a single cluster, QCG-Broker can flexibly distribute and control applications onto many computing clusters or supercomputers on behalf of end users. Moreover, owing to some built-in metascheduling procedures it can optimize and run efficiently a wide range of applications while at the same time increasing the overall throughput of computing e-infrastructures. Advance reservation mechanisms are used to create, synchronize and simultaneously manage the co-allocation of computing resources located at different Administrative Domains. The XML-based job definition language Job Profile makes it relatively easy to specify the requirements of large-scale parallel applications together with the complex parallel communication topologies. Consequently, application developers and end users are able to run their experiments in parallel over multiple clusters as well to perform various benchmark-based experiments as alternative topologies are taken into account during meta-scheduling processes in QCG-Broker.

For more information see  QCG-Broker Home Page

Data Movement

As many other e-infrastructures controlled by middleware services, QosCosGrid takes advantage of the GridFTP protocol for large data transfer operations, in particular to stage in and stage out files for advanced simulations. GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide area networks. It is a de facto standard for all data transfers in grid and cloud environments and extends the standard FTP protocol with functions such as third-party transfer, parallel and striped data transfer, self-tuning capabilities, X509 proxy certificate-based security, support for reliable and restartable data transfers. The development of GridFTP is coordinated by the GridFTP Working Group under the hood of the Open Grid Forum community.

Attachments