Changeset 1065 for papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex
- Timestamp:
- 05/31/13 14:37:45 (12 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex
r1062 r1065 104 104 %% \fntext[label3]{} 105 105 106 \title{DC WoRMS- a tool for simulation of energy efficiency in distributed computing infrastructures}106 \title{DCworms - a tool for simulation of energy efficiency in distributed computing infrastructures} 107 107 108 108 %% use optional labels to link authors explicitly to addresses: … … 145 145 %% Text of abstract 146 146 147 In the recent years, energy-efficiency of computing infrastructures has gained a great attention. For this reason, proper estimation and evaluation of energy that is required to execute grid and cloud workloads became an important research problem. In this paper we present a Data Center Workload and Resource Management Simulator (DCWoRMS) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.147 In the recent years, energy-efficiency of computing infrastructures has gained a great attention. For this reason, proper estimation and evaluation of energy that is required to execute data center workloads became an important research problem. In this paper we present a Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. 148 148 We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements of real servers. 149 To demonstrate DC WoRMScapabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.149 To demonstrate DCworms capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources. 150 150 151 151 \end{abstract} … … 170 170 \section{Introduction} 171 171 172 Rising popularity of large-scale computing infrastructures such as grids and cloudscaused quick development of data centers. Nowadays, data centers are responsible for around 2\% of the global energy consumption making it equal to the demand of aviation industry \cite{koomey}. Moreover, in many current data centers the actual IT equipment uses only half of the total energy whereas most of the remaining part is required for cooling and air movement resulting in poor Power Usage Effectiveness (PUE) \cite{pue} values. Large energy needs and significant $CO_2$ emissions caused that issues related to cooling, heat transfer, and IT infrastructure location are more and more carefully studied during planning and operation of data centers.172 Rising popularity of large-scale computing infrastructures caused quick development of data centers. Nowadays, data centers are responsible for around 2\% of the global energy consumption making it equal to the demand of aviation industry \cite{koomey}. Moreover, in many current data centers the actual IT equipment uses only half of the total energy whereas most of the remaining part is required for cooling and air movement resulting in poor Power Usage Effectiveness (PUE) \cite{pue} values. Large energy needs and significant $CO_2$ emissions caused that issues related to cooling, heat transfer, and IT infrastructure location are more and more carefully studied during planning and operation of data centers. 173 173 %Even if we take ecological and footprint issues aside, the amount of consumed energy can impose strict limits on data centers. First of all, energy bills may reach millions euros making computations expensive. 174 174 %Furthermore, available power supply is usually limited so it also may reduce data center development capabilities, especially looking at challenges related to exascale computing breakthrough foreseen within this decade. 175 175 176 For these reasons many efforts were undertaken to measure and study energy efficiency of data centers. There are projects focused on data center monitoring and management \cite{games}\cite{fit4green} whereas others on energy efficiency of networks \cite{networks}. Additionally, vendors offer a wide spectrum of energy efficient solutions for computing and cooling \cite{sgi}\cite{colt}\cite{ecocooling}. However, a variety of solutions and configuration options can be applied planning new or upgrading existing data centers.176 For these reasons many efforts were undertaken to measure and study energy efficiency of data centers. There are projects focused on data center monitoring and management \cite{games}\cite{fit4green} whereas others on energy efficiency of networks \cite{networks}. Additionally, vendors offer a wide spectrum of energy efficient solutions for computing and cooling \cite{sgi}\cite{colt}\cite{ecocooling}. However, a variety of solutions and configuration options can be applied planning new or upgrading existing data centers. 177 177 In order to optimize a design or configuration of data center we need a thorough study using appropriate metrics and tools evaluating how much computation or data processing can be done within given power and energy budget and how it affects temperatures, heat transfers, and airflows within data center. 178 178 Therefore, there is a need for simulation tools and models that approach the problem from a perspective of end users and take into account all the factors that are critical to understanding and improving the energy efficiency of data centers, in particular, hardware characteristics, applications, management policies, and cooling. … … 180 180 There are various tools that allow simulation of computing infrastructures. On one hand they include advanced packages for modeling heat transfer and energy consumption in data centers \cite{ff} or tools concentrating on their financial analysis \cite{DCD_Romonet}. On the other hand, there are simulators focusing on computations such as CloudSim \cite{CloudSim}. The CoolEmAll project aims to integrate these approaches and enable advanced analysis of data center efficiency taking into account all these aspects \cite{e2dc12}\cite{coolemall}. 181 181 182 One of the results of the CoolEmAll project is the Data Center Workload and Resource Management Simulator (DC WoRMS) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.182 One of the results of the CoolEmAll project is the Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. 183 183 We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements of real servers. 184 To demonstrate DC WoRMScapabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.185 186 The remaining part of this paper is organized as follows. In Section~2 we give a brief overview of the current state of the art concerning modeling and simulation of distributed systems, such as Grids and Clouds, in terms of energy efficiency. Section~3 discusses the main features of DC WoRMS. In particular, it introduces our approach to workload and resource management, presents the concept of energy efficiency modeling and explains how to incorporate a specific application performance model into simulations. Section~4 discusses energy models adopted within the DCWoRMS. In Section~5 we assess the energy models by comparison of simulation results with real measurements. We also present experiments that were performed using DCWoRMS to show various types of resource and scheduling technics allowing to decreasethe total energy consumption of the execution of a set of tasks. In Section~6 we explain how to integrate workload and resource simulations with heat transfer simulations within the CoolEmAll project. Final conclusions and directions for future work are given in Section~7.184 To demonstrate DCworms capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources. 185 186 The remaining part of this paper is organized as follows. In Section~2 we give a brief overview of the current state of the art concerning modeling and simulation of distributed systems, such as Grids and Clouds, in terms of energy efficiency. Section~3 discusses the main features of DCworms. In particular, it introduces our approach to workload and resource management, presents the concept of energy efficiency modeling and explains how to incorporate a specific application performance model into simulations. Section~4 discusses energy models adopted within the DCworms. In Section~5 we assess the energy models by comparison of simulation results with real measurements. We also present experiments that were performed using DCworms to show various types of resource and scheduling technics allowing decreasing the total energy consumption of the execution of a set of tasks. In Section~6 we explain how to integrate workload and resource simulations with heat transfer simulations within the CoolEmAll project. Final conclusions and directions for future work are given in Section~7. 187 187 188 188 \section{Related Work}\label{sota} … … 192 192 GreenCloud is a C++ based simulation environment for studying the energy-efficiency of cloud computing data centers. CloudSim is a simulation tool that allows modeling of cloud computing environments and evaluation of resource provisioning algorithms. Finally, the DCSG Simulator is a data center cost and energy simulator calculating the power and cooling schema of the data center equipment. 193 193 194 The scope of the aforementioned toolkits concerns the data center environments. However, all of them, except DC WoRMSpresented in this paper, restricts the simulated architecture in terms of types of modeled resources. In this way, they impose the use of predefined sets of resources and relations between them. GreenCloud defines switches, links and servers that are responsible for task execution and may contain different scheduling strategies. Contrary to what the GreenCloud name may suggest, it does not allow testing the impact of virtualization-based approaches. CloudSim allows creating a simple resources hierarchy consisting of machines and processors. To simulate a real cloud computing data center, it provides an extra virtualization layer responsible for the virtual machines (VM) provisioning process and managing the VM life cycle. In DCSG Simulator user is able to take into account a variety of mechanical and electrical devices as well as the IT equipment and define for each of them numerous factors, including device capacity and efficiency as well as the data center conditions.195 196 The general idea behind all of the analyzed tools is to enable studies concerning energy efficiency in distributed infrastructures. GreenCloud approach enables simulation of energy usage associated with computing servers and network components. For example, the server power consumption model implemented in GreenCloud depends on the server state as well as its utilization. The CloudSim framework provides basic models to evaluate energy-conscious provisioning policies. Each computing node can be extended with a power model that estimates the current power consumption. Within the DCSG Simulator, performance of each data center equipment (facility and IT) is determined by a combination of factors, including workload, local conditions, the manufacturer's specifications and the way in which it is utilized. In DCWoRMS, the plugin idea has been introduced that offers emulating the behavior of computing resources in terms of power consumption. Additionally, it delivers detailed information concerning resource and application characteristics needed to define more sophisticated power draw models.197 198 In order to emulate the behavior of real computing systems, green computing simulator should address also the energy-aware resource management. In this term, GreenCloud offers capturing the effects of both of the Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management schemes. At the links and switches level, it supports downgrading the transmission rate and putting network equipment into a sleep mode. CloudSim comes with a set of predefined and extensible policies that manage the process of VM migrations in order to optimize the power consumption. However, the proposed approach is not sufficient for modeling more sophisticated policies like frequency scaling techniques and managing resource power states. DCSG Simulator is told to implement a set of basic energy-efficient rules that have been developed on the basis of detailed understanding of the data center as a system. The output of this simulation is a set of energy metrics, like PUE, and cost data representing the IT devices. DC WoRMSintroduces a dedicated interface that provides methods to obtain the detailed information about each resource and its components energy consumption and allows changing its current energy state. Availability of these interfaces in scheduling plugin supports implementation of various strategies such as centralized energy management, self-management of computing resources and mixed models.199 200 In terms of application modeling, all tools, except DCSG Simulator, describe the application with a number of computational and communicational requirements. In addition, GreenCloud and DC WoRMS allow introducing the QoS requirements (typical for cloud computing applications) by taking into account the time constraints during the simulation. DCSG Simulator instead of modeling of the single application, enables the definition of workload that leads to a given utilization level. However, only DCWoRMSsupports application performance modeling by not only incorporating simple requirements that are taken into account during scheduling, but also by allowing specification of task execution time.201 202 GreenCloud, CloudSim and DC WoRMSare released as Open Source under the GPL. DCSG Simulator is available under an OSL V3.0 open-source license, however, it can be only accessed by the DCSG members.203 204 Summarizing, DC WoRMSstands out from other tools due to the flexibility in terms of data center equipment and structure definition.205 Moreover, it allows to associate the energy consumption not only with the current power state and resource utilization but also with the particular set of applications running on it. Moreover, it does not limit the user in defining various types of resource management polices. The main strength of CloudSim lies in implementation of the complex scheduling and task execution schemes involving resource virtualization techniques. However, the energy efficiency aspect is limited only to the VM management. The GreenCloud focuses on data center resources with particular attention to the network infrastructure and the most popular energy management approaches. DCSG simulator allows t o takeinto account also non-computing devices, nevertheless it seems to be hardly customizable to specific workloads and management policies.206 207 \section{DC WoRMS}194 The scope of the aforementioned toolkits concerns the data center environments. However, all of them, except DCworms presented in this paper, restricts the simulated architecture in terms of types of modeled resources. In this way, they impose the use of predefined sets of resources and relations between them. GreenCloud defines switches, links and servers that are responsible for task execution and may contain different scheduling strategies. Contrary to what the GreenCloud name may suggest, it does not allow testing the impact of virtualization-based approaches. CloudSim allows creating a simple resources hierarchy consisting of machines and processors. To simulate a real cloud computing data center, it provides an extra virtualization layer responsible for the virtual machines (VM) provisioning process and managing the VM life cycle. In DCSG Simulator user is able to take into account a variety of mechanical and electrical devices as well as the IT equipment and define for each of them numerous factors, including device capacity and efficiency as well as the data center conditions. 195 196 The general idea behind all of the analyzed tools is to enable studies concerning energy efficiency in distributed infrastructures. GreenCloud approach enables simulation of energy usage associated with computing servers and network components. For example, the server power consumption model implemented in GreenCloud depends on the server state as well as its utilization. The CloudSim framework provides basic models to evaluate energy-conscious provisioning policies. Each computing node can be extended with a power model that estimates the current power consumption. Within the DCSG Simulator, performance of each of the data center equipment (facility and IT) is determined by a combination of factors, including workload, local conditions, the manufacturer's specifications and the way in which it is utilized. In DCworms, the plugin idea has been introduced that offers emulating the behavior of computing resources in terms of power consumption. Additionally, it delivers detailed information concerning resource and application characteristics needed to define more sophisticated power draw models. 197 198 In order to emulate the behavior of real computing systems, green computing simulator should address also the energy-aware resource management. In this term, GreenCloud offers capturing the effects of both of the Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management schemes. At the links and switches level, it supports downgrading the transmission rate and putting network equipment into a sleep mode. CloudSim comes with a set of predefined and extensible policies that manage the process of VM migrations in order to optimize the power consumption. However, the proposed approach is not sufficient for modeling more sophisticated policies like frequency scaling techniques and managing resource power states. DCSG Simulator is told to implement a set of basic energy-efficient rules that have been developed on the basis of detailed understanding of the data center as a system. The output of this simulation is a set of energy metrics, like PUE, and cost data representing the IT devices. DCworms introduces a dedicated interface that provides methods to obtain the detailed information about each resource and its components energy consumption and allows changing its current energy state. Availability of these interfaces in scheduling plugin supports implementation of various strategies such as centralized energy management, self-management of computing resources and mixed models. 199 200 In terms of application modeling, all tools, except DCSG Simulator, describe the application with a number of computational and communicational requirements. In addition, GreenCloud and DCworms allow introducing the QoS requirements by taking into account the time constraints during the simulation. DCSG Simulator instead of modeling of the single application, enables the definition of workload that leads to a given utilization level. However, only DCworms supports application performance modeling by not only incorporating simple requirements that are taken into account during scheduling, but also by allowing specification of task execution time. 201 202 GreenCloud, CloudSim and DCworms are released as Open Source under the GPL. DCSG Simulator is available under an OSL V3.0 open-source license, however, it can be only accessed by the DCSG members. 203 204 Summarizing, DCworms stands out from other tools due to the flexibility in terms of data center equipment and structure definition. 205 Moreover, it allows to associate the energy consumption not only with the current power state and resource utilization but also with the particular set of applications running on it. Moreover, it does not limit the user in defining various types of resource management polices. The main strength of CloudSim lies in implementation of the complex scheduling and task execution schemes involving resource virtualization techniques. However, the energy efficiency aspect is limited only to the VM management. The GreenCloud focuses on data center resources with particular attention to the network infrastructure and the most popular energy management approaches. DCSG simulator allows taking into account also non-computing devices, nevertheless it seems to be hardly customizable to specific workloads and management policies. 206 207 \section{DCworms} 208 208 209 209 The following picture (Figure~\ref{fig:arch}) presents the overall architecture of the simulation tool. 210 210 211 Data Center workload and resource management simulator (DC WoRMS) is a simulation tool based on the GSSIM framework \cite{GSSIM} developed by Poznan Supercomputing and Networking Center (PSNC).212 GSSIM has been proposed to provide an automated tool for experimental studies of various resource management and scheduling strategies in distributed computing systems. DC WoRMSextends its basic functionality and adds some additional features related to the energy efficiency issues in data centers. In this section we will introduce the functionality of the simulator, in terms of modeling and simulation of large scale distributed systems like Grids and Clouds.211 Data Center workload and resource management simulator (DCworms) is a simulation tool based on the GSSIM framework \cite{GSSIM} developed by Poznan Supercomputing and Networking Center (PSNC). 212 GSSIM has been proposed to provide an automated tool for experimental studies of various resource management and scheduling strategies in distributed computing systems. DCworms extends its basic functionality and adds some additional features related to the energy efficiency issues in data centers. In this section we will introduce the functionality of the simulator, in terms of modeling and simulation of large scale distributed systems like Grids and Clouds. 213 213 214 214 … … 219 219 \centering 220 220 \includegraphics[width = 12cm]{fig/arch.png} 221 \caption{\label{fig:arch} DC WoRMSarchitecture}221 \caption{\label{fig:arch} DCworms architecture} 222 222 \end{figure} 223 223 224 DC WoRMS is an event-driven simulation tool written in Java. In general, input data for the DCWoRMS consist of workload and resources descriptions. They can be provided by the user, read from real traces or generated using the generator module. However, the key elements of the presented architecture are plugins. They allow the researchers to configure and adapt the simulation environment to the peculiarities of their studies, starting from modeling job performance, through energy estimations up to implementation of resource management and scheduling policies. Each plugin can be implemented independently and plugged into a specific experiment. Results of experiments are collected, aggregated, and visualized using the statistics module. Due to a modular and plug-able architecture DCWoRMScan be applied to specific resource management problems and address different usersâ requirements.224 DCworms is an event-driven simulation tool written in Java. In general, input data for the DCworms consist of workload and resources descriptions. They can be provided by the user, read from real traces or generated using the generator module. In this terms DCworms benefits from the GSSIM workload generator tool that allows creating synthetic workloads (\cite{GSSIM}). However, the key elements of the presented architecture are plugins. They allow the researchers to configure and adapt the simulation environment to the peculiarities of their studies, starting from modeling job performance, through energy estimations up to implementation of resource management and scheduling policies. Each plugin can be implemented independently and plugged into a specific experiment. Results of experiments are collected, aggregated, and visualized using the statistics module. Due to a modular and plug-able architecture DCworms can be applied to specific resource management problems and address different usersâ requirements. 225 225 226 226 227 227 \subsection{Workload modeling} 228 228 229 As it was said, experiments performed in DC WoRMS require a description of applications that will be scheduled during the simulation. As a primary definition, DCWoRMS uses files in the Standard Workload Format (SWF) or its extension the Grid Workload Format (GWF) \cite{GWF}. In addition to the SWF file, some more detailed specification of a job and tasks can be included in an auxiliary XML file. This form of description extends the basic one and provides the scheduler with more detailed information about application profile, task requirements, user preferences and execution time constraints, which are unavailable in SWF/GWF files. To facilitate the process of adapting the traces from real resource management systems, DCWoRMSsupports reading those delivered from the most common ones like SLURM \cite{SLURM} and Torque \cite{TORQUE}.230 Since the applications may vary depending on their nature in terms of their requirements and structure, DC WoRMS provides user flexibility in defining the application model. Thus, considered workloads may have various shapes and levels of complexity that range from multiple independent jobs, through large-scale parallel applications, up to whole workflows containing time dependencies and preceding constraints between jobs and tasks. Each job may consist of one or more tasks and these can be seen as groups of processes. Moreover, DCWoRMSis able to handle rigid and moldable jobs, as well as pre-emptive ones. To model the application profile in more detail,231 DC WoRMSfollows the DNA approach proposed in \cite{Ghislain}. Accordingly, each task can be presented as a sequence of phases, which shows the impact of this task on the resources that run it. Phases are then periods of time where the system is stable (load, network, memory) given a certain threshold. Each phase is linked to values of the system that represent a resource consumption profile. Such a stage could be for example described as follows: â60\% CPU, 30\% net, 10\% mem.â229 As it was said, experiments performed in DCworms require a description of applications that will be scheduled during the simulation. As a primary definition, DCworms uses files in the Standard Workload Format (SWF) or its extension the Grid Workload Format (GWF) \cite{GWF}. In addition to the SWF file, some more detailed specification of a job and tasks can be included in an auxiliary XML file. This form of description extends the basic one and provides the scheduler with more detailed information about application profile, task requirements, user preferences and execution time constraints, which are unavailable in SWF/GWF files. To facilitate the process of adapting the traces from real resource management systems, DCworms supports reading those delivered from the most common ones like SLURM \cite{SLURM} and Torque \cite{TORQUE}. 230 Since the applications may vary depending on their nature in terms of their requirements and structure, DCworms provides user flexibility in defining the application model. Thus, considered workloads may have various shapes and levels of complexity that range from multiple independent jobs, through large-scale parallel applications, up to whole workflows containing time dependencies and preceding constraints between jobs and tasks. Each job may consist of one or more tasks and these can be seen as groups of processes. Moreover, DCworms is able to handle rigid and moldable jobs, as well as pre-emptive ones. To model the application profile in more detail, 231 DCworms follows the DNA approach proposed in \cite{Ghislain}. Accordingly, each task can be presented as a sequence of phases, which shows the impact of this task on the resources that run it. Phases are then periods of time where the system is stable (load, network, memory) given a certain threshold. Each phase is linked to values of the system that represent a resource consumption profile. Such a stage could be for example described as follows: â60\% CPU, 30\% net, 10\% mem.â 232 232 Levels of information about incoming jobs are presented in Figure~\ref{fig:jobsStructure}. 233 233 … … 241 241 242 242 This form of representation allows users to define a wide range of workloads: 243 HPC (long jobs, computational-intensive, hard to migrate) or virtualization (short requests) typical for cloud computing environments. 244 Further, the DCWoRMS benefits from the GSSIM workload generator tool that allows creating synthetic workloads. 243 HPC (long jobs, computational-intensive, hard to migrate) or virtualization (short requests) that are also typical for data center environments. 245 244 246 245 247 246 \subsection{Resource modeling} 248 247 249 The main goal of DC WoRMS is to enable researchers evaluation of various resource management policies in diverse computing environments. To this end, it supports flexible definition of simulated resources both on physical (computing resources) as well as on logical (scheduling entities) level. This flexible approach allows modeling of various computing entities consisting of compute nodes, processors and cores. In addition, detailed location of the given resources can be provided in order to group them and arrange into physical structures such as racks and containers. Each of the components may be described by different parameters specifying available memory, storage capabilities, processor speed etc. In this way, it is possible to describe power distribution system and cooling devices. Due to an extensible description, users are able to define a number of experiment-specific and visionary characteristics. Moreover, with every component, dedicated profiles can be associated that determines, among others, power, thermal and air throughput properties. The energy estimation plugin can be bundled with each resource. This allows defining various power models that can be then followed by different computing system components. Details concerning the approach to energy-efficiency modeling in DCWoRMScan be found in the next sections.250 251 Scheduling entities allow providing data related to the brokering or queuing system characteristics. Thus, information about available queues, resources associated with them and their parameters like priority, availability of advance reservation (AR) mechanism etc. can be defined. Moreover, allocation policy and task scheduling strategy for each scheduling entity can be introduced in form of the reference to an appropriate plugin. DC WoRMSallows building a hierarchy of schedulers corresponding to the hierarchy of resource components over which the task may be distributed.252 253 In this way, the DC WoRMS supports simulation of a wide scope of physical and logical architectural patterns that may span from a single computing resource up to whole data centers or geographically distributed grids and clouds. In particular, it supports simulating complex distributed architectures containing models of the whole data centers, containers, racks, nodes, etc. In addition, new resources and distributed computing entities can easily be added to the DCWoRMSenvironment in order to enhance the functionality of the tool and address more sophisticated requirements. Granularity of such topologies may also differ from coarse-grained to very fine-grained modeling single cores, memory hierarchies and other hardware details.254 255 256 \subsection{Energy management concept in DC WoRMS}257 258 The DC WoRMS allows researchers to take into account energy efficiency and thermal issues in distributed computing experiments. That can be achieved by the means of appropriate models and profiles. In general, the main goal of the models is to emulate the behavior of the real computing resources, while profiles support models by providing data essential for the power consumptioncalculations. Introducing particular models into the simulation environment is possible through choosing or implementation of dedicated energy plugins that contain methods to calculate power usage of resources, their temperature and system air throughput values. Presence of detailed resource usage information, current resource energy and thermal state description and a functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption and thermal metrics become in this context an additional criterion in the resource management process. Scheduling plugins are provided with dedicated interfaces, which allow them to collect detailed information about computing resource components and to affect their behavior.259 The following subsection s present the general idea behind the energy-efficiency simulations.248 The main goal of DCworms is to enable researchers evaluation of various resource management policies in diverse computing environments. To this end, it supports flexible definition of simulated resources both on physical (computing resources) as well as on logical (scheduling entities) level. This flexible approach allows modeling of various computing entities consisting of compute nodes, processors and cores. In addition, detailed location of the given resources can be provided in order to group them and arrange into physical structures such as racks and containers. Each of the components may be described by different parameters specifying available memory, storage capabilities, processor speed etc. In this way, it is possible to describe power distribution system and cooling devices. Due to an extensible description, users are able to define a number of experiment-specific and visionary characteristics. Moreover, with every component, dedicated profiles can be associated that determines, among others, power, thermal and air throughput properties. The energy estimation plugin can be bundled with each resource. This allows defining various power models that can be then followed by different computing system components. Details concerning the approach to energy-efficiency modeling in DCworms can be found in the next sections. 249 250 Scheduling entities allow providing data related to the brokering or queuing system characteristics. Thus, information about available queues, resources associated with them and their parameters like priority, availability of advance reservation (AR) mechanism etc. can be defined. Moreover, allocation policy and task scheduling strategy for each scheduling entity can be introduced in form of the reference to an appropriate plugin. DCworms allows building a hierarchy of schedulers corresponding to the hierarchy of resource components over which the task may be distributed. 251 252 In this way, the DCworms supports simulation of a wide scope of physical and logical architectural patterns that may span from a single computing resource up to whole data centers (even geographically distributed). In particular, it supports simulating complex distributed architectures containing models of the whole data centers, containers, racks, nodes, etc. In addition, new resources and distributed computing entities can easily be added to the DCworms environment in order to enhance the functionality of the tool and address more sophisticated requirements. Granularity of such topologies may also differ from coarse-grained to very fine-grained modeling single cores, memory hierarchies and other hardware details. 253 254 255 \subsection{Energy management concept in DCworms} 256 257 The DCworms allows researchers to take into account energy efficiency and thermal issues in distributed computing experiments. That can be achieved by the means of appropriate models and profiles. In general, the main goal of the models is to emulate the behavior of the real computing resources, while profiles support models by providing data essential for the energy usage calculations. Introducing particular models into the simulation environment is possible through choosing or implementation of dedicated energy plugins that contain methods to calculate power usage of resources, their temperature and system air throughput values. Presence of detailed resource usage information, current resource energy and thermal state description and a functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption and thermal metrics become in this context an additional criterion in the resource management process. Scheduling plugins are provided with dedicated interfaces, which allow them to collect detailed information about computing resource components and to affect their behavior. 258 The following subsection presents the general idea behind the power management concept in DCworms. Detailed description of the approach to thermal and air throughput simulations can be found in \cite{d2.2}. 260 259 261 260 262 261 \subsubsection{Power management} 263 262 264 The motivation behind introducing a power management concept in DC WoRMSis providing researchers with the means to define the energy efficiency of resources, dependency of energy consumption on resource load and specific applications, and to manage power modes of resources. Proposed solution extends the power management concept presented in GSSIM \cite{GSSIM_Energy} by offering a much more granular approach with the possibility of plugging energy consumption models and power profiles into each resource level.263 The motivation behind introducing a power management concept in DCworms is providing researchers with the means to define the energy efficiency of resources, dependency of energy consumption on resource load and specific applications, and to manage power modes of resources. Proposed solution extends the power management concept presented in GSSIM \cite{GSSIM_Energy} by offering a much more granular approach with the possibility of plugging energy consumption models and power profiles into each resource level. 265 264 266 265 \paragraph{\textbf{Power profile}} … … 268 267 269 268 \paragraph{\textbf{Power consumption model}} 270 The main aim of these models is to emulate the behavior of the real computing resource and the way it consumes energy. Due to a rich functionality and flexible environment description, DCWoRMS can be used to verify a number of theoretical assumptions and to develop new energy consumption models. Modeling of energyconsumption is realized by the energy estimation plugin that calculates energy usage based on information about the resource power profile, resource utilization, and the application profile including energy consumption and heat production metrics. Relation between model and power profile is illustrated in Figure~\ref{fig:powerModel}.269 The main aim of these models is to emulate the behavior of the real computing resource and the way it consumes power. Due to a rich functionality and flexible environment description, DCworms can be used to verify a number of theoretical assumptions and to develop new power consumption models. Modeling of power consumption is realized by the energy estimation plugin that calculates energy usage based on information about the resource power profile, resource utilization, and the application profile including energy consumption and heat production metrics. Relation between model and power profile is illustrated in Figure~\ref{fig:powerModel}. 271 270 272 271 \begin{figure}[tbp] … … 277 276 278 277 \paragraph{\textbf{Power management interface}} 279 DC WoRMSis complemented with an interface that allows scheduling plugins to collect detailed power information about computing resource components and to change their power states. It enables performing various operations on the given resources, including dynamically changing the frequency level of a single processor, turning off/on computing resources etc. The activities performed with this interface find a reflection in total amount of energy consumed by the resource during simulation.278 DCworms is complemented with an interface that allows scheduling plugins to collect detailed power information about computing resource components and to change their power states. It enables performing various operations on the given resources, including dynamically changing the frequency level of a single processor, turning off/on computing resources etc. The activities performed with this interface find a reflection in total amount of energy consumed by the resource during simulation. 280 279 281 280 Presence of detailed resource usage information, current resource energy state description and functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption becomes in this context an additional criterion in the scheduling process, which uses various techniques to decrease energy consumption, e.g. workload consolidation, moving tasks between resources to reduce a number of running resources, dynamic power management, cutting down CPU frequency, and others. … … 299 298 300 299 %\paragraph{\textbf{Air throughput management interface}} 301 %The DC WoRMSdelivers interfaces that provide access to the air throughput profile data, allows acquiring detailed information concerning current air flow conditions and changes in air flow states. The availability of these interfaces support evaluation of different cooling strategies.300 %The DCworms delivers interfaces that provide access to the air throughput profile data, allows acquiring detailed information concerning current air flow conditions and changes in air flow states. The availability of these interfaces support evaluation of different cooling strategies. 302 301 303 302 … … 305 304 %\subsubsection{Thermal management concept} 306 305 307 %The primary motivation behind the incorporation of thermal aspects in the DC WoRMSis to exceed the commonly adopted energy use-cases and apply more sophisticated scenarios. By the means of dedicated profiles and interfaces, it is possible to perform experimental studies involving temperature-aware workload placement.306 %The primary motivation behind the incorporation of thermal aspects in the DCworms is to exceed the commonly adopted energy use-cases and apply more sophisticated scenarios. By the means of dedicated profiles and interfaces, it is possible to perform experimental studies involving temperature-aware workload placement. 308 307 309 308 %\paragraph{\textbf{Thermal profile}} … … 327 326 \subsection{Application performance modeling}\label{sec:apps} 328 327 329 In general, DC WoRMSimplements user application models as objects describing computational, communicational as well as energy requirements and profiles of the task to be scheduled. Additionally, simulator provides means to include complex and specific application performance models during simulations. They allow researchers to introduce specific ways of calculating task execution time. These models can be plugged into the simulation environment through a dedicated API and implementation of an appropriate plugin. To specify the execution time of a task user can apply a number of parameters, including:328 In general, DCworms implements user application models as objects describing computational, communicational as well as energy requirements and profiles of the task to be scheduled. Additionally, simulator provides means to include complex and specific application performance models during simulations. They allow researchers to introduce specific ways of calculating task execution time. These models can be plugged into the simulation environment through a dedicated API and implementation of an appropriate plugin. To specify the execution time of a task user can apply a number of parameters, including: 330 329 \begin{itemize} 331 330 \item task length (number of CPU instructions) … … 340 339 341 340 342 \section{Modeling of energy consumption in DC WoRMS}343 344 DC WoRMS is an open framework in which various models and algorithms can be investigated as presented in Section \ref{sec:apps}. We discuss possible approaches to modeling that can be applied to simulation of energy-efficiency of distributed computing systems in this section. Additionally, to facilitate the simulation process, DCWoRMS provides some basic implementation of power consumption, air throughput and thermal models. We described them as examples and validate part of them by experiments in real computing system (in Section \ref{sec:experiments}).341 \section{Modeling of energy consumption in DCworms} 342 343 DCworms is an open framework in which various models and algorithms can be investigated as presented in Section \ref{sec:apps}. In this section, we discuss possible approaches to modeling that can be applied to simulation of energy-efficiency of distributed computing systems. In general, to facilitate the simulation process, DCworms provides some basic implementation of power consumption, air throughput and thermal models. We introduce power consumption models as examples and validate part of them by experiments in real computing system (in Section \ref{sec:experiments}). Description of thermal models and corresponding experiments was presented in \cite{e2dc13}. 345 344 346 345 The most common questions explored by researchers who study energy-efficiency of distributed computing systems is how much energy $E$ do these systems require to execute workloads. In order to obtain this value the simulator must calculate values of power $P_i(t)$ and load $L_i(t)$ in time for all $m$ computing nodes, $i=1..m$. Load function may depend on specific load models applied. In more complex cases it can even be defined as vectors of different resource usage in time. In a simple case load can be either idle or busy but even in this case estimation of job processing times $p_j$ is needed to calculate total energy consumption. The total energy consumption of computing nodes is given by (\ref{eq:E}): 347 346 348 347 \begin{equation} 349 E=\sum_i^m{\int_t{P_i(t)} } \label{eq:E}348 E=\sum_i^m{\int_t{P_i(t)} dt} \label{eq:E} 350 349 \end{equation} 351 350 … … 353 352 Power function may depend on load and states of resources or even specific applications as explained in Section~\ref{sec:power}. Total energy can be also completed by adding constant power usage of components that does not depend on load or state of resources. 354 353 355 In large computing systems which are often characterized by high computational density, total energy consumption of computing nodes is not the only result interesting for researchers. Temperature distribution is getting more and more important as it affects energy consumption of cooling devices, which can reach even half of a total data center energy use. In order to obtain accurate values of temperatures heat transfer simulations based on the Computational Fluid Dynamics (CFD) methods have to be performed. These methods require as an input (i.e. boundary conditions) a heat dissipated by IT hardware and air throughput generated by fans at servers' outlets. Another approach is based on simplified thermal models that without costly CFD calculations provide rough estimations of temperatures. DC WoRMS enables the use of either approaches. In the former, the output of simulations including power usage of computing nodes in time and air throughput at node outlets can be passed to CFD solver.354 In large computing systems which are often characterized by high computational density, total energy consumption of computing nodes is not the only result interesting for researchers. Temperature distribution is getting more and more important as it affects energy consumption of cooling devices, which can reach even half of a total data center energy use. In order to obtain accurate values of temperatures heat transfer simulations based on the Computational Fluid Dynamics (CFD) methods have to be performed. These methods require as an input (i.e. boundary conditions) a heat dissipated by IT hardware and air throughput generated by fans at servers' outlets. Another approach is based on simplified thermal models that without costly CFD calculations provide rough estimations of temperatures. DCworms enables the use of both approaches. In the former, the output of simulations including power usage of computing nodes in time and air throughput at node outlets can be passed to CFD solver. Details addressing this integration issues are introduced in \cite{d2.2}. 356 355 %This option is further elaborated in Section \ref{sec:coolemall}. Simplified thermal models required by the latter approach are proposed in \ref{sec:thermal}. 357 356 … … 386 385 time. However, experiments performed on several HPC servers show that this dependency does not reflect theoretical shape and is often close to linear as presented in Figure \ref{fig:power_freq}. This phenomenon can be explained by impact of other component than CPU and narrow range of available voltages. A good example of impact by other components is power usage of servers with visible influence of fans as illustrated in Figure \ref{fig:fans_P}. 387 386 388 For these reasons, DC WoRMSallows users to define dependencies between power usage and resource states (such as CPU frequency) in the form of tables or arbitrary functions using energy estimation plugins.387 For these reasons, DCworms allows users to define dependencies between power usage and resource states (such as CPU frequency) in the form of tables or arbitrary functions using energy estimation plugins. 389 388 390 389 The energy consumption models provided by default can be classified into the following groups, starting from the simplest model up to the more complex ones. Users can easily switch between the given models and incorporate new, visionary scenarios. … … 405 404 \end{equation} 406 405 407 Within DC WoRMSwe built in a static approach model that uses common resource states that affect power usage which are the CPU power states. Hence, with each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. We distinguish also an idle state. Therefore, the current power usage of the node can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level. Additionally, node power states are taken into account to reflect no (or limited) power usage when a node is off.406 Within DCworms we built in a static approach model that uses common resource states that affect power usage which are the CPU power states. Hence, with each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. We distinguish also an idle state. Therefore, the current power usage of the node can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level. Additionally, node power states are taken into account to reflect no (or limited) power usage when a node is off. 408 407 409 408 \subsection{Resource load} … … 429 428 Unfortunately, to verify this model and adjust it to the specific hardware, power usage of particular subcomponents such as CPU or memory must be measured. As this is usually difficult, other models, based on a total power use, can be applied. 430 429 431 An example is a model applied in DC WoRMSbased on the real measurements (see Section \ref{sec:models} for more details):430 An example is a model applied in DCworms based on the real measurements (see Section \ref{sec:models} for more details): 432 431 433 432 \begin{equation} … … 435 434 \end{equation} 436 435 437 where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, $L$ is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$ is a current frequency, $f_{base}$ is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application ).436 where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, $L$ is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$ is a current frequency, $f_{base}$ is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application ($P_{app}$ is a constant appointed experimentally for each application in order to extract the part of power consumption independent of the load and specific for particular type of task). 438 437 439 438 … … 448 447 %\subsection{Air throughput models}\label{sec:air} 449 448 450 %The DC WoRMScomes with the following air throughput models.449 %The DCworms comes with the following air throughput models. 451 450 %By default, air throughput estimations are performed according to the first one. 452 451 … … 475 474 \subsection{Testbed description} 476 475 477 To obtain values of power consumption that could be later used in DC WoRMSenvironment to build the model and to evaluate resource management policies we ran a set of applications / benchmarks on the physical testbed. For experimental purposes we choose the Christmann high-density Resource Efficient Cluster Server (RECS) system \cite{e2dc12}. The single RECS unit consists of 18 single CPU modules, each of them can be treated as an individual node of PC class. Configuration of our RECS unit is presented in Table~\ref{testBed}.476 To obtain values of power consumption that could be later used in DCworms environment to build the model and to evaluate resource management policies we ran a set of applications / benchmarks on the physical testbed. For experimental purposes we choose the Christmann high-density Resource Efficient Cluster Server (RECS) system \cite{e2dc12}. The single RECS unit consists of 18 single CPU modules, each of them can be treated as an individual node of PC class. Configuration of our RECS unit is presented in Table~\ref{testBed}. 478 477 479 478 \begin {table}[h!] … … 530 529 531 530 532 Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of value means that the application did not run on the given type of node.531 Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of the corresponding value means that the application did not run on the given type of node. 533 532 \begin {table}[h!] 534 533 \centering … … 587 586 \subsection{Methodology} 588 587 589 Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consists in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values were used to configure the DC WoRMSenvironment providing energy and time execution models.588 Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consists in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values were used to configure the DCworms environment providing energy and time execution models. 590 589 Based on the models obtained for the considered set of resources and applications we evaluated a set of resource management strategies in terms of energy consumption needed to execute four workloads varying in load intensity (10\%, 30\%, 50\%, 70\%). The differences in the load were obtained by applying various intervals (3000, 1200, 720 and 520 seconds, respectively) related to submission times of two successive tasks. In all cases the number of tasks was equal to 1000. Moreover, we differentiated the applications in terms of number of cores allocated by them and their type. 591 To generate a workload we used the DC WoRMSworkload generator tool with the aforementioned characteristics gathered in Table~\ref{workloadCharacteristics}.590 To generate a workload we used the DCworms workload generator tool with the aforementioned characteristics gathered in Table~\ref{workloadCharacteristics}. 592 591 593 592 \begin {table}[ tp] … … 623 622 \end {table} 624 623 625 In all cases we assumed that tasks are scheduled and served in order of their arrival (FIFO strategy) with easy backfilling approach.624 In all cases we assumed that tasks are scheduled and served in order of their arrival (FIFO strategy) using relaxed backfilling (RB) approach, with indefinite delay for the highest priority task. Moreover, all tasks were assigned to nodes with the condition that they can be assigned only to nodes of the type on which the application was able to run (in other words - we had the corresponding value of power consumption and execution time). 626 625 627 626 \subsection{Computational analysis} 628 627 629 In the following section we present the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. The scheduling strategies were evaluated according to two criteria: total energy consumption and maximum completion time of all tasks (makespan). These evaluation criteria employed in our experiments represent interests of various groups of stakeholders present in clouds and grids.630 Then we discuss esthe corresponding results received for workloads with other density level.628 In the following section we present the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. The scheduling strategies were evaluated according to two criteria: total energy consumption and maximum completion time of all tasks (makespan). These evaluation criteria employed in our experiments represent interests of various groups of stakeholders present in data centers. 629 Then we discuss the corresponding results received for workloads with other density level. 631 630 632 631 \subsubsection{Random approach} 633 632 634 The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in a random manner with the condition that they can be assigned only to nodes of the type on which the application was able to run (in other words - we had the corresponding value of power consumption and execution time). The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46.883 kWh, \textbf{workload completion time}: 533 820 s. 635 Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively. 633 The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in a random manner. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46.883 kWh, \textbf{workload completion time}: 533 820 s. 634 %Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively. 635 636 %\begin{figure}[h!] 637 %\centering 638 %\includegraphics[width = 12cm]{fig/70r.png} 639 %\caption{\label{fig:70r} Random strategy} 640 %\end{figure} 641 642 In the second version of this strategy, which is getting more popular due to energy costs, we switched off unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the primary one in many HPC centers. In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the Figure~\ref{fig:70r_rnpm}, switching off unused nodes led to decrease of the total energy consumption. 643 644 %\begin{figure}[h!] 645 %\centering 646 %\includegraphics[width = 6cm]{fig/70rnpm.png} 647 %\caption{\label{fig:70rnpm} Random + switching off unused nodes strategy} 648 %\end{figure} 636 649 637 650 638 651 \begin{figure}[h!] 639 652 \centering 640 \includegraphics[width = 12cm]{fig/70r .png}641 \caption{\label{fig:70r } Random strategy}653 \includegraphics[width = 12cm]{fig/70r_rnpm.png} 654 \caption{\label{fig:70r_rnpm} Comparison of energy usage for Random (left) and Random + switching off unused nodes strategy (right)} 642 655 \end{figure} 643 656 644 In the second version of this strategy, which is getting more popular due to energy costs, we switched off unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the primary one in many HPC centers. 645 646 \begin{figure}[h!] 647 \centering 648 \includegraphics[width = 6cm]{fig/70rnpm.png} 649 \caption{\label{fig:70rnpm} Random + switching off unused nodes strategy} 650 \end{figure} 651 652 In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the power consumption chart in the Figure~\ref{fig:70rnpm}, switching off unused nodes led to decrease of the total energy consumption. As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36.705 kWh. The overall energy savings reached 22\%. 657 As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36.705 kWh. The overall energy savings reached 22\%. 653 658 654 659 \subsubsection{Energy optimization} 655 660 656 The next two evaluated resource management strategies try to decrease the total energy consumption (EO) caused by the execution of the whole workload. They take into account differences in applications and hardware profiles by trying to find the most energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks ha s to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usagemeasured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}.661 The next two evaluated resource management strategies try to decrease the total energy consumption (EO) caused by the execution of the whole workload. They take into account differences in applications and hardware profiles by trying to find the most energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks have to be assigned to the nodes for which the difference between energy usage for the node running the application and and the node in the idle state is minimal. The power consumption measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}. 657 662 658 663 \begin {table}[h!] … … 660 665 \begin{tabular}{ll} 661 666 \hline 662 Type & Power usage in idle state [W] \\667 Type of processor within the node & Power usage in idle state [W] \\ 663 668 \hline 664 669 Intel i7 & 11.5 \\ … … 670 675 \end {table} 671 676 672 As mentioned, we assign tasks to nodes minimizing the value of expression: $(P-P_{idle})*exec\_time$, where $P$ denotes observed power of the node running the particular application and $exec\_time$ refers to the measured application running time. Based on the application and hardware profiles, we expected that Atom D510 would be the preferred node. Obtained schedule, that is presented in the Gantt chart in Figure~\ref{fig:70eo} along with the energy and system usage,confirmed our assumptions. Atom D510 nodes are nearly fully loaded, while the least energy-favourable AMD nodes are used only when other ones are busy.677 As mentioned, we assign tasks to nodes minimizing the value of expression: $(P-P_{idle})*exec\_time$, where $P$ denotes observed power of the node running the particular application and $exec\_time$ refers to the measured application running time. Based on the application and hardware profiles, we expected that Atom D510 would be the preferred node. Obtained schedule, that is presented in the Gantt chart in Figure~\ref{fig:70eo} confirmed our assumptions. Atom D510 nodes are nearly fully loaded, while the least energy-favourable AMD nodes are used only when other ones are busy. 673 678 674 679 \begin{figure}[h!] 675 680 \centering 676 \includegraphics[width = 1 2cm]{fig/70eo.png}681 \includegraphics[width = 10cm]{fig/70eoGantt.png} 677 682 \caption{\label{fig:70eo} Energy usage optimization strategy} 678 683 \end{figure} … … 686 691 \begin{figure}[h!] 687 692 \centering 688 \includegraphics[width = 1 2cm]{fig/70eonpm.png}693 \includegraphics[width = 10cm]{fig/70eonpmGantt.png} 689 694 \caption{\label{fig:70eonpm} Energy usage optimization + switching off unused nodes strategy} 690 695 \end{figure} … … 694 699 \subsubsection{Downgrading frequency} 695 700 696 The last case considered by us is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77.109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers.701 The last case considered by us is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration, is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77.109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers. 697 702 698 703 \begin{figure}[h!] … … 749 754 \end {table} 750 755 751 Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, refers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion , can not be compensate by the very little energy savings derived from the lower operating state of node. The greater criteria values for the higher system load are the result of greater time space between submission of successive tasks, and thus longer workload execution.752 We also demonstrated differences between power usage models. They span from rough static approach to accurate application specific models. However, the latter can be difficult or even infeasible to use as it requires real measurements for specific applications beforehand. This issue can be partially resolved by introducing application profiles and classification, which can deteriorate the accuracy though. This issue is begin studied more deeply within CoolEmAll project.753 754 755 756 %\section{DC WoRMSapplication/use cases}\label{sec:coolemall}757 758 %DC WoRMSin CoolEmAll, integration with CFD756 Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, refers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion can not be compensate by the very little energy savings derived from the lower operating state of node. The greater criteria values for the higher system load are the result of greater time space between submission of successive tasks, and thus longer workload execution. Based on Table~\ref{loadMakespan}, one should note that differences in workload completion times are relatively small for all evaluated policies, except Random + lowest frequency approach. 757 We also demonstrated differences between power usage models. They span from rough static approach to accurate application specific models. However, the latter can be difficult or even infeasible to use, as it requires real measurements for specific applications beforehand. This issue can be partially resolved by introducing application profiles and classification, which can deteriorate the accuracy though. This issue is begin studied more deeply within CoolEmAll project. 758 759 760 761 %\section{DCworms application/use cases}\label{sec:coolemall} 762 763 %DCworms in CoolEmAll, integration with CFD 759 764 760 765 %... 761 766 762 %Being based on the GSSIM framework, that has been successfully applied in a substantial number of research projects and academic studies, DC WoRMSwith its sophisticated energy extension has become an essential tool for studies of energy efficiency in distributed environments. For this reason, it has been adopted within the CoolEmAll project as a component of Simulation, Visualisation and Decision Support (SVD) Toolkit. In general the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints should help to minimise the energy consumption, and consequently the CO2 emissions of the whole IT infrastructure with related facilities. The SVD Toolkit is designed to support the analysis and optimization of IT modern infrastructures. For the recent years the special attention has been paid for energy utilized by the data centers which considerable contributes to the data center operational costs. Actual power usage and effectiveness of energy saving methods heavily depends on available resources, types of applications and workload properties. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures.767 %Being based on the GSSIM framework, that has been successfully applied in a substantial number of research projects and academic studies, DCworms with its sophisticated energy extension has become an essential tool for studies of energy efficiency in distributed environments. For this reason, it has been adopted within the CoolEmAll project as a component of Simulation, Visualisation and Decision Support (SVD) Toolkit. In general the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints should help to minimise the energy consumption, and consequently the CO2 emissions of the whole IT infrastructure with related facilities. The SVD Toolkit is designed to support the analysis and optimization of IT modern infrastructures. For the recent years the special attention has been paid for energy utilized by the data centers which considerable contributes to the data center operational costs. Actual power usage and effectiveness of energy saving methods heavily depends on available resources, types of applications and workload properties. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures. 763 768 %Hence, SVD Toolkit integrates also workload management and scheduling policies to support complex modeling and optimization of modern data centres. 764 769 765 %The main aim of DC WoRMSwithin CoolEmAll project is to enable studies of dynamic states of IT infrastructures, like power consumption and air throughput distribution, on the basis of changing workloads, resource model and energy-aware resource management policies.766 %In this context, DC WoRMStakes into account the specific workload and application characteristics as well as detailed resource parameters. It will benefit from the CoolEmAll benchmarks and classification of applications and workloads. In particular various types of workload, including data centre workloads using virtualization and HPC applications, may be considered. The knowledge concerning their performance and properties as well as information about their energy consumption and heat production will be used in simulations to study their impact on thermal issues and energy efficiency. Detailed resource characteristics, will be also provided according to the CoolEmAll blueprints. Based on this data, workload simulation will support evaluation process of various resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. In addition to typical approaches minimizing energy consumption, policies that prevent too high temperatures in the presence of limited cooling (or no cooling) may also be analyzed. Moreover, apart from the set of predefined strategies, new approaches can easily be applied and examined.770 %The main aim of DCworms within CoolEmAll project is to enable studies of dynamic states of IT infrastructures, like power consumption and air throughput distribution, on the basis of changing workloads, resource model and energy-aware resource management policies. 771 %In this context, DCworms takes into account the specific workload and application characteristics as well as detailed resource parameters. It will benefit from the CoolEmAll benchmarks and classification of applications and workloads. In particular various types of workload, including data centre workloads using virtualization and HPC applications, may be considered. The knowledge concerning their performance and properties as well as information about their energy consumption and heat production will be used in simulations to study their impact on thermal issues and energy efficiency. Detailed resource characteristics, will be also provided according to the CoolEmAll blueprints. Based on this data, workload simulation will support evaluation process of various resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. In addition to typical approaches minimizing energy consumption, policies that prevent too high temperatures in the presence of limited cooling (or no cooling) may also be analyzed. Moreover, apart from the set of predefined strategies, new approaches can easily be applied and examined. 767 772 %The outcome of the workload and resource management simulation phase is a distribution of power usage and air throughput for the computing models specified within the SVD Toolkit. These statistics may be analyzed directly by data centre designers and administrators and/or provided as an input to the CFD simulation phase. The former case allows studying how the above metrics change over time, while the latter harness CFD simulations to identify temperature differences between the computing modules, called hot spots. The goal of this scenario is to visualise the behavior of the temperature distribution within a server room with a number of racks for different types of executed workloads and for various policies used to manage these workloads. 768 773 … … 770 775 \section{Conclusions and future work} 771 776 772 In this paper we presented a Data Center Workload and Resource Management Simulator (DC WoRMS) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. DCWoRMSprovides broad options of customization and combines detailed applications and workloads modeling with simulation of data center resources including their power usage and thermal behavior.777 In this paper we presented a Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. DCworms provides broad options of customization and combines detailed applications and workloads modeling with simulation of data center resources including their power usage and thermal behavior. 773 778 We shown its energy-efficiency related features and proposed three methods of power usage modeling: static (fully defined by resource state), dynamic (defined by a function of parameters such as CPU frequency and load), and mapping (based on power usage of specific applications). We compared results of simulations to measurements of real servers and shown differences in accuracy and usability of these models. 774 We also demonstrated DC WoRMScapabilities to implement various resource management policies including workload scheduling and node power management. The experimental studies we conducted shown that their impact on overall energy-efficiency depends on a type of servers, their power usage in idle time, possibility of switching off nodes as well as level of load.775 DC WoRMS is a part of the Simulation, Visualisation and Decision Support (SVD) Toolkit being developed within the CoolEmAll project. The aim of this toolkit is, based on data center building blocks defined by the project, to analyze energy-efficiency of data centers taking into account various aspects such as heterogenous hardware architectures, applications, management policies, and cooling. DCWoRMS will take as an input open models data center building blocks and application profiles. DCWoRMS will be applied to evaluation of resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. Output of simulations will include distribution of servers' power usage in time along with estimations of server outlets air flow. These data will be passed to Computational Fluid Dynamics (CFD) simulations using OpenFOAM solver and to advanced 3D visualization. In this way users will be able to study energy-efficiency of a data center from a detailed analysis of workloads and policies to the impact on heat transfer and overall energy consumption.776 Thus, future work on DC WoRMSwill focus on more precise power, air-throughput, and thermal models. Additional research directions will include modeling application execution phases, adding predefined common HPC and cloud management policies and application performance and resource power models.779 We also demonstrated DCworms capabilities to implement various resource management policies including workload scheduling and node power management. The experimental studies we conducted shown that their impact on overall energy-efficiency depends on a type of servers, their power usage in idle time, possibility of switching off nodes as well as level of load. 780 DCworms is a part of the Simulation, Visualisation and Decision Support (SVD) Toolkit being developed within the CoolEmAll project. The aim of this toolkit is, based on data center building blocks defined by the project, to analyze energy-efficiency of data centers taking into account various aspects such as heterogenous hardware architectures, applications, management policies, and cooling. DCworms will take as an input the open models of the data center building blocks and application profiles. DCworms will be applied to evaluation of resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation, dynamic switching off nodes, DVFS and thermal-aware methods. Output of simulations will include distribution of servers' power usage in time along with estimations of server outlets air flow. These data will be passed to Computational Fluid Dynamics (CFD) simulations using OpenFOAM solver and to advanced 3D visualization. In this way users will be able to study energy-efficiency of a data center from a detailed analysis of workloads and policies to the impact on heat transfer and overall energy consumption. 781 Thus, future work on DCworms will focus on more precise power, air-throughput, and thermal models. Additional research directions will include modeling application execution phases, adding predefined common HPC and cloud management policies and application performance and resource power models. 777 782 778 783 \section*{Acknowledgement} … … 818 823 \bibitem{fit4green} A. Berl, E. Gelenbe, M. di Girolamo, G. Giuliani, H. de Meer, M.-Q. Dang, K. Pentikousis. Energy-Efficient Cloud Computing. The Computer Journal, 53(7), 2010. 819 824 825 \bibitem{e2dc13} M. vor dem Berge, G. Da Costa, M. Jarus, A. Oleksiak, W. Piatek, E. Volk. Modeling Data Center Building Blocks for Energy-efficiency and Thermal Simulations. 2nd International Workshop on Energy-Efficient Data Centres, Berkeley, 2013. 826 820 827 \bibitem{e2dc12} M. vor dem Berge, G. Da Costa, A. Kopecki, A. Oleksiak, J-M. Pierson, T. Piontek, E. Volk, S. Wesner. Modeling and Simulation of Data Center Energy-Efficiency in CoolEmAll. Energy Efficient Data Centers, Lecture Notes in Computer Science Volume 7396, 2012, pp 25-36. 821 828 822 829 \bibitem{CloudSim} R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya. CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience (SPE), Volume 41, Number 1, Pages: 23-50, ISSN: 0038-0644, Wiley Press, New York, USA, January, 2011. 823 830 831 Modeling Data Center Building Blocks for Energy-efficiency and Thermal Simulations. 832 Micha Vor Dem Berge, Georges Da Costa, Mateusz Jarus, Ariel Oleksiak, Wojciech PiÄ 833 tek and Eugen Volk 824 834 \bibitem{DCSG} http://dcsg.bcs.org/welcome-dcsg-simulator 825 835 … … 846 856 \bibitem{fit4green_scheduler} O. M{\"a}mmel{\"a}, M. Majanen, R. Basmadjian, H. De Meer, A. Giesler, W. Homberg, Energy-aware job scheduler for high-performance computing, Computer Science - Research and Development, November 2012, Volume 27, Issue 4, pp 265-275. 847 857 858 \bibitem {d2.2} U. Woessner, E. Volk, G. Gallizo, M. vor dem Berge, G. Da Costa, P. Domagalski, W. Piatek, J-M. Pierson. (2012) D2.2 Design of the CoolEmAll simulation and visualisation environment - CoolEmAll Deliverable, http://coolemall.eu 859 848 860 % web links 849 861
Note: See TracChangeset
for help on using the changeset viewer.