Context Navigation

← Previous Change
Next Change →

elsarticle-DCWoRMS.tex

Timestamp:

05/31/13 14:37:45 (12 years ago)

Author:

wojtekp

Message:

File:

: 1 edited

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex (modified) (31 diffs)

Legend:

: Unmodified
: Added
: Removed

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

r1062	r1065
104	104	%% \fntext[label3]{}
105	105
106		\title{DC~~WoRMS~~ - a tool for simulation of energy efficiency in distributed computing infrastructures}
	106	\title{DCworms - a tool for simulation of energy efficiency in distributed computing infrastructures}
107	107
108	108	%% use optional labels to link authors explicitly to addresses:
…	…
145	145	%% Text of abstract
146	146
147		In the recent years, energy-efficiency of computing infrastructures has gained a great attention. For this reason, proper estimation and evaluation of energy that is required to execute ~~grid and cloud workloads became an important research problem. In this paper we present a Data Center Workload and Resource Management Simulator (DCWoRMS~~) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.
	147	In the recent years, energy-efficiency of computing infrastructures has gained a great attention. For this reason, proper estimation and evaluation of energy that is required to execute data center workloads became an important research problem. In this paper we present a Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.
148	148	We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements of real servers.
149		To demonstrate DC~~WoRMS~~ capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.
	149	To demonstrate DCworms capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.
150	150
151	151	\end{abstract}
…	…
170	170	\section{Introduction}
171	171
172		Rising popularity of large-scale computing infrastructures ~~such as grids and clouds~~ caused quick development of data centers. Nowadays, data centers are responsible for around 2\% of the global energy consumption making it equal to the demand of aviation industry \cite{koomey}. Moreover, in many current data centers the actual IT equipment uses only half of the total energy whereas most of the remaining part is required for cooling and air movement resulting in poor Power Usage Effectiveness (PUE) \cite{pue} values. Large energy needs and significant $CO_2$ emissions caused that issues related to cooling, heat transfer, and IT infrastructure location are more and more carefully studied during planning and operation of data centers.
	172	Rising popularity of large-scale computing infrastructures caused quick development of data centers. Nowadays, data centers are responsible for around 2\% of the global energy consumption making it equal to the demand of aviation industry \cite{koomey}. Moreover, in many current data centers the actual IT equipment uses only half of the total energy whereas most of the remaining part is required for cooling and air movement resulting in poor Power Usage Effectiveness (PUE) \cite{pue} values. Large energy needs and significant $CO_2$ emissions caused that issues related to cooling, heat transfer, and IT infrastructure location are more and more carefully studied during planning and operation of data centers.
173	173	%Even if we take ecological and footprint issues aside, the amount of consumed energy can impose strict limits on data centers. First of all, energy bills may reach millions euros making computations expensive.
174	174	%Furthermore, available power supply is usually limited so it also may reduce data center development capabilities, especially looking at challenges related to exascale computing breakthrough foreseen within this decade.
175	175
176		For these reasons many efforts were undertaken to measure and study energy efficiency of data centers.There are projects focused on data center monitoring and management \cite{games}\cite{fit4green} whereas others on energy efficiency of networks \cite{networks}. Additionally, vendors offer a wide spectrum of energy efficient solutions for computing and cooling \cite{sgi}\cite{colt}\cite{ecocooling}. However, a variety of solutions and configuration options can be applied planning new or upgrading existing data centers.
	176	For these reasons many efforts were undertaken to measure and study energy efficiency of data centers. There are projects focused on data center monitoring and management \cite{games}\cite{fit4green} whereas others on energy efficiency of networks \cite{networks}. Additionally, vendors offer a wide spectrum of energy efficient solutions for computing and cooling \cite{sgi}\cite{colt}\cite{ecocooling}. However, a variety of solutions and configuration options can be applied planning new or upgrading existing data centers.
177	177	In order to optimize a design or configuration of data center we need a thorough study using appropriate metrics and tools evaluating how much computation or data processing can be done within given power and energy budget and how it affects temperatures, heat transfers, and airflows within data center.
178	178	Therefore, there is a need for simulation tools and models that approach the problem from a perspective of end users and take into account all the factors that are critical to understanding and improving the energy efficiency of data centers, in particular, hardware characteristics, applications, management policies, and cooling.
…	…
180	180	There are various tools that allow simulation of computing infrastructures. On one hand they include advanced packages for modeling heat transfer and energy consumption in data centers \cite{ff} or tools concentrating on their financial analysis \cite{DCD_Romonet}. On the other hand, there are simulators focusing on computations such as CloudSim \cite{CloudSim}. The CoolEmAll project aims to integrate these approaches and enable advanced analysis of data center efficiency taking into account all these aspects \cite{e2dc12}\cite{coolemall}.
181	181
182		One of the results of the CoolEmAll project is the Data Center Workload and Resource Management Simulator (DC~~WoRMS~~) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.
	182	One of the results of the CoolEmAll project is the Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies.
183	183	We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements of real servers.
184		To demonstrate DC~~WoRMS~~ capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.
185
186		The remaining part of this paper is organized as follows. In Section~2 we give a brief overview of the current state of the art concerning modeling and simulation of distributed systems, such as Grids and Clouds, in terms of energy efficiency. Section~3 discusses the main features of DCWoRMS. In particular, it introduces our approach to workload and resource management, presents the concept of energy efficiency modeling and explains how to incorporate a specific application performance model into simulations. Section~4 discusses energy models adopted within the DCWoRMS. In Section~5 we assess the energy models by comparison of simulation results with real measurements. We also present experiments that were performed using DCWoRMS to show various types of resource and scheduling technics allowing to decrease the total energy consumption of the execution of a set of tasks. In Section~6 we explain how to integrate workload and resource simulations with heat transfer simulations within the CoolEmAll project. Final conclusions and directions for future work are given in Section~7.
	184	To demonstrate DCworms capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.
	185
	186	The remaining part of this paper is organized as follows. In Section~2 we give a brief overview of the current state of the art concerning modeling and simulation of distributed systems, such as Grids and Clouds, in terms of energy efficiency. Section~3 discusses the main features of DCworms. In particular, it introduces our approach to workload and resource management, presents the concept of energy efficiency modeling and explains how to incorporate a specific application performance model into simulations. Section~4 discusses energy models adopted within the DCworms. In Section~5 we assess the energy models by comparison of simulation results with real measurements. We also present experiments that were performed using DCworms to show various types of resource and scheduling technics allowing decreasing the total energy consumption of the execution of a set of tasks. In Section~6 we explain how to integrate workload and resource simulations with heat transfer simulations within the CoolEmAll project. Final conclusions and directions for future work are given in Section~7.
187	187
188	188	\section{Related Work}\label{sota}
…	…
192	192	GreenCloud is a C++ based simulation environment for studying the energy-efficiency of cloud computing data centers. CloudSim is a simulation tool that allows modeling of cloud computing environments and evaluation of resource provisioning algorithms. Finally, the DCSG Simulator is a data center cost and energy simulator calculating the power and cooling schema of the data center equipment.
193	193
194		The scope of the aforementioned toolkits concerns the data center environments. However, all of them, except DC~~WoRMS~~ presented in this paper, restricts the simulated architecture in terms of types of modeled resources. In this way, they impose the use of predefined sets of resources and relations between them. GreenCloud defines switches, links and servers that are responsible for task execution and may contain different scheduling strategies. Contrary to what the GreenCloud name may suggest, it does not allow testing the impact of virtualization-based approaches. CloudSim allows creating a simple resources hierarchy consisting of machines and processors. To simulate a real cloud computing data center, it provides an extra virtualization layer responsible for the virtual machines (VM) provisioning process and managing the VM life cycle. In DCSG Simulator user is able to take into account a variety of mechanical and electrical devices as well as the IT equipment and define for each of them numerous factors, including device capacity and efficiency as well as the data center conditions.
195
196		The general idea behind all of the analyzed tools is to enable studies concerning energy efficiency in distributed infrastructures. GreenCloud approach enables simulation of energy usage associated with computing servers and network components. For example, the server power consumption model implemented in GreenCloud depends on the server state as well as its utilization. The CloudSim framework provides basic models to evaluate energy-conscious provisioning policies. Each computing node can be extended with a power model that estimates the current power consumption. Within the DCSG Simulator, performance of each data center equipment (facility and IT) is determined by a combination of factors, including workload, local conditions, the manufacturer's specifications and the way in which it is utilized. In DCWoRMS, the plugin idea has been introduced that offers emulating the behavior of computing resources in terms of power consumption. Additionally, it delivers detailed information concerning resource and application characteristics needed to define more sophisticated power draw models.
197
198		In order to emulate the behavior of real computing systems, green computing simulator should address also the energy-aware resource management. In this term, GreenCloud offers capturing the effects of both of the Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management schemes. At the links and switches level, it supports downgrading the transmission rate and putting network equipment into a sleep mode. CloudSim comes with a set of predefined and extensible policies that manage the process of VM migrations in order to optimize the power consumption. However, the proposed approach is not sufficient for modeling more sophisticated policies like frequency scaling techniques and managing resource power states. DCSG Simulator is told to implement a set of basic energy-efficient rules that have been developed on the basis of detailed understanding of the data center as a system. The output of this simulation is a set of energy metrics, like PUE, and cost data representing the IT devices. DC~~WoRMS~~ introduces a dedicated interface that provides methods to obtain the detailed information about each resource and its components energy consumption and allows changing its current energy state. Availability of these interfaces in scheduling plugin supports implementation of various strategies such as centralized energy management, self-management of computing resources and mixed models.
199
200		In terms of application modeling, all tools, except DCSG Simulator, describe the application with a number of computational and communicational requirements. In addition, GreenCloud and DCWoRMS allow introducing the QoS requirements (typical for cloud computing applications) by taking into account the time constraints during the simulation. DCSG Simulator instead of modeling of the single application, enables the definition of workload that leads to a given utilization level. However, only DCWoRMS supports application performance modeling by not only incorporating simple requirements that are taken into account during scheduling, but also by allowing specification of task execution time.
201
202		GreenCloud, CloudSim and DC~~WoRMS~~ are released as Open Source under the GPL. DCSG Simulator is available under an OSL V3.0 open-source license, however, it can be only accessed by the DCSG members.
203
204		Summarizing, DC~~WoRMS~~ stands out from other tools due to the flexibility in terms of data center equipment and structure definition.
205		Moreover, it allows to associate the energy consumption not only with the current power state and resource utilization but also with the particular set of applications running on it. Moreover, it does not limit the user in defining various types of resource management polices. The main strength of CloudSim lies in implementation of the complex scheduling and task execution schemes involving resource virtualization techniques. However, the energy efficiency aspect is limited only to the VM management. The GreenCloud focuses on data center resources with particular attention to the network infrastructure and the most popular energy management approaches. DCSG simulator allows t~~o take~~ into account also non-computing devices, nevertheless it seems to be hardly customizable to specific workloads and management policies.
206
207		\section{DC~~WoRMS~~}
	194	The scope of the aforementioned toolkits concerns the data center environments. However, all of them, except DCworms presented in this paper, restricts the simulated architecture in terms of types of modeled resources. In this way, they impose the use of predefined sets of resources and relations between them. GreenCloud defines switches, links and servers that are responsible for task execution and may contain different scheduling strategies. Contrary to what the GreenCloud name may suggest, it does not allow testing the impact of virtualization-based approaches. CloudSim allows creating a simple resources hierarchy consisting of machines and processors. To simulate a real cloud computing data center, it provides an extra virtualization layer responsible for the virtual machines (VM) provisioning process and managing the VM life cycle. In DCSG Simulator user is able to take into account a variety of mechanical and electrical devices as well as the IT equipment and define for each of them numerous factors, including device capacity and efficiency as well as the data center conditions.
	195
	196	The general idea behind all of the analyzed tools is to enable studies concerning energy efficiency in distributed infrastructures. GreenCloud approach enables simulation of energy usage associated with computing servers and network components. For example, the server power consumption model implemented in GreenCloud depends on the server state as well as its utilization. The CloudSim framework provides basic models to evaluate energy-conscious provisioning policies. Each computing node can be extended with a power model that estimates the current power consumption. Within the DCSG Simulator, performance of each of the data center equipment (facility and IT) is determined by a combination of factors, including workload, local conditions, the manufacturer's specifications and the way in which it is utilized. In DCworms, the plugin idea has been introduced that offers emulating the behavior of computing resources in terms of power consumption. Additionally, it delivers detailed information concerning resource and application characteristics needed to define more sophisticated power draw models.
	197
	198	In order to emulate the behavior of real computing systems, green computing simulator should address also the energy-aware resource management. In this term, GreenCloud offers capturing the effects of both of the Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management schemes. At the links and switches level, it supports downgrading the transmission rate and putting network equipment into a sleep mode. CloudSim comes with a set of predefined and extensible policies that manage the process of VM migrations in order to optimize the power consumption. However, the proposed approach is not sufficient for modeling more sophisticated policies like frequency scaling techniques and managing resource power states. DCSG Simulator is told to implement a set of basic energy-efficient rules that have been developed on the basis of detailed understanding of the data center as a system. The output of this simulation is a set of energy metrics, like PUE, and cost data representing the IT devices. DCworms introduces a dedicated interface that provides methods to obtain the detailed information about each resource and its components energy consumption and allows changing its current energy state. Availability of these interfaces in scheduling plugin supports implementation of various strategies such as centralized energy management, self-management of computing resources and mixed models.
	199
	200	In terms of application modeling, all tools, except DCSG Simulator, describe the application with a number of computational and communicational requirements. In addition, GreenCloud and DCworms allow introducing the QoS requirements by taking into account the time constraints during the simulation. DCSG Simulator instead of modeling of the single application, enables the definition of workload that leads to a given utilization level. However, only DCworms supports application performance modeling by not only incorporating simple requirements that are taken into account during scheduling, but also by allowing specification of task execution time.
	201
	202	GreenCloud, CloudSim and DCworms are released as Open Source under the GPL. DCSG Simulator is available under an OSL V3.0 open-source license, however, it can be only accessed by the DCSG members.
	203
	204	Summarizing, DCworms stands out from other tools due to the flexibility in terms of data center equipment and structure definition.
	205	Moreover, it allows to associate the energy consumption not only with the current power state and resource utilization but also with the particular set of applications running on it. Moreover, it does not limit the user in defining various types of resource management polices. The main strength of CloudSim lies in implementation of the complex scheduling and task execution schemes involving resource virtualization techniques. However, the energy efficiency aspect is limited only to the VM management. The GreenCloud focuses on data center resources with particular attention to the network infrastructure and the most popular energy management approaches. DCSG simulator allows taking into account also non-computing devices, nevertheless it seems to be hardly customizable to specific workloads and management policies.
	206
	207	\section{DCworms}
208	208
209	209	The following picture (Figure~\ref{fig:arch}) presents the overall architecture of the simulation tool.
210	210
211		Data Center workload and resource management simulator (DC~~WoRMS~~) is a simulation tool based on the GSSIM framework \cite{GSSIM} developed by Poznan Supercomputing and Networking Center (PSNC).
212		GSSIM has been proposed to provide an automated tool for experimental studies of various resource management and scheduling strategies in distributed computing systems. DC~~WoRMS~~ extends its basic functionality and adds some additional features related to the energy efficiency issues in data centers. In this section we will introduce the functionality of the simulator, in terms of modeling and simulation of large scale distributed systems like Grids and Clouds.
	211	Data Center workload and resource management simulator (DCworms) is a simulation tool based on the GSSIM framework \cite{GSSIM} developed by Poznan Supercomputing and Networking Center (PSNC).
	212	GSSIM has been proposed to provide an automated tool for experimental studies of various resource management and scheduling strategies in distributed computing systems. DCworms extends its basic functionality and adds some additional features related to the energy efficiency issues in data centers. In this section we will introduce the functionality of the simulator, in terms of modeling and simulation of large scale distributed systems like Grids and Clouds.
213	213
214	214
…	…
219	219	\centering
220	220	\includegraphics[width = 12cm]{fig/arch.png}
221		\caption{\label{fig:arch} DC~~WoRMS~~ architecture}
	221	\caption{\label{fig:arch} DCworms architecture}
222	222	\end{figure}
223	223
224		DCWoRMS is an event-driven simulation tool written in Java. In general, input data for the DCWoRMS consist of workload and resources descriptions. They can be provided by the user, read from real traces or generated using the generator module. However, the key elements of the presented architecture are plugins. They allow the researchers to configure and adapt the simulation environment to the peculiarities of their studies, starting from modeling job performance, through energy estimations up to implementation of resource management and scheduling policies. Each plugin can be implemented independently and plugged into a specific experiment. Results of experiments are collected, aggregated, and visualized using the statistics module. Due to a modular and plug-able architecture DCWoRMS can be applied to specific resource management problems and address different usersâ requirements.
	224	DCworms is an event-driven simulation tool written in Java. In general, input data for the DCworms consist of workload and resources descriptions. They can be provided by the user, read from real traces or generated using the generator module. In this terms DCworms benefits from the GSSIM workload generator tool that allows creating synthetic workloads (\cite{GSSIM}). However, the key elements of the presented architecture are plugins. They allow the researchers to configure and adapt the simulation environment to the peculiarities of their studies, starting from modeling job performance, through energy estimations up to implementation of resource management and scheduling policies. Each plugin can be implemented independently and plugged into a specific experiment. Results of experiments are collected, aggregated, and visualized using the statistics module. Due to a modular and plug-able architecture DCworms can be applied to specific resource management problems and address different usersâ requirements.
225	225
226	226
227	227	\subsection{Workload modeling}
228	228
229		As it was said, experiments performed in DCWoRMS require a description of applications that will be scheduled during the simulation. As a primary definition, DCWoRMS uses files in the Standard Workload Format (SWF) or its extension the Grid Workload Format (GWF) \cite{GWF}. In addition to the SWF file, some more detailed specification of a job and tasks can be included in an auxiliary XML file. This form of description extends the basic one and provides the scheduler with more detailed information about application profile, task requirements, user preferences and execution time constraints, which are unavailable in SWF/GWF files. To facilitate the process of adapting the traces from real resource management systems, DCWoRMS supports reading those delivered from the most common ones like SLURM \cite{SLURM} and Torque \cite{TORQUE}.
230		Since the applications may vary depending on their nature in terms of their requirements and structure, DCWoRMS provides user flexibility in defining the application model. Thus, considered workloads may have various shapes and levels of complexity that range from multiple independent jobs, through large-scale parallel applications, up to whole workflows containing time dependencies and preceding constraints between jobs and tasks. Each job may consist of one or more tasks and these can be seen as groups of processes. Moreover, DCWoRMS is able to handle rigid and moldable jobs, as well as pre-emptive ones. To model the application profile in more detail,
231		DC~~WoRMS~~ follows the DNA approach proposed in \cite{Ghislain}. Accordingly, each task can be presented as a sequence of phases, which shows the impact of this task on the resources that run it. Phases are then periods of time where the system is stable (load, network, memory) given a certain threshold. Each phase is linked to values of the system that represent a resource consumption profile. Such a stage could be for example described as follows: â60\% CPU, 30\% net, 10\% mem.â
	229	As it was said, experiments performed in DCworms require a description of applications that will be scheduled during the simulation. As a primary definition, DCworms uses files in the Standard Workload Format (SWF) or its extension the Grid Workload Format (GWF) \cite{GWF}. In addition to the SWF file, some more detailed specification of a job and tasks can be included in an auxiliary XML file. This form of description extends the basic one and provides the scheduler with more detailed information about application profile, task requirements, user preferences and execution time constraints, which are unavailable in SWF/GWF files. To facilitate the process of adapting the traces from real resource management systems, DCworms supports reading those delivered from the most common ones like SLURM \cite{SLURM} and Torque \cite{TORQUE}.
	230	Since the applications may vary depending on their nature in terms of their requirements and structure, DCworms provides user flexibility in defining the application model. Thus, considered workloads may have various shapes and levels of complexity that range from multiple independent jobs, through large-scale parallel applications, up to whole workflows containing time dependencies and preceding constraints between jobs and tasks. Each job may consist of one or more tasks and these can be seen as groups of processes. Moreover, DCworms is able to handle rigid and moldable jobs, as well as pre-emptive ones. To model the application profile in more detail,
	231	DCworms follows the DNA approach proposed in \cite{Ghislain}. Accordingly, each task can be presented as a sequence of phases, which shows the impact of this task on the resources that run it. Phases are then periods of time where the system is stable (load, network, memory) given a certain threshold. Each phase is linked to values of the system that represent a resource consumption profile. Such a stage could be for example described as follows: â60\% CPU, 30\% net, 10\% mem.â
232	232	Levels of information about incoming jobs are presented in Figure~\ref{fig:jobsStructure}.
233	233
…	…
241	241
242	242	This form of representation allows users to define a wide range of workloads:
243		HPC (long jobs, computational-intensive, hard to migrate) or virtualization (short requests) typical for cloud computing environments.
244		Further, the DCWoRMS benefits from the GSSIM workload generator tool that allows creating synthetic workloads.
	243	HPC (long jobs, computational-intensive, hard to migrate) or virtualization (short requests) that are also typical for data center environments.
245	244
246	245
247	246	\subsection{Resource modeling}
248	247
249		The main goal of DCWoRMS is to enable researchers evaluation of various resource management policies in diverse computing environments. To this end, it supports flexible definition of simulated resources both on physical (computing resources) as well as on logical (scheduling entities) level. This flexible approach allows modeling of various computing entities consisting of compute nodes, processors and cores. In addition, detailed location of the given resources can be provided in order to group them and arrange into physical structures such as racks and containers. Each of the components may be described by different parameters specifying available memory, storage capabilities, processor speed etc. In this way, it is possible to describe power distribution system and cooling devices. Due to an extensible description, users are able to define a number of experiment-specific and visionary characteristics. Moreover, with every component, dedicated profiles can be associated that determines, among others, power, thermal and air throughput properties. The energy estimation plugin can be bundled with each resource. This allows defining various power models that can be then followed by different computing system components. Details concerning the approach to energy-efficiency modeling in DCWoRMS can be found in the next sections.
250
251		Scheduling entities allow providing data related to the brokering or queuing system characteristics. Thus, information about available queues, resources associated with them and their parameters like priority, availability of advance reservation (AR) mechanism etc. can be defined. Moreover, allocation policy and task scheduling strategy for each scheduling entity can be introduced in form of the reference to an appropriate plugin. DC~~WoRMS~~ allows building a hierarchy of schedulers corresponding to the hierarchy of resource components over which the task may be distributed.
252
253		In this way, the DCWoRMS supports simulation of a wide scope of physical and logical architectural patterns that may span from a single computing resource up to whole data centers or geographically distributed grids and clouds. In particular, it supports simulating complex distributed architectures containing models of the whole data centers, containers, racks, nodes, etc. In addition, new resources and distributed computing entities can easily be added to the DCWoRMS environment in order to enhance the functionality of the tool and address more sophisticated requirements. Granularity of such topologies may also differ from coarse-grained to very fine-grained modeling single cores, memory hierarchies and other hardware details.
254
255
256		\subsection{Energy management concept in DC~~WoRMS~~}
257
258		The DCWoRMS allows researchers to take into account energy efficiency and thermal issues in distributed computing experiments. That can be achieved by the means of appropriate models and profiles. In general, the main goal of the models is to emulate the behavior of the real computing resources, while profiles support models by providing data essential for the power consumption calculations. Introducing particular models into the simulation environment is possible through choosing or implementation of dedicated energy plugins that contain methods to calculate power usage of resources, their temperature and system air throughput values. Presence of detailed resource usage information, current resource energy and thermal state description and a functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption and thermal metrics become in this context an additional criterion in the resource management process. Scheduling plugins are provided with dedicated interfaces, which allow them to collect detailed information about computing resource components and to affect their behavior.
259		The following subsection~~s present the general idea behind the energy-efficiency simulations~~.
	248	The main goal of DCworms is to enable researchers evaluation of various resource management policies in diverse computing environments. To this end, it supports flexible definition of simulated resources both on physical (computing resources) as well as on logical (scheduling entities) level. This flexible approach allows modeling of various computing entities consisting of compute nodes, processors and cores. In addition, detailed location of the given resources can be provided in order to group them and arrange into physical structures such as racks and containers. Each of the components may be described by different parameters specifying available memory, storage capabilities, processor speed etc. In this way, it is possible to describe power distribution system and cooling devices. Due to an extensible description, users are able to define a number of experiment-specific and visionary characteristics. Moreover, with every component, dedicated profiles can be associated that determines, among others, power, thermal and air throughput properties. The energy estimation plugin can be bundled with each resource. This allows defining various power models that can be then followed by different computing system components. Details concerning the approach to energy-efficiency modeling in DCworms can be found in the next sections.
	249
	250	Scheduling entities allow providing data related to the brokering or queuing system characteristics. Thus, information about available queues, resources associated with them and their parameters like priority, availability of advance reservation (AR) mechanism etc. can be defined. Moreover, allocation policy and task scheduling strategy for each scheduling entity can be introduced in form of the reference to an appropriate plugin. DCworms allows building a hierarchy of schedulers corresponding to the hierarchy of resource components over which the task may be distributed.
	251
	252	In this way, the DCworms supports simulation of a wide scope of physical and logical architectural patterns that may span from a single computing resource up to whole data centers (even geographically distributed). In particular, it supports simulating complex distributed architectures containing models of the whole data centers, containers, racks, nodes, etc. In addition, new resources and distributed computing entities can easily be added to the DCworms environment in order to enhance the functionality of the tool and address more sophisticated requirements. Granularity of such topologies may also differ from coarse-grained to very fine-grained modeling single cores, memory hierarchies and other hardware details.
	253
	254
	255	\subsection{Energy management concept in DCworms}
	256
	257	The DCworms allows researchers to take into account energy efficiency and thermal issues in distributed computing experiments. That can be achieved by the means of appropriate models and profiles. In general, the main goal of the models is to emulate the behavior of the real computing resources, while profiles support models by providing data essential for the energy usage calculations. Introducing particular models into the simulation environment is possible through choosing or implementation of dedicated energy plugins that contain methods to calculate power usage of resources, their temperature and system air throughput values. Presence of detailed resource usage information, current resource energy and thermal state description and a functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption and thermal metrics become in this context an additional criterion in the resource management process. Scheduling plugins are provided with dedicated interfaces, which allow them to collect detailed information about computing resource components and to affect their behavior.
	258	The following subsection presents the general idea behind the power management concept in DCworms. Detailed description of the approach to thermal and air throughput simulations can be found in \cite{d2.2}.
260	259
261	260
262	261	\subsubsection{Power management}
263	262
264		The motivation behind introducing a power management concept in DC~~WoRMS~~ is providing researchers with the means to define the energy efficiency of resources, dependency of energy consumption on resource load and specific applications, and to manage power modes of resources. Proposed solution extends the power management concept presented in GSSIM \cite{GSSIM_Energy} by offering a much more granular approach with the possibility of plugging energy consumption models and power profiles into each resource level.
	263	The motivation behind introducing a power management concept in DCworms is providing researchers with the means to define the energy efficiency of resources, dependency of energy consumption on resource load and specific applications, and to manage power modes of resources. Proposed solution extends the power management concept presented in GSSIM \cite{GSSIM_Energy} by offering a much more granular approach with the possibility of plugging energy consumption models and power profiles into each resource level.
265	264
266	265	\paragraph{\textbf{Power profile}}
…	…
268	267
269	268	\paragraph{\textbf{Power consumption model}}
270		The main aim of these models is to emulate the behavior of the real computing resource and the way it consumes energy. Due to a rich functionality and flexible environment description, DCWoRMS can be used to verify a number of theoretical assumptions and to develop new energy consumption models. Modeling of energy consumption is realized by the energy estimation plugin that calculates energy usage based on information about the resource power profile, resource utilization, and the application profile including energy consumption and heat production metrics. Relation between model and power profile is illustrated in Figure~\ref{fig:powerModel}.
	269	The main aim of these models is to emulate the behavior of the real computing resource and the way it consumes power. Due to a rich functionality and flexible environment description, DCworms can be used to verify a number of theoretical assumptions and to develop new power consumption models. Modeling of power consumption is realized by the energy estimation plugin that calculates energy usage based on information about the resource power profile, resource utilization, and the application profile including energy consumption and heat production metrics. Relation between model and power profile is illustrated in Figure~\ref{fig:powerModel}.
271	270
272	271	\begin{figure}[tbp]
…	…
277	276
278	277	\paragraph{\textbf{Power management interface}}
279		DC~~WoRMS~~ is complemented with an interface that allows scheduling plugins to collect detailed power information about computing resource components and to change their power states. It enables performing various operations on the given resources, including dynamically changing the frequency level of a single processor, turning off/on computing resources etc. The activities performed with this interface find a reflection in total amount of energy consumed by the resource during simulation.
	278	DCworms is complemented with an interface that allows scheduling plugins to collect detailed power information about computing resource components and to change their power states. It enables performing various operations on the given resources, including dynamically changing the frequency level of a single processor, turning off/on computing resources etc. The activities performed with this interface find a reflection in total amount of energy consumed by the resource during simulation.
280	279
281	280	Presence of detailed resource usage information, current resource energy state description and functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption becomes in this context an additional criterion in the scheduling process, which uses various techniques to decrease energy consumption, e.g. workload consolidation, moving tasks between resources to reduce a number of running resources, dynamic power management, cutting down CPU frequency, and others.
…	…
299	298
300	299	%\paragraph{\textbf{Air throughput management interface}}
301		%The DC~~WoRMS~~ delivers interfaces that provide access to the air throughput profile data, allows acquiring detailed information concerning current air flow conditions and changes in air flow states. The availability of these interfaces support evaluation of different cooling strategies.
	300	%The DCworms delivers interfaces that provide access to the air throughput profile data, allows acquiring detailed information concerning current air flow conditions and changes in air flow states. The availability of these interfaces support evaluation of different cooling strategies.
302	301
303	302
…	…
305	304	%\subsubsection{Thermal management concept}
306	305
307		%The primary motivation behind the incorporation of thermal aspects in the DC~~WoRMS~~ is to exceed the commonly adopted energy use-cases and apply more sophisticated scenarios. By the means of dedicated profiles and interfaces, it is possible to perform experimental studies involving temperature-aware workload placement.
	306	%The primary motivation behind the incorporation of thermal aspects in the DCworms is to exceed the commonly adopted energy use-cases and apply more sophisticated scenarios. By the means of dedicated profiles and interfaces, it is possible to perform experimental studies involving temperature-aware workload placement.
308	307
309	308	%\paragraph{\textbf{Thermal profile}}
…	…
327	326	\subsection{Application performance modeling}\label{sec:apps}
328	327
329		In general, DC~~WoRMS~~ implements user application models as objects describing computational, communicational as well as energy requirements and profiles of the task to be scheduled. Additionally, simulator provides means to include complex and specific application performance models during simulations. They allow researchers to introduce specific ways of calculating task execution time. These models can be plugged into the simulation environment through a dedicated API and implementation of an appropriate plugin. To specify the execution time of a task user can apply a number of parameters, including:
	328	In general, DCworms implements user application models as objects describing computational, communicational as well as energy requirements and profiles of the task to be scheduled. Additionally, simulator provides means to include complex and specific application performance models during simulations. They allow researchers to introduce specific ways of calculating task execution time. These models can be plugged into the simulation environment through a dedicated API and implementation of an appropriate plugin. To specify the execution time of a task user can apply a number of parameters, including:
330	329	\begin{itemize}
331	330	\item task length (number of CPU instructions)
…	…
340	339
341	340
342		\section{Modeling of energy consumption in DC~~WoRMS~~}
343
344		DCWoRMS is an open framework in which various models and algorithms can be investigated as presented in Section \ref{sec:apps}. We discuss possible approaches to modeling that can be applied to simulation of energy-efficiency of distributed computing systems in this section. Additionally, to facilitate the simulation process, DCWoRMS provides some basic implementation of power consumption, air throughput and thermal models. We described them as examples and validate part of them by experiments in real computing system (in Section \ref{sec:experiments}).
	341	\section{Modeling of energy consumption in DCworms}
	342
	343	DCworms is an open framework in which various models and algorithms can be investigated as presented in Section \ref{sec:apps}. In this section, we discuss possible approaches to modeling that can be applied to simulation of energy-efficiency of distributed computing systems. In general, to facilitate the simulation process, DCworms provides some basic implementation of power consumption, air throughput and thermal models. We introduce power consumption models as examples and validate part of them by experiments in real computing system (in Section \ref{sec:experiments}). Description of thermal models and corresponding experiments was presented in \cite{e2dc13}.
345	344
346	345	The most common questions explored by researchers who study energy-efficiency of distributed computing systems is how much energy $E$ do these systems require to execute workloads. In order to obtain this value the simulator must calculate values of power $P_i(t)$ and load $L_i(t)$ in time for all $m$ computing nodes, $i=1..m$. Load function may depend on specific load models applied. In more complex cases it can even be defined as vectors of different resource usage in time. In a simple case load can be either idle or busy but even in this case estimation of job processing times $p_j$ is needed to calculate total energy consumption. The total energy consumption of computing nodes is given by (\ref{eq:E}):
347	346
348	347	\begin{equation}
349		E=\sum_i^m{\int_t{P_i(t)}} \label{eq:E}
	348	E=\sum_i^m{\int_t{P_i(t)} dt} \label{eq:E}
350	349	\end{equation}
351	350
…	…
353	352	Power function may depend on load and states of resources or even specific applications as explained in Section~\ref{sec:power}. Total energy can be also completed by adding constant power usage of components that does not depend on load or state of resources.
354	353
355		In large computing systems which are often characterized by high computational density, total energy consumption of computing nodes is not the only result interesting for researchers. Temperature distribution is getting more and more important as it affects energy consumption of cooling devices, which can reach even half of a total data center energy use. In order to obtain accurate values of temperatures heat transfer simulations based on the Computational Fluid Dynamics (CFD) methods have to be performed. These methods require as an input (i.e. boundary conditions) a heat dissipated by IT hardware and air throughput generated by fans at servers' outlets. Another approach is based on simplified thermal models that without costly CFD calculations provide rough estimations of temperatures. DC~~WoRMS enables the use of either approaches. In the former, the output of simulations including power usage of computing nodes in time and air throughput at node outlets can be passed to CFD solver.~~
	354	In large computing systems which are often characterized by high computational density, total energy consumption of computing nodes is not the only result interesting for researchers. Temperature distribution is getting more and more important as it affects energy consumption of cooling devices, which can reach even half of a total data center energy use. In order to obtain accurate values of temperatures heat transfer simulations based on the Computational Fluid Dynamics (CFD) methods have to be performed. These methods require as an input (i.e. boundary conditions) a heat dissipated by IT hardware and air throughput generated by fans at servers' outlets. Another approach is based on simplified thermal models that without costly CFD calculations provide rough estimations of temperatures. DCworms enables the use of both approaches. In the former, the output of simulations including power usage of computing nodes in time and air throughput at node outlets can be passed to CFD solver. Details addressing this integration issues are introduced in \cite{d2.2}.
356	355	%This option is further elaborated in Section \ref{sec:coolemall}. Simplified thermal models required by the latter approach are proposed in \ref{sec:thermal}.
357	356
…	…
386	385	time. However, experiments performed on several HPC servers show that this dependency does not reflect theoretical shape and is often close to linear as presented in Figure \ref{fig:power_freq}. This phenomenon can be explained by impact of other component than CPU and narrow range of available voltages. A good example of impact by other components is power usage of servers with visible influence of fans as illustrated in Figure \ref{fig:fans_P}.
387	386
388		For these reasons, DC~~WoRMS~~ allows users to define dependencies between power usage and resource states (such as CPU frequency) in the form of tables or arbitrary functions using energy estimation plugins.
	387	For these reasons, DCworms allows users to define dependencies between power usage and resource states (such as CPU frequency) in the form of tables or arbitrary functions using energy estimation plugins.
389	388
390	389	The energy consumption models provided by default can be classified into the following groups, starting from the simplest model up to the more complex ones. Users can easily switch between the given models and incorporate new, visionary scenarios.
…	…
405	404	\end{equation}
406	405
407		Within DC~~WoRMS~~ we built in a static approach model that uses common resource states that affect power usage which are the CPU power states. Hence, with each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. We distinguish also an idle state. Therefore, the current power usage of the node can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level. Additionally, node power states are taken into account to reflect no (or limited) power usage when a node is off.
	406	Within DCworms we built in a static approach model that uses common resource states that affect power usage which are the CPU power states. Hence, with each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. We distinguish also an idle state. Therefore, the current power usage of the node can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level. Additionally, node power states are taken into account to reflect no (or limited) power usage when a node is off.
408	407
409	408	\subsection{Resource load}
…	…
429	428	Unfortunately, to verify this model and adjust it to the specific hardware, power usage of particular subcomponents such as CPU or memory must be measured. As this is usually difficult, other models, based on a total power use, can be applied.
430	429
431		An example is a model applied in DC~~WoRMS~~ based on the real measurements (see Section \ref{sec:models} for more details):
	430	An example is a model applied in DCworms based on the real measurements (see Section \ref{sec:models} for more details):
432	431
433	432	\begin{equation}
…	…
435	434	\end{equation}
436	435
437		where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, $L$ is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$ is a current frequency, $f_{base}$ is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application).
	436	where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, $L$ is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$ is a current frequency, $f_{base}$ is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application ($P_{app}$ is a constant appointed experimentally for each application in order to extract the part of power consumption independent of the load and specific for particular type of task).
438	437
439	438
…	…
448	447	%\subsection{Air throughput models}\label{sec:air}
449	448
450		%The DC~~WoRMS~~ comes with the following air throughput models.
	449	%The DCworms comes with the following air throughput models.
451	450	%By default, air throughput estimations are performed according to the first one.
452	451
…	…
475	474	\subsection{Testbed description}
476	475
477		To obtain values of power consumption that could be later used in DC~~WoRMS~~ environment to build the model and to evaluate resource management policies we ran a set of applications / benchmarks on the physical testbed. For experimental purposes we choose the Christmann high-density Resource Efficient Cluster Server (RECS) system \cite{e2dc12}. The single RECS unit consists of 18 single CPU modules, each of them can be treated as an individual node of PC class. Configuration of our RECS unit is presented in Table~\ref{testBed}.
	476	To obtain values of power consumption that could be later used in DCworms environment to build the model and to evaluate resource management policies we ran a set of applications / benchmarks on the physical testbed. For experimental purposes we choose the Christmann high-density Resource Efficient Cluster Server (RECS) system \cite{e2dc12}. The single RECS unit consists of 18 single CPU modules, each of them can be treated as an individual node of PC class. Configuration of our RECS unit is presented in Table~\ref{testBed}.
478	477
479	478	\begin {table}[h!]
…	…
530	529
531	530
532		Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of value means that the application did not run on the given type of node.
	531	Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of the corresponding value means that the application did not run on the given type of node.
533	532	\begin {table}[h!]
534	533	\centering
…	…
587	586	\subsection{Methodology}
588	587
589		Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consists in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values were used to configure the DC~~WoRMS~~ environment providing energy and time execution models.
	588	Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consists in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values were used to configure the DCworms environment providing energy and time execution models.
590	589	Based on the models obtained for the considered set of resources and applications we evaluated a set of resource management strategies in terms of energy consumption needed to execute four workloads varying in load intensity (10\%, 30\%, 50\%, 70\%). The differences in the load were obtained by applying various intervals (3000, 1200, 720 and 520 seconds, respectively) related to submission times of two successive tasks. In all cases the number of tasks was equal to 1000. Moreover, we differentiated the applications in terms of number of cores allocated by them and their type.
591		To generate a workload we used the DC~~WoRMS~~ workload generator tool with the aforementioned characteristics gathered in Table~\ref{workloadCharacteristics}.
	590	To generate a workload we used the DCworms workload generator tool with the aforementioned characteristics gathered in Table~\ref{workloadCharacteristics}.
592	591
593	592	\begin {table}[ tp]
…	…
623	622	\end {table}
624	623
625		In all cases we assumed that tasks are scheduled and served in order of their arrival (FIFO strategy) ~~with easy backfilling approach~~.
	624	In all cases we assumed that tasks are scheduled and served in order of their arrival (FIFO strategy) using relaxed backfilling (RB) approach, with indefinite delay for the highest priority task. Moreover, all tasks were assigned to nodes with the condition that they can be assigned only to nodes of the type on which the application was able to run (in other words - we had the corresponding value of power consumption and execution time).
626	625
627	626	\subsection{Computational analysis}
628	627
629		In the following section we present the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. The scheduling strategies were evaluated according to two criteria: total energy consumption and maximum completion time of all tasks (makespan). These evaluation criteria employed in our experiments represent interests of various groups of stakeholders present in ~~clouds and grid~~s.
630		Then we discusses the corresponding results received for workloads with other density level.
	628	In the following section we present the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. The scheduling strategies were evaluated according to two criteria: total energy consumption and maximum completion time of all tasks (makespan). These evaluation criteria employed in our experiments represent interests of various groups of stakeholders present in data centers.
	629	Then we discuss the corresponding results received for workloads with other density level.
631	630
632	631	\subsubsection{Random approach}
633	632
634		The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in a random manner with the condition that they can be assigned only to nodes of the type on which the application was able to run (in other words - we had the corresponding value of power consumption and execution time). The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46.883 kWh, \textbf{workload completion time}: 533 820 s.
635		Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively.
	633	The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in a random manner. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46.883 kWh, \textbf{workload completion time}: 533 820 s.
	634	%Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively.
	635
	636	%\begin{figure}[h!]
	637	%\centering
	638	%\includegraphics[width = 12cm]{fig/70r.png}
	639	%\caption{\label{fig:70r} Random strategy}
	640	%\end{figure}
	641
	642	In the second version of this strategy, which is getting more popular due to energy costs, we switched off unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the primary one in many HPC centers. In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the Figure~\ref{fig:70r_rnpm}, switching off unused nodes led to decrease of the total energy consumption.
	643
	644	%\begin{figure}[h!]
	645	%\centering
	646	%\includegraphics[width = 6cm]{fig/70rnpm.png}
	647	%\caption{\label{fig:70rnpm} Random + switching off unused nodes strategy}
	648	%\end{figure}
636	649
637	650
638	651	\begin{figure}[h!]
639	652	\centering
640		\includegraphics[width = 12cm]{fig/70r.png}
641		\caption{\label{fig:70r~~} Random strategy~~}
	653	\includegraphics[width = 12cm]{fig/70r_rnpm.png}
	654	\caption{\label{fig:70r_rnpm} Comparison of energy usage for Random (left) and Random + switching off unused nodes strategy (right)}
642	655	\end{figure}
643	656
644		In the second version of this strategy, which is getting more popular due to energy costs, we switched off unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the primary one in many HPC centers.
645
646		\begin{figure}[h!]
647		\centering
648		\includegraphics[width = 6cm]{fig/70rnpm.png}
649		\caption{\label{fig:70rnpm} Random + switching off unused nodes strategy}
650		\end{figure}
651
652		In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the power consumption chart in the Figure~\ref{fig:70rnpm}, switching off unused nodes led to decrease of the total energy consumption. As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36.705 kWh. The overall energy savings reached 22\%.
	657	As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36.705 kWh. The overall energy savings reached 22\%.
653	658
654	659	\subsubsection{Energy optimization}
655	660
656		The next two evaluated resource management strategies try to decrease the total energy consumption (EO) caused by the execution of the whole workload. They take into account differences in applications and hardware profiles by trying to find the most energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks ha~~s to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usage~~ measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}.
	661	The next two evaluated resource management strategies try to decrease the total energy consumption (EO) caused by the execution of the whole workload. They take into account differences in applications and hardware profiles by trying to find the most energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks have to be assigned to the nodes for which the difference between energy usage for the node running the application and and the node in the idle state is minimal. The power consumption measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}.
657	662
658	663	\begin {table}[h!]
…	…
660	665	\begin{tabular}{ll}
661	666	\hline
662		Type & Power usage in idle state [W] \\
	667	Type of processor within the node & Power usage in idle state [W] \\
663	668	\hline
664	669	Intel i7 & 11.5 \\
…	…
670	675	\end {table}
671	676
672		As mentioned, we assign tasks to nodes minimizing the value of expression: $(P-P_{idle})*exec\_time$, where $P$ denotes observed power of the node running the particular application and $exec\_time$ refers to the measured application running time. Based on the application and hardware profiles, we expected that Atom D510 would be the preferred node. Obtained schedule, that is presented in the Gantt chart in Figure~\ref{fig:70eo} ~~along with the energy and system usage,~~ confirmed our assumptions. Atom D510 nodes are nearly fully loaded, while the least energy-favourable AMD nodes are used only when other ones are busy.
	677	As mentioned, we assign tasks to nodes minimizing the value of expression: $(P-P_{idle})*exec\_time$, where $P$ denotes observed power of the node running the particular application and $exec\_time$ refers to the measured application running time. Based on the application and hardware profiles, we expected that Atom D510 would be the preferred node. Obtained schedule, that is presented in the Gantt chart in Figure~\ref{fig:70eo} confirmed our assumptions. Atom D510 nodes are nearly fully loaded, while the least energy-favourable AMD nodes are used only when other ones are busy.
673	678
674	679	\begin{figure}[h!]
675	680	\centering
676		\includegraphics[width = 1~~2cm]{fig/70eo~~.png}
	681	\includegraphics[width = 10cm]{fig/70eoGantt.png}
677	682	\caption{\label{fig:70eo} Energy usage optimization strategy}
678	683	\end{figure}
…	…
686	691	\begin{figure}[h!]
687	692	\centering
688		\includegraphics[width = 1~~2cm]{fig/70eonpm~~.png}
	693	\includegraphics[width = 10cm]{fig/70eonpmGantt.png}
689	694	\caption{\label{fig:70eonpm} Energy usage optimization + switching off unused nodes strategy}
690	695	\end{figure}
…	…
694	699	\subsubsection{Downgrading frequency}
695	700
696		The last case considered by us is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77.109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers.
	701	The last case considered by us is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration, is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77.109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers.
697	702
698	703	\begin{figure}[h!]
…	…
749	754	\end {table}
750	755
751		Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, refers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion, can not be compensate by the very little energy savings derived from the lower operating state of node. The greater criteria values for the higher system load are the result of greater time space between submission of successive tasks, and thus longer workload execution.
752		We also demonstrated differences between power usage models. They span from rough static approach to accurate application specific models. However, the latter can be difficult or even infeasible to use as it requires real measurements for specific applications beforehand. This issue can be partially resolved by introducing application profiles and classification, which can deteriorate the accuracy though. This issue is begin studied more deeply within CoolEmAll project.
753
754
755
756		%\section{DC~~WoRMS~~ application/use cases}\label{sec:coolemall}
757
758		%DC~~WoRMS~~ in CoolEmAll, integration with CFD
	756	Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, refers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion can not be compensate by the very little energy savings derived from the lower operating state of node. The greater criteria values for the higher system load are the result of greater time space between submission of successive tasks, and thus longer workload execution. Based on Table~\ref{loadMakespan}, one should note that differences in workload completion times are relatively small for all evaluated policies, except Random + lowest frequency approach.
	757	We also demonstrated differences between power usage models. They span from rough static approach to accurate application specific models. However, the latter can be difficult or even infeasible to use, as it requires real measurements for specific applications beforehand. This issue can be partially resolved by introducing application profiles and classification, which can deteriorate the accuracy though. This issue is begin studied more deeply within CoolEmAll project.
	758
	759
	760
	761	%\section{DCworms application/use cases}\label{sec:coolemall}
	762
	763	%DCworms in CoolEmAll, integration with CFD
759	764
760	765	%...
761	766
762		%Being based on the GSSIM framework, that has been successfully applied in a substantial number of research projects and academic studies, DC~~WoRMS~~ with its sophisticated energy extension has become an essential tool for studies of energy efficiency in distributed environments. For this reason, it has been adopted within the CoolEmAll project as a component of Simulation, Visualisation and Decision Support (SVD) Toolkit. In general the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints should help to minimise the energy consumption, and consequently the CO2 emissions of the whole IT infrastructure with related facilities. The SVD Toolkit is designed to support the analysis and optimization of IT modern infrastructures. For the recent years the special attention has been paid for energy utilized by the data centers which considerable contributes to the data center operational costs. Actual power usage and effectiveness of energy saving methods heavily depends on available resources, types of applications and workload properties. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures.
	767	%Being based on the GSSIM framework, that has been successfully applied in a substantial number of research projects and academic studies, DCworms with its sophisticated energy extension has become an essential tool for studies of energy efficiency in distributed environments. For this reason, it has been adopted within the CoolEmAll project as a component of Simulation, Visualisation and Decision Support (SVD) Toolkit. In general the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints should help to minimise the energy consumption, and consequently the CO2 emissions of the whole IT infrastructure with related facilities. The SVD Toolkit is designed to support the analysis and optimization of IT modern infrastructures. For the recent years the special attention has been paid for energy utilized by the data centers which considerable contributes to the data center operational costs. Actual power usage and effectiveness of energy saving methods heavily depends on available resources, types of applications and workload properties. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures.
763	768	%Hence, SVD Toolkit integrates also workload management and scheduling policies to support complex modeling and optimization of modern data centres.
764	769
765		%The main aim of DC~~WoRMS~~ within CoolEmAll project is to enable studies of dynamic states of IT infrastructures, like power consumption and air throughput distribution, on the basis of changing workloads, resource model and energy-aware resource management policies.
766		%In this context, DC~~WoRMS~~ takes into account the specific workload and application characteristics as well as detailed resource parameters. It will benefit from the CoolEmAll benchmarks and classification of applications and workloads. In particular various types of workload, including data centre workloads using virtualization and HPC applications, may be considered. The knowledge concerning their performance and properties as well as information about their energy consumption and heat production will be used in simulations to study their impact on thermal issues and energy efficiency. Detailed resource characteristics, will be also provided according to the CoolEmAll blueprints. Based on this data, workload simulation will support evaluation process of various resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. In addition to typical approaches minimizing energy consumption, policies that prevent too high temperatures in the presence of limited cooling (or no cooling) may also be analyzed. Moreover, apart from the set of predefined strategies, new approaches can easily be applied and examined.
	770	%The main aim of DCworms within CoolEmAll project is to enable studies of dynamic states of IT infrastructures, like power consumption and air throughput distribution, on the basis of changing workloads, resource model and energy-aware resource management policies.
	771	%In this context, DCworms takes into account the specific workload and application characteristics as well as detailed resource parameters. It will benefit from the CoolEmAll benchmarks and classification of applications and workloads. In particular various types of workload, including data centre workloads using virtualization and HPC applications, may be considered. The knowledge concerning their performance and properties as well as information about their energy consumption and heat production will be used in simulations to study their impact on thermal issues and energy efficiency. Detailed resource characteristics, will be also provided according to the CoolEmAll blueprints. Based on this data, workload simulation will support evaluation process of various resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. In addition to typical approaches minimizing energy consumption, policies that prevent too high temperatures in the presence of limited cooling (or no cooling) may also be analyzed. Moreover, apart from the set of predefined strategies, new approaches can easily be applied and examined.
767	772	%The outcome of the workload and resource management simulation phase is a distribution of power usage and air throughput for the computing models specified within the SVD Toolkit. These statistics may be analyzed directly by data centre designers and administrators and/or provided as an input to the CFD simulation phase. The former case allows studying how the above metrics change over time, while the latter harness CFD simulations to identify temperature differences between the computing modules, called hot spots. The goal of this scenario is to visualise the behavior of the temperature distribution within a server room with a number of racks for different types of executed workloads and for various policies used to manage these workloads.
768	773
…	…
770	775	\section{Conclusions and future work}
771	776
772		In this paper we presented a Data Center Workload and Resource Management Simulator (DCWoRMS) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. DCWoRMS provides broad options of customization and combines detailed applications and workloads modeling with simulation of data center resources including their power usage and thermal behavior.
	777	In this paper we presented a Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. DCworms provides broad options of customization and combines detailed applications and workloads modeling with simulation of data center resources including their power usage and thermal behavior.
773	778	We shown its energy-efficiency related features and proposed three methods of power usage modeling: static (fully defined by resource state), dynamic (defined by a function of parameters such as CPU frequency and load), and mapping (based on power usage of specific applications). We compared results of simulations to measurements of real servers and shown differences in accuracy and usability of these models.
774		We also demonstrated DC~~WoRMS~~ capabilities to implement various resource management policies including workload scheduling and node power management. The experimental studies we conducted shown that their impact on overall energy-efficiency depends on a type of servers, their power usage in idle time, possibility of switching off nodes as well as level of load.
775		DCWoRMS is a part of the Simulation, Visualisation and Decision Support (SVD) Toolkit being developed within the CoolEmAll project. The aim of this toolkit is, based on data center building blocks defined by the project, to analyze energy-efficiency of data centers taking into account various aspects such as heterogenous hardware architectures, applications, management policies, and cooling. DCWoRMS will take as an input open models data center building blocks and application profiles. DCWoRMS will be applied to evaluation of resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, DVFS and thermal-aware methods. Output of simulations will include distribution of servers' power usage in time along with estimations of server outlets air flow. These data will be passed to Computational Fluid Dynamics (CFD) simulations using OpenFOAM solver and to advanced 3D visualization. In this way users will be able to study energy-efficiency of a data center from a detailed analysis of workloads and policies to the impact on heat transfer and overall energy consumption.
776		Thus, future work on DC~~WoRMS~~ will focus on more precise power, air-throughput, and thermal models. Additional research directions will include modeling application execution phases, adding predefined common HPC and cloud management policies and application performance and resource power models.
	779	We also demonstrated DCworms capabilities to implement various resource management policies including workload scheduling and node power management. The experimental studies we conducted shown that their impact on overall energy-efficiency depends on a type of servers, their power usage in idle time, possibility of switching off nodes as well as level of load.
	780	DCworms is a part of the Simulation, Visualisation and Decision Support (SVD) Toolkit being developed within the CoolEmAll project. The aim of this toolkit is, based on data center building blocks defined by the project, to analyze energy-efficiency of data centers taking into account various aspects such as heterogenous hardware architectures, applications, management policies, and cooling. DCworms will take as an input the open models of the data center building blocks and application profiles. DCworms will be applied to evaluation of resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation, dynamic switching off nodes, DVFS and thermal-aware methods. Output of simulations will include distribution of servers' power usage in time along with estimations of server outlets air flow. These data will be passed to Computational Fluid Dynamics (CFD) simulations using OpenFOAM solver and to advanced 3D visualization. In this way users will be able to study energy-efficiency of a data center from a detailed analysis of workloads and policies to the impact on heat transfer and overall energy consumption.
	781	Thus, future work on DCworms will focus on more precise power, air-throughput, and thermal models. Additional research directions will include modeling application execution phases, adding predefined common HPC and cloud management policies and application performance and resource power models.
777	782
778	783	\section*{Acknowledgement}
…	…
818	823	\bibitem{fit4green} A. Berl, E. Gelenbe, M. di Girolamo, G. Giuliani, H. de Meer, M.-Q. Dang, K. Pentikousis. Energy-Efficient Cloud Computing. The Computer Journal, 53(7), 2010.
819	824
	825	\bibitem{e2dc13} M. vor dem Berge, G. Da Costa, M. Jarus, A. Oleksiak, W. Piatek, E. Volk. Modeling Data Center Building Blocks for Energy-efficiency and Thermal Simulations. 2nd International Workshop on Energy-Efficient Data Centres, Berkeley, 2013.
	826
820	827	\bibitem{e2dc12} M. vor dem Berge, G. Da Costa, A. Kopecki, A. Oleksiak, J-M. Pierson, T. Piontek, E. Volk, S. Wesner. Modeling and Simulation of Data Center Energy-Efficiency in CoolEmAll. Energy Efficient Data Centers, Lecture Notes in Computer Science Volume 7396, 2012, pp 25-36.
821	828
822	829	\bibitem{CloudSim} R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya. CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience (SPE), Volume 41, Number 1, Pages: 23-50, ISSN: 0038-0644, Wiley Press, New York, USA, January, 2011.
823	830
	831	Modeling Data Center Building Blocks for Energy-efficiency and Thermal Simulations.
	832	Micha Vor Dem Berge, Georges Da Costa, Mateusz Jarus, Ariel Oleksiak, Wojciech PiÄ
	833	tek and Eugen Volk
824	834	\bibitem{DCSG} http://dcsg.bcs.org/welcome-dcsg-simulator
825	835
…	…
846	856	\bibitem{fit4green_scheduler} O. M{\"a}mmel{\"a}, M. Majanen, R. Basmadjian, H. De Meer, A. Giesler, W. Homberg, Energy-aware job scheduler for high-performance computing, Computer Science - Research and Development, November 2012, Volume 27, Issue 4, pp 265-275.
847	857
	858	\bibitem {d2.2} U. Woessner, E. Volk, G. Gallizo, M. vor dem Berge, G. Da Costa, P. Domagalski, W. Piatek, J-M. Pierson. (2012) D2.2 Design of the CoolEmAll simulation and visualisation environment - CoolEmAll Deliverable, http://coolemall.eu
	859
848	860	% web links
849	861

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 1065 for papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

Legend:

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

Download in other formats: