Context Navigation

source: papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex @ 683

Revision 683, 49.6 KB checked in by wojtekp, 12 years ago (diff)

Rev	Line
[593]	1	%% This is file `elsarticle-template-1-num.tex',
	2	%%
	3	%% Copyright 2009 Elsevier Ltd
	4	%%
	5	%% This file is part of the 'Elsarticle Bundle'.
	6	%% ---------------------------------------------
	7	%%
	8	%% It may be distributed under the conditions of the LaTeX Project Public
	9	%% License, either version 1.2 of this license or (at your option) any
	10	%% later version. The latest version of this license is in
	11	%% http://www.latex-project.org/lppl.txt
	12	%% and version 1.2 or later is part of all distributions of LaTeX
	13	%% version 1999/12/01 or later.
	14	%%
	15	%% The list of all files belonging to the 'Elsarticle Bundle' is
	16	%% given in the file `manifest.txt'.
	17	%%
	18	%% Template article for Elsevier's document class `elsarticle'
	19	%% with numbered style bibliographic references
	20	%%
	21	%% $Id: elsarticle-template-1-num.tex 149 2009-10-08 05:01:15Z rishi $
	22	%% $URL: http://lenova.river-valley.com/svn/elsbst/trunk/elsarticle-template-1-num.tex $
	23	%%
	24	\documentclass[preprint,12pt]{elsarticle}
	25
	26	%% Use the option review to obtain double line spacing
	27	%% \documentclass[preprint,review,12pt]{elsarticle}
	28
	29	%% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
	30	%% for a journal layout:
	31	%% \documentclass[final,1p,times]{elsarticle}
	32	%% \documentclass[final,1p,times,twocolumn]{elsarticle}
	33	%% \documentclass[final,3p,times]{elsarticle}
	34	%% \documentclass[final,3p,times,twocolumn]{elsarticle}
	35	%% \documentclass[final,5p,times]{elsarticle}
	36	%% \documentclass[final,5p,times,twocolumn]{elsarticle}
	37
	38	%% if you use PostScript figures in your article
	39	%% use the graphics package for simple commands
	40	%% \usepackage{graphics}
	41	%% or use the graphicx package for more complicated commands
	42	%% \usepackage{graphicx}
	43	%% or use the epsfig package if you prefer to use the old commands
	44	%% \usepackage{epsfig}
	45
	46	%% The amssymb package provides various useful mathematical symbols
	47	\usepackage{amssymb}
	48	%% The amsthm package provides extended theorem environments
	49	%% \usepackage{amsthm}
	50
	51	%% The lineno packages adds line numbers. Start line numbering with
	52	%% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
	53	%% for the whole article with \linenumbers after \end{frontmatter}.
	54	%% \usepackage{lineno}
	55
	56	%% natbib.sty is loaded by default. However, natbib options can be
	57	%% provided with \biboptions{...} command. Following options are
	58	%% valid:
	59
	60	%% round - round parentheses are used (default)
	61	%% square - square brackets are used [option]
	62	%% curly - curly braces are used {option}
	63	%% angle - angle brackets are used <option>
	64	%% semicolon - multiple citations separated by semi-colon
	65	%% colon - same as semicolon, an earlier confusion
	66	%% comma - separated by comma
	67	%% numbers- selects numerical citations
	68	%% super - numerical citations as superscripts
	69	%% sort - sorts multiple citations according to order in ref. list
	70	%% sort&compress - like sort, but also compresses numerical citations
	71	%% compress - compresses without sorting
	72	%%
	73	%% \biboptions{comma,round}
	74
	75	% \biboptions{}
	76
	77
	78	\journal{Simulation Modelling Practice and Theory}
	79
	80	\begin{document}
	81
	82	\begin{frontmatter}
	83
	84	%% Title, authors and addresses
	85
	86	%% use the tnoteref command within \title for footnotes;
	87	%% use the tnotetext command for the associated footnote;
	88	%% use the fnref command within \author or \address for footnotes;
	89	%% use the fntext command for the associated footnote;
	90	%% use the corref command within \author for corresponding author footnotes;
	91	%% use the cortext command for the associated footnote;
	92	%% use the ead command for the email address,
	93	%% and the form \ead[url] for the home page:
	94	%%
	95	%% \title{Title\tnoteref{label1}}
	96	%% \tnotetext[label1]{}
	97	%% \author{Name\corref{cor1}\fnref{label2}}
	98	%% \ead{email address}
	99	%% \ead[url]{home page}
	100	%% \fntext[label2]{}
	101	%% \cortext[cor1]{}
	102	%% \address{Address\fnref{label3}}
	103	%% \fntext[label3]{}
	104
	105	\title{DCWoRMS - a tool for simulation of energy efficiency in Grids and Clouds }
	106
	107	%% use optional labels to link authors explicitly to addresses:
	108	%% \author[label1,label2]{<author name>}
	109	%% \address[label1]{<address>}
	110	%% \address[label2]{<address>}
	111
[594]	112	\author{Krzysztof Kurowski, Ariel Oleksiak, Wojciech Piatek, Tomasz Piontek}
[593]	113
	114
	115	\begin{abstract}
	116	%% Text of abstract
	117
	118	\end{abstract}
	119
	120	\begin{keyword}
	121	%% keywords here, in the form: keyword \sep keyword
	122
	123	%% MSC codes here, in the form: \MSC code \sep code
	124	%% or \MSC[2008] code \sep code (2000 is the default)
	125
	126	\end{keyword}
	127
	128	\end{frontmatter}
	129
	130	%%
	131	%% Start line numbering here if you want
	132	%%
	133	% \linenumbers
	134
	135	%% main text
	136	\section{Introduction}
	137
[648]	138	TODO - Introduction
	139
[632]	140	...
	141
[657]	142	The remaining part of this paper is organized as follows. In Section~2 we give a brief overview of the current state of the art concerning modeling and simulation of distributed systems, like Grids and Clouds, in terms of energy efficiency. Section~3 discusses the main features of DCWoRMS. In particular, it introduces our approach to workload and resource management, presents the concept of energy efficiency modeling and explains how to incorporate a specific application performance model into simulations. Section~4 discusses energy models adopted within the DCWoRMS. In Section~5 we present some experiments that were performed using DCWoRMS utilizing real testbed nodes models to show varius types of popular resource and scheduling technics allowing to decrease the total power consumption of the execution of a set of tasks. Section~6 focuses on the role of DCWoRMS within the CoolEmAll project. Final conclusions and directions for future work are given in Section~7.
[632]	143
[593]	144	\section{Related Work}
	145
[632]	146	The growing importance of energy efficiency in information technologies led to significant interest in energy saving methods for computing systems. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures. Nevertheless, studies of impact of scheduling strategies on energy consumption require a large effort and are difficult to perform in real distributed environments. To overcome these issues extensive research has been conducted in the area of modeling and simulation tools. As a result, a wide variety of simulation tools emerged. The following section contains a short summary of existing simulators that address the green computing issues in distributed infrastructures.
[593]	147
[632]	148	\subsection{GreenCloud}
	149
[650]	150	GreenCloud \cite{GreenCloud} is a C++ based simulation environment for energy-aware cloud computing data centers. It was developed as an extension of the NS2 network simulator. GreenCloud allows researchers to observe and evaluate data centers performance and study their energy-efficiency, focusing mainly on the communications within a data center. Along with the workload distribution, it offers users a detailed, fine-grained modeling of the energy consumed by the elements of the data center.
[632]	151
	152	To deliver information about the energy usage, GreenCloud distinguishes three energy consumption components: computing energy, communicational energy, and the energy component related to the physical infrastructure of a data center. This approach enables modeling energy usage associated with computations, network operations and cooling systems. In GreenCloud, the energy models are implemented for every simulated data center entity (computing servers, core and rack switches). Moreover, due to the advantage in the simulation resolution, energy models can operate at the network packet level as well. This allows updating the levels of energy consumption whenever a new packet leaves or arrives from the link, or whenever a new task execution is started or completed at the server.
[657]	153	Servers are modeled as single core nodes that are responsible for task execution and may contain different scheduling strategies.
	154	The server power consumption model implemented in GreenCloud depends on the server state as well as its utilization and allows capturing the effects of both of the Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM) schemes.
	155	At the links and switches level, GreenCloud supports Dynamic Voltage Scaling (DVS) and Dynamic Network Shutdown (DNS) techniques. The DVS method introduces a control element at each port of the switch that - depending on the traffic pattern and current levels of link utilization - could downgrade the transmission rate. The DNS approach allows putting some network equipment into a sleep mode.
[632]	156
	157	To cover the vast majority of cloud computing applications, GreenCloud defines three types of workloads: computationally intensive workloads that load computing servers considerably, data-intensive workloads that require heavy data transfers, and finally balanced workloads which aim to model the applications having both computing and data transfer requirements.
	158	GreenCloud describes application with a number of computational requirements. Moreover, it specifies communication requirements of the applications in terms of the amount of data to be transferred before and after a task completion. The execution of each application requires a successful completion of its two main components: computing and communicational.
[657]	159	In addition time constraints can be taken into account during the simulation by adding a predefined execution deadline, which aims at introducing Quality of Service constraints specified in a Service Level Agreement. Nevertheless, GreenCloud does not support application performance modeling. Aforementioned capabilities allow only incorporating simple requirements that need to be satisfied before and during the task execution.
[632]	160
	161	Contrary to what the GreenCloud name may suggest, it does not allow testing the impact of a virtualization-based approach on the resource management.
	162	GreenCloud simulator is released under the General Public License Agreement.
	163
	164
	165	\subsection{CloudSim}
	166
	167	CloudSim \cite{CloudSim} is an event-based simulation tool written in Java. Initially CloudSim was based on the well-known GridSim framework, however since the last few releases it is an independent simulator and does not benefit from most of the GridSim functionality.
	168
[657]	169	CloudSim allows creating a simple resources hierarchy containing computing resources that consist of machines and processors. Additionally, it may simulate the behavior of other components including storage and network resources. However, it focuses on computational resources and provides an extra virtualization layer that acts as an execution, management, and hosting environment for application services. It is responsible for the VM provisioning process as well as managing the VM life cycle such as: VM creation, VM destruction, and VM migration. It also enables evaluation of different economic policies by modeling the cost metrics related to the SaaS and IaaS models.
[632]	170
[657]	171	The CloudSim framework provides basic models and entities to validate and evaluate energy-conscious provisioning of techniques and algorithms. Each computing node can be extended with a power model that simulates the power consumption. CloudSim offers example implementations of this component that characterize some popular server models. Needless to say, it can be easily extended for simulating user-defined power consumption models. That allows estimating the current power usage according to the utilization level or the host model. This capability enables the creation of energy-conscious provisioning policies that require real-time knowledge of power consumption by Cloud system components.
[632]	172	Furthermore, it allows an accounting of the total energy consumed by the system during the simulation period. CloudSim comes with a set of predefined and extendable policies that manage the process of VM migrations in order to optimize the power consumption. However, the proposed solution is not appropriate for more sophisticated power management policies. In particular, CloudSim is not sufficient for modeling frequency scaling techniques and managing resource power states.
	173
	174	Similar to GreenCloud, CloudSim defines a simple application model that includes computational and data requirements. Although all these constraints are taken into account during scheduling, they do not affect the application execution. Thereby, a researcher is required to put a lot of effort to incorporate an application performance model into his experiments.
	175	On the other hand CloudSim offers modeling of utilization models that are used to estimate the current load of processor, bandwidth and memory and can be taken into account during the task allocation process.
[657]	176	Concerning workloads, simulator is able to partially support SWF \cite{SWF} files and read data in a user-defined file format. Moreover, it can handle a wide variety of workload types, including parallel, and pre-emptive jobs.
[632]	177
	178	CloudSim is available as Open Source under GPL license.
	179
	180
	181	\subsection{DCSG Simulator}
	182
[657]	183	DCSG Simulator \cite{DCSG} is a Data Centre Cost and Energy Simulator that has been developed under the Carbon Trust Low Carbon Collaborations program in conjunction with the BCS and Romonet Ltd. The simulator works at a data center infrastructure level where analysis of the achieved efficiency of the data center mechanical and electrical plant can be performed but also at the IT level. The simulator implements a set of basic rules that have been developed, based on a detailed understanding of the data center as a system, to allow cost and energy use to be usefully allocated to IT devices within the data center.
[632]	184
[657]	185	As far as data center infrastructure level is concerned, DCSG Simulator calculates the power and cooling schema of data center equipment with respect to their performance. User is able to take into account a wide variety of mechanical and electrical devices like: transformers, power distribution units, power supply, cabling, computer room air conditioning units and chiller plant. For each of them numerous factors can be defined, including device capacity and efficiency, load operating points. These data can be derived from a generic list as well as from the information given by particular manufacturers. There is a wide range of pre-defined models, but user can easily extend them or create new ones.
[632]	186
[651]	187	To perform the IT simulation, it is possible to extend the data center infrastructure by putting IT devices into that data center. That enables detailed simulation of the energy efficiency of devices across a specified time period.
[638]	188	In this case performance of each piece of equipment (facility and IT) within a data center is determined by a combination of factors, including workload, data center conditions, the manufacturer's specifications of the machine's components and the way in which the machine is utilized based on its provisioned IT load.
[657]	189	Users are possible to bind the operational characteristics, proper to the particular geographic locations, with the simulation process. These characteristics may include temperature profile as well as the power cost that vary depending on the time and place. The output of this simulation is a set of energy and cost data representing the IT devices (including PUE and DCiE) and data center energy consumption, capital and operational costs.
[632]	190
	191
	192	According to the tool evaluation presented in \cite{DCD_Romonet} an accuracy of models delivered by Romonet is at the level of 95\% when compared with metered data. The simulator is available under an OSL V3.0 open-source license, however it can be only accessed by the DCSG Members.
	193
	194
[648]	195	\subsection{Summary}
[632]	196
[648]	197	TODO - short summary of current SoTA
[632]	198
[648]	199
	200
[593]	201	\section{DCWoRMS}
	202
[632]	203	The following picture (Figure~\ref{fig:arch}) presents the overall architecture of the simulation tool.
	204
	205	Data Center workload and resource management simulator (DCWoRMS) is a simulation tool based on the GSSIM framework \cite{GSSIM} developed by Poznan Supercomputing and Networking Center (PSNC).
[657]	206	GSSIM has been proposed to provide an automated tool for experimental studies of various resource management and scheduling strategies in distributed computing systems. DCWoRMS extends its basic functionality and adds some additional features related to the energy efficiency issues in data centers. In this section we will introduce the functionality of the simulator, in terms of modeling and simulation of large scale distributed systems like Grids and Clouds.
[632]	207
	208
	209	\subsection{Architecture}
	210
	211
	212	\begin{figure}[tbp]
	213	\centering
	214	\includegraphics[width = 12cm]{fig/arch.png}
	215	\caption{\label{fig:arch} DCWoRMS architecture}
	216	\end{figure}
	217
[657]	218	DCWoRMS is an event-driven simulation tool written in Java. In general, input data for the DCWoRMS consist of workload and resources descriptions. They can be provided by the user, read from real traces or generated using the generator module. However, the key elements of the presented architecture are plugins. They allow the researchers to configure and adapt the simulation environment to the peculiarities of their studies, starting from modeling job performance, through energy estimations up to implementation of resource management and scheduling policies. Each plugin can be implemented independently and plugged into a specific experiment. Results of experiments are collected, aggregated, and visualized using the statistics module. Due to a modular and plug-able architecture DCWoRMS can be applied to specific resource management problems and address different usersâ requirements.
[632]	219
	220
	221	\subsection{Workload modeling}
	222
	223	As it was said, experiments performed in DCWoRMS require a description of applications that will be scheduled during the simulation. As a primary definition, DCWoRMS uses files in the Standard Workload Format (SWF) or its extension the Grid Workload Format (GWF) \cite{GWF}. In addition to the SWF file, some more detailed specification of a job and tasks can be included in an auxiliary XML file. This form of description provides the scheduler with more detailed information about application profile, task requirements, user preferences and execution time constraints, which are unavailable in SWF/GWF files. To facilitate the process of adapting the traces from real resource management systems, DCWoRMS supports reading those delivered from the most common ones like SLURM \cite{SLURM} and Torque \cite{TORQUE}.
[657]	224	Since the applications may vary depending on their nature in terms of their requirements and structure, DCWoRMS provides user flexibility in defining the application model. Thus, considered workloads may have various shapes and levels of complexity that range from multiple independent jobs, through large-scale parallel applications, up to whole workflows containing time dependencies and preceding constraints between jobs and tasks. Each job may consist of one or more tasks and these can be seen as groups of processes. Moreover, DCWoRMS is able to handle rigid and moldable jobs, as well as pre-emptive ones. To model the application profile in more detail,
	225	DCWoRMS follows the DNA approach proposed in \cite{Ghislain}. Accordingly, each task can be presented as a sequence of phases, which shows the impact of this task on the resources that run it. Phases are then periods of time where the system is stable (load, network, memory) given a certain threshold. Each phase is linked to values of the system that represent a resource consumption profile. Such a stage could be for example described as follows: â60\% CPU, 30\% net, 10\% mem.â
[632]	226
	227	Levels of information about incoming jobs are presented in Figure~\ref{fig:jobsStructure}.
	228
	229
	230	\begin{figure}[tbp]
	231	\centering
	232	\includegraphics[width = 8cm]{fig/jobsStructure.png}
	233	\caption{\label{fig:jobsStructure} Levels of information about jobs}
	234	\end{figure}
	235
	236
	237	This form of representation allows users to define a wide range of workloads:
	238	HPC (long jobs, computational-intensive, hard to migrate) or virtualization (short requests) typical for cloud computing environments.
	239	Further, the DCWoRMS benefits from the GSSIM workload generator tool and extends it with that allows creating synthetic workloads.
	240
	241
	242	\subsection{Resource modeling}
	243
[657]	244	The main goal of DCWoRMS is to enable researchers evaluation of various resource management policies in diverse computing environments. To this end, it supports flexible definition of simulated resources both on physical (computing resources) as well as on logical (scheduling entities) level. This flexible approach allows modeling of various computing entities consisting of compute nodes, processors and cores. In addition, detailed location of the given resources can be provided in order to group them and arrange into physical structures such as racks and containers. Each of the components may be described by different parameters specifying available memory, storage capabilities, processor speed etc. In this way, it is possible to describe power distribution system and cooling devices. Due to an extensible description, users are able to define a number of experiment-specific and visionary characteristics. Moreover, with every component, dedicated profiles can be associated that determines, among others, power, thermal and air throughput properties. The energy estimation plugin can be bundled with each resource. This allows defining various power models that can be then followed by different computing system components. Details concerning the approach to energy-efficiency modeling in DCWoRMS can be found in the next sections.
[632]	245
	246	Scheduling entities allow providing data related to the brokering or queuing system characteristics. Thus, information about available queues, resources associated with them and their parameters like priority, availability of AR mechanism etc. can be defined. Moreover, allocation policy and task scheduling strategy for each scheduling entity can be introduced in form of the reference to an appropriate plugin. DCWoRMS allows building a hierarchy of schedulers corresponding to the hierarchy of resource components over which the task may be distributed.
	247
	248	In this way, the DCWoRMS supports simulation of a wide scope of physical and logical architectural patterns that may span from a single computing resource up to whole data centers or geographically distributed grids and clouds. In particular, it supports simulating complex distributed architectures containing models of the whole data centers, containers, racks, nodes, etc. In addition, new resources and distributed computing entities can easily be added to the DCWoRMS environment in order to enhance the functionality of the tool and address more sophisticated requirements. Granularity of such topologies may also differ from coarse-grained to very fine-grained modeling single cores, memory hierarchies and other hardware details.
	249
	250
	251	\subsection{Energy management concept in DCWoRMS}
	252
[657]	253	The DCWoRMS allows researchers to take into account energy efficiency and thermal issues in distributed computing experiments. That can be achieved by the means of appropriate models and profiles. In general, the main goal of the models is to emulate the behavior of the real computing resources, while profiles support models by providing data essential for the power consumption calculations. Introducing particular models into the simulation environment is possible through choosing or implementation of dedicated energy plugins that contain methods to calculate power usage of resources, their temperature and system air throughput values. Presence of detailed resource usage information, current resource energy and thermal state description and a functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption and thermal metrics become in this context an additional criterion in the resource management process. Scheduling plugins are provided with dedicated interfaces, which allow them to collect detailed information about computing resource components and to affect their behavior.
[632]	254	The following subsections present the general idea behind the energy-efficiency simulations.
	255
	256
	257	\subsubsection{Power management}
[638]	258
[632]	259	The motivation behind introducing a power management concept in DCWoRMS is providing researchers with the means to define the energy efficiency of resources, dependency of energy consumption on resource load and specific applications, and to manage power modes of resources. Proposed solution extends the power management concept presented in GSSIM \cite{GSSIM_Energy} by offering a much more granular approach with the possibility of plugging energy consumption models and power profiles into each resource level.
	260
	261	\paragraph{\textbf{Power profile}}
[657]	262	In general, power profiles allow specifying the power usage of resources. Depending on the accuracy of the model, users may provide additional information about power states which are supported by the resources, amounts of energy consumed in these states, and other information essential to calculate the total energy consumed by the resource during runtime. In such a way each component of IT infrastructure may be described, including computing resources, system components and data center facilities. Moreover, it is possible to define any number of new, resource specific, states, for example so called P-states, in which processor can operate.
[632]	263
[683]	264	\paragraph{\textbf{Power consumption model}}
[657]	265	The main aim of these models is to emulate the behavior of the real computing resource and the way it consumes energy. Due to a rich functionality and flexible environment description, DCWoRMS can be used to verify a number of theoretical assumptions and to develop new energy consumption models. Modeling of energy consumption is realized by the energy estimation plugin that calculates energy usage based on information about the resource power profile, resource utilization, and the application profile including energy consumption and heat production metrics. Relation between model and power profile is illustrated in Figure~\ref{fig:powerModel}.
[632]	266
	267	\begin{figure}[tbp]
	268	\centering
	269	\includegraphics[width = 8cm]{fig/powerModel.png}
[683]	270	\caption{\label{fig:powerModel} Power consumption modeling}
[632]	271	\end{figure}
	272
	273	\paragraph{\textbf{Power management interface}}
	274	DCWoRMS is complemented with an interface that allows scheduling plugins to collect detailed power information about computing resource components and to change their power states. It enables performing various operations on the given resources, including dynamically changing the frequency level of a single processor, turning off/on computing resources etc. The activities performed with this interface find a reflection in total amount of energy consumed by the resource during simulation.
	275
[657]	276	Presence of detailed resource usage information, current resource energy state description and functional energy management interface enables an implementation of energy-aware scheduling algorithms. Resource energy consumption becomes in this context an additional criterion in the scheduling process, which use various techniques to decrease energy consumption, e.g. workload consolidation, moving tasks between resources to reduce a number of running resources, dynamic power management, cutting down CPU frequency, and others.
[632]	277
	278	\subsubsection{Air throughput management concept}
[638]	279
[632]	280	The presence of an air throughput concept addresses the issue of resource air-cooling facilities provisioning. Using the air throughput profiles and models allows anticipating the air flow level on output of the computing system component, resulting from air-cooling equipment management.
	281
	282	\paragraph{\textbf{Air throughput profile}}
[683]	283	The air throughput profile, analogously to the power profile, allows specifying supported air flow states. Each air throughput state definition consists of an air flow value and a corresponding power draw. It can represent, for instance, a fan working state. In this way, associating the air throughput profile with the given computing resource, it is possible to describe mounted air-cooling devices.
[632]	284	Possibility of introducing additional parameters makes the air throughput description extensible for new specific characteristics.
	285
	286	\paragraph{\textbf{Air throughput model}}
	287	Similar to energy consumption models, the user is provided with a dedicated interface that allows him to describe the resulting air throughput of the computing system components like cabinets or server fans. The general idea of the air throughput modeling is shown in Figure~\ref{fig:airModel}. Accordingly, air flow estimations are based on detailed information about the involved resources, including their air throughput states.
	288
	289	\begin{figure}[tbp]
	290	\centering
	291	\includegraphics[width = 8cm]{fig/airModel.png}
	292	\caption{\label{fig:airModel} Air throughput modeling}
	293	\end{figure}
	294
	295	\paragraph{\textbf{Air throughput management interface}}
	296	The DCWoRMS delivers interfaces that provide access to the air throughput profile data, allows acquiring detailed information concerning current air flow conditions and changes in air flow states. The availability of these interfaces support evaluation of different cooling strategies.
	297
[638]	298
	299
	300	\subsubsection{Thermal management concept}
	301
	302	The primary motivation behind the incorporation of thermal aspects in the DCWoRMS is to exceed the commonly adopted energy use-cases and apply more sophisticated scenarios. By the means of dedicated profiles and interfaces, it is possible to perform experimental studies involving temperature-aware workload placement.
	303
	304	\paragraph{\textbf{Thermal profile}}
[657]	305	Thermal profile expresses the thermal specification of resources. It consists of the definition of the thermal design power (TDP), thermal resistance and thermal states that describe how the temperature depends on dissipated heat. For the purposes of more complex experiments, introducing of new, user-defined characteristics is supported. The aforementioned values may be provided for all computing system components distinguishing them, for instance, according to their material parameters and/or models.
[638]	306
	307	\paragraph{\textbf{Temperature estimation model}}
[683]	308	Thermal profile, complemented with the temperature measurement model implementation may introduce temperature sensors simulation. In this way, users have means to approximately predict the temperature of the simulated objects by taking into account basic thermal characteristics as well as the estimated impact of cooling devices. However, the proposed approach assumes some simplifications that ignore heating and cooling dynamics understood as a heat flow process.
[638]	309
	310	Figure~\ref{fig:tempModel} summarizes relation between model and profile and input data.
	311
	312	\begin{figure}[tbp]
	313	\centering
	314	\includegraphics[width = 8cm]{fig/tempModel.png}
	315	\caption{\label{fig:tempModel} Temperature estimation modeling}
	316	\end{figure}
	317
	318	\paragraph{\textbf{Thermal resource management interface}}
	319	As the temperature is highly dependent on the dissipated heat and cooling capacity, thermal resource management is performed via a power and air throughput interface. Nevertheless, the interface provides access to the thermal resource characteristics and the current temperature values
	320
	321
[666]	322	\subsection{Application performance modeling}\label{sec:apps}
[638]	323
[657]	324	In general, DCWoRMS implements user application models as objects describing computational, communicational as well as energy requirements and profiles of the task to be scheduled. Additionally, simulator provides means to include complex and specific application performance models during simulations. They allow researchers to introduce specific ways of calculating task execution time. These models can be plugged into the simulation environment through a dedicated API and implementation of an appropriate plugin. To specify the execution time of a task user can apply a number of parameters, including:
[632]	325	\begin{itemize}
	326	\item task length (number of CPU instructions)
	327	\item task requirements
	328	\item detailed description of allocated resources (processor type and
	329	parameters, available memory)
	330	\item input data size
	331	\item network parameters
	332	\end{itemize}
	333	Using these parameters developers can for instance take into account the architectures of the underlying systems, such as multi-core processors, or virtualization overheads, and their impact on the final performance of applications.
	334
	335
	336
[666]	337	\section{Modeling of energy efficiency in DCWoRMS}
[593]	338
[666]	339	DCWoRMS is an open framework in which various models and algorithms can be investigated as presented in Section \ref{sec:apps}. We discuss possible approaches to modeling that can be applied to simulation of energy-efficiency of distributed computing systems in this section. Additionally, to facilitate the simulation process, DCWoRMS provides some basic implementation of power consumption, air throughput and thermal models. We described them as examples and validate part of them by experiments in real computing system (in Section \ref{sec:experiments}).
[648]	340
[666]	341	The most common questions explored by researchers who study energy-efficiency of distributed computing systems is how much energy $E$ do these systems require to execute workloads. In order to obtain this value the simulator must calculate values of power $P_i(t)$ and load $L_i(t)$ in time for all $m$ computing nodes, $i=1..m$. Load function may depend on specific load models applied. In more complex cases it can even be defined as vectors of different resource usage in time. In a simple case load can be either idle or busy but even in this case estimation of job processing times $p_j$ is needed to calculate total energy consumption. The total energy consumption of computing nodes is given by (\ref{eq:E}):
[638]	342
[666]	343	\begin{equation}
	344	E=\sum_i^m{\int_t{P_i(t)}} \label{eq:E}
	345	\end{equation}
[638]	346
[666]	347
	348	Power function may depend on load and states of resources or even specific applications as explained in \ref{sec:power}. Total energy can be also completed by adding constant power usage of components that does not depend on load or state of resources.
	349
	350	In large computing systems which are often characterized by high computational density, total energy consumption of computing nodes is not the only result interesting for researchers. Temperature distribution is getting more and more important as it affects energy consumption of cooling devices, which can reach even half of a total data center energy use. In order to obtain accurate values of temperatures heat transfer simulations based on the Computational Fluid Dynamics (CFD) methods have to be performed. These methods require as an input (i.e. boundary conditions) a heat dissipated by IT hardware and air throughput generated by fans at servers' outlets. Another approach is based on simplified thermal models that without costly CFD calculations provide rough estimations of temperatures. DCWoRMS enables the use of either approaches. In the the former, the output of simulations including power usage of computing nodes in time and air throughput at node outlets can be passed to CFD solver. This option is further elaborated in Section \ref{sec:coolemall}. Simplified thermal models required by the latter approach are proposed in \ref{sec:thermal}.
	351
	352
	353	\subsection{Power consumption models}\label{sec:power}
	354
	355	As stated above power usage of computing nodes depend on a number of factors.
	356
	357	Generally, the power consumption of a modern CPU is given by the Ohm's law:
	358	%\cite{intel_speedstep}:
	359
	360	\begin{equation}
	361	P=C\cdot V_{core}^{2}\cdot f\label{eq:ohm-law}
	362	\end{equation}
	363
	364	with $C$ being the processor switching capacitance, $V_{core}$ the
	365	current P-State's core voltage and $f$ the frequency. Based on the
	366	above equation it is suggested that although the reduction of frequency
	367	causes an increase in the time of execution, the reduction of frequency
	368	also leads to the reduction of $V_{core}$ and thus the power savings
	369	from the $P\sim V_{core}^{2}$ relation outweigh the increased computation
	370	time. However, experiments performed on several HPC servers shown that this dependency does not reflect theoretical shape and is often close to linear as presented in Figure \ref{fig:power}. This phenomenon can be explained by impact of other component than CPU and narrow range of available voltages. A good example of impact by other components is power usage of servers with visible influence of fans as illustrated in Figure \ref{fig:fans_P}.
	371
	372	%
	373	\begin{figure}\label{fig:power}
	374	\centering
	375	\includegraphics[width=6cm]{fig/power_default}
	376	\caption{Average power usage with regard to CPU frequency\protect \\
	377	\textbf{Tests}: Linpack (\emph{green}), Abinit (\emph{purple}),
	378	Namd (\emph{blue}) and Cpuburn (\emph{red}). \protect \\
	379	}
	380	%
	381	\end{figure}
	382
	383	\begin{figure}[tbp]
	384	\centering
	385	\includegraphics[width = 8cm]{fig/power-fans.png}
	386	\caption{\label{fig:fans_P} Power in time for highest frequency}
	387	\end{figure}
	388
	389
	390	For these reasons, DCWoRMS allows users to define dependencies between power usage and resource states (such as CPU frequency) in the form of tables or arbitrary functions using energy estimation plugins.
	391
[632]	392	The energy consumption models provided by default can be classified into the following groups, starting from the simplest model up to the more complex ones. Users can easily switch between the given models and incorporate new, visionary scenarios.
	393
[648]	394	\textbf{Static approach} is based on a static definition of resource power usage. This model calculates the total amount of energy consumed by the computing resource system as a sum of energy, consumed by all its components (processors, disks, power adapters, etc.). More advanced versions of this approach assume definition of resource states along with corresponding power usage. This model follows changes of resource power states and sums up the amounts of energy defined for each state.
[666]	395	In this case, specific values of power usage are defined for all discrete $n$ states as shown in (\ref{eq:static}):
[648]	396
[666]	397	\begin{equation}
	398	S_1 \to P_1, S_2 \to P_2, ..., S_n \to P_n \label{eq:static}
	399	\end{equation}
	400
[648]	401	\textbf{Resource load} model extends the static power state description and enhances it with real-time resource usage, most often simply the processor load. In this way it enables a dynamic estimation of power usage based on resource basic power usage and state (defined by the static resource description) as well as resource load. For instance, it allows distinguishing between the amount of energy used by idle processors and processors at full load. In this manner, energy consumption is directly connected with power state and describes average power usage by the resource working in a current state.
[666]	402	In this case, specific values of power usage are defined for all pairs state and load values (discretized to $l$ values) as shown in (\ref{eq:load}):
[648]	403
[666]	404	\begin{equation}
	405	(S_1, L_1) \to P_{11}, (S_1, L_2) \to P_{12}, ..., (S_2, L_1) \to P_{21}, ..., (S_n, L_l) \to P_{nl}, \label{eq:load}
	406	\end{equation}
	407
[648]	408	\textbf{Application specific} model allows expressing differences in the amount of energy required for executing various types of applications at diverse computing resources. It considers all defined system elements (processors, memory, disk, etc.), which are significant in total energy consumption. Moreover, it also assumes that each of these components can be utilized in a different way during the experiment and thus have different impact on total energy consumption. To this end, specific characteristics of resources and applications are taken into consideration. Various approaches are possible including making the estimated power usage dependent on defined classes of applications, ratio between CPU-bound and IO-bound operations, etc.
[666]	409	In this case, power usage is an arbitrary function of state, load, and application characteristics as shown in (\ref{eq:app}):
[648]	410
[666]	411	\begin{equation}
	412	f(S, L, A) \to P \label{eq:app}
	413	\end{equation}
[638]	414
[666]	415	\subsection{Air throughput models}\label{sec:air}
	416
[650]	417	The DCWoRMS comes with the following air throughput models.
[632]	418	By default, air throughput estimations are performed according to the first one.
	419
[648]	420	\textbf{Static} model refers to a static definition of air throughput states. According to this approach, output air flow depends only on the present air cooling working state and the corresponding air throughput value. Each state change triggers the calculations and updates the current air throughput value. This strategy requires only a basic air throughput profile definition.
	421
[683]	422	\textbf{Space} model allows taking into account a duct associated with the investigated air flow. On the basis of the given fan rotation speed and the obstacles before/behind the fans, the output air throughput can be roughly estimated. To this end, additional manufacturer's specifications will be required, including resulting air velocity values and fan duct dimensions. Thus, it is possible to estimate the air flow level not only referring to the current fan operating state but also with respect to the resource and its subcomponent placement. More advanced scenario may consider mutual impact of several air flows.
[648]	423
[666]	424	\subsection{Thermal models}\label{sec:thermal}
[650]	425
[666]	426	\begin{figure}[tbp]
	427	\centering
	428	\includegraphics[width = 8cm]{fig/temp-fans.png}
	429	\caption{\label{fig:tempModel} Temperature in time for highest frequency}
	430	\end{figure}
	431
	432
[638]	433	The following models are supported natively. By default, the static strategy is applied.
[632]	434
[648]	435	\textbf{Static} approach follows the changes in heat, generated by the computing system components and matches the corresponding temperature according to the specified profile. Since it tracks the power consumption variations, corresponding values must be delivered, either from power consumption model or on the basis of user data. Replacing the appropriate temperature values with function based on the defined material properties and/o experimentally measured values can easily extend this model.
[593]	436
[648]	437	\textbf{Ambient} model allows taking into account the surrounding cooling infrastructure. It calculates the device temperature as a function adopted from the static approach and extends it with the influence of cooling method. The efficiency of cooling system may be derived from the current air throughput value.
[638]	438
[666]	439	\section{Experiments and evaluation}\label{sec:experiments}
[638]	440
[593]	441	Results + RECS and MOP description
	442
[638]	443	....
	444
[648]	445	In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. We carried out two types of experiments. The former one aimed at demonstrating the capabilities of the simulator in termis of verifying the research hypotheses. The latter set of experiments was performed on the CoolEmAll testbed and then repeated using DCWoRMS tool. The comparative analysis of obtained results shows the reproducibility of experiments and prove the correctness of the adopted models and assumptions.
[638]	446
	447	\subsection{Testbed description}
	448
	449	The RECS Cluster System is an 18 node computer system that has an monitoring and controlling mechanism integrated. Through the integrated novel monitoring approach of the RECS Cluster System the network load can be reduced, the dependency of polling every single
	450	compute node at operation system layer can be avoided. Furthermore this concept build up a basis on which new monitoring- and controlling-concepts can be developed. Therefore, each compute node of the RECS Cluster Server is connected to an Operation System independent microcontroller that collects the most important sensor data like temperature, power consumption and the status (on/off) from every single node.
	451
	452
[648]	453	\begin {table}[ tp]
	454
	455	\begin{tabular}{llr}
	456	\hline
	457	\multicolumn{3}{c}{Nodes} \\
	458	Type & Memory (RAM) & Count \\
	459	\hline
	460	Intel i7 & 16 GB & 4 \\
	461	AMD Fusion T40N 64 Bit & 4 GB & 6 \\
	462	Atom D510 64 Bit & 2 GB & 4 \\
	463	Atom Z510 VT &2 GB & 4 \\
	464	\hline
	465	\multicolumn{3}{c}{Storage} \\
	466	Type & Size & Connection \\
	467	\hline
	468	Storage Head 520 & 16 x 300 GB SSD & 2 x 10 Gbit/s CX4 \\
	469	\hline
	470	\end{tabular}
	471	\caption {CoolEmAll testbed}
	472	\end {table}
	473
[638]	474	%Node i7, 16 GB RAM 4
	475	%Node AMD Fusion T40N Dualcore, 1,0 Ghz, 4 GB (64 Bit) 6
	476	%Node Atom D510 64 Bit, 2 GB 4
	477	%Node Atom Z510 VT, 2 GB 4
	478	%RECS \| Storage Head 520, 16 x 300 GB SSD, 2 x 10 Gbit/s CX4
	479
[648]	480	\subsection{Computational analysis}
[638]	481
[648]	482	TODO - experiments
[593]	483
[666]	484	\section{DCWoRMS application/use cases}\label{sec:coolemall}
[639]	485
[648]	486	DCWoRMS in CoolEmAll, integration with CFD
[593]	487
[639]	488	...
	489
[657]	490	Being based on the GSSIM framework, that has been successfully applied in a substantial number of research projects and academic studies, DCWoRMS with its sophisticated energy extension has become an essential tool for studies of energy efficiency in distributed environments. For this reason, it has been adopted within the CoolEmAll project as a component of Simulation, Visualisation and Decission Support (SVD) Toolkit. In general the main goal of CoolEmAll is to provide advanced simulation, visualisation and decision support tools along with blueprints of computing building blocks for modular data centre environments. Once developed, these tools and blueprints should help to minimise the energy consumption, and consequently the CO2 emissions of the whole IT infrastructure with related facilities. The SVD Toolkit is designed to support the analysis and optimization of IT modern infrastructures. For the recent years the special attention has been paid for energy utilized by the data centers which considerable contributes to the data center operational costs. Actual power usage and effectiveness of energy saving methods heavily depends on available resources, types of applications and workload properties. Therefore, intelligent resource management policies are gaining popularity when considering the energy efficiency of IT infrastructures.
[639]	491	Hence, SVD Toolkit integrates also workload management and scheduling policies to support complex modeling and optimization of modern data centres.
	492
[648]	493	The main aim of DCWoRMS within CoolEmAll project is to enable studies of dynamic states of IT infrastructures, like power consumption and air throughput distribution, on the basis of changing workloads, resource model and energy-aware resource management policies.
	494	In this context, DCWoRMS takes into account the specific workload and application characteristics as well as detailed resource parameters. It will benefit from the CoolEmAll benchmarks and classification of applications and workloads. In particular various types of workload, including data centre workloads using virtualization and HPC applications, may be considered. The knowledge concerning their performance and properties as well as information about their energy consumption and heat production will be used in simulations to study their impact on thermal issues and energy efficiency. Detailed resource characteristics, will be also provided according to the CoolEmAll blueprints. Based on this data, workload simulation will support evaluation process of various resource management approaches. These policies may include a wide spectrum of energy-aware strategies such as workload consolidation/migration, dynamic switching off nodes, Dynamic Voltage and Frequency Scaling (DVFS), and thermal-aware methods. In addition to typical approaches minimizing energy consumption, policies that prevent too high temperatures in the presence of limited cooling (or no cooling) may also be analyzed. Moreover, apart from the set of predefined strategies, new approaches can easily be applied and examined.
	495	The outcome of the workload and resource management simulation phase is a distribution of power usage and air throughput for the computing models specified within the SVD Toolkit. These statistics may be analyzed directly by data centre designers and administrators and/or provided as an input to the CFD simulation phase. The former case allows studying how the above metrics change over time, while the latter harness CFD simulations to identify temperature differences between the computing modules, called hot spots. The goal of this scenario is to visualise the behavior of the temperature distribution within a server room with a number of racks for different types of executed workloads and for various policies used to manage these workloads.
[639]	496
	497
[593]	498	\section{Conclusions and future work}
	499
[648]	500	TODO - Conclusions and future research
[593]	501
[594]	502
[593]	503	\label{}
	504
	505	%% The Appendices part is started with the command \appendix;
	506	%% appendix sections are then done as normal sections
	507	%% \appendix
	508
	509	%% \section{}
	510	%% \label{}
	511
	512	%% References
	513	%%
	514	%% Following citation commands can be used in the body text:
	515	%% Usage of \cite is as follows:
	516	%% \cite{key} ==>> [#]
	517	%% \cite[chap. 2]{key} ==>> [#, chap. 2]
	518	%% \citet{key} ==>> Author [#]
	519
	520	%% References with bibTeX database:
	521
	522	%%\bibliographystyle{model1-num-names}
	523	%%\bibliography{<your-bib-database>}
	524
	525	%% Authors are advised to submit their bibtex database files. They are
	526	%% requested to list a bibtex style file in the manuscript if they do
	527	%% not want to use model1-num-names.bst.
	528
	529	%% References without bibTeX database:
	530
	531	\begin{thebibliography}{00}
	532
	533	%% \bibitem must have the following form:
	534	%% \bibitem{key}...
	535	%%
	536
	537	% \bibitem{}
	538
[632]	539	\bibitem{CloudSim} Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, Cesar A. F. De Rose, and Rajkumar Buyya, CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience (SPE), Volume 41, Number 1, Pages: 23-50, ISSN: 0038-0644, Wiley Press, New York, USA, January, 2011.
	540
	541	\bibitem{DCSG} http://dcsg.bcs.org/welcome-dcsg-simulator
	542
	543	\bibitem{DCD_Romonet} http://www.datacenterdynamics.com/blogs/ian-bitterlin/it-does-more-it-says-tin\%E2\%80\%A6
	544
	545	\bibitem{Ghislain} Ghislain Landry Tsafack Chetsa, Laurent LefÃšvre, Jean-Marc Pierson, Patricia Stolf, Georges Da Costa. âDNA-inspired Scheme for Building the Energy Profile of HPC Systemsâ. In: International Workshop on Energy-Efficient Data Centres, Madrid, Springer, 2012
	546
	547	\bibitem{GreenCloud} D. Kliazovich, P. Bouvry, and S. U. Khan, A Packet-level Simulator of Energy- aware Cloud Computing Data Centers, Journal of Supercomputing, vol. 62, no. 3, pp. 1263-1283, 2012
	548
	549	\bibitem{GSSIM} S. Bak, M. Krystek, K. Kurowski, A. Oleksiak, W. Piatek and J. Weglarz, GSSIM - a Tool for Distributed Computing Experiments, Scientific Programming Journal, vol. 19, no. 4, pp. 231-251, 2011.
	550
	551	\bibitem{GSSIM_Energy} M. Krystek, K. Kurowski, A. Oleksiak, W. Piatek, Energy-aware simulations with GSSIM. Proceedings of the COST Action IC0804 on Energy Efficiency in Large Scale Distributed Systems, 2010, pp. 55-58.
	552
	553	\bibitem{GWF} http://gwa.ewi.tudelft.nl/
	554
	555	\bibitem{SLURM} https://computing.llnl.gov/linux/slurm/
	556
	557	\bibitem{SWF} Parallel Workload Archive, http://www.cs.huji.ac.il/labs/parallel/workload/
	558
	559	\bibitem{TORQUE} http://www.adaptivecomputing.com/products/open-source/torque/
	560
	561
[593]	562	\end{thebibliography}
	563
	564
	565	\end{document}
	566
	567	%%
	568	%% End of file `elsarticle-template-1-num.tex'.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: