Ignore:
Timestamp:
12/28/12 09:18:29 (12 years ago)
Author:
wojtekp
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

    r714 r715  
    118118 
    119119In the recent years, the issue of computing infrastructures energy-efficiency has gained great attention. In this paper we present a Data Center Workload and Resource Management Simulator (DCWoRMS) which enables modeling and simulations of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. 
    120 We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements from the real servers.  
    121 To demonstrate DCWoRMS capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. 
     120We discuss methods of power usage modeling available in the simulator. To this end, we compare the results of simulations to measurements from the real servers.  
     121To demonstrate DCWoRMS capabilities we evaluate the impact of several resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. 
    122122 
    123123\end{abstract} 
     
    419419TODO - correct, improve, refactor... 
    420420 
    421 In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. The experiments were first performed on the CoolEmAll testbed to collect all necessary data and then repeated using DCWoRMS tool. Based on the obtained results we studied the impact of popular energy-aware resource management policies on the energy consumption. The following sections contains description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies. 
     421In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. To this end we evaluate the impact of energy-aware resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. The following sections contains description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies. 
    422422 
    423423\subsection{Testbed description} 
    424  
    425424 
    426425To obtain values of power consumption that could be later used in DCWoRMS environment to build the model and to evaluate resource management policies we ran a set of applications / benchmarks on the physical testbed. For experimental purposes we choose the high-density Resource Efficient Cluster Server (RECS) system. The single RECS unit consists of 18 single CPU modules, each of them can be treated as an individual node of PC class. Configuration of our RECS unit is presented in Table~\ref{testBed}.  
     
    443442\hline 
    444443\end{tabular} 
    445 \caption {\label{testBed} CoolEmAll testbed} 
     444\caption {\label{testBed} RECS system configuration} 
    446445\end {table} 
    447446 
     
    456455\textbf{Abinit} is a widely-used application for computational physics simulating systems made of electrons and nuclei to be calculated within density functional theory. 
    457456 
    458 \textbf{C-Ray} is a ray-tracing benchmark that stresses floating point performance of a CPU. The test is configured with the 'scene' file at 16000x9000 resolution. 
     457\textbf{C-Ray} is a ray-tracing benchmark that stresses floating point performance of a CPU. Our test is configured with the 'scene' file at 16000x9000 resolution. 
    459458 
    460459\textbf{Linpack} benchmark is used to evaluate system floating point performance. It is based on the Gaussian elimination methods that solves a dense N by N system of linear equations. 
    461460 
    462 \textbf{Tar} it is a widely used data archiving software [tar]. In the tests the task was to create one compressed file of Linux kernel, which is about 2,3GB size, using bzip2. 
    463  
    464 \textbf{FFTE} benchmark measures the floating-point arithmetic rate of double precision complex one-dimensional Discrete Fourier Transforms of 1-, 2-, and 3-dimensional sequences of length $2^{p} * 3^{q} * 5^{r}$. In our tests only one core was used to run the application 
    465  
     461\textbf{Tar} it is a widely used data archiving software]. In our tests the task was to create one compressed file of Linux kernel (version 3.4), which is about 2,3 GB size, using bzip2. 
     462 
     463\textbf{FFTE} benchmark measures the floating-point arithmetic rate of double precision complex one-dimensional Discrete Fourier Transforms of 1-, 2-, and 3-dimensional sequences of length $2^{p} * 3^{q} * 5^{r}$. In our tests only one core was used to run the application. 
    466464 
    467465 
    468466\subsection{Methodology} 
    469467 
    470 Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores).  To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consist in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values  were used to configure the DCWoRMS environment providing energy and time execution models. 
     468Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consist in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values  were used to configure the DCWoRMS environment providing energy and time execution models. 
    471469Based on the models obtained for the considered set of resources and applications we evaluated a set of resource management strategies in terms of energy consumption needed to execute three workloads varying in load intensity (10\%, 30\%, 50\%, 70\% ). 
    472 To generate a workload we benefited from the DCWoRMS workload generator tool using the following characteristics. 
     470To generate a workload we used the DCWoRMS workload generator tool using the following characteristics. 
    473471 
    474472\begin {table}[ tp] 
     
    495493 & \multicolumn{4}{c}{C-Ray} & uniform - 20\%\\ 
    496494 & \multicolumn{4}{c}{Tar} & uniform - 20\%\\ 
    497  & \multicolumn{4}{c}{Linpack} & uniform - 20\%\\ 
     495 & \multicolumn{4}{c}{Linpac - 3Gb} & uniform - 10\%\\ 
     496 & \multicolumn{4}{c}{Linpac - tiny} & uniform - 10\%\\ 
    498497 & \multicolumn{4}{c}{FFT} & uniform - 20\%\\ 
    499498 
     
    503502\end {table} 
    504503 
    505 Execution time of each application is based on the measurements collected within our testbed. 
    506504In all cases we assumed that  tasks are scheduled and served in order of their arrival (FIFO strategy with easy backfilling strategy).  
    507505 
    508506\subsection{Computational analysis} 
    509507 
    510 In the following section presents the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. Then we discusses the corresponding results received for workloads with other density level. 
    511 The first considered by us policy was the strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms.  
     508The scheduling strategy was evaluated according to two criteria: total energy consumption and total completion time. 
     509In the following section we present the results obtained for the workload with load density equal to 70\% in the light of five resource management and scheduling strategies. Then we discusses the corresponding results received for workloads with other density level. 
     510 
     511The first considered by us policy was the Random strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms.  
    512512 
    513513 
     
    518518\end{figure} 
    519519 
    520 \textbf{total energy usage [kWh]} : 46,883 
    521 \textbf{mean power consumption [W]} : 316,17 
    522 \textbf{workload completion [s]} : 533 820 
     520\textbf{total energy usage}: 46,883 kWh 
     521\textbf{mean power consumption}: 316,17 W 
     522\textbf{workload completion}: 533 820 s 
    523523 
    524524We investigated also the second version of this strategy, which is getting more popular due to energy costs in which unused nodes are switched off to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the the primary one in many HPC centers.  
     
    530530\end{figure} 
    531531 
    532 \textbf{total energy usage [kWh]} : 36,705 
    533 \textbf{mean power consumption [W]} : 247,53 
    534 \textbf{workload completion [s]} : 533 820 
     532\textbf{total energy usage}: 36,705 kWh 
     533\textbf{mean power consumption}: 247,53 W 
     534\textbf{workload completion}: 533 820 s 
    535535 
    536536In this version of experiment we neglected additional cost and time necessary to  change the power state of resources. As expected, switching of unused nodes led to significant decrease of the total energy consumption. The overall savings reached 22\% 
     
    545545\end{figure} 
    546546 
    547 \textbf{total energy usage [kWh]} : 46,305 
    548 \textbf{mean power consumption [W]} : 311,94 
    549 \textbf{workload completion [s]} : 534 400 
     547\textbf{total energy usage}: 46,305 kWh 
     548\textbf{mean power consumption}: 311,94 W 
     549\textbf{workload completion}: 534 400 s 
    550550 
    551551 
     
    560560\end{figure} 
    561561 
    562 \textbf{total energy usage [kWh]} : 30,568 
    563 \textbf{mean power consumption [W]} : 206,15 
    564 \textbf{workload completion [s]} : 533 820 
     562\textbf{total energy usage}: 30,568 kWh 
     563\textbf{mean power consumption}: 206,15 W 
     564\textbf{workload completion}: 533 820 s 
    565565 
    566566The last considered by us case is modification of the one of previous strategies taking into account the energy-efficiency of nodes. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. All the considered workloads have been executed on the testbed configured for three different possible frequencies of CPUs – the lowest, medium and the highest one. The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. 
     
    573573\end{figure} 
    574574 
    575 \textbf{total energy usage [kWh]} : 77,108 
    576 \textbf{mean power consumption [W]} : 260,57 
    577 \textbf{workload completion [s]} : 1 065 356 
     575\textbf{total energy usage}: 77,108 kWh 
     576\textbf{mean power consumption}: 260,57 W 
     577\textbf{workload completion}: 1 065 356 s 
    578578 
    579579 
Note: See TracChangeset for help on using the changeset viewer.