Ignore:
Timestamp:
12/30/12 17:29:34 (12 years ago)
Author:
wojtekp
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

    r721 r722  
    419419TODO - correct, improve, refactor... 
    420420 
    421 In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. To this end we evaluate the impact of energy-aware resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. The following sections contains description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies. 
     421In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. To this end we evaluate the impact of energy-aware resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. The following sections contain description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies. 
    422422 
    423423\subsection{Testbed description} 
     
    446446 
    447447 
    448 The RECS system was chosen due to its heterogeneous platform with very high density and energy efficiency that has a monitoring and controlling mechanism integrated. The built-in and additional sensors allow to monitor the complete testbed at a very fine granularity level without the negative impact of the computing- and network-resources.  
     448The RECS system was chosen due to its heterogeneous platform with very high density and energy efficiency that has a monitoring and controlling mechanism integrated. The built-in and additional sensors allow monitoring the complete testbed at a very fine granularity level without the negative impact of the computing- and network-resources.  
    449449 
    450450 
     
    457457\textbf{C-Ray} is a ray-tracing benchmark that stresses floating point performance of a CPU. Our test is configured with the 'scene' file at 16000x9000 resolution. 
    458458 
    459 \textbf{Linpack} benchmark is used to evaluate system floating point performance. It is based on the Gaussian elimination methods that solves a dense N by N system of linear equations. 
     459\textbf{Linpack} benchmark is used to evaluate system floating point performance. It is based on the Gaussian elimination methods that solve a dense N by N system of linear equations. 
    460460 
    461461\textbf{Tar} it is a widely used data archiving software]. In our tests the task was to create one compressed file of Linux kernel (version 3.4), which is about 2,3 GB size, using bzip2. 
     
    467467 
    468468Every chosen application / benchmark was executed on each type of node, for all frequencies supported by the CPU and for different levels of parallelization (number of cores). To eliminate the problem with assessing which part of the power consumption comes from which application, in case when more then one application is ran on the node, the queuing system (SLURM) was configured to run jobs in exclusive mode (one job per node). Such configuration is often used for at least dedicated part of HPC resources. The advantage of the exclusive mode scheduling policy consist in that the job gets all the resources of the assigned nodes for optimal parallel performance and applications running on the same node do not influence each other. For every configuration of application, type of node and CPU frequency we measure the average power consumption of the node and the execution time. The aforementioned values  were used to configure the DCWoRMS environment providing energy and time execution models. 
    469 Based on the models obtained for the considered set of resources and applications we evaluated a set of resource management strategies in terms of energy consumption needed to execute three workloads varying in load intensity (10\%, 30\%, 50\%, 70\% ). 
     469Based on the models obtained for the considered set of resources and applications we evaluated a set of resource management strategies in terms of energy consumption needed to execute three workloads varying in load intensity (10\%, 30\%, 50\%, 70\%). 
    470470To generate a workload we used the DCWoRMS workload generator tool using the following characteristics. 
    471471 
     
    521521\end{figure} 
    522522 
    523 In the second version of this strategy, which is getting more popular due to energy costs, we switched of unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the the primary one in many HPC centers.  
     523In the second version of this strategy, which is getting more popular due to energy costs, we switched of unused nodes to reduce the total energy consumption. In the previous one, unused nodes are not switched off, which case is still the primary one in many HPC centers.  
    524524 
    525525\begin{figure}[h!] 
     
    573573\subsubsection{Frequency scaling} 
    574574 
    575 The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs. The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that require almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs} the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers 
     575The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs. The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs} the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers 
    576576         
    577577\begin{figure}[h!] 
     
    628628\end {table} 
    629629 
    630 One should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. 
    631  
    632 ... 
     630Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. 
     631 
     632\subsubsection{Summary} 
     633 
    633634 
    634635\section{DCWoRMS application/use cases}\label{sec:coolemall} 
Note: See TracChangeset for help on using the changeset viewer.