Ignore:
Timestamp:
12/31/12 14:33:01 (12 years ago)
Author:
wojtekp
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

    r731 r732  
    415415\section{Experiments and evaluation}\label{sec:experiments} 
    416416 
    417 TODO - correct, improve, refactor... 
    418  
    419417In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. To this end we evaluate the impact of energy-aware resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. The following sections contain description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies. 
    420418 
     
    449447\subsection{Evaluated applications} 
    450448 
    451 As mentioned, first we carried out a set of tests on the real hardware used as a CoolEmAll testbed to build the performance and energy profiles of applications. Then we applied this data into the simulation environment and used to investigate different approaches to energy-aware resource management. The following applications were taken into account: 
     449As mentioned, first we carried out a set of tests on the real hardware used as a CoolEmAll testbed to build the performance and energy profiles of applications. The following applications were taken into account: 
    452450 
    453451\textbf{Abinit} is a widely-used application for computational physics simulating systems made of electrons and nuclei to be calculated within density functional theory. 
     
    460458 
    461459\textbf{FFTE} benchmark measures the floating-point arithmetic rate of double precision complex one-dimensional Discrete Fourier Transforms of 1-, 2-, and 3-dimensional sequences of length $2^{p} * 3^{q} * 5^{r}$. In our tests only one core was used to run the application. 
     460 
     461 
     462\subsection{Models} 
     463 
     464Based on the measured values we evaluated three types of models that can be applied, among others, to the simulation environment. 
     465 
     466\textbf{Static} 
     467This model refers to the static approach presented in Section~\ref{sec:power}. According to the measured values we created a resource power consumption model that is based on a static definition of resource power usage. With each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. Therefore, the current power usage of the node, can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level. 
     468 
     469\textbf{Dynamic} 
     470This model is combination of Resource load and Application specific approaches presented in Section~\ref{sec:power}. Based on the measured values and referring to the existing models presented in literature, we assumed the following equation: $P = P_{idle} + load*P_{cpubase}*c^{(f-f_{base})/100} + P_{app}$, where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, load is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$- is a current frequency, $f_{base}$- is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application). 
     471 
     472 
     473Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of value means that the application did not run on the given type of node. 
     474\begin {table}[h!] 
     475\centering 
     476\begin{tabular}{lccc} 
     477\hline 
     478Intel I7 & AMD Fusion  & Atom D510  \\ 
     479\hline 
     480 8 & 2 & 1 \\ 
     481\hline 
     482\end{tabular} 
     483\caption {\label{nodeBasePowerUsage} $P_{cpubase}$ values} 
     484\end {table} 
     485 
     486 
     487\begin {table}[h!] 
     488\centering 
     489\begin{tabular}{l|ccc} 
     490\hline 
     491 & \multicolumn{3}{c} {Node type}\\  
     492Application & Intel I7 & AMD Fusion  & Atom D510  \\ 
     493\hline 
     494Abinit & 3.3 &  - &  - \\ 
     495Linpactiny & 2.5 & - & 0.2 \\ 
     496Linpack3gb &  6 &  -  & -  \\ 
     497C-Ray & 4 & 1 & 0.05 \\ 
     498FFT & 3.5 & 2 & 0.1 \\ 
     499Tar & 3 & 2.5 & 0.5 \\ 
     500 
     501\hline 
     502\end{tabular} 
     503\caption {\label{appPowerUsage} $P_{app}$ values} 
     504\end {table} 
     505 
     506 
     507\textbf{Mapping} 
     508In this model we applied the measured values exactly to the power model. Obviously this model is contaminated only with the inaccuracy of the measurements. 
     509         
     510The following table (Table~\ref{expPowerModels}) contains the relative errors of the models with respect to the measured values 
     511\begin {table}[h!] 
     512\centering 
     513\begin{tabular}{llr} 
     514\hline 
     515Static & Dynamic  & Mapping \\ 
     516\hline 
     51713.74 & 10.85 & 0 \\ 
     518\hline 
     519\end{tabular} 
     520\caption {\label{expPowerModels} Power models accuracy} 
     521\end {table} 
     522 
     523For the experimental purposes we decided to use the latter model. Thus, we introduce into the simulation environment exact values obtained within our testbed, to build both the power profiles of applications as well as the application performance models, denoting the their execution times. 
    462524 
    463525 
     
    507569Then we discusses the corresponding results received for workloads with other density level. 
    508570 
    509 \subsubsection{Random approach} 
    510  
    511 The first considered by us policy was the Random strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46,883 kWh, \textbf{workload completion time}: 533 820 s. 
     571\subsubsection{Random approach}  
     572 
     573The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46,883 kWh, \textbf{workload completion time}: 533 820 s. 
    512574Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively. 
    513575 
     
    529591In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the power consumption chart in the Figure~\ref{fig:70rnpm}, switching of unused nodes led to decrease of the total energy consumption. As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36,705 kWh. The overall energy savings reached 22\%.  
    530592 
    531 \subsubsection{Energy optimization} 
    532  
    533 The next two evaluate resource management strategies try to decrease the total energy consumption needed to execute the whole workload taking into account differences in applications and hardware profiles. We tried to match both profiles to find the more energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks has to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usage measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}. 
     593\subsubsection{Energy optimization}  
     594 
     595The next two evaluate resource management strategies try to decrease the total energy consumption (EO) needed to execute the whole workload taking into account differences in applications and hardware profiles. We tried to match both profiles to find the more energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks has to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usage measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}. 
    534596 
    535597\begin {table}[h!] 
     
    569631Estimated \textbf{total energy usage} of the system is 30,568 kWh. As we can see, this approach significantly improved the value of this criterion, comparing to the previous policies. Moreover, the proposed allocation strategy does not worsen the \textbf{workload completion time} criterion, for which the resulting value is equal to 533 820 s. 
    570632 
    571 \subsubsection{Frequency scaling} 
    572  
    573 The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs. The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers. 
     633\subsubsection{Downgrading frequency}  
     634 
     635The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers. 
    574636         
    575637\begin{figure}[h!] 
     
    580642 
    581643 
    582 As we were looking for the trade-off between total completion time and energy usage, we were searching for the workload load level that can benefit from the lower system performance in terms of energy-efficiency. For the frequency downgrading policy, we noticed the improvement on the energy usage criterion only for the workload resulting in 10\% system load. For this threshold we observed that slowdown in task execution does not affect the subsequent tasks in the system and thus total completion time of the whole workload. 
     644As we were looking for the trade-off between total completion time and energy usage, we were searching for the workload load level that can benefit from the lower system performance in terms of energy-efficiency. For the frequency downgrading policy, we noticed the improvement on the energy usage criterion only for the workload resulting in 10\% system load. For this threshold we observed that slowdown in task execution does not affect the subsequent tasks in the system and thus total completion time of the whole workload. T 
    583645         
    584  
    585  
    586 Figure~\ref{fig:dfsComp} shows schedules obtained for Random and DFS strategy.  
     646Figure~\ref{fig:dfsComp} shows schedules obtained for Random and Random + lowest frequency strategy.  
    587647 
    588648 
     
    590650\centering 
    591651\includegraphics[width = 12cm]{fig/dfsComp.png} 
    592 \caption{\label{fig:dfsComp} Schedules obtained for Random strategy (left) and DFS strategy (right) for 10\% of system load} 
    593 \end{figure} 
    594  
    595  
     652\caption{\label{fig:dfsComp} Schedules obtained for Random strategy (left) and Random + lowest frequency strategy (right) for 10\% of system load} 
     653\end{figure} 
     654 
     655\subsection{Discussion} 
    596656The following tables: Table~\ref{loadEnergy} and Table~\ref{loadMakespan} contain the values of evaluation criteria (total energy usage and makespan respectively) gathered for all investigated workloads. 
    597657 
     
    601661\hline 
    602662&  \multicolumn{5}{c}{Strategy}\\ 
    603 Load  & R & R+NPM & EO & EO+NPM & DFS\\ 
     663Load  & R & R+NPM & EO & EO+NPM & R+LF\\ 
    604664\hline 
    60566510\% & 241,337 &        37,811 & 239,667 & 25,571 & 239,278 \\ 
     
    609669\hline 
    610670\end{tabular} 
    611 \caption {\label{loadEnergy} Energy usage [kWh] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, DFS - Dynamic Frequency Scaling} 
     671\caption {\label{loadEnergy} Energy usage [kWh] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, R+LF - Random + lowest frequency} 
    612672\end {table} 
    613673 
     
    617677\hline 
    618678&  \multicolumn{5}{c}{Strategy}\\ 
    619 Load  & R & R+NPM & EO & EO+NPM & DFS\\ 
     679Load  & R & R+NPM & EO & EO+NPM & R+LF\\ 
    620680\hline 
    62168110\% & 3 605 428 & 3 605 428 & 3 605 428 & 3 605 428 & 3 622 968 \\ 
     
    625685\hline 
    626686\end{tabular} 
    627 \caption {\label{loadMakespan} Makespan [s] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, DFS - Dynamic Frequency Scaling} 
     687\caption {\label{loadMakespan} Makespan [s] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, R+LF - Random + lowest frequency} 
    628688\end {table} 
    629689 
    630 Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. 
     690Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, reefers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion, can not be compensate by the very little energy savings derived from the lower operating state of node. 
    631691 
    632692 
Note: See TracChangeset for help on using the changeset viewer.