Context Navigation

← Previous Change
Next Change →

elsarticle-DCWoRMS.tex

Timestamp:

12/31/12 14:33:01 (12 years ago)

Author:

wojtekp

Message:

File:

: 1 edited

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex (modified) (12 diffs)

Legend:

: Unmodified
: Added
: Removed

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

r731	r732
415	415	\section{Experiments and evaluation}\label{sec:experiments}
416	416
417		~~TODO - correct, improve, refactor...~~
418
419	417	In this section, we present computational analysis that were conducted to emphasize the role of modelling and simulation in studying computing systems performance. To this end we evaluate the impact of energy-aware resource management policies on overall energy-efficiency of specific workloads on heterogeneous resources. The following sections contain description of the used system, tested application and the results of simulation experiments conducted for the evaluated strategies.
420	418
…	…
449	447	\subsection{Evaluated applications}
450	448
451		As mentioned, first we carried out a set of tests on the real hardware used as a CoolEmAll testbed to build the performance and energy profiles of applications. ~~Then we applied this data into the simulation environment and used to investigate different approaches to energy-aware resource management.~~ The following applications were taken into account:
	449	As mentioned, first we carried out a set of tests on the real hardware used as a CoolEmAll testbed to build the performance and energy profiles of applications. The following applications were taken into account:
452	450
453	451	\textbf{Abinit} is a widely-used application for computational physics simulating systems made of electrons and nuclei to be calculated within density functional theory.
…	…
460	458
461	459	\textbf{FFTE} benchmark measures the floating-point arithmetic rate of double precision complex one-dimensional Discrete Fourier Transforms of 1-, 2-, and 3-dimensional sequences of length $2^{p} * 3^{q} * 5^{r}$. In our tests only one core was used to run the application.
	460
	461
	462	\subsection{Models}
	463
	464	Based on the measured values we evaluated three types of models that can be applied, among others, to the simulation environment.
	465
	466	\textbf{Static}
	467	This model refers to the static approach presented in Section~\ref{sec:power}. According to the measured values we created a resource power consumption model that is based on a static definition of resource power usage. With each node power state, understood as a possible operating state (p-state), we associated a power consumption value that derives from the averaged values of measurements obtained for different types of application. Therefore, the current power usage of the node, can be expressed as: $P = P_{idle} + P_{f}$ where $P$ denotes power consumed by the node, $P_{idle}$ is a power usage of node in idle state and $P_{f}$ stands for power usage of CPU operating at the given frequency level.
	468
	469	\textbf{Dynamic}
	470	This model is combination of Resource load and Application specific approaches presented in Section~\ref{sec:power}. Based on the measured values and referring to the existing models presented in literature, we assumed the following equation: $P = P_{idle} + loadP_{cpubase}c^{(f-f_{base})/100} + P_{app}$, where $P$ denotes power consumed by the node executing the given application, $P_{idle}$ is a power usage of node in idle state, load is the current utilization level of the node, $P_{cpubase}$ stands for power usage of fully loaded CPU working in the lowest frequency, $c$ is the constant factor indicating the increase of power consumption with respect to the frequency increase $f$- is a current frequency, $f_{base}$- is the lowest available frequency within the given CPU and $P_{app}$ denotes the additional power usage derived from executing a particular application).
	471
	472
	473	Table~\ref{nodeBasePowerUsage} and Table~\ref{appPowerUsage} contain values of $P_{cpubase}$ and $P_{app}$, respectively, obtained for the particular application and resource architectures. Lack of value means that the application did not run on the given type of node.
	474	\begin {table}[h!]
	475	\centering
	476	\begin{tabular}{lccc}
	477	\hline
	478	Intel I7 & AMD Fusion & Atom D510 \\
	479	\hline
	480	8 & 2 & 1 \\
	481	\hline
	482	\end{tabular}
	483	\caption {\label{nodeBasePowerUsage} $P_{cpubase}$ values}
	484	\end {table}
	485
	486
	487	\begin {table}[h!]
	488	\centering
	489	\begin{tabular}{l\|ccc}
	490	\hline
	491	& \multicolumn{3}{c} {Node type}\\
	492	Application & Intel I7 & AMD Fusion & Atom D510 \\
	493	\hline
	494	Abinit & 3.3 & - & - \\
	495	Linpactiny & 2.5 & - & 0.2 \\
	496	Linpack3gb & 6 & - & - \\
	497	C-Ray & 4 & 1 & 0.05 \\
	498	FFT & 3.5 & 2 & 0.1 \\
	499	Tar & 3 & 2.5 & 0.5 \\
	500
	501	\hline
	502	\end{tabular}
	503	\caption {\label{appPowerUsage} $P_{app}$ values}
	504	\end {table}
	505
	506
	507	\textbf{Mapping}
	508	In this model we applied the measured values exactly to the power model. Obviously this model is contaminated only with the inaccuracy of the measurements.
	509
	510	The following table (Table~\ref{expPowerModels}) contains the relative errors of the models with respect to the measured values
	511	\begin {table}[h!]
	512	\centering
	513	\begin{tabular}{llr}
	514	\hline
	515	Static & Dynamic & Mapping \\
	516	\hline
	517	13.74 & 10.85 & 0 \\
	518	\hline
	519	\end{tabular}
	520	\caption {\label{expPowerModels} Power models accuracy}
	521	\end {table}
	522
	523	For the experimental purposes we decided to use the latter model. Thus, we introduce into the simulation environment exact values obtained within our testbed, to build both the power profiles of applications as well as the application performance models, denoting the their execution times.
462	524
463	525
…	…
507	569	Then we discusses the corresponding results received for workloads with other density level.
508	570
509		\subsubsection{Random approach}
510
511		The first considered by us policy was the Random strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46,883 kWh, \textbf{workload completion time}: 533 820 s.
	571	\subsubsection{Random approach}
	572
	573	The first considered by us policy was the Random (R) strategy in which tasks were assigned to nodes in random manner with the reservation that they can be assigned only to nodes of the type which the application was possible to execute on and we have the corresponding value of power consumption and execution time. The Random strategy is only the reference one and will be later used to compare benefits in terms of energy efficiency resulting from more sophisticated algorithms. Criteria values are as follows: \textbf{total energy usage}: 46,883 kWh, \textbf{workload completion time}: 533 820 s.
512	574	Figure~\ref{fig:70r} presents the energy consumption, load of the system and obtained schedule, respectively.
513	575
…	…
529	591	In this version of experiment we neglected additional cost and time necessary to change the power state of resources. As can be observed in the power consumption chart in the Figure~\ref{fig:70rnpm}, switching of unused nodes led to decrease of the total energy consumption. As expected, with respect to the makespan criterion, both approaches perform equally reaching \textbf{workload completion time}: 533 820 s. However, the pure random strategy was significantly outperformed in terms of energy usage, by the policy with additional node power management with its \textbf{total energy usage}: 36,705 kWh. The overall energy savings reached 22\%.
530	592
531		\subsubsection{Energy optimization}
532
533		The next two evaluate resource management strategies try to decrease the total energy consumption needed to execute the whole workload taking into account differences in applications and hardware profiles. We tried to match both profiles to find the more energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks has to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usage measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}.
	593	\subsubsection{Energy optimization}
	594
	595	The next two evaluate resource management strategies try to decrease the total energy consumption (EO) needed to execute the whole workload taking into account differences in applications and hardware profiles. We tried to match both profiles to find the more energy efficient assignment. In the first case we assumed that there is again no possibility to switch off unused nodes, thus for the whole time needed to execute workload nodes consume at least power for idle state. To obtain the minimal energy consumption, tasks has to be assigned to the nodes of type for which the difference between energy consumption for the node running the application and in the idle state is minimal. The power usage measured in idle state for three types of nodes is gathered in the Table~\ref{idlePower}.
534	596
535	597	\begin {table}[h!]
…	…
569	631	Estimated \textbf{total energy usage} of the system is 30,568 kWh. As we can see, this approach significantly improved the value of this criterion, comparing to the previous policies. Moreover, the proposed allocation strategy does not worsen the \textbf{workload completion time} criterion, for which the resulting value is equal to 533 820 s.
570	632
571		\subsubsection{~~Frequency scaling}~~
572
573		The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs. The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers.
	633	\subsubsection{Downgrading frequency}
	634
	635	The last considered by us case is modification of the random strategy. We assume that tasks do not have deadlines and the only criterion which is taken into consideration is the total energy consumption. In this experiment we configured the simulated infrastructure for the lowest possible frequencies of CPUs (LF). The experiment was intended to check if the benefit of running the workload on less power-consuming frequency of CPU is not leveled by the prolonged time of execution of the workload. The values of the evaluated criteria are as follows: \textbf{workload completion time}: 1 065 356 s and \textbf{total energy usage}: 77,109 kWh. As we can see, for the given load of the system (70\%), the cost of running the workload that requires almost twice more time, can not be compensate by the lower power draw. Moreover, as it can be observed on the charts in Figure~\ref{fig:70dfs}, the execution times on the slowest nodes (Atom D510) visibly exceeds the corresponding values on other servers.
574	636
575	637	\begin{figure}[h!]
…	…
580	642
581	643
582		As we were looking for the trade-off between total completion time and energy usage, we were searching for the workload load level that can benefit from the lower system performance in terms of energy-efficiency. For the frequency downgrading policy, we noticed the improvement on the energy usage criterion only for the workload resulting in 10\% system load. For this threshold we observed that slowdown in task execution does not affect the subsequent tasks in the system and thus total completion time of the whole workload.
	644	As we were looking for the trade-off between total completion time and energy usage, we were searching for the workload load level that can benefit from the lower system performance in terms of energy-efficiency. For the frequency downgrading policy, we noticed the improvement on the energy usage criterion only for the workload resulting in 10\% system load. For this threshold we observed that slowdown in task execution does not affect the subsequent tasks in the system and thus total completion time of the whole workload. T
583	645
584
585
586		Figure~\ref{fig:dfsComp} shows schedules obtained for Random and DFS strategy.
	646	Figure~\ref{fig:dfsComp} shows schedules obtained for Random and Random + lowest frequency strategy.
587	647
588	648
…	…
590	650	\centering
591	651	\includegraphics[width = 12cm]{fig/dfsComp.png}
592		\caption{\label{fig:dfsComp} Schedules obtained for Random strategy (left) and ~~DFS~~ strategy (right) for 10\% of system load}
593		\end{figure}
594
595
	652	\caption{\label{fig:dfsComp} Schedules obtained for Random strategy (left) and Random + lowest frequency strategy (right) for 10\% of system load}
	653	\end{figure}
	654
	655	\subsection{Discussion}
596	656	The following tables: Table~\ref{loadEnergy} and Table~\ref{loadMakespan} contain the values of evaluation criteria (total energy usage and makespan respectively) gathered for all investigated workloads.
597	657
…	…
601	661	\hline
602	662	& \multicolumn{5}{c}{Strategy}\\
603		Load & R & R+NPM & EO & EO+NPM & ~~DFS~~\\
	663	Load & R & R+NPM & EO & EO+NPM & R+LF\\
604	664	\hline
605	665	10\% & 241,337 & 37,811 & 239,667 & 25,571 & 239,278 \\
…	…
609	669	\hline
610	670	\end{tabular}
611		\caption {\label{loadEnergy} Energy usage [kWh] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, ~~DFS - Dynamic Frequency Scaling~~}
	671	\caption {\label{loadEnergy} Energy usage [kWh] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, R+LF - Random + lowest frequency}
612	672	\end {table}
613	673
…	…
617	677	\hline
618	678	& \multicolumn{5}{c}{Strategy}\\
619		Load & R & R+NPM & EO & EO+NPM & ~~DFS~~\\
	679	Load & R & R+NPM & EO & EO+NPM & R+LF\\
620	680	\hline
621	681	10\% & 3 605 428 & 3 605 428 & 3 605 428 & 3 605 428 & 3 622 968 \\
…	…
625	685	\hline
626	686	\end{tabular}
627		\caption {\label{loadMakespan} Makespan [s] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, ~~DFS - Dynamic Frequency Scaling~~}
	687	\caption {\label{loadMakespan} Makespan [s] for different level of system load. R - Random, R+NPM - Random + node power management, EO - Energy optimization, EO+NPM - Energy optimization + node power management, R+LF - Random + lowest frequency}
628	688	\end {table}
629	689
630		Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits.
	690	Referring to the Table~\ref{loadEnergy}, one should easily note that gain from switching off unused nodes decreases with the increasing workload density. In general, for the highly loaded system such policy does not find an application due to the cost related to this process and relatively small benefits. Another interesting conclusion, reefers to the poor result for Random strategy with downgrading the frequency approach. The lack of improvement on the energy usage criterion for higher system load can be explained by the relatively small or no benefit obtained for prolonging the task execution, and thus, maintaining the node in working state. The cost of longer workload completion, can not be compensate by the very little energy savings derived from the lower operating state of node.
631	691
632	692

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 732 for papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

Legend:

papers/SMPaT-2012_DCWoRMS/elsarticle-DCWoRMS.tex

Download in other formats: