Changes between Version 4 and Version 5 of benchmarks

Show
Ignore:
Timestamp:
11/24/11 15:34:57 (10 years ago)
Author:
bartek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • benchmarks

    v4 v5  
    2020 * queueing system: Torque 2.4.12 + Maui 3.3, 
    2121 * about 800 nodes, 
    22  * about 3-4k tasks present in the system, 
     22 * about 3-4k jobs present in the system, 
    2323 * Maui „RMPOLLINTERVAL”: 3,5 minutes, 
    2424 * for the puropose of the tests, a special partition (WP4) was set aside: 64 cores / 8 nodes - 64 slots, 
     
    7878''Pros:'' 
    7979* The test reflects the natural situation in productive environments: 
    80  * approximately constant number of tasks, 
    81  * "the task flow" (when one task is finished, another begins). 
     80 * approximately constant number of jobs, 
     81 * "the jobs flow" (when one job is finished, another begins). 
    8282* The program may be used to measure the overall capacity of the system. 
    8383 
     
    8686 
    8787==== Plan of the tests ==== 
    88 * 50 tasks x 10 users = 500 tasks, 30 minutes, SLEEP_COEF = 10 
    89 * 100 tasks x 10 users = 1000 tasks, 30 minutes, SLEEP_COEF = 10 
    90 * 200 tasks x 10 users = 2000 tasks, 30 minutes, SLEEP_COEF = 10 
    91 * 400 tasks x 10 users = 4000 tasks, 30 minutes, SLEEP_COEF = 10 
     88* 50 jobs x 10 users = 500 jobs, 30 minutes, SLEEP_COEF = 10 
     89* 100 jobs x 10 users = 1000 jobs, 30 minutes, SLEEP_COEF = 10 
     90* 200 jobs x 10 users = 2000 jobs, 30 minutes, SLEEP_COEF = 10 
     91* 400 jobs x 10 users = 4000 jobs, 30 minutes, SLEEP_COEF = 10 
    9292 
    9393==== Results ==== 
     
    114114== Test 2 - Throughput == 
    115115The test is grounded on the methodology described in the paper [[http://dl.acm.org/citation.cfm?id=1533455|Benchmarking of Integrated OGSA-BES with the Grid Middleware]] and bases on measurement performed from the user perspective of the finish time of the last from N jobs submitted at (almost) the same moment. In addition to the paper, the presented test has utilized also the following elements:  
    116 * submitting the tasks by N processes/users, 
     116* submitting the jobs by N processes/users, 
    117117* using consistent SDK, not the command-line clients, 
    118118* single test environment. 
     
    129129 
    130130==== Results ==== 
    131 * 1 user, 1 thread, 500 tasks: 
     131* 1 user, 1 thread, 500 jobs: 
    132132[[Image(zeus-throughput-500x1-1.png, center, width=640px)]] 
    133 * 1 user, 10 thread, 500 tasks (50x10): 
     133* 1 user, 10 thread, 500 jobs (50x10): 
    134134[[Image(zeus-throughput-50x1-1.png, center, width=640px)]] 
    135 * 10 users, 10 thread, 500 tasks (50x10): 
     135* 10 users, 10 thread, 500 jobs (50x10): 
    136136[[Image(zeus-throughput-50x10-0.png, center, width=640px)]] 
    137 * 10 users, 10 thread, 1000 tasks (10x100): 
     137* 10 users, 10 thread, 1000 jobs (10x100): 
    138138[[Image(zeus-throughput-100-10-0.png, center, width=640px)]] 
    139139 
    140140=== Notes === 
    1411411. The machine where CREAM (gLite) was running had more resources (in particular CPU cores and virtual memory) than the machines with QCG and UNICORE. 
    142 2. ... hovewer this machine was additionally loaded by external tasks (about 500-2000 tasks - the tests were performed by 2 weeks). 
    143 3. QCG returns the job status when the job is already in queueing system, gLite and UNICORE not necessarily. Thus, e.g. in the throughput tests, new tasks appeared after the test finished. 
    144 4. The bottle-neck (especially in the second group of tests) was the throughput of the WP4 partition and Maui, which imposed that only 64 tasks could be scheduled per one scheduling cycle (at least 3.5 minutes). 
     1422. ... hovewer this machine was additionally loaded by external jobs (about 500-2000 jobs - the tests were performed by 2 weeks). 
     1433. QCG returns the job status when the job is already in queueing system, gLite and UNICORE not necessarily. Thus, e.g. in the throughput tests, new jobs appeared after the test finished. 
     1444. The bottle-neck (especially in the second group of tests) was the throughput of the WP4 partition and Maui, which imposed that only 64 jobs could be scheduled per one scheduling cycle (at least 3.5 minutes). 
    145145 
    146146[=#n]