ALEA 3 GridSim based Grid Scheduling Simulator

This work concentrates on the design of a system intended for study of advanced scheduling techniques for planning various types of jobs in Grid environment. The solution is able to deal with
common problems of job scheduling in Grids like heterogeneity of jobs and resources, and dynamic
runtime changes such as arrival of new jobs.
Alea Simulator is based on the latest GridSim 5 simulation toolkit which we extended
to provide a simulation
environment that supports simulation of varying Grid scheduling problems. To demonstrate the
features of the Alea environment, we implemented an experimental centralised Grid scheduler
which uses advanced scheduling techniques for schedule generation. By now local search based
algorithms and some policies were tested as well as "classical" queue-based algorithms such as FCFS or Easy Backfilling.
The scheduler is capable to handle dynamic situation when jobs appear in the system during simulation.
In this case generated schedule is changing through time as some jobs are already finished while
the new ones are arriving.
The sources are stored in the Netbeans IDE project format. You can download Netbeans IDE for free
from http://www.netbeans.org.
For the proper function you have to include simjava.jar (re-modified in Jan 2009) which is here.
You will also need gridsim.jar which you can download here.
Usage
Important
When using Alea in your paper or presentation, please use the following citation as an acknowledgement. Thank you!
Dalibor Klusáček and Hana Rudová.
Alea 2 - Job Scheduling Simulator. In proceedings of the 3rd International ICST Conference
on Simulation Tools and Techniques (SIMUTools 2010), ICST, 2010. [
download]
BibTeX format:
@inproceedings{alea2,
author = {Dalibor Klus\'{a}\v{c}ek and Hana Rudov\'{a}},
title = {Alea 2 -- Job Scheduling Simulator},
booktitle = {Proceedings of the 3rd International ICST Conference
on Simulation Tools and Techniques (SIMUTools 2010)},
publisher = {ICST},
year = {2010},
}
Software licence:
This software is the result of the research intent No. 0021622419 (Ministry of Education, Youth and Sports of the Czech Republic) and the grant No. 201/07/0205 (Grant Agency of
the Czech Republic) and this result is consistent with the expected objectives of these projects. The owner of the result is Masaryk University, a public high school, ID: 00216224. Masaryk
University allows other companies and individuals to use this software free of charge and without territorial restrictions under the terms of the
LGPL licence.
This permission is granted for the duration of property rights. This software is not subject to special information treatment according to Act No. 412/2005 Coll., as amended. In case
that a person who will use the software under this license offer violates the license terms, the permission to use the software terminates.
News
20th July 2011 - Alea 3.0 has been released. Newly added methods to simulate runtime estimates. New interfaces to easy add new scheduling algorithm.
2nd October 2009 - Alea 2.1 released. SWF, GWF and MetaCentrum workload format support, new algorithms, Visualization output, ready to use!
20th January 2009 - Alea 2.0 released. New design, better scaling, lots of bugs fixed, GWF support, ready to use!
29th December 2008 - Alea 1.9 available. Workload traces available
20th February 2008 - version 1.5 is GridSim 4.1 compatible
2nd February 2008 - version 1.5 released
28th October 2007 - make sure you use modified simjava.jar when running Alea! [download]
Versions
- Version 3.0
This version brings significantly improved version of Alea 2.1 with following features:
- Newly created interfaces allow straightforward development of new scheduling algorithms (see SchedulingPolicy and OptimizationAlgorithm in package xklusac.algorithms).
- Alea now allows to simulate variously inaccurate job runtime estimates. User or queue-based limits are accompanied by the Feitelson's f-model.
- Conservative Backfilling now supported, including schedule compression algorithm.
- Several minor bugs fixed.
- FAQ section of this webpage has been updated with new information on how to create your own scheduling algorithms.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets are either in GWF, SWF, MWF or PWF file format. Shortened versions of all possible formats
were included in previous (2.1) distribution.
- Version 2.1
This version brings significantly improved version of Alea 2.0 with following features:
- New,
more efficient design of simulation entities (AdvancedSpaceShared policy, JobLoader).
- Support of machine failures and restarts.
- Support of machine and job properties (realistic simulaion).
- New
visualization output (several objectives, direct export to several bitmap formats).
-
Grid Workloads Format (GWF), Satndard Workloads Format (SWF) and MetaCentrum workload format supported!
- New implementations of schedule-based algorithms (EDF, Local-search optimization).
-
Several prepared data-sets from GWA/PWA/MetaCentrum allowing immediate testing of implemented (or your own) scheduling algorithms.
- GridSim 5.0 compatible.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets are either in GWF, SWF, MWF or PWF file format. Also,
shortened versions of all possible formats are included for better understanding and immediate testing.
- Version 2.0
This version brings totally new version of Alea with following features:
- New, more efficient design of simulation entities (JobLoader, Scheduler).
- New simjava.jar fixing serious issues regarding simulation determinism (fixed simjava.jar is distributed with the sources).
- Enhanced scalability (much faster simulation execution).
- Machine co-allocation allowing to execute parallel jobs on more than one machine within one cluster (GridResource).
- Grid Workload Format (GWF) supported!
- New implementations of popular queue-based algorithms (FCFS, EDF, EASY Backfilling, EDF-Backfilling).
- Periodical generation of time-dependant results involving (current utilization, # of waiting or executing jobs).
- Two prepared data-sets from GWF allowing immediate testing of implemented/proposed scheduling algorithms.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets file format is described in the readme.txt.
- Version 1.5
This version fix couple of bugs from previous versions and introduces some new features:
- GridSim 4.1 compatible
- New "fair" mechanism for computing resource usage
- New queue based algorithm Easy Backfilling
- New method for filling gaps (holes) in the schedule to increase resource usage
- Multi-criteria objective functions - maximize the resource usage and the number of jobs that meet their deadline
- Results stored into text files - Copy&Paste into Excell to generate graphs
- 3 example data sets (each containing 2 experiments) are included - each data set (i.e. "Ta_XX" directory) represents different job inter-arrival time.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets file format is described in the readme.txt file so you can create your own.
- Version 1.4
This version fix couple of bugs from previous versions and introduces a lot of new features:
- Queue based algorithms (FCFS, EDF, Easy Backfilling)
- New objective - maximize number of jobs that meet their deadline
- Results stored into text files - easy to generate graphs
- Data sets stored in separate files and directories - each directory represents different job inter-arrival time.
To create your own data sets please read readme.txt file contained in the distribution.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets file format is described in the readme.txt file so you can create your own.
- Version 1.3
This version was used to test Tabu search when scheduling multi-CPU jobs on different nunmber of machines.
Data Sets can be downloaded here. (text file)
- Version 1.2
This version extends version 1.1 and modifies the GridletGenerator method to generate jobs with different
number of required CPUs. The sources contain complete experimental setup with all dispatching rules and algorithms implemented. Preliminary
results are available and shows the performance of implemented scheduling techniques.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Preliminary results when testing total tardiness minimalization can be seen here.
- Version 1.1
This version supports
single CPU and multi CPU jobs and multiple resources consisting of one machine with one or more CPUs.
Each job may have setted up a priority although it is not currently taken into account during scheduling process. This version does not use
job preemption or job migration during scheduling process. The
ResourceInfo.java class contains newly designed methods to
approximate resource makespan. Older methods designed for single-CPU jobs should NOT be used. Please see
documentation for further details.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
- Version 1.0
This version supports single CPU jobs and multiple resources consisting of one machine with one or more CPUs.
This version does not use job preemption or job migration during scheduling process.
Javadoc can be viewed here.
Sources can be downloaded here. (zip archive) | licence
Data Sets can be downloaded here. (text file)
FAQ - Frequently Asked Questions
How to run the simulator?
|
Open the project (i.e., the folder you have downloaded) in the Netbeans IDE and add gridsim.jar and simjava.jar
into the libraries tab. Please do not use standard simjava.jar but the one that is provided with this simulator,
which is here.
If you use the standard simjava.jar the simulation will hang. The new version of simjava.jar is neccessary if
you want to run multiple simulations by using only one run of your
program (1 program contains e.g. 6 different GridSim simulations). Original simjava.jar does not reset its internal variables
after the end of one simulation, so the next run cannot start properly.
Please see the screenshot that shows where to manage the libraries in the Netbeans IDE.
It is also important to make simjava.jar
appear "earlier" or "before" than gridsim.jar in the library folder.
The design and main entities of Alea 2/3 are shown on these two figures.
If you want to enable visualization output run the simulation with "-v" parameter or manually set the
visualize = true; in ExperimentSetup.java.
|
|
|
|
How to add new scheduling algorithm? (Alea 3.0)
This is generally very simple procedure. We use two interfaces for this:
SchedulingPolicy and optionally
OptimizationAlgorithm.
These interfaces are at package
xklusac.algorithms together with existing implemntations of popular scheduling algorithms.,br>
In Alea,
scheduling policy is used to handle job arrival and to select job for execution. Therefore it contains two methods:
addNewJob(GridletInfo gi) and
selectJob(). When you want to create your own algorithm you need to:
- create a new class that implements
SchedulingPolicy interface. Rather than describing it - take a look at FCFS.java and you will easily
see how it works. Please use javadoc documentation when facing problems.
For FCFS, the code is following (see the comments to understand the functionality):
public void addNewJob(GridletInfo gi) {
double runtime1 = new Date().getTime();
// add job at the end of job queue
Scheduler.queue.addLast(gi);
// update the internal counter that measures the total algorithm
// execution time (how long scheduling algorithm executed)
Scheduler.runtime += (new Date().getTime() - runtime1);
}
public int selectJob() {
int scheduled = 0;
ResourceInfo r_cand = null;
// main cycle that goes through the job queue
for (int i = 0; i < Scheduler.queue.size(); i++) {
// get the first job in the queue
GridletInfo gi = (GridletInfo) Scheduler.queue.get(i);
// test if some resource is free to run this job (gi)
for (int j = 0; j < Scheduler.resourceInfoList.size(); j++) {
ResourceInfo ri = (ResourceInfo) Scheduler.resourceInfoList.get(j);
// tests whether job gi can run now on resource ri.
// Job properties as well as number of CPUs have to be tested.
if (Scheduler.isSuitable(ri, gi) && ri.getNumFreePE() >= gi.getNumPE()) {
r_cand = ri;
// we have found a free resource so break the inner search cycle.
break;
}
}
if (r_cand != null) {
// remove job from queue
gi = (GridletInfo) Scheduler.queue.remove(i);
// update dynamic Alea's information concerning job-machine mapping
r_cand.addGInfoInExec(gi);
gi.setResourceID(r_cand.resource.getResourceID());
// submit job on the found resource
scheduler.submitJob(gi.getGridlet(), r_cand.resource.getResourceID());
// send the Scheduler information about succesfull job start
scheduler.sim_schedule(GridSim.getEntityId("Alea_3.0_scheduler"),
0.0, Scheduler.GridletWasSent, gi);
r_cand = null;
// decrease the counter of the outer loop so that first job
// is selected in the next iteration
i--;
} else {
return scheduled;
}
}
return scheduled;
}
- Optimization algorithms are used to optimize schedule (not queues) and there are currently two local
search-based procedures:
RandomSearch.java and GapSearch.java.
When you want to create your own method you need to implement the OptimizationAlgorithm interface.
- If you want to implement your own objective function you will probalbly have to add functionality into
CommonObjectives.java class.
It contains methods that are able to calculate expected tardiness, expected makespan, expected number of delayed jobs, machine usage, avg. slowdown, response time, wait time, etc.
These methods are usefull when implementing some complex objective function, e.g., when comparing two different schedules.
If you need different kind of information you have to implement your own methods in CommonObjectives.java.
- Finally, you have to add your new scheduling algorithm into the
ExperimentSetup.java class so that it loads properly during simulation.
Lets assume that your new policy is in MyPolicy.java and that it is identified as "alg == 3" in ExperimentSetup.java.
An example of how "MyPolicy" is added is shown here:
// MyPolicy is the fourth algorithm in the algorithms[] list
int algorithms[] = {0, 1, 2, 3, ...}
// select which algorithms from the algorithms[] list will be used.
for (int sel_alg = 3; sel_alg <= 3; sel_alg++) {
// get proper algorithm
int alg = algorithms[sel_alg];
// only fourth algorithm is selected now... and it is MyPolicy...
...
// this will set up the proper algorithm according to the algorithms[] list
if (alg == 0) {
policy = new FCFS(scheduler);
suff = "FCFS";
}
if (alg == 1) {
policy = new EDF(scheduler);
suff = "EDF";
}
...
if (alg == 3) {
// create instance of MyPolicy
policy = new MyPolicy(scheduler);
suff = "MyPolicy";
}
...
And that is all! Have fun.
How to use the ExperimentSetup.java?
ExperimentSetup.java is the main class where you specify all important parameters of the simulation.
The setup is manual - you have to update proper parameters (i.e. variables). But do not worry, all are explained in the comments.
The most important is to select data set ("metacentrum" preselected) and algorithms (0,..,11) preselected.
Next you can decide to use failure traces, runtime estimates, Feitelson's f-model, ... Just use the commented variables.
Be carefull when simulating failures. Alea uses real traces and does not support "synthetically on the run created" failures.
Therefore, there must be a file that ends with *.failures in the data-set directory. See e.g., the file for metacentrum.mwf data set (metacentrum.mwf.failures).
Its format is simple, each line has following meaning:
failure_start(epoch) cluster_name failure_duration(seconds) failed_machines(machine_IDs)
For example:
1230768000 cluster_12 10753754 0 1 2 3 4 5 6 7
1234254529 cluster_8 17768 2 3 4 5
Setting up system (machine description):
The system is described by *.machines file. It can look like this:
0 interactive 9 8 1
1 batch1 171 8 1
2 batch2 7 32 1
The meaning is following:
The system consists of 3 clusters (interactive, batch1, batch2)
Those clusters have following parameters
- first cluster: 9 machines, 8 CPUs per machine, speed 1 (depicted as "9 8 1")
- second cluster: 171 machines, 8 CPUs per machine, speed = 1
- third cluster: 7 machines, 32 CPUs per machine, speed = 1
Licence
This software is the result of the research intent No. 0021622419 (Ministry of Education, Youth and Sports of the Czech Republic) and the grant No. 201/07/0205 (Grant Agency of
the Czech Republic) and this result is consistent with the expected objectives of these projects. The owner of the result is Masaryk University, a public high school, ID: 00216224. Masaryk
University allows other companies and individuals to use this software free of charge and without territorial restrictions under the terms of the LGPL licence.
This permission is granted for the duration of property rights. This software is not subject to special information treatment according to Act No. 412/2005 Coll., as amended. In case
that a person who will use the software under this license offer violates the license terms, the permission to use the software terminates.
Original GridSim Toolkit is protected by the GPL licence.
2006 - 2011 | Dalibor Klusacek | xklusac(at)fi.muni.cz | Last modified:
|