Demos-3/AntPheromones2/Readme-drone.txt
---------------------------------------
This file describes how to use Drone to run experiments
with a program that uses the ModelParameters extension
of the Repast SimModelImpl class.
--------------------------------------------------------------------
Quick Start:
Copy the AntPheromones2 program from /users/rlr/RePast/Demos-3 in the usual
way (see the Readme.txt in that directory).
Make and go into the dir with the drone ctrl file you want to use:
cd /users/YOUR-PATH/AntPheromones2
mkdir Experiments
cp /users/rlr/RePast/Demos-3/AntPheromones2/exp01.ctrl Experiments
cp /users/rlr/RePast/Demos-3/AntPheromones2/exp01.in Experiments
cd Experiments
To see how to set up a drone experiment control file:
emacs exp01.ctrl &
Of course to use it you will want to change the rootDir, the programDir,
the email address and any base parameters and the sweep line
(as described in more detail below). Note that the file
exp01.in
has the "base parameters" section of the exp01.ctrl file. Its a good idea
to make such a X.in file by copy/pasting from your X.ctrl and test it by
bin/batchrun.sh iPFN=exp01.in T=100
to be sure it will run correctly, i.e., to be sure you don't have
mistakes in that part of the ctrl file (wrong parameter names, etc).
You can look in the report.00 file to be sure it ran for the full 100 steps.
Here is the little test i did (*** YOU WILL NEED TO CHANGE DIR NAMES ***):
Make a directory for output from experiments on a scratch disk:
mkdir /scratch/formica0/rlr/Experiments-AntPheromones2
Make a symbolic link from "expdata" in the current directory
to that scratch disk directory:
ln -s /scratch/formica0/rlr/Experiments-AntPheromones2 expdata
Now run the experiment: *** BE SURE to edit and change exp01.ctrl FIRST ***)
/common/scripts/drone-linux exp01.ctrl
Some simple analysis scripts:
Get average distance to source, averaged over last 0.10 of the runs,
and then averaged across the runs, for all the cases in experiment-1:
~rlr/Scripts/getEquilAvgsFromDirs.pl -c1 -esaveRun -s0.1 \
-f"report.\d\d" -d"expdata/experiment-1"
Same, but just for the prmm=0.10 case
~rlr/Scripts/getEquilAvgs.pl -c1 -v -esaveRun -s.2 \
expdata/experiment-1/prmm=0.10/report.??
Plot the average distance vs time, averaged over the runs, rmmm=0.80 vs 1.00 cases:
~rlr/Scripts/getAvgColsOverFiles.pl -c1 -esaveRun \
expdata/experiment-1/prmm=0.80/report.?? | \
/common/scripts/xmgrace -pipe >& /dev/null &
~rlr/Scripts/getAvgColsOverFiles.pl -c1 -esaveRun \
expdata/experiment-1/prmm=1.00/report.?? | \
/common/scripts/xmgrace -pipe >& /dev/null &
Run the above scripts with just -h as a parameter to get more info about them.
==========================================================================
Running experiments
-------------------
Suppose we want to run some experiments with the AntPheromones2 model.
Since the model has stochastic elements, it will give different
results for different RNG (random number generator) seeds, so we will
want to average the results of several runs (10 for exploring, 30 or more
for real experiments).
Suppose we want to know how some aggregate variable (e.g., average
distance to source) varies with some model parameters.
These seem like likely ones to wonder about:
probRandMoveMean (prmm), probRandMoveVar (prmv)
randomMoveMethod (rmm)
diffusionK, evapRate
Lets focus on the first 3 for now, and assume all others
are set to some fixed values for all runs.
Call each combination of parameter values a "case",
e.g., (prmm=0.1,prmv=0,rmm=0) is one case.
Suppose we want to try ("sweep") all combinations of
prmm = 0.0 0.05 0.1 0.2 0.4 0.6 0.8 0.9 1.0
prmv = 0.0 0.05 0.1 0.15
rmm = 0 1
and run it 10 times for each combination. (Well...there is no point
in doing rmm=1 with prmm=prmv=0.)
That is on the order of 9*4*2*10 > 700 runs of the model!!
To do that "by hand" (i.e., start the model, set the parameter values,
run the model, get the final averages from the report file)
would be a lot of drudge work!! How can we make this a bit easier???
============================================================================
Drone -- a tool to run experiments
-----
Drone makes it easy to:
* sweep parameters (1, 2, or more)
* N runs for each case
* organize data into output directories and files
* automatically log experiments
* easy to replicate a whole experiment, case or run
[* run on multiple machines at once -- currently not available]]
From the CSCS Lab documents page, there are pointers to pages
about drone, both the full manual and a quick note.
Basically, you have to tell drone, through a "control file"
- what program to run
- how many runs to do
- where to put the output
- the basic (non-changing parameters)
- any parameters to be varied across cases
- comment for the log
Let's make a directory for all AntPheromones2 experiments, and
copy a drone "control" file into it (this assumes you already made a
AntPheromones2 directory on your account using the copyProject.pl script):
cd AntPheromones2
mkdir Experiments
cp /users/rlr/RePast/Demos-3/AntPheromones2/exp01.ctrl Experiments
cp /users/rlr/RePast/Demos-3/AntPheromones2/exp01.in Experiments
cd Experiments
ll
You should see the control file exp01.ctrl and the extract exp01.in from it.
Now let's edit that control file---the start should look like:
# These parameters MUST be set up by the user
param rootDir = /users/rlr/RePast/Demos-3/AntPheromones2/Experiments
param programDir = /users/rlr/RePast/Demos-3/AntPheromones2/bin
param programName = batchrun.sh
param seriesLogFile = AntPheromones2.log
param emailAddr = rlriolo@umich.edu
param mailUser = 1
The parts you will/may have to change include:
rootDir -- all your simulation output goes in directories
under this directory. So you should set it to
something like:
/users/YOURPATH/AntPheromones2/Experiments
programDir -- the directory that contains the program to run
- change only if you use your own program!
programName -- the program to run
- you don't need to change this, since your copy
of batchrun.sh should run your copy
of the AntPheromones2 program.
NB: check the PROJECTDIR in your batchrun.sh file!
emailAddr -- your email address
numRuns -- 10 runs (or whatever you are doing)
**
** BE SURE to change the rootDir above to point to
** the proper directory in your login space!!
** AND BE SURE to change the email address!
**
Next the file has a set of "base values" that act as defaults,
some of which will be changed for specific cases:
begin input file
5000
100
100
2
0.9
1.0
0.3
100
0.00
0.00
0
end input file
Note the exp01.in file contains just the above section of your ctrl file.
Thus you can test that section, to be sure the parameters and values
do not have misspellings, etc, with this command:
../bin/batchrun.sh iPFN=exp01.in T=100
It should produce report.00 and report.xml.00 files.
(NB: the above command assumes you are in AntPheromones2/Experiments/ dir.)
The next section of the ctrl file specifies the "sweep values"
on a line like this:
sweep probRandMoveMean 0.00 0.10 0.20 0.40 0.60 0.80 0.90 1.00
To do this as a collective experiment, we'll divide up the parameters
to be swept, rather than have each person do them all---the assignment of
what part of the sweep to do will be made at class time.
The basic changes to be made to the ctrl file:
0. Change the rootDir emailAddr lines:
param rootDir = /users/yourUniqName/***AND-SO-ON***
param emailAddr = ***your email****
1. Change the number of runs to do:
param numRuns = 10
2. Some of you will run it with randomMoveMethod 0 and some with 1.
0
3. Some of you will run it with different values of probRandMoveVar
so you will be changing this line as needed:
0.00
The rest of the exp0.ctrl file you can ignore.
>> See the Experiments/experiment-plan.txt (in class) for cases to run.
After you have changed your exp1.ctrl file, you can make a link from your
Experiments/ directory to an area for your experimental output
(be sure you are in your Experiments directory when you do this):
mkdir -p /scratch/formica0/**YourUniqName**/AntPheromones2-expdata
ln -s /scratch/formica0/**YourUniqName**/AntPheromones2-expdata expdata
BE SURE: to replace all of **YourUniqName**
with just your login/uniqName, without the **'s!!.
If you then do the command:
ll
you should see something like this (all on one line):
lrwxrwxrwx 1 yourUniqName users 39 Jul 25 16:26 expdata ->
/scratch/formica0/yourUniqName/AntPheromones2-expdata
in the list of files in your AntPheromones2/Experiments directory.
(The technical details: That just means you have a "symbolic link" from the
directory expdata/ to some space in another part of the CSCS disk space.
However, you can treat that expdata/ directory just as if it was just
another directory in your Experiments/ directory. For details on CSCS
scratch space, see
http://www.cscs.umich.edu/PmWiki/pmwiki.php/Site/GettingStartedScratchSpace
Now you can start the drone run as follows:
/common/scripts/drone-linux exp01.ctrl
You should see something like:
Drone version 970829a
Running on host battistini
Started on Fri Jul 18 13:42:32 2003
Reading control file...
Initializing...
Running experiment experiment-1...
Going into the background...
If you now do the ll ("list files long") command
you should see something like:
total 16
drwxr-xr-x 2 yourUniqName users 4096 Jul 18 17:01 errorlogs
-rw-r--r-- 1 yourUniqName users 4000 Jul 18 17:01 exp1.ctrl
-rw-r--r-- 1 yourUniqName users 4035 Jul 18 16:59 exp1.ctrl~
lrwxrwxrwx 1 yourUniqName users 42 Jul 18 16:34 expdata ->
/scratch/formica0/yourUniqName/AntPheromones2-expdata
drwxr-xr-x 2 yourUniqName users 4096 Jul 18 17:01 logs
Note a few new directories in your Experiments/ directory:
errorlogs/ -- where drone error messages go
logs/ -- where experiment logs go
expdata/ -- where experiment output goes
To see what is inside the expdata/ directory, do
ll expdata/
You should see
experiment-1/ -- this first experiment you are running
if you run more, they will be experiment-2/, etc.
As the experiment proceeds you can see how its doing by
ll expdata/experiment-1
You should see something like this:
total 12
-rw-r--r-- 1 yourUniqName users 225 Jul 26 11:18 experiment-1.in
drwxr-xr-x 2 yourUniqName users 1024 Jul 26 11:46 prmm=0.00/
drwxr-xr-x 2 yourUniqName users 1024 Jul 26 11:46 prmm=0.10/
(Exactly which prmm= directories you have depends on which probRandMoveMean
parameter values you put on the sweep line in the exp1.ctrl file.)
The above means so far it has finished the prmm=0.00 case, and is working
on the prmm=0.10 case. If you do this command:
ll expdata/experiment-1/prmm=0.00
you may see something like:
-rw-r--r-- 1 yourUniqName users 2750 Jul 26 11:46 case.log
-rw-r--r-- 1 yourUniqName users 18192 Jul 26 11:45 report.00
-rw-r--r-- 1 yourUniqName users 18192 Jul 26 11:45 report.01
...and more, with one report.** file for each run you asked it to do.
You also will see files like report.xml.00 -- those are the report
files in xml format.
** NOTE:
If you see any files named stderr.00 (or 01 ...) in a case directory
for which all runs are done, that means you have an error in your ctrl file,
and your experiment is not running correctly.
You can see what error message you have by looking in
that stderr.* file (with cat, more, emacs, vi, ...).
So...eventually...depending on what machine you are on, you will
have a full set of report files for each case.
Drone will send an email to you telling you the job is done
(if you changed the email line in the control file!).
Now....how do we get the data we want out of all those files???
What do we want? At the least, we'd like to know how the average distance
to the pheromone source varies with different parameter combinations.
But....over what part of the run? The beginning? The middle? The end?
For starters, lets ignore the transient dynamics and
just look at the averageD over the end of the runs.
Since the value will vary from step to step (if agents have
randomMoveProb > 0), we will
- get the average of averageD over the last 10% of each run
- average that over all the runs for each case
- compare those (and their StdDevs)
(Note also, the basic values are also the average D over the
populations -- so we have done a lot of aggregation here!!)
How do that? There are many approaches, e.g., read the data
- into excell (or OpenOffice's spreadsheet)
- into matlab, mathematica, R, ...
- "roll your own" scripts (perl, python,...) to process the data
I have written a little script (in perl)
to do the averaging and print it all out.
Here is how you run my script for experiment-1 (assuming you are
in your Experiments/ directory which has the expdata/ directory in it):
/users/rlr/Scripts/getEquilAvgsFromDirs.pl -c1 -esaveRun -s0.1 \
-f"report.\d\d" -d"expdata/experiment-1"
That says: for each run (i.e., each file that matches report.??),
extract the values from the averageD column (column 1, -c1) and
average those over the last 10% of the time steps (-s0.1),
for all the case directories in expdata/experiment-1 .
(The -esaveRun tells it to start looking at lines after the line
that contains that at the start; it ignores all lines that start with # ).
You should see something like this:
Equilibrium data Wed Feb 2 09:23:07 EST 2005
From last 0.10 of the data.
From directory: expdata/experiment-1
FilePattern: report.??
Column numbers in files: 1
For each file, average (sd) over points in column
Then print average (sd) over averages across files,
and average (and sd) of StdDev's across files.
Overall: avg of (sd over avg of (sd over
in-RunAvgs those) inRunSDs those)
Directory Column 1
----------------------------- -----------------------------------
prmm=0.05 2372.142 (2496.468) 128.338 (135.265)
prmm=0.10 2372.153 (2496.456) 128.341 (135.262)
prmm=0.25 2372.195 (2496.412) 128.349 (135.252)
prmm=0.50 2372.343 (2496.257) 128.369 (135.232)
prmm=1.0 2389.692 (2477.969) 128.664 (134.921)
Note that we have, for each case:
- an in-run averageD value for each run, i.e., averages over the last 10%
of the time steps.
- an in-run SD associated with that average: this is an indication
of how much averageD jumps around from step to step in the run
- an average across runs of those in-Run averages (and an SD for those)
- an average across runs of those in-Run SDs (and an SD for those!)
One can then take these values and import them into some
graphing program, e.g., to get a plot of number of regions vs worldSize.
On the CSCS machines, this could be done with xmgrace,
mathematica, matlab, scilab, and so on.
(There are pointers to xmgrace from the CSCS lab doc pages.)
You could also move the data to a different computer (e.g. a wintel
machine) and plot in Excel or whatever.
**** Note:
For the purposes of our collective experiment, please run the
analysis program above, and then send your results
to the CTNG email, so we can archive them all there.
Be sure to include values to indicate the case you have done
(i.e., rmmv, rmm, and anything else you changed).
Given the data collected there, one (or more!) of us
could produce graphs of the data.
NOTE: if you carry out the assignment in AntPheromones2/Readme.txt
to print out the average counts of number of neighbors,
you could use the same tool to extract summaries for that
data, by just naming different columns (the -c parameter).
--------------------------------------------------------------------
Feel free to come up with other experiments.
Please don't run them for too many time steps (eg 5,000)
or too many runs per case, or to many cases.
For example, you could
cp exp01.ctrl exp02.ctrl
and use emacs to edit exp02.ctrl to change some
other base parameters, sweep different values, etc.
In particular, you may find that a transition
occurs between two contiguous sweep values. In that situation,
you would want to sweep finer steps between those parameter
values, to more closely see where the transition occurs
and what its shape is.
============================================================================
Sun Grid Engine -- SGE
SGE (Sun Grid Engine) can be used at CSCS to submit jobs (drone or
others) which will be load-balanced across CPUs on different
workstations using dynamic algorithms to choose the best and most fit
processing power. Jobs can be submitted and viewed from any
workstation in the computing cluster, there is no need to login to
more than one workstation. Please see this page for more details:
http://www.cscs.umich.edu/PmWiki/pmwiki.php/Site/UsingSunGridEngineSGE
=============================================================================
Drone is a tool that can be used with any program that meets
a few constraints with respect to how the program is started,
how parameters are sent to it, and so on.
We have also be able to use Drone with programs that don't directly
meet those constraints by writing "wrapper" scripts that
do process parameters as Drone requires, by transforming
those parameters into the form your program wants and then
running it. For details, see the Drone documentation
off the CSCS web pages:
http://cscs.umich.edu/old/lab/documentation/SwarmStuff/drone-quicknote.html
=============================================================================
In general, you will want to develop some kind of scripts
or use other tools to automate the process of collecting
and analyzing data "across cases" of an experiment.
An alternative approach is to do the multiple runs for the
varying cases all within one run of a program.
I don't favor that approach because:
- it makes the program with the model implementation more complex
- it means you do one long run, and can't divide it up into
runs on many machines
- it means you have to re-implement tools (eg averagers, etc)
within each model you implement
=============================================================================