CSCS Home Page UM Home Page



research > lab > documentation > plotting



Simple Plotting and Analysis

This page describes some simple scripts and tools for analyzing and plotting data, e.g., data produced by Swarm programs that are run using the drone experiment management tool.

Note well: These scripts may or may not be in the PATH set for your account on the CSCS machines, and if they are not, you will get an error like:

    command not found

If you have trouble running them as described below, you might need to pre-pend the following directory location in front of the commands:

     /users/rlr/Scripts/

Suppose you have run drone to produce a number of report files (for different RNG seeds) for each of a number of cases (particular parameter settings) in an experiment. For example, suppose you have report files in the directory:

   expdata/experiment-1/NB=128-RMP=0.00 

Each of the report.* files in that case directory is the output from one run of the UM-HeatbugsPlus Swarm program (with a unique seed). They each have a format like:

# Program heatbugsPlus, Version 0.9, run on 02/07/98 at 16:04:12
@begin
reportFileName = report
reportFileNameSuffix = 00
...
@end
##############################################
#       unhappiness
# T    Min   Avg   Max
    0 0.522 0.723 0.947
    1 0.456 0.696 0.942

You can produce a file ("outfile" in this case) that has the average across the runs of the min/avg/max unhappiness for each time step by issuing the command:

getAvgColsOverFiles.pl -c1,2,3 -e@end -v expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* \
       > outfile

The -v parameter tells it to put a few lines of information at the start of outfile to tell you where the data came from. This data can then be read into your favorite plotter, for example.

To get more information about the getAvgColsOverFiles.pl perl script:

 getAvgColsOverFiles.pl -h 

You can also directly produce a plot for this data using xmgrace:

getAvgColsOverFiles.pl -c1,2,3 -e@end expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* | \
     /common/scripts/xmgrace -pipe >& /dev/null &

See the local xmgrace User Guide www pages for more information about using it to enhance your graph, to specify how the graph should look (e.g., labels) via command and parameter files, etc.

In some cases you will want to produce a graph that has lines for a single column of output (i.e., a single measurement variable) from each of the several different runs, e.g., so you can see if there are large differences in the dynamics when the system is started with different RNG seeds. You can do this as follows:

getColsFromFiles.pl -c2 -e@end expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* | \
     /common/scripts/xmgrace -pipe >& /dev/null &

You can also use the getColsFromFiles.pl script to write the data to a file, e.g.,

getColsFromFiles.pl -c2 -e@end expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* > outfile

to right to the file named "outfile".

Note that by default getColsFromFiles.pl extracts data from the first first 3 files that match the pattern (report.* above) in the specified directory. You can have it extract data from different files using the optional -n parameter, e.g.,

getColsFromFiles.pl -c2 -n0-1 -e@end expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* | \
    /common/scripts/xmgrace -pipe 

will extract data from the first 2 files (note they are counted from 0!), and

getColsFromFiles.pl -c2 -n4-6 -e@end expdata/exper-->
	-->iment-1/NB=128-RMP=0.00/report.* | \
     /common/scripts/xmgrace -pipe 

will extract data from the 5th through 7th files that match that pattern. Of course there must be that many files in the data directories for this to work!

NOTE WELL: For more information about any of the scripts describe on this page, you can run them with just the "-h" parameter.
For example, enter: getColsFromFiles.pl -h
Its important to do this, as the new features may be added and older features deprecated (and so some details on this page may become out-of-date).

Another common analysis is to get the average of some value over the last part of a run, e.g., the "equilibrium" value of average unhappiness from a heatbugs run. Note that if one does several runs with different RNG seeds, some simple-to-calculate values of possible interest are:

  1. Average within one run. This might be interpreted as the "equilibrium" value of the measure (assuming it really has settled down).
  2. Standard deviation (SD) associated with the average in one run. This gives an indication of much the measure is varying within a run over the time period measured. A large SD might indicate the value has not settled down to an equilibrium, or it might indicate the value is varying a lot perhaps around a steady mean.
  3. Average of the averages from different runs. This gives an estimate of the mean over runs with different RNG seeds (i.e., different intial conditions and/or orders of chance events).
  4. The SD of the average of the averages. This gives some indicatation as to whether the mean results are sensitive to different RNG seeds.
  5. Average of the SDs from the different runs. This gives some indicatation as to whether the in-run variance is sensitive to different RNG seeds.

These measures for a report file of the type UM-HeatbugsPlus produces are calculated by the getEquilAvgs.pl script. As with the getColsFromFiles.pl and getAvgColsOverFiles.pl, the user must specify the columns to examine, the files to examine, and whether there is a special mark indicating where data starts. The user may also specifiy the fraction of the end of the run over which the calculations are made. For example

getEquilAvgs.pl -c1,2,3 -e@end -v -s.2 expdata/exper-->
	-->iment-1/NB=16-RMP=0.00/report.*

will calculate the above measures for columns 1, 2 and 3 in the named report files, over the last 20% of the runs. Output will look like:

# Equilibrium data         Sun Feb 15 10:20:25 EST 1998
# From last 0.20 of the data (40 pts).
# From directory: expdata/experiment-1/NB=16-RMP=0.00
# FirstFile report.00, numfiles 3
# Column numbers in files: '1 2 3'
# For each file, average (sd) over points in column
# Then print average (sd) over averages across files,
# and average (and sd) of StdDev's across files.

  FileName               Column  1            Column  2            Column  3
--------------    -------------------  -------------------  -------------------
     report.00       0.2364 ( 0.0276)     0.4859 ( 0.0056)     0.7794 ( 0.0102)
     report.01       0.0956 ( 0.0520)     0.4965 ( 0.0086)     0.8185 ( 0.0092)
     report.02       0.1063 ( 0.0593)     0.5069 ( 0.0096)     0.7614 ( 0.0108)
--------------    -------------------  -------------------  -------------------
Avg over files       0.1461 ( 0.0463)     0.4964 ( 0.0079)     0.7864 ( 0.0101)
 SDs over Avgs       0.0784 ( 0.0166)     0.0105 ( 0.0021)     0.0291 ( 0.0008)

One can also get similar numbers for all the sub-directories (cases) in an experiment directory. For example:

~rlr/Scripts/getEquilAvgsFromDirs.pl -c2 -s0.99 -e@end -f"report.*" -dexpda-->
	-->ta/experiment-3

Equilibrium data         Wed Feb 18 08:17:24 EST 1998
From last 0.99 of the data.
From directory: expdata/experiment-3
FilePattern:  report.*
Column numbers in files: 2
For each file, average (sd) over points in column
Then print average (sd) over averages across files,
and average (and sd) of StdDev's across files.

Overall:                         avg of  (sd over   avg of (sd over
                               in-RunAvgs  those)   inRunSDs  those)
        Directory                          Column  2   
-----------------------------    -----------------------------------
e=0.99                               0.211 (  0.016)   0.138 ( 0.030) 
e=0.994                              0.198 (  0.015)   0.148 ( 0.130) 
e=0.996                              0.181 (  0.008)   0.150 ( 0.023) 
e=0.997                              0.179 (  0.007)   0.157 ( 0.071) 
e=0.998                              0.163 (  0.010)   0.154 ( 0.035) 
e=0.999                              0.154 (  0.008)   0.156 ( 0.050) 

Note that the "avg of inRunAvgs" is calculated by taking the mean within each run (each report.* file), and then calculating the average over all those values, along with the standard deviation over that mean of means.

The "avg of inRunSDs" is calculated by first calculating the standard deviation within the part of each run used to calculate the in-RunAvg, and then calculating the average of those (as well as a standard deviation over those values. Thus the inRunSDs measures the "wiggliness" within the runs, and we get an average of those across the runs to see how variable the measure is within runs.

For more information on these scripts, run them with just the -h option.

These perl scripts can all be found with the UM-ExpTools-4 tools for use with Swarm programs. You can look at these scripts:

  getAvgColsOverFiles.pl
  getAvgColsOverDirs.pl
  getEquilAvgs.pl
  getEquilAvgsFromDirs.pl

just by poking them, too!

Plans for the future of these scripts include change them so that one can specify the plot title, the x,y axes labels, and labels and symbols for the individual lines on the scripts, as well as adding a direct-to-printer parameter. What I'd really like to see is a standard in the input report files, in the #-comment lines before the data, which define names for the columns of data, including short names, and long names for display on graphs, so the user doesn't have to specify them.


Updated September 1, 2005