This document gives some example analyze programs and explains how they function.
The following program will load in a genome sequence, run it through a test CPU, and output the information about it in a couple of formats.
VERBOSE LOAD_SEQUENCE rmzavcgmciqqptqpqcpctletncogcbeamqdtqcptipqfpgqxutycuastttva RECALCULATE DETAIL detail_test.dat fitness merit gest_time length viable sequence TRACE PRINT
This program starts off with the "VERBOSE" command so that avida will print to the screen all of the details about what is going on as it runs the analyze script; I recommend you begin all of your programs this way for debugging purposes. The program then uses the LOAD_SEQUENCE command to allow the user to enter a specific genome sequence in its compressed format. This will translate the genome into the proper genotype as long as you are using the correct instruction set file, since that file determines the mappings of letters to instructions).
The RECALCULATE command places the genome sequence into a test CPU, and determines its fitness, merit, gestation time, etc. so that the DETAIL command that follows it can have access to all of this information as it prints it to the file "detail_test.dat" (its first argument). The TRACE and PRINT commands will then print individual files about this genome, the first tracing its execution line-by-line, and the second summarizing all sorts of statistics about it and displaying the genome. Since no directory was specified for these commands, "genebank/" is assumed, and the filenames are "org-S1.trace" and "org-S1.gen". If a genotype has a name when it is loaded, that name will be kept, but if it doesn't, it will be assigned a name starting at org-S1, then org-S2, and so on counting higher. The TRACE and PRINT commands add their own suffixes to the genome's name to determine the filename they will be printed as.
Often, you will want to run the same section of analyze code with multiple different "inputs" each time through, or else you might simply want a single value to be easy to change throughout the code. To facilitate such programming practices, variables are available in analyze mode that can be altered for each repitition through the code.
There are actually several types of variables, all of which are a single letter of number. For a command that requires a variable name as an input, you simply put that variable where it is requested. For example, if you were going to set the variable i to be equal to the number 12, you would type:
SET i 12
But later on in the code, how does avida know when you type an i if you really want the letter 'i' there, or if you prefer the number 12 to be there? To distinguish these cases, you must put a dollar sign ("$") before a variable wherever you want it to be translated to its value instead of just using the variable name itself.
There are a few different commands that allow you to manipulate a variable's value, and sometimes execute a section of code multiple times based off of each of the possible values. Here is one example:
FORRANGE i 100 199 SET d /home/charles/dev/avida/runs/evo-neut/evo_neut_$i PURGE_BATCH LOAD_DETAIL_DUMP $d/detail_pop.100000 RECALCULATE DETAIL $d/detail.dat update length fitness sequence END
The FORRANGE command runs the contents of the loop once for each possible value in the range, setting the variable i to each of these values in turn. Thus the first time through the loop, 'i' will be equal to the value '100', then '101', '102', all the way up to '199'. In this particular case, we have 100 runs (numbered 100 through 199) that we want to work with.
The first thing we do once we're inside the loop is set the value of the variable 'd' to be the name of the directory we're going to be working with. Since this is a long directory name, we don't want to have to type it over every time we need it. If we set it to the variable d, then all we need to do is type '$d' in the future, and it will be translated to the full name. Note that in this case we are setting a variable to a string instead of a number; that's just fine and avida will figure out how to handle it properly. This directory we are working with will change each time through the loop, and that it is no problem to use one variable as part of setting another.
After we know what directory we are using, we run a PURGE_BATCH to get rid of all of the genotypes from the last time through the loop (lest we just keep building up more and more genotypes in the current batch) and then we refill the batch by using LOAD_DETAIL_DUMP to load in all of the genotypes saved in the file "detail_pop.100000" within our chosen directory. The RECALCULATE command runs all of the genotypes through a test CPU so we have all the statistics we need, and finally DETAIL will print out the stats we want to the file "detail.dat", again placing it in the proper directory. The END command signifies the end of the FORRANGE loop.
Quite often, the portion of an avida run that we will be most interested in is the lineage from the final dominant genotype back to the original ancestor. As such, there are tools in avida to get at this information.
FORRANGE i 100 199 SET d /home/charles/dev/avida/runs/evo-neut/evo_neut_$i PURGE_BATCH LOAD_DETAIL_DUMP $d/detail_pop.100000 LOAD_DETAIL_DUMP $d/historic_dump.100000 FIND_LINEAGE num_cpus RECALCULATE DETAIL lineage.$i.html depth parent_dist length fitness html.sequence END
This program looks very similar to the last one. The first four lines are actually identical, but after loading the detail dump at update 100,000, we also want to load the historic dump from the same time point. A detail file contains all of the genotypes that were currently alive in the population at the time it was printed, while the historic files contain all of the genotypes that are direct ancestors of those that were still alive. The combination of these two files gives us the lineages of the entire population back to the original ancestor. Since we are only interested in a single lineage, the next thing we do is run the FIND_LINEAGE command to pick out a single genotype, and discard everything else except for its lineage. In this case, we pick the genotype with the highest abundance (the most virtual CPUs associated with it) at the time of printing.
As before, the RECALCULATE command gets us any additional information we may need about the genotypes, and then we print that information to a file using the DETAIL command. The filenames that we are using this time have the format "lineage.$i.html", so they are all being written to the current directory with filenames that incorporate the run number right in them. Also, because the filename ends in the suffix '.html', Avida knows to print the file in a proper html format. Note that the specific values that we choose to print take advantage of the fact that we have a lineage (and hence measured things like the genetic distance to the parent) and are in html mode (and thus can print the sequence using colors to specify where exactly mutations occurred).
In analyze mode, we can load genotypes into multiple batches and we then operate on a single batch at a time. So, for example, if we wanted to only consider the dominant genotypes at time points 100 updates apart, but all we had to work with were the detail files (containing all genotypes at each time point) we might write a program like:
SET d /home/charles/avida/runs/mydir/here-it-is SET_BATCH 0 FORRANGE u 100 100000 100 # Cycle through updates PURGE_BATCH # Purge current batch (0) LOAD_DETAIL_DUMP $d/detail_pop.$u # Load in the population at this update FIND_GENOTYPE num_cpus # Remove all but most abundant genotype DUPLICATE 0 1 # Duplicate batch 0 into batch 1 END SET_BATCH 1 # Switch to batch 1 RECALCULATE # Recalculate statistics... DETAIL dom.dat fitness sequence # Print info for all dominants!
This program is slightly more complicated than the others, so I added in comments directly inside it. Basically, what we do here is use batch 0 as our staging area where we load the full detail dumps into, strip them down to only the single most abundant genotype, and then copy that genotype over into batch one. By the time we're done, we have all of the dominant genotypes inside of batch one, so we can print anything we need right from there.
One really useful feature that I have added to the analyze mode is the ability for the user to construct a variety of their own commands without modifying the source code. This is done with the FUNCTION command. For example, if you know you will always need a file called "lineage.html" with very specific information in it, you might write a helper command for yourself as follows:
FUNCTION MY_HTML_LINEAGE # arg1=run_directory PURGE_BATCH LOAD_DETAIL_DUMP $1/detail_pop.100000 LOAD_DETAIL_DUMP $1/historic_dump.100000 FIND_LINEAGE num_cpus RECALCULATE DETAIL $1/lineage.html depth parent_dist length fitness html.sequence END
This works identically to how we found lineages and printed their data in the section above. Only this time, it has created the new command called "MY_HTML_LINEAGE" that you can use anytime thereafter. Arguments to functions work similar to variables, but they are numbers instead of letters. Thus $1 translates to the first arguments, $2 becomes the second, and so on. You are limited to 9 arguments at this point, but that should be enough for most tasks. $0 is the name of the function you are running, in case you ever need to use that.
You may be interested in also using functions in conjunction with the SYSTEM command. Anything you type as arguments to this command gets run on the command line, so you can make functions to do anything that could otherwise be done were you at the shell prompt. For example, imagine that you were going to use a lot of compressed files in your analysis that you would first need to uncompress. You might right a function like:
FUNCTION UNZIP # Arg1=filename SYSTEM gunzip $1 END
This is a shorter example than you might typically want to write a function
for, but it does get the point across. This would allow you to just type
"UNZIP
Functions are particularly useful in conjunction with the INCLUDE command.
You can create a file called something like "my_functions.cfg" in
your avida work directory, define a bunch of functions there, and then start
all of your analyze.cfg files with the line:
and you will have access to all of your functions thereafter. Ideally, as
this language becomes more flexible, so will your ability to
create functions within the language, so you will be able to develop
flexible and useful libraries for yourself.
Here are a couple of example problems you can try to see how well you can
use analyze mode. These should get you used to working with it for future
projects.
Problem 1. A detail file in avida contains one line associated with
each genotype, in order from the most abundant to the least. Currently,
the LOAD_DETAIL_DUMP command will load the entire file's worth of
genotypes into the current batch, but what if you only wanted the top
few? You should write a function called "LOAD_DETAIL_TOP" that takes
two arguments. The first ($1) is the name file that needs to be loaded
in (just as in the original command), and the second is the number of
lines you want to load.
The easiest way to go about doing this is by using the SYSTEM command
along with the Unix command "head" which will output the very
top of a file. If you typed the line:
The file "my_temp_file" would be created, and its contents would
be the first 42 lines of detail_pop.1000. So, what you need this
function to do is create a temporary file with proper number of lines from
the detail file in it, load that temp file into the current batch, and
then delete the file (using the rm command).
Warning: be very careful with the automated deletions -- you don't
want to accidentally remove something that you really need! I recommend
that you use the command "rm -i" until you finish debugging. This
problem may end up being a little tricky for you, but you should be able to
work your way through it.
Problem 2. Now that you have a working LOAD_DETAIL_TOP command, you
can run "LOAD_DETAIL_TOP
INCLUDE my_functions.cfg
Try it Out...
head -42 detail_pop.1000 > my_temp_file
Project hosted by: