Configuring the arbtt categorizer

Once arbtt-capture is running, it will start recording, without further configuration. The configuration is only needed to do an analysis of the recorded data. Thus, if you improve your categorization later, it will apply even to previous data samples!

The configuration file needs to be placed in ~/.arbtt/categorize.cfg. An example file is included in the source distribution, and reproduced here as Example 1, “A complete categorize.cfg, which should be more enlightening than this rather formal description.

Example 1. A complete categorize.cfg

-- This defines some aliases, to make the reports look nicer:
aliases (
	"sun-awt-X11-XFramePeer"  -> "java",
	"sun-awt-X11-XDialogPeer" -> "java",
	"sun-awt-X11-XWindowPeer" -> "java",
        "gramps.py"               -> "gramps"
	)

-- A rule that probably everybody wants. Being inactive for over a minute
-- causes this sample to be ignored by default.
$idle > 60 ==> tag inactive,

-- Simple rule that just tags the current program
tag Program:$current.program,

-- I'd like to know what evolution folders I'm working in. But when sending a mail,
-- the window title only contains the (not very helpful) subject. So I do not tag
-- necessarily by the active window title, but the title that contains the folder
current window $program == "evolution" &&
any window ($program == "evolution" && $title =~ /^(.*) \([0-9]+/)
  ==> tag Evo-Folder:$1,

-- A general rule that works well with gvim and gnome-terminal and tells me what
-- project I'm currently working on
current window $title =~ m!(?:~|home/jojo)/projekte/(?:programming/(?:haskell/)?)?([^/)]*)!
  ==> tag Project:$1,
current window $title =~ m!(?:~|home/jojo)/debian!
  ==> tag Project:Debian,

-- My diploma thesis is in a different directory
current window $title =~ m!(?:~|home/jojo)/dokumente/Uni/DA!
  ==> tag Project:DA,
current window $title =~ m!Diplomarbeit.pdf!
  ==> tag Project:DA,
current window $title =~ m!LoopSubgroupPaper.pdf!
  ==> tag Project:DA,

-- Out of curiosity: what percentage of my time am I actually coding Haskell?
current window ($program == "gvim" && $title =~ /^[^ ]+\.hs \(/ )
  ==> tag Editing-Haskell,

-- To be able to match on the time of day, I introduce tags for that as well
$time >=  2:00 && $time <  8:00 ==> tag time-of-day:night,
$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning,
$time >= 12:00 && $time < 14:00 ==> tag time-of-day:lunchtime,
$time >= 14:00 && $time < 18:00 ==> tag time-of-day:afternoon,
$time >= 18:00 && $time < 22:00 ==> tag time-of-day:evening,
$time >= 22:00 || $time <  2:00 ==> tag time-of-day:late-evening,

-- This tag always refers to the last 24h
$sampleage <= 24:00 ==> tag last-day,

The syntax

The file categorize.cfg is a plain text file. Whitespace is insignificant and Haskell-style comments are allowed. A formal grammar is provided in Figure 1, “The formal grammar of categorize.cfg.

Figure 1. The formal grammar of categorize.cfg

[1]Rules::= [ AliasSpec ] Rule ( (, Rule)* | ( ; Rule)* )  
[2]AliasSpec::=aliases ( Alias (, Alias)* )  
[3]Alias::=Literal -> Literal 
[4]Rule::={ Rules } |
Cond ==> Rule | if Cond then Rule else Rule |
tag Tag
 
[5]Cond::=( Cond ) |
! Cond | Cond && Cond | Cond || Cond |
$ StringVariable == String | $ StringVariable /= String |
$ StringVariable =~ RegEx |
$ NumericVariable CmpOp Number |
$ TimeVariable CmpOp TimeSpecification |
$ BooleanVariable |
current window Cond |
any window Cond
 
[6]Tag::= [ Literal : ] Literal  
[7]RegEx::= / Literal / | m c Literal c /* Where c can be any character. */
[8]CmpOp::=<= | < | == | > | >= 
[9]StringVariable::=title | program 
[10]NumericVariable::=idle 
[11]BooleanVariable::=active 
[12]TimeVariable::=time | sampleage 
[13]TimeSpecification::=[ Digit ] Digit : Digit Digit 

A String refers to a double-quoted string of characters, while a Literal is not quoted. Tags may only consist of letters, dashes and underscores, or variable interpolations. A Tag maybe be optionally prepended with a category, separated by a colon. The category itself follows he same lexical rules as the tag. A variable interpolation can be one of the following:

$1, $2,...
will be replaced by the respective group in the last successfully applied regular expression in the conditions enclosing the current rule.
$current.title, $current.program
will be replaced by title the currently active window, resp. by the name of the currently active program. If no window happens to be active, this tag will be ignored.

A regular expression is, like in perl, either enclosed in forward slashes or, alternatively, in any character of your choice with an m (for match) in front. This is handy if you need to use regular expressions that match directory names. Otherwise, the syntax of the regular expressions is that of perl-compatible regular expressions.

The semantics

A data sample consists of the time of recording, the time passed since the user’s last action and the list of windows. Per window, the following information is available:

  • the window title
  • the program name
  • whether the window was the active window

Base on this information, the categorizer will now assign tags to each sample, based on the rules layed out in categorizer.cfg.

The keyword tag, usually wrapped in a condition, assigns the tag to the sample. If the tag also contains a category, it will only be assigned if no other tag of that category is present. This means that for each sample and each category, there can only be one tag of that category. Tags can contain references to matches done of regular expressions in conditions.

The variable $idle contains the idle time of the user, measured in seconds. Usually, it is used to assign the tag inactive, which is handled specially by arbtt-stats, as can be seen in Example 1, “A complete categorize.cfg.

While applying the conditions and rules, the categorizer has a notion of the window in scope, and the variables $title, $program and $active always refer to the window in scope. At first, no window is in scope. Only when evaluating a condition wrapped in current window or any window, this changes.

For current window, the currently active window is in scope. If there is no such window, the condition is false.

For any window, the condition is applied to each window, in turn, and if any of the windows matches, the result is true. If more than one window matches it is not defined from which match the variables $1... are take from.

The variable $time refers to the time-of-day of the sample (i.e. the time since 0:00 that day), while $sampleage refers to the time span from when the sample was created until now, the time of evaluating the statistics. The latter variable is especially useful when passed to the --filter option of arbtt-stats. They can be compared with expressions of the type "hh:mm", for example

$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning