This manual is for liblouisxml (version 1.9.1, 18 March 2009), an xml to Braille Translation Library.
This file may contain code borrowed from the Linux screenreader BRLTTY, Copyright © 1999-2009 by the BRLTTY Team.
Copyright © 2004-2009 ViewPlus Technologies, Inc. www.viewplus.com and Copyright © 2006,2009 Abilitiessoft, Inc. www.abilitiessoft.com.
This file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser (or library) General Public License (LGPL) as published by the Free Software Foundation; either version 3, or (at your option) any later version.This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser (or Library) General Public License LGPL for more details.
You should have received a copy of the GNU Lesser (or Library) General Public License (LGPL) along with this program; see the file COPYING. If not, write to the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
liblouisxml is a software component which can be incorporated into software packages to provide the capability of translating any file in the computer lingua franca xml format into properly transcribed braille. This includes translation into grade two, if desired, mathematical codes, etc. It also includes formatting according to a built-in style sheet which can be modified by the user. The first program into which liblouisxml has been incorporated is xml2brl. This program will translate an xml or text file into an embosser-ready braille file. It is not necessary to know xml, because MSWord and other word processors can export files in this format. If the word processor has been used correctly xml2brl will produce an excellent braille file.
There is a Mac GUI application incorporating liblouisxml called louis. For a link to it go to www.abilitiessoft.com/downloads. A similar Windows application is in the works.
Users who want to generate Braille using xml2brl will be interested in Transcribing with the xml2brl program. Those who wish to change the output generated by liblouisxml should read Customization Configuring liblouisxml. If you encounter a type of xml file with which liblouisxml is not familiar you can learn how to tell it how to process that file by reading Connecting with the xml Document. If you wish to implement a new braille mathematics code read Implementing Braille Mathematics Codes. Finally, computer programmers who wish to use liblouisxml in their software can find the information they need in Programming with liblouisxml.
You will also find it advantageous to be acquainted with the companion library liblouis, which is a braille translator and back-translator (see Overview).
At the moment, actual transcription with liblouisxml is done with the command-line (or console) program xml2brl. The line to type is:
xml2brl [OPTIONS] [-f config-file] [infile] [outfile]
The brackets indicate that something is optional. You will see that nothing is required except the program name itself, xml2brl. The various optional parts control how the program will behave, as follows:
backFormat
in the outputFormat
section of the
configuration file. Html files will contain page numbers and emphasis.
To get good html, the liblouis table must have the entry ‘space
\e 1b’ so that it will pass through escape characters. The
html.sem file must also contain the line ‘pagenum
pagenum’. Text output files simply have a blank line between
paragraphs. Encoding of text files is controlled by the
outputEncoding
setting. Html files are always in UTF-8.
xml2brl is set up so that it can be used in a "pipe". To do this, omit both infile and outfile. Input is then taken from the standard input unit.
The first file name encountered (a word not preceded by a minus sign) is taken to be the input file and the second to be the output file. If you wish input to be taken from stdin and still want to specify an output file use two minus signs (‘--’) for the input file.
If only the program name is typed xml2brl assumes that the configuration file is default.cfg, input is from the standard input unit, and output is to the standard output unit.
msword2brl infile outfile
Infile must be a Microsoft Word file. The script first calls the antiword program, so you must have this installed on your machine. antiword is called with -x db, which causes the output to be in docbook format. This is piped to xml2brl. The output file from xml2brl contains much of the formatting, including emphasis, of the word file.
The operation of liblouisxml is controlled by two types of files: semantic-action files and configuration files. The former are discussed in the section Connecting with the xml Document - Semantic-action Files (see Connecting with the xml Document - Semantic-Action Files). The latter are discussed in this section. A third type of file, braille translation tables, is discussed in the liblouis documentation (see Overview). Another section of the present document which may be of interest is Implementing Braille Mathematical Codes (see Implementing Braille Mathematics Codes).
liblouisxml (with liblouis) can be used as the braille transcription component in any number of applications with different overall purposes and user interfaces. However, as of now the principal application is xml2brl, which is a console application for Mac and Linux. (There is also a Mac GUI application called louis.) The information below therefore applies to xml2brl as much as to liblouisxml.
Before discussing configuration files in detail it is worth noting
that the application program has access to the information in the
configuration files by calling the liblouisxml function
lbx_initialize
. This function returns a pointer to a data
structure containing the configuration information.
xml2brl uses the configuration file default.cfg unless a different one is specified via the -f command-line option. The configuration file name may include a full path. In this case, liblouisxml will consider this to be the user path (see Files and Paths). If just a file name (or list) is given, liblouisxml will consider the current directory as the user path.
The configuration "file" specified with the -f option need not be a single filename. It can be several file names separated by commas. Only the first filename may have a path component. This path is taken as the user path, as discussed in the previous paragraph. This file-list feature is also found in liblouis. It enables you to combine configuration files on the command line. For example, a file list may consist of one file specifying the output format used in your establishment, a comma, and then the name of a stylesheet.
After the path, if any, has been evaluated, but before reading any of the files, liblouisxml reads in a file called canonical.cfg. This file specifies values for all possible settings. It is needed to complete the initialization of the program. You may alter the values in the distribution canonical.cfg, but you should not delete any settings. Do not specify canonical.cfg as your configuration file. This will lead to error messages and program termination. If a configuration file read in later contains a particular setting name, the value specified simply replaces the one specified in canonical.cfg.
As you will see by looking at canonical.cfg, it contains four
main sections, outputFormat
, translation
, xml
and
styles
. In addition, a configuration file can contain an
include entry. This causes the file named on that line to be read in
at the point where the line occurs. The sections need not follow each
other in any particular order, nor is the order of settings within
each section important. In this document and in the
canonical.cfg file, where section and setting names consist of
more than one word, the first letter of each word following the
initial one is capitalized. This is merely for readability. The case
of the letters in these names is ignored by the program. Section and
setting names may not contain spaces.
Here, then, is an explanation of each section and setting in the canonical.cfg file. When you look at this file you will see that the section names start at the left margin, while the settings are indented one tab stop. This is done for readability. it has no effect on the meaning of the lines. You will also see lines beginning with a number sign (‘#’), which are comments. Blank lines can also be used anywhere in a configuration file. In general, a section name is a single word or combination of unspaced words. However, each style has a section of its own, so the word ‘style’ is followed by the name of the style. Setting lines begin with the name of the setting, followed by at least one space or tab, followed by the value of the setting. A few settings have two values.
This section specifies the format of the output file (or string, if no file name is given).
cellsPerLine 40
LinesPerPage 25
interpoint no
lineEnd \r\n
pageEnd \f
fileEnd ^z
printPages yes
braillePages yes
paragraphs yes
BeginingPageNumber 1
printPageNumberAt top
braillePageNumberAt bottom
hyphenate no
outputEncoding ascii8
inputTextEncoding ascii8
formatFor textDevice
backFormat plain
backLineLength 70
interline no
lineFill '
This section specifies the liblouis translation tables to be used for various purposes.
literaryTextTable en-us-g2.ctb
uncontractedTable en-us-g1.ctb
compbrailleTable en-us-compbrl.ctb
mathtextTable en-us-mathtext.ctb
MathexpTable nemeth.ctb
editTable nemeth_edit.ctb
interlineBackTable en-us-interline.ctb
This section provides various information for the processing of xml files.
semanticFiles *,nemeth.sem
xmlheader <?xml version='1.0' encoding='UTF8' standalone='yes'?>
entity nbsp ^1
internetAccess yes
newEntries yes
The following sections all deal with styles. Each style has its own
section. Style section names are unlike other section names in that
they consist of the word style, followed by a space, followed by a
style name. More styles may be added as the software develops, and
some may be dropped. New styles currently cannot be defined by the
user, because the styles already defined appear to be adequate. This
feature can be added if needed. There are, however, five utility
styles, style1
through style5
, which the user can employ
in any way.
This section specifies the style of the whole document. The settings given in it are applied to all other styles. If a section for another style is given, the settings in it replace those from the document style for that section. Because the settings in the document style apply to all other styles, if a document style section is given it must precede the sections for all other styles. Since canonical.cfg contains a document style definition, the user may not use this style.
linesBefore 0
This setting gives the number of blank lines which should be left before the text to which this style applies. It is set to a non-zero value for some header styles.
linesAfter 0
The number of blank lines which should be left after the text to which this style applies.
leftMargin 0
The number of cells by which the left margin of all lines in the text should be indented. Used for hanging indents, among other things.
firstLineIndent 0
The number of cells by which the first line is to be indented relative to leftMargin. firstLineIndent may be negative. If the result is less than 0 it will be set to 0.
translate contracted
This setting is currently inactive. It may be used in the future. This setting tells how text in this style should be translated. Possible values are ‘contracted’, ‘uncontracted’, ‘compbrl’, ‘mathtext’ and ‘mathexpr’.
skipNumberLines no
If this setting is ‘yes’ the top and bottom lines on the page will be skipped if they contain braille or print page numbers. This is useful in some of the mathematical and graphical styles.
format leftJustified
The format setting controls how the text in the style will be formatted. Valid values are ‘leftJustified’, ‘rightJustified’, ‘centered’, ‘computerCoded’, ‘alignColumnsLeft’, ‘alignColumnsRight’, ‘listColumns’, ‘listLines’ and ‘contents’. The first three are self-explanatory. ‘computerCoded’ is used for computer programs and similar material. The next three are used for tabular material. ‘alignColumnsLeft’ causes the left ends of columns to be aligned. ‘alignColumnsRight’ causes the right ends of columns to be aligned. ‘listColumns’ causes columns to be placed one after the other, separated by whatever separation character has been specified in the semantic-action file, followed by a space. An escape character (hex 1b) must also be specified to indicate the end of the column. Two escape characters must be specified to indicate the end of a row. Indentation of the lines in a row is controlled by the leftMargin and firstLineIndent settings. ‘listLines’ is similar except that it lists lines, as in poetry stanzas. The semantic-action file must specify two escape characters to indicate the end of a line. ‘contents’ is used only in styles specifically intended for tables of contents.
newPageBefore no
If this setting is ‘yes’, the text will begin on a new page. This is useful for certain mathematical and graphical styles. Page numbers are handled properly.
newPageAfter no
If this setting is ‘yes’ any remaining space on the page after the material covered by this style is handled is left blank, except for page numbers.
rightHandPage no
if this setting is ‘yes’ and interpoint is yes the material covered by this style will start on a right-hand page. This may cause a left-hand page to be left blank except for page numbers. If interpoint is ‘no’ this setting is equivalent to newPageBefore.
This style is used for arithmetic examples in elementary math books. On recognizing this style, the translator formats the material in a special way. This style has no settings different from those of the document style at the moment. Nevertheless, the line ‘style arith’ must be included in canonical.cfg so that it will be set up properly.
This style is used for an attribution following a quotation.
format rightJustified
This style is used for bibliographies. Settings will be added later.
This style is used for picture captions.
This style is used for computer programs.
This style is used to specify where the contents should be placed and the title that should be given to it.
linesBefore 1
linesAfter 1
format centered
This style and the other contents styles are used for the table of contents and correspond to the four heading levels.
firstLineIndent -2
leftMargin 2
format contents
firstLineIndent -2
leftMargin 4
format contents
firstLineIndent -2
leftMargin 6
format contents
firstLineIndent -2
leftMargin 8
format contents
This style is for the dedication of a book.
This is for giving directions for exercises.
This is for showing mathematics that is set off from the text.
leftMargin 2
This if for text that is set off from the rest of the text.
This is the first level in a set of exercises where there are sublevels.
This is for the second level of exercises, such as exercise a following exercise 1.
This is for the third level of exercises.
firstLineIndent 2
Section: style graph
This style reserves space for a graph or other tactile material.
skipNumberLines yes
This style reserves space for the label of a graph.
This style is used for main headings, such as chapter titles.
The first level of subreadings after the main heading.
firstLineIndent 4
The fourth and final level of headings.
firstLineIndent 4
This style is used for indexes. The extra ‘x’ is not an error. It is there to prevent conflict with names elsewhere in the software.
This is for the individual items in a list.
This style causes its contents to be formatted in a way suitable for the representation of matrices.
format alignColumnsLeft
This style is used for braille music.
skipNumberLines yes
This style is used for footnotes.
Paragraph. This is ordinary body text.
firstLineIndent 2
This style is used for quotations that are set off from the rest of the text.
This style is used for a section with a section number.
firstLineIndent 4
This style is used for mathematical material that is arranged spatially, such as large fractions.
this style is used for stanzas in poetry.
This and the subsequent numbered styles can be used by the user for any purpose.
This style is used for subsections with a subsection number.
firstLineIndent 4
This style is used for ordinary tables.
This style is used to begin a title page.
newPageAfter yes
This style is used for transcriber's notes which are set off from the text.
This style is used to indicate the beginning of a braille volume.
When liblouisxml (or xml2brl) processes an xml document, it needs to be told how to use the information in that document to produce a properly translated and formatted braille document. These instructions are provided by a semantic-action file, so called because it explains the meaning, or semantics, of the various specifications in the xml document. To understand how this works, it is necessary to have a basic knowledge of the organization of an xml document.
An xml document is organized like a book, but with much finer detail.
first there is the title of the whole book. Then there are various
sections, such as author, copyright, table of contents, dedication,
acknowledgments, preface, various chapters, bibliography, index, and
so on. Each chapter may be divided into sections, and these in turn
can be divided into subsections, subsubsections, etc. In a book the
parts have names or titles distinguished by capitalization, type
fonts, spacing, and so forth. In an xml document the names of the
parts are enclosed in angle brackets (‘<>’). for example, if
liblouisxml encounters <html>
at the beginning of a document,
it knows it is dealing with a document that conforms to the standards
of the extensible markup language (xhtml) - at least we hope it does.
When you see a book, you know it's a book. The computer can know only
by being told. Something enclosed in angle brackets is called an
"element" (more properly, a "tag") in xml parlance. (There may be more
between the angle brackets than just the name of the element. More of
this later). The first "element" in a document thus tells liblouisxml
what kind of document it is dealing with. This element is called the
"root element" because the document is visualized as branching out
from it like a tree. Some examples of root elements are <html>
,
<math>
, <book>
, <dtbook3>
and
<wordDocument>
. Whenever liblouisxml encounters a root element
that it doesn't know about it creates a new file called a
semantic-action file. The name of this file is formed by stripping the
angle brackets from the root element and adding a period plus the
letters ‘sem’. If you look in a directory containing
semantic-action files you will see names like html.sem,
dtbook3.sem, math.sem, and so on.
Sometimes it is advantageous to preempt the creation of a
semantic-action file for a new root element. For example, an article
written according to the docbook specification may have the root
element <article>
. However, the specification itself has the
root element <book>
. In this case you can specify the
book.sem file in the configuration file by writing, in the xml
section,:
semanticFiles book.sem
You will note that this setting uses the plural of "file". This is because you can actually specify a list of file names separated by commas. You might want to do this to specify the semantic-action file for the particular braille mathematical code to be used. For example:
semanticFiles book.sem,ukmath.sem
As you will see in the next section, different braille style conventions and different braille mathematical codes may require different semantic-action files
liblouisxml records the names of all elements found in the document in the semantic-action file. The document has a multitude of elements, which can be thought of as describing the headings of various parts of the document. One element is used to denote a chapter heading. Another is used to denote a paragraph, Still another to denote text in bold type, and so on. In other words, the elements take the place of the capitalization, changes in type font, spacing, etc. in a book. However, the computer still does not know what to do when it encounters an element. The semantic-action file tells it that.
Consider html.sem. A copy is included as part of this documentation with the name example_sem. It may differ from the file that liblouisxml is currently using. You will see that it begins with some lines about copyrights. Each line begins with a number sign (‘#’). This indicates that it is a "comment", intended for the human reader and the computer should ignore it. Then there is a blank line. Finally, there are two other comments explaining that the file must be edited to get proper output. This is because a human being must tell the computer what to do with each element. The semantic files for common types of documents have already been edited, so you generally don't have to worry about this. But if you encounter a new type of document or wish to specify special handling for styles or mathematics you may have to edit the semantic-action file or send it to the maintainer for editing. In any case the rest of this section is essential for understanding how liblouisxml handles documents and for making changes if the way it does so is not correct.
After another blank line you will see a table consisting of two, and sometimes three, columns. The first column contains a word which tells the computer to do something. For example, the first entry in the table is: ‘include nemeth.sem’. This tells liblouisxml to include the information in the nemeth.sem file when it is deciphering an html (actually xhtml) document (it may be preferable to use the semanticFiles setting in the configuration file rather than an include).
The second row of the table is:
no hr
‘hr’ is an element with the angle brackets removed. It means nothing in itself. However, the first column contains the word ‘no’. This tells liblouisxml "no do", that is, do nothing.
After a few more lines with ‘no’ in the first column, we see one that says:
softreturn br
This means that when the element <br>
is encountered,
liblouisxml is to do a soft return, that is, start a new line without
starting a new paragraph.
The next line says:
heading1 h1
This tells liblouisxml that when it encounters the element <h1>
it is to format the text which follows as a first-level braille
heading, that is, the text will be centered and proceeded and followed
by blank lines. (You can change this by changing the definition of the
heading1 style).
The next line says:
italicx em
This tells liblouisxml that when it encounters the element <em>
it is to enclose the text which follows in braille italic indicators.
The ‘x’ at the end of the semantic action name is there to
prevent conflicts with names elsewhere in the software. Just where the
italic indicators will be placed is controlled by the liblouis
translation table in use.
The next line says:
skip style
This tells liblouis to simply skip ahead until it encounters the
element </style>
. Nothing in between will have any effect on
the braille output. Note the slash (‘/’) before the ‘style’.
This means the end of whatever the <style>
element was
referring to. Actually, it was referring to specifications of how
things should be printed. If liblouisxml had not been told to skip
these specifications, the braille output would have contained a lot of
gobledygook.
The next line says:
italicx strong
This tells liblouis to also use the italic braille indicators for the
text between the <strong>
and </strong>
elements.
After a few more lines with ‘no’ in the first column we come to the line:
document html
This tells liblouisxml that everything between <html>
and
</html>
is an entire document. <html>
was the root
element of this document, so this is logical.
After another ‘no’ line we come to:
para p
liblouisxml will consider everything between <p>
and
</p>
to be a normal body text paragraph.
The next line is:
heading1 title
this causes the title of the document to also be treated as a braille level 1 heading.
Next we have the line:
list li
The xhtml <li>
and </li>
pair of elements is used to
enclose an item in a list. liblouisxml will format this with its own
list style. That is, the first line will begin at the left margin and
subsequent lines will be indented two cells.
Next we have:
table table
You will note that the names of actions and elements are often identical. This is because they are both mnemonic. In any case, this line tells liblouisxml to format the table contained in the xhtml document according to the table formatting rules it has been given for braille output.
Next we have the line:
heading2 h2
This means that the text between <h2>
and </h2>
is to be
formatted according to the Liblouisxml style heading2. A blank line
will be left before the heading and the first line will be indented
four spaces.
After a few more lines we come to:
no table,cellpadding
Note the comma in the second column. This divides the column into two
subcolumns. The first is the table element name. The second is called
an "attribute" in xml. It gives further instructions about the
material enclosed between the starting and ending "tags" of the
element (<table>
and </table>
. Full information requires
three subcolumns. The third is called the value and gives the actual
information. The attribute is merely the name of the information.
Much further down we find:
no table,border,0
Here the element is table, the attribute is border and the value is 0. If liblouisxml were to interpret this, it would mean that the table was to have a border of 0 width. It is not told to do so because tables in braille do not have borders.
Now let's look at the file which is included at the beginning of the html.sem file. This is nemeth.sem. As with html.sem, a copy is included in the documentation directory with the name example_nemeth.sem , but it is not necessarily the one that liblouisxml is currently using. It illustrates several more things about how liblouisxml uses semantic-action files.
The first thing you will notice is that for quite a few lines the first and second columns are identical. This is because the MathML element and attribute names are part of a standard, and it was simplest to use the element names for the semantic actions as well.
The first line of real interest is:
math math
Every mathematical expression begins with the element <math>
(which may have attributes and values), and ends with </math>
.
This is therefore the root element of a mathematical expression.
However, mathematical expressions are usually part of a document, so
it is not given the semantic action document. The math semantic action
causes liblouisxml to carry out special interpretation actions. These
will become clearer as we continue to look at the nemeth.sem
file. You will note that this line has three columns. The meaning of
the third column is discussed below.
After another uninteresting line we come to two that illustrate several more facts about semantic-action files:
mfrac mfrac ^?,/,^# mfrac mfrac,linethickness,0 ^(,^;%,^)
Like the math entry above, the first line has three columns. While the
first two columns must always be present, the third column is
optional. Here, it is also divided into subcolumns by commas. The
element <mfrac>
indicates a fraction. A fraction has two parts,
a numerator and a denominator. In xml, we call these parts children of
<mfrac>
. They may be represented in various ways, which need
not concern us here. What is of real importance is that the third
column tells liblouisxml to put the characters ‘~?’ before the
numerator, ‘/’ between the numerator and denominator, and
‘~#’ after the denominator. Later on, liblouis will translate
these characters into the proper representation of a fraction in the
Nemeth Code of Braille Mathematics. (For other mathematical codes,
see Implementing Braille Mathematics Codes).
The second line is of even greater interest. The first column is again ‘mfrac’, but this line is for binomial coefficient. The second column contains three subcolumns, an element name, an attribute name and an attribute value. The attribute linethickness specifies the thickness of the line separating the numerator and denominator. Here it is 0, so there is no line. This is how the binomial coefficient is represented in print. The third column tells how to represent it in braille. liblouisxml will supply ‘~(’, upper number, ‘~%’, lower number, ‘~)’ to liblouis, which will then produce the proper braille representation for the binomial coefficient.
Returning to the line for the math element, we see that the third column begins with a backslash followed by an asterisk. The backslash is an escape character which gives a special meaning to the character which follows it. Here the asterisk means that what follows is to be placed at the very end of the mathematical expression, no matter how complex it is.
For further discussion of how the third column is used see Implementing Braille Mathematics Codes. The third column is not limited to mathematics. It can be used to add characters to anything enclosed by an xml tag.
Here is a complete list of the semantic actions which liblouisxml recognizes. Many of them are also the names of styles. These are listed first, preceded by an asterisk. For a discussion of these, see Customization Configuring liblouisxml.
Generally the format of a semantic action is:
semanticAction elementSpecifier optionalArguments
elementSpecifier
is the second-column value, which may be an
element name, an element-attribute pair or an element-attribute-value
triplet, separated by commas. This specifies where a semantic action
is to be applied. If it is solely an element then the action is
applied if this element is encountered. If it is an element-attribute
pair then the action is applied if the given element also has the
specified attribute. In the last case with a element-attribute-value
triplet the action is only applied if the element has the specified
attribute and the value of this attribute is equal to the specified
value.
* arith
* attribution
* biblio
* blanklinebefore
* caption
* code
* contents
* dedication
* directions
* dispmath
* disptext
document elementSpecifier
<elementSpecifier>
and
</elementSpecifier>
is an entire document.
* exercise1
* exercise2
* exercise3
* glossary
* graph
* graphlabel
heading1 elementSpecifier
heading2 elementSpecifier
heading3 elementSpecifier
heading4 elementSpecifier
* indexx
list elementSpecifier
elementSpecifier
with list style. That is, the
first line will begin at the left margin and subsequent lines will be
indented two cells.
* matrix
* music
* note
para elementSpecifier
<elementSpecifier>
and </elementSpecifier>
is to be
formatted as a normal body text paragraph.
* quotation
* section
* spatial
* stanza
* style1
* style2
* style3
* style4
* style5
* subsection
table elementSpecifier
<elementSpecifier>
according to
the table formatting rules it has been given for braille output.
* titlepage
* trnote
* volume
acknowledge
author
blankline
bodymatter
boldx
booktitle
boxline
cdata
center
chemistry
changetable
compbrl
configfile elementSpecifier filename
The configfile
, configstring
and configtweak
semantic actions enable the configuration of liblouisxml to be changed
according to the contents of the document being transcribed.
configfile
and configstring
take effect during the
document analysis phase performed by examine_document.c.
configtweak
is effective during the transcription phase,
performed by transcribe_document.c and the functions called in
this module.
elementSpecifier
is the usual second-column value, which may be
an element name, an element-attribute pair or an
element-attribute-value triplet, separated by commas. filename
must be on one of the paths set in the paths.c module. The file
may contain any configuration settings except those in the xml
section. These would be ineffective, since the document has already
been parsed.
configstring elementSpecifier setting1=value1;setting2=value2;...
Note that the setting=value
pairs are separated by semicolons.
Because the string may be longer than a screen line, you can use a
backslash ‘\’ followed immediately by a line ending ‘\n’, to
continue to another line. The string must not contain any blanks. Any
setting which can be specified in a file read with configfile can be
specified in configstring
.
configtweak elementSpecifier
configtweak
is identical to configstring
except that it
is called in the transcription phase. It should be used only for
things like changing translation tables. For example:
configtweak elementSpecifier literaryTextTable=fooTable;\ mathExprTable=barTable
configtweak
is not a generalization of changetable
. The
latter changes only the literarytexttable and applies to a subtree.
configtweak
remains in effect until changed by another
configtweak
.
contentsheader elementSpecifier
Replace the given element with a table of contents (see Table of contents). Typically the elementSpecifier
would occur at the
end of the information which you want to be at the head of the output,
such as a title page, dedication, etc.
contracted
copyright
endnotes
footer
frontmatter
generic
graphic
htmllink
htmltarget
italicx elementSpecifier
jacket
line
maction
maligngroup
malignmark
math elementSpecifier
<elementSpecifier>
(which may have attributes and values), and ends with
</elementSpecifier>
. This is therefore the root element of a mathematical
expression. However, mathematical expressions are usually part of a
document, so it is not given the semantic action document. The
math
semantic action causes liblouisxml to carry out special
interpretation actions.
menclose
merror
mfenced
mfrac
mglyph
mi
mlabeledtr
mmultiscripts
mn
mo
mover
mpadded
mphantom
mprescripts
mroot
mrow
ms
mspace
msqrt
mstyle
msub
msubsup
msup
mtable
mtd
mtext
mtr
munder
munderover
newpage
no
none
notranslate elementSpecifier
pagenum
preface
rearmatter
reverse
righthandpage
runninghead
semantics
skip elementSpecifier
</elementSpecifier>
.
Nothing
in between will have any effect on the braille output.
softreturn elementSpecifier
tblbody
tblcol
tblhead
tblrow
tnpage
transcriber
uncontracted
A table of contents is produced for an xml file if the file contains a
tag which has been defined with the contentsheader
semantic action (see contentsheader) and
also tags for the heading1
, heading2
, heading3
or
heading4
semantic actions (see heading1). The table of contents will
contain print and braille page numbers if these features have been
enabled. A sequence of fill characters will be inserted before the
page numbers, so that the latter are at the right margin. The fill
character can be specified in a configuration file with the
lineFill
setting (see lineFill). The default fill character is an apostrophe
(dot 3).
Five new styles have been defined for the table of contents. The first
is the contentsheader
style (see contentsheader style),
which is used to specify how the contents should be placed and the
title that should be given to it. The others correspond to the four
heading levels and are contents1
, contents2
,
contents3
and contents4
. These styles are chosen as
appropriate while the table of contents is being made. Do not declare
them in a semantic-action file. See the canonical.cfg file for
the current default definitions of all these styles.
The table of contents will be placed where the xml tag is that you
declared in the contentsheader
semantic action (see contentsheader). It begins on a new page.
After it is completed the braille page number is reset to
beginningBraillePageNumber
and another new page is started.
This means that the xml tag with the contentsheader
semantic
action should occur at the end of the information which you want to be
at the head of the output, such as a title page, dedication, etc.
It is not necessary that an xml file contain a tag with the
contentsheader
semantic action. If the file contains headers
you can obtain a table of contents by specifying contents yes
in a configuration file or -Ccontents=yes on the command line
of xml2brl. In this case, the table of contents will appear
at the beginning of the output. Pages will be numbered beginning with
1. When the table of contents is complete, the material in the file
will start on a new page and the page number will be the value given
in beginningBraillePageNumber
.
The contents1
, etc. styles all have the format contents
setting. This is a variant of the leftJustified
format. It has
been necessary to change the way firstLineIndent
is handled to
accommodate multilevel lists. Up till now, if firstLineIndent
was negative, the first line would start at the real left margin,
regardless of the value of leftMargin
. Now the value of
firstLineIndent
is simply added to leftMargin
. This
means that if it is negative it is really subtracted. For example, if
leftMargin
is 4 and firstLineIndent
is -2 the first line
will start in cell 2.
The Nemeth Code of Braille Mathematical and Science Notation has been implemented. Other braille mathematics codes can be implemented by following the same pattern. The Nemeth Code implementation is discussed as an example below.
Four tables are used to translate xml documents containing a mixture of text and mathematics into the Nemeth code. They can be found in the subdirectory lbx_files of the liblouisxml directory. First, the semantic-action file nemeth.sem is used to interpret the mathematical portions of the xml document (The text portions are interpreted by another semantic-action file which will not be discussed here). After the math and text have been interpreted, two liblouis tables, nemeth.ctb and en-mathtext.ctb are used to translate them. Each piece of mathematics or text is translated separately and the pieces are strung together with blanks between them. This results in inaccuracies where mathematics meets text. The fourth table, also a liblouis table, is used to remove these inaccuracies. It is called edittable.ctb, and it does things like removing the multi-purpose indicator before a blank, inserting the punctuation indicator before a punctuation mark following a math expression, and removing extra spaces.
The general format and use of semantic-action files were discussed in the previous section, (see Connecting with the xml Document - Semantic-Action Files). In this section we shall concentrate on the optional third column, which is used a lot in nemeth.sem. While the first two columns can be generated by liblouisxml but must be edited by a person, the third column must always be provided by a human.
As previously stated, the third column tells liblouisxml what characters to insert to inform liblouis how to translate the math expression. Look at the following line:
mfrac mfrac ^?,/,^#
You will see that the third column contains two commas. This means
that it has three subcolumns. A fraction has a numerator and a
denominator. These are called children of the mfrac
element.
The first subcolumn specifies the characters that liblouisxml should
place in front of the numerator. The second subcolumn gives the
characters to be placed between the numerator and denominator.
Finally, the third subcolumn gives the characters to place after the
denominator. You will see that the first subcolumn contains a caret
followed by a question mark. The dot pattern for the question mark in
computer braille is the same as for the Nemeth start-fraction
indicator. The caret is used so that liblouis can tell this apart from
a question mark, which also has the same dot pattern in computer
braille. The second subcolumn contains a slash but no caret. This is
because there is no danger of confusion where the slash is concerned.
The third subcolumn does contain a caret, and it also contains a
number sign, which corresponds to the Nemeth end-fraction indicator.
When liblouisxml encounters the MathML representation of the fraction
one-half it produces the following string of characters:
‘^?1/2^#’. liblouis then removes the carets to get ‘?1/2#’.
As another example, consider the entry in nemeth.sem for a subscript.
msub msub ,^;,^"
Here the first subcolumn is blank, because nothing is to be placed before the subscripted symbol. The second subcolumn contains a caret and a semicolon (in computer braille). This corresponds to the Nemeth subscript indicator. The third column contains a caret and a quotation mark, corresponding to the Nemeth baseline indicator. liblouisxml translates the MathML expression for x superscript i into ‘x^;i^’. liblouis subsequently produces ‘x;i’. There are other steps if the subscript is numeric. These are handled by pass2 opcodes in the liblouis translation table, nemeth.ctb.
You will notice that the entries in nemeth.sem have various numbers of subcolumns in the third column. In general, the characters given in the first subcolumn are placed before the first child of the element given in the second column. The characters in the second subcolumn are placed before the second child, and so on, until the characters given in the last subcolumn are placed after the last child.
Sometimes an element or tag can have an indeterminate number of
children. This is true of <math>
itself. Yet, it may be
necessary to place some characters after the very last element. Let us
look at the <math>
entry.
math math \eb,\*\ee
First let us discuss escape sequences starting with a backslash. These are basically the same as in liblouis. The sequence ‘\e’ is shorthand for the escape character, which would otherwise be represented by ‘\x001b’. The beginning of a math expression is denoted by an escape character followed by the letter b and the end by an escape character followed by the letter ‘e’. This enables the editing table to do such things as drop the baseline indicator at the end of a math expression and insert a number sign at the beginning, if needed.
Not found in liblouis is the sequence ‘\*’. This means to put what follows after the very last child of the math element, no matter how many there are.
As another example consider:
mtd mtd \*\ec
mtd
is the MathML tag for a table column. There may be many
children of this tag. The entry says to put an escape character (hex
1b), plus the letter ‘c’, after the very last of them.
As a final example consider:
mtr mtr ^.^\,^(,\*^.^\,^)\er
mtr
is the MathML tag for a row in a table, in this case a
matrix. Each row in a matrix must begin with the dot pattern
‘46-6-12356’ and end with the dot pattern ‘46-6-12456’. As
usual a caret is placed before the corresponding characters. Since dot
6 is a comma, it must be escaped. This is done by placing a backslash
before the comma. There are two subcolumns. the first contains the
characters to be placed at the beginning of each row. The second
starts with ‘\*’, signifying that the characters following it
are to be placed at the end of everything in this row. A subcolumn
starting with ‘\*’ must be the last (or only) subcolumn.
Here this last subcolumn ends with an escape character and the letter <r>, signifying the end of a row.
So much for the semantic action file. Even though the characters in the third column were chosen to correspond with nemeth characters, they may not have to be changed for other math codes. liblouis can replace them with anything needed.
This brings us to a consideration of the two tables used by liblouis to translate mathematics texts. The first, en-mathtext.ctb is used to translate text appearing outside math expressions. It is necessary because the Nemeth code requires modifications of Grade 2 braille. Other math codes may not have this requirement.
The table actually used to translate mathematics is nemeth.ctb. It includes two other tables, chardfs.cti and nemethdefs.cti. The first gives ordinary character definitions and is included by all the other tables. Note however, that the unbreakable space, ‘\x00a0’, is translated by dot 9. This is used before and after the equal sign and other symbols in nemeth.ctb. The second table contains character definitions for special math symbols, most of which are Unicode characters greater than ‘\x00ff’. The Greek letters are here. So are symbols like the integral sign.
Most of the entries in nemeth.ctb should be familiar from other tables. The unfamiliar ones follow the comments ‘# Semantic pairs’ and ‘# pass2 corrections’. The first simply replace characters preceded by a caret with the character itself. The second make adjustments in the code generated directly from the nemeth.sem file. The pass2 opcode is discussed in the liblouis documentation (see Overview). Here are some comments on a few of the entries in nemeth.ctb.
pass2 @1456-1456 @6-1456
Replaces double start-fraction indicators with the start complex fraction indicator.
pass2 @3456-3456 @6-3456
Replaces double end-fraction indicators with the end-complex-fraction indicator.
pass2 @56[$d1-5]@5 *
Removes the subscript and baseline indicators from numeric subscripts.
pass2 @5-9 @9
Removes the baseline or multipurpose indicator before an unbreakable space generated by the translation of an equal sign, etc.
pass2 @45-3-5 @3
Replaces a superscript apostrophe with a simple prime symbol.
pass2 @9[]$d @3456
Puts a number sign before a digit preceded by a blank.
pass2 @9-0 @9
Removes a space following an unbreakable space.
We now come to the fourth and last table used for math translation, the editing table, edittable.ctb. As explained at the beginning, this table is used to remove inaccuracies where math translation butts up against text translation. For example, the Nemeth code puts numbers in the lower part of the cell. However, punctuation marks are also in the lower part of the cell. So Nemeth puts a punctuation indicator, dots ‘456’, in front of any lower-cell punctuation that immediately follows a mathematical expression. If this occurs inside Mathml it is handled by nemeth.ctb. However, a MathML expression is often followed by a punctuation mark which is the first part of text. liblouisxml puts a blank between math and text, but this can result in a mathematical expression followed by a blank and then, say, a period, dots ‘256’. edittable.ctb replaces the blank with the punctuation indicator.
When you look at edittable.ctb you will see that it begins with an include of chardefs.cti. Most of the entries are ordinary, but some are interesting. for example,
always "\s 0
replaces the baseline or multipurpose indicator followed by a space with just a space.
Liblouisxml may contain code borrowed from the Linux screenreader BRLTTY, Copyright © 1999-2009 by the BRLTTY Team.
Copyright © 2004-2009 ViewPlus Technologies, Inc. www.viewplus.com.
Copyright © 2006,2009 Abilitiessoft, Inc. www.abilitiessoft.com.
Liblouisxml is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Liblouisxml is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with Liblouisxml. If not, see http://www.gnu.org/licenses/.
liblouisxml is an "extensible renderer", designed to translate a wide variety of xml and text documents into braille, but with a special emphasis on technical material. The overall operation of liblouisxml is controlled by a configuration file. The way in which a particular type of xml document is to be rendered is specified by a semantic-action file for that document type. Braille translation is done by the liblouis braille translation and back-translation library (see Overview). Its operation, in turn is controlled by translation table files. All these files are plain text and can be created and edited in any text editor. Configuration settings can also be specified on the command line of the console-mode transcription program xml2brl.
The general operation of liblouisxml is as follows. It uses the
libxml2 library to construct a parse tree of the xml document. After
the parse tree is constructed, a function called
examine_document
looks it over and determines whether math
translation tables, etc. are needed. examine_document
also
constructs a prototype semantic-action file, if one does not exist
already. When it is finished, another function, called
transcribe_document
, does the actual braille transcription. It
calls transcribe_math
to handle MathML subtrees,
transcribe_chemistry
for chemical formula subtrees,
transcribe_graphic
for SVG graphics, etc. Entities are
translated to Unicode, if they are not already. Sequences of symbols
indicate superscripts, return to the baseline, subscripts, start and
end of fractions, etc. The Braille translator and back-translator
library liblouis is used to do the braille translation.
The transcribe_math
function works in conjunction with the
latest version of liblouis and a special math translation table to
transcribe most mathematical expressions into good Nemeth Code.
Other Braille mathematics codes can be handled by modifying the
translation table and semantic-action file.
The functions which are not needed at the moment, such as
transcribe_chemistry
, are only skeletons. However, I hope that
transcribe_graphics
can be expanded in the near future to use
the graphics capability of the Tiger tactile graphics embossers.
The latest versions of liblouisxml and liblouis can be downloaded from www.abilitiessoft.com. Note that liblouisxml will only work with the latest version of liblouis.
liblouisxml can be compiled to use either 16-bit or 32-bit Unicode internally. This is inherited from liblouis, so liblouis must be compiled first and then liblouisxml. Wherever 16 bits are mentioned in this document, read 32 if you have compiled the library for 32 bits.
As stated in the previous section, liblouisxml uses three kinds of files, configuration files, semantic-action files, and liblouis translation tables. The first two are discussed later in this documentation. liblouis translation tables are discussed in the liblouis documentation (see Overview) which is distributed with liblouis. These files can be placed on various paths, which are determined at compile time. One of these paths should be to the lbx_files directory provided by liblouisxml, which contains the principal configuration file (canonical.cfg) and the semantic-action files. Another should be to the tables directory in the liblouis distribution. Note that liblouisxml also generates some files, all of which are placed in the current directory. These files are new prototype semantic-action files, additions to old semantic-action files, temporary files, and log files. The first two can be used to extend the capability of liblouisxml to process xml documents. The latter two are useful for debugging.
Paths are set by changing a few lines of code in the paths.c module. If you are preparing liblouisxml for Windows a function which finds the name of the "Program Files" directory for your locale is called automatically. You can then modify the line containing the term ‘yourSubDir’ as needed. Note that this line will produce a deliberate compiler error, so you can find it easily.
If you are preparing liblouisxml for a Unix-type system look for the line that says ‘Set Unix Paths’. The following lines establish paths to the lbx_files directory and to the liblouis tables directory. If you are using the Gnu autotooled versions of liblouis and liblouisxml these paths are set up automatically.
The function addPath
takes care of adding a path to liblouisxml
properly. You can specify many more than two paths.
char *lbx_version (void)
This function returns a pointer to a character string containing the version of liblouisxml. Other information such as the release date and perhaps notable changes may be added later.
void * lbx_initialize ( const char *configFilelist, const char *logFileName, const char *settingsString)
This function initializes the libxml2 library, processes
canonical.cfg and configuration settings given in
settingsString
and the configuration files given in
configFilelist
. This is a list of configuration file names
separated by commas. If the first character is a comma it is taken to
be a string containing configuration settings and is processed like
the settingsString
string. Such a string must conform to the
format of a configuration file. Newlines should be represented with
ASCII 10. If logfilename
is not null
, a log file is
produced on the current directory. If it is null
any messages
are printed on stderr. The function returns a pointer to the
UserData
structure. This pointer is void
and must be
cast to (UserData *)
in the calling program. To access the
information in this structure you must include louisxml.h. This
function is used by xml2brl.
int lbx_translateString ( const char *configfilelist, char * inbuf, widechar *outbuf, int *outlen, unsigned int mode)
This function takes a well-formed xml expression in inbuf
and
translates it into a string of 16-bit (or 32-bit if this has been
specified in liblouis) braille characters in outbuf
. The xml
expression must be immediately followed by a zero or null byte.
Leading whitespace is ignored. If it does not then begin with the
characters ‘<?xml’ an xml header is added. If it does not begin
with ‘<’ it is assumed to be a text string and is translated
accordingly. The header is specified by the xmlHeader
line in
the configuration file. If no such line is present, a default header
specifying UTF-8 encoding is used. The mode
parameter specifies
whether you want the library to be initialized. If it is 0 everything
is reset, the canonical.cfg file is processed and the
configuration file and/or string (see previous section) are processed.
If mode
is 1 liblouisxml simply prepares to handle a new
document. For more on the mode
parameter see the next section.
Which 16-bit character in outbuf
represents which dot pattern
is indicated in the liblouis translation tables. The
configfilelist
parameter points to a configuration file or
string. Among other things, this file specifies translation tables. It
is these tables which control just how the translation is made,
whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or
something else.
Note that the *outlen
parameter is a pointer to an integer.
When the function is called, this integer contains the maximum output
length. When it returns, it is set to the actual length used. The
function returns 1 if no errors were encountered and a negative number
if a complete translation could not be done.
int lbx_translateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a well-formed xml document in
inputFilename
and produces a braille translation in
outputFilename
. As for lbx_translateString
, the
mode
parameter specifies whether the library is to be
initialized with new configuration information or simply prepared to
handle a new document. In addition, the mode
parameter can
specify that a document is in html, not xhtml. liblouisxml.h
contains an enumeration type with the values dontInit
and
htmlDoc
. These can be combined with an or (‘|’) operator. The
input file is assumed to be encoded in UTF-8, unless otherwise
specified in the xml header. The encoding of the output file may be
UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the
outputEncoding
line in the configuration file,
configfilelist
. The function returns 1 if the translation was
successful.
int lbx_translateTextFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a text file in inputFilename
and produces
a braille translation in outputFilename
. The input file is
assumed to be encoded in Ascii8. However, utf-8 can be specified with
the configuration setting inputTextEncoding utf8
. Blank lines
indicate the divisions between paragraphs. Two blank lines cause a
blank line between paragraphs (or headers). The output file may be in
UTF-8, UTF-16, or Ascii8, as specified by the outputEncoding
line in the configuration file, configfilelist
. As for
lbx_translateString
, the mode
parameter specifies
whether complete initialization is to be done or simply initialization
for a new document.
int lbx_backTranslateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a braille file in inputFilename
and
produces a back-translation in outputFilename
. The input file
is assumed to be encoded in Ascii8. The output file is in either plain
text or html, according to the setting of backFormat
in the
configuration file. Html files are encoded in UTF8. In plain-text,
blank lines are inserted between paragraphs. The output file may be in
UTF-8, UTF-16, or Ascii8, as specified by the outputEncoding
line in the configuration file, configfilelist
. The mode
parameter specifies whether or not the library is to be initialized
with new configuration information, as described in the section on
lbx_translateString
(see lbx_translateString).
void lbx_free (void)
This function should be called at the end of the application to free
all memory allocated by liblouisxml and liblouis. If you wish to
change configuration files during your application, use a mode
parameter of 0 on the function call using the new configuration
information. This will call the function automatically.
backFormat
: outputFormatbackLineLength
: outputFormatBeginingPageNumber
: outputFormatbraillePageNumberAt
: outputFormatbraillePages
: outputFormatcellsPerLine
: outputFormatcenter
: stylecompbrailleTable
: translationeditTable
: translationentity
: xmlfileEnd
: outputFormatfirstLineIndent
: styleformat
: styleformatFor
: outputFormathyphenate
: outputFormatinputTextEncoding
: outputFormatinterline
: outputFormatinterlineBackTable
: translationinternetAccess
: xmlinterpoint
: outputFormatleftMargin
: stylelineEnd
: outputFormatlineFill
: outputFormatlinesAfter
: stylelinesBefore
: styleLinesPerPage
: outputFormatliteraryTextTable
: translationMathexpTable
: translationmathtextTable
: translationnewEntries
: xmlnewPageAfter
: stylenewPageBefore
: styleoutputEncoding
: outputFormatpageEnd
: outputFormatparagraphs
: outputFormatprintPageNumberAt
: outputFormatprintPages
: outputFormatrightHandPage
: stylesemanticFiles
: xmlskipNumberLines
: styletranslate
: styleuncontractedTable
: translationxmlheader
: xmllbx_backTranslateFile
: lbx_backTranslateFilelbx_free
: lbx_freelbx_initialize
: lbx_initializelbx_translateFile
: lbx_translateFilelbx_translateString
: lbx_translateStringlbx_translateTextFile
: lbx_translateTextFilelbx_version
: lbx_version