Constructing A Braille Rule File
BraillMaster represents an important breakthrough in computerized
Braille production. It makes it possible to adapt the
translation mechanism of a Braille translator to follow given Braille
rules precisely, and it also allows Braille users to construct their
own translation rules, regardless of language or application.
BrailleMaster does this with the introduction of a symbolic language called LOUIS. LOUIS is simple to use, yet powerful and flexible enough to
be able to define Braille rules for most national languages,
as well as for special applications, such as math Braille,
music Braille, etc.
The name LOUIS has been as a tribute to Louis Braille.
Robotron has conceived LOUIS in order to produce better Braille
code and to provide the public with the ultimate tool in
Braille production and research. Such a universal tool has been
long overdue.
LOUIS Rule File Structure
LOUIS representation of a Braille code consists of a sequence
of definition lines in a text file called Rule File.
Each definition line represents a certain rule of the Braille code. The
sequence in which the lines appear in the Rule File is
important: The rules are always searched from top to bottom and
the higher the definition line is placed in the text, the higher
precedence it has over the rules which follow.
All LOUIS definitions which represent a complete description of
a particular Braille code are contained in a single text file called
Rule File. It is possible to have many Rule Files for different languages and applications. These can be selected even from within translated text, so that different Rule Files can be applied to different portions of the text.
The LOUIS Rule Files are plain text files, consisting of lines. There are three types of lines: Comment Lines, Rule Header Lines and Rule Definition Lines.
A Comment Line always starts with a semicolon and can contain
comments for user reference. A Comment Line is ignored during
the translation process.
Rule Headers serve to mark individual groups of rules. They
start with a single character, followed by "-rule". (A dash
followed immediately by the word "rule".) For example, all rules for letter "A" will start with a header "A-rule".
The LOUIS Rule Definitions are divided into three major groups:
The Composition Rules, the Non-Letter Rules and the Letter Rules.
Composition Rules are always the first rules in a LOUIS Rule File. They have the highest priority because they determine the general behaviour
of the whole Braille system. For example, in the Standard
English Grade 2 Braille, the Composition Rules determine
such essential imperatives as that each number commences with a number sign, each upper-case character starts with a capital sign, etc.
In the Rule File, the Composition Rules are followed by the Non-Letter Rules.
Non-Letter Rules are preceded by the header "--rule". (Two
dashes immediately followed by the word "rule".)
This rule group defines the Braille translation of all printable
characters except for letters of the alphabet. These include
punctuation marks, special symbols and digits.
The Non-Letter Rules group is further subdivided into three
subgroups: The Space Rules, Punctuation Mark Rules and Numeral
Rules.
The Space Rules determine how a space character is translated
into Braille. This can be a very simple single-line group in case
the space is always represented by a single Braille code.
However, in some Braille codes, such as the Standard English Grade 2 Braille, space is sometimes removed from the corresponding Braille text, such as for example between the words "and" , "for", "of", "the", "with",
etc. Such exceptions have to be defined in the Space Rules.
The Space Rules are followed by the Punctuation Marks group. Here
BrailleMaster's translator is instructed how to interpret
punctuation marks. All ambiguities in how the Braille code
represents punctuation marks must be defined in this section.
(For example a dot between digits can result in a different
Braille code compared to a dot terminating a sentence.)
The Punctuation Mark Rules are followed by a small group of
Numeral Rules which define a Braille representation of digits.
The composition of complete numbers is given by the Composition
Rules.
There as many Rule Headers in this group as there are characters
in the alphabet of the language these rules describe. For
example, the subgroup of rules for character A would start with
"A-rule", the subgroup of rules for character M would start with
"M-rule", etc.
The Letter Rules contain rules for every letter of alphabet.
If a character in the translated text does not have a
corresponding rule in the Letter Rules, it will be ignored. The
Letter Rules can be sometimes very simple, in instances where
a certain Braille code always corresponds to a certain
letter of alphabet, such as in the Computer Braille. On the
other hand, the Letter Rules can be quite complex if there are
many ambiguities or many abbreviations and contractions, such as
in English Grade 2 Braille.
LOUIS Rule Syntax
Each LOUIS rule consists of two parts: The text part on the
left and the corresponding Braille part on the right. A LOUIS rule
determines the relationship between the portion of translated text which is
matched by the left-hand part of the rule (also called "contextual part")
and the resulting Braille code, as specified by the right-hand Braille part of the rule.
For example, a simple rule for the word "here" goes like this:
|{HERE}| 5 125
The | symbols on the left determine the start and end of the context.
The { and } symbols denote the text which will be replaced by the Braille code shown on the right. On the right, there are two Braille signs, Dot 5 and Dot 1,2,5, shown either as a series of asterisks and dashes (more on this later).
Indeed, the correct Braille code in English Grade II Braille for the word "here" is Dot 5, followed by Dot 1,2,5. In other words, the word "here", if it stands alone, is always contracted into a special two-character Braille sign.
But what about words such as "Hereford", where the initial "Here" is not supposed to be contracted to Dot 5 and Dot 1,2,5 but rather spelled character by character? That's why the "text" and "context" symbols are separate. In the word "Hereford" the "Here" doesn't stand alone. Its context is different: It "stands alone" from the left, but is followed by the word "ford" immediately on the right.
For this, we can attempt to construct an appropriate rule:
Note that the rule covers only the character "h" - only the character "h" is surrounded by the text delimiters {} and will be replaced by the Braille code for a single "h": Dot 1,2,5. The next characters will be handled by rules for "e", "r", then "e" again, etc. During translation, the imaginary translation "cursor" moves to the character immediately after the right text delimiter }, and invokes the rule for that particular character.
In order for the "Hereford" rule to take precedence over the more general "here" rule, the "Hereford" rule will be simply placed before the "here" rule. Like this:
Naturally, not all the words in a human language can be contained
in the rules. Therefore, BrailleMaster uses various symbols which
make it possible to generalize when constructing rules. For
example, to instruct the translator that every number has to
be preceded by a number sign, a general symbol for "any number"
can be used, without the need of specifying all the numbers in
the universe, one by one!
The Braille part of a LOUIS rule consists of a series of Braille
codes, either shown as digits or a sequence of dashes and
asterisks. For example, "hello" in English Braille can be
represented either as
or
-**-*- --*-*- ***--- ***--- *-*-*-
Both forms of Braille description can be freely mixed within
the same line.
In the examples above, we have seen that the left-hand part of any LOUIS rule is always surrounded by special delimiters. These characters will be referred to as the "context delimiters". In the LOUIS examples shown here, we will assume that the context delimiters are vertical bar characters: |.
These context delimiters surround the whole contextual part of a
rule, so that BrailleMaster knows where this part begins and
where it ends.
Within each contextual part, there are two more delimiters, which
are referred to as Text Delimiters. The symbols we will be using here are compound parentheses (braces): { and }. The text they embrace is always replaced the Braille code on the right.
When processing a text file, the LOUIS translator scans the
rule lines one by one, comparing the original text to the entire
Rule Context. When a match is found, only that portion of the
original text which is matched by the Rule Text is converted
to the Braille code which is specified on the right-hand side
of the Rule.
Let us have another example to illustrate why we need the separate concepts of "Rule Text" and "Rule Context":
Firstly, let us establish the rule for contraction "one".
According to current English Grade 2 Braille rules, "one" should
be Brailled as Dot 5, Dot 1,3,5. The rule should look like this:
In these two rules, there is obviously no need to separate Text
and Context.
However, the contraction "one" should not be used in words such
as "colonel", "anemone", etc. In order to prevent contracting the
colonel, the following rule should precede the original one:
Recall that the scanning of a rule is done from left to right, character by
character, using an imaginary translation "cursor". When the translation
starts, the cursor points at the first text character and a scan
through the rules begins. As soon as a match is found, the
appropriate Braille code is generated and the "cursor" is placed
after the converted portion of the text, i.e. after that portion
which is matched by the text part of a rule.
Let us have yet another example, in which we will introduce another
concept: say we are converting the word
"can" into Braille. According to Standard English Grade 2
Braille rules, the word "can" needs to be converted to a single
Braille code, Dot 1,4. To facilitate this particular rule, the
Context Part of the Rule Definition could look like this:
The Text Part contains the whole word, since the whole word is
going to be replaced if the C,A and N characters are matched.
The Rule Context, however, also contains space characters on
each side of the Rule Text. This is consistent with the
English Grade 2 Braille requirements, which specify that only
a stand-alone word should be translated in this way. Now let us
consider the word "cannot": While the first three letters will
match the above Rule Text, the whole word will not match the
Rule Context and the rule will therefore fail for this word,
which is precisely what we need.
The Braille part of a LOUIS rule can be specified in two
ways: By a mixed sequence of dashes and asterisks, which
resembles the layout of a Braille keyboard, or by dot numbers.
Using the first method, a dash symbolizes a released key, while
an asterisk represents a key pressed down.
The code for "can" is Dot 1,4, therefore the Braille Part would
look like this:
The complete Rule Line for the "can" rule would then be:
or
However, within a sentence, the word "can" can be also terminated
by a punctuation mark, rather than a space. It can be also
preceded by a punctuation mark if the word is at the start of
a line and the previous line ends with a punctuation mark.
To make the above rule infallible, we have to replace the space
characters in the Context Part by general punctuation mark
symbols. These symbols are similar in concept to wild cards. Any
punctuation mark (including space) in the converted text will be
matched against these wild card symbols.
Let's assume that the wild card for a punctuation mark is ~ (usually called "tilde"). So, for the purpose of Braille rule translation, a tilde will represent any punctuation mark.
The correct and final version of the "can" rule will then be:
or
Composition Rules
The Composition Rules are exceptional because their Text Part
is empty. The Text Delimiters still exist but there is no text
between them. This is because the Composition Rules do not
replace any text by a Braille code. This task is left to the
other rules. Instead, the Composition Rules look at
transitions between parts of text.
The Composition Rules define relationships between various
groups of characters as a whole. This relationship in Braille
is usually specified by extra Braille codes inserted into text.
For example, the Composition Rules may be used to create a rule
for each whole number to be preceded by a number sign, but
not each digit within that number. Or a rule that each
upper case character should be preceded by a capital sign -
but not each upper case character within a whole word composed
of upper case letters.
To demonstrate the usage of Composition Rules, let us implement
the number sign convention for Standard English Grade 2
Braille. Let us start from the simplest requirement, that each
number should be preceded by a number sign:
Let us assume that the generlized LOUIS symbol for a digit is # (a "hash" symbol). A rule implementing this requirement will look like this:
(A Number Sign in Grade 2 Braille is Dot 3,4,5,6.)
We also need to cater for situations where a number
immediately follows a letter. Naturally, we do want a Number
Sign there:
Note that we have used "@" as a generalized LOUIS symbol for a letter.
If a punctuation mark, preceded by a letter, precedes a number,
we also want a number sign there. The following rule will cater
for this requirement:
But what happens if a number follows a dot, which follows a
space? This is obviously a decimal number, such as .5 and,
according to Standard English Braille needs to be translated as a
number sign, followed by a special sign for decimal point,
followed by the number.
The number sign must precede the sign for the dot. There is no
obvious way for the Composition Rules themselves to swap the
number sign and the dot sign. Indeed, what we need to do is
replace the dot with a decimal point code.
Since we are talking about replacement, we must move away from
the Composition Rules. And since we are talking about replacing a
punctuation mark, in our case, a dot, we have to add an extra
rule to the Punctuation Rules. A rule to cater for decimal
numbers starting with a decimal point, would read like this:
This rule should be included in the Punctuation Rules, preceding
a general rule for a dot.
Space Rules
Space Rules represent the first rule group of Non-Letter rules.
A space in practically all used Braille codes is represented by a
gap between Braille symbols. In LOUIS, this is expressed by
a sequence of six dashes. The simplest Space Rule will
therefore be:
However, in Standard English Grade 2 Braille, a space is
sometimes suppressed between certain words, such as between the
words "and", "for", "of", "the", "with", etc.
Say we wish to implement the rule that a space between "and" and
"for" will be suppressed. This is achieved by including the
following rule into the Space Rules:
The
Punctuation Rules
The Punctuation Rules can be very simple if a particular
punctuation mark is always replaced by a particular
Braille code. The situation gets more complicated if there
are context-dependent ambiguities.
For example, the rule for a colon is very straightforward indeed:
Comma, however, gets slightly more complicated:
The second, general rule, accommodates any context which doesn't
satisfy the first rule.
The first rule looks at the text surrounding the comma: If the
comma is immediately preceded by a digit and immediately
followed by three digits, it is then interpreted as the comma
dividing thousands in long numbers and translated
accordingly. This happens only if there are exactly three
digits following the comma, as defined by the general
punctuation symbol following the three general digit symbols. In
the actual text, the punctuation symbol is matched against any
punctuation mark, which can be another comma or any other
punctuation mark, for example a dot at the end of the sentence;
but not another digit. Space is also considered to be a
punctuation mark.
The representation of punctuation marks in Braille can sometimes
provide interesting situations which need the co-operation of
Punctuation Rules and Composition Rules, in order to provide
correct translation. An example of this is the per cent sign: In
Standard English Grade 2 Braille, the per cent sign is shown as a
Braille code for a dash followed by a code for "p", before the
actual number. This requirement is satisfied by using the
Composition Rule
and the Punctuation Rule
The first rule instructs to precede any number which is followed
by a percent sign, by three Braille codes, namely Dot 2,4; Dot
1,2,3,4 and Dot 3,4,5,6. The last one is a numeral sign. Also
note that character \ has been used as a wild-card symbol for any
decimal number.
The co-operating Punctuation Rule contains no Braille part,
therefore the per cent sign itself will not produce any Braille
code. The rule simply means that a per cent sign will be
disregarded by the Punctuation Rules since it has been
already taken care of by the Composition Rules.
Letter Rules
The Letter Rules are divided into groups for each letter of
the alphabet. The starting letter of the Rule Text of each
letter group must match.
There are a few simple points to remember when constructing
Braille rules with LOUIS:
1. The sequence of the rule lines is important. For example,
"bb" in the middle of a word is defined as Dot 2,3 in
Standard English Grade 2 Braille. However, the "bb" in the
word "babble" should NOT be translated using the Dot 2,3
code. This is because the contraction for the following "ble"
is preferred. To cater for this situation, the general rule for
"bb" has to be preceded by a specific rule for "babble":
2. Every Letter Rule must be terminated by a general character
rule, which represents a "fall back" position if all
preceding rules fail. In order for this last rule to never fail,
this means this last rule must not have any left or right
context. For example, the letter "A" rules in Standard
English Grade 2 Braille must terminate with
Similarly, the letter "B" rules have to terminate with
etc.
3. The first character of the Rule Text of all rules that follow
a particular Rule Header (up to the next Rule Header or the end
of the Rule File), must be identical and match that Rule Header.
A complete set of Braille rules conctructed in LOUIS, usable for
English Grade 2 conversion, can be downloaded from here.
Advanced LOUIS
This chapter describes advanced features of LOUIS
symbolic language. This information will be of interest to
those users who wish to develop serious Braille applications
beyond simple modifications of the factory-supplied rules.
User Definable Symbol Characters
In previous text, we have been using default delimiters and wild
card symbols, such as
LOUIS makes it possible to override these symbols with a user-
defined symbols. Any character from the standard ANSI or IBM character
set can be used, except character 255.
The definition of LOUIS symbols must be done at the start of the
Rule file, using Symbol Definition instructions. The syntax of a
symbol definition instruction is as follows:
Symbol Name = Character Code
The Character Code is simply an ASCII character enclosed between
single quotes.
The following reserved symbol names can be used to replace the
default symbol definitions in basic LOUIS:
For example, in default LOUIS symbology, the rule for the word
"can" is
|~{CAN}~| --**--
Now let us add the following symbol definitions to the start of
the rule file:
Word Repetition
For BrailleMaster to be the ultimate Braille production tool
for most languages, the rules must be flexible enough in
order to be able to generate perfect Braille code even in various
"exotic" Braille alphabets.
A special facility exists in LOUIS to incorporate contractions of
repetition words which occur especially in Malaysian and
Indonesian Braille. In basic LOUIS this option is not activated
and must be defined at the start of the Rule file, using the
reserved symbol name SymRpt, for example like this:
A rule, incorporating the Word Repetition symbol can then be
included in the punctuation rules:
When a repetition word is encountered in the text, for example
"orang-orang", which means "people" in the Malay language, the
translator will convert the word "orang" only once, append a
"repetition sign" to it (Dot 1,2,3,4,5,6 as defined in the above
rule) and reposition the translation cursor to the end of the
repetition word (i.e. not just after the hyphen as it would
normally). The hyphen between the two identical words will be
disregarded.
User Definable Groups
This facility makes it possible to define not only symbol
characters, but also groups of text characters or words
represented by them.
For example, the fact that a space is disregarded between words
"and", "for", "of", "the", "with" and "a" in English Grade
2 Braille can be described as the following sequence of
seventeen rules:
There are five user-definable groups, defined by reserved symbols
SymUs1, SymUs2, SymUs2, SymUs3, SymUs4 and SymUs5.
Let us simplify the above group for rules by defining a group,
for example like this, using the character _ (Alt 225) as a
symbol:
The following single rule will then replace all the previous
seventeen:
The general syntax of the User Definable Group definition is as
follows:
Symbol Name = Character Code : Group
A "character code" is a character surrounded by single quotes. A
"group" is a sequence of words enclosed by single quotes and
separated by commas.
There is a beneficial side-effect in implementing user-defined
groups: Since the speed of translation depends on the number of
rules, the reduction of rules usually increases the translation
speed. This applies especially if the reduction occurs in
the Composition and Punctuation rules. (The Composition rules
add most time to the translation since they have to be
scanned at each position of the translation cursor, checking for
boundaries between words, digits, etc.)
Note that there is a limit upon the total number of characters in
each user defined group: The number is not allowed to exceed 255
(not including separating commas and quotation marks). If the
total number of characters is found to be greater, an error will
be announced by BrailleMaster during the rule initialization
process.
Copyright © 2004 Robotron Group
Before the arrival of BrailleMaster, programs which performed text-to-Braille conversion were based on a fixed set of rules, defined by the programmer. As Braille code is quite complex, and, like a living language, subject to evolutionary changes, it often happens that fixed-rule Braille translation programs do not follow precisely the particular Braille code which they are supposed to represent.
|{HERE}| ----*- -**-*-
|{,}| -*----
|@{BB}@| **----
SymTxl Left Text Delimiter ({ used so far)
SymTxr Right Text Delimiter (} used so far)
SymPun Punctuation (~ used so far)
SymDig Digit (# used so far)
SymNum Number (\ used so far)
SymLet Letter (@ used so far)
SymUcl Upper Case Letter (^ used so far)
SymLcl Lower Case Letter (not used yet)
SymTxl='['
SymTxr=']'
SymPun='#'
|~AND{ }FOR~|
|~AND{ }OF~|
|~AND{ }THE~|
|~AND{ }WITH~|
|~FOR{ }A~|
|~FOR{ }THE~|
|~FOR{ }WITH~|
|~FOR{ }OF~|
|~OF{ }A~|
|~OF{ }THE~|
|~OF{ }WITH~|
|~OF{ }FOR~|
|~WITH{ }A~|
|~WITH{ }THE~|
|~WITH{ }OF~|
|~WITH{ }FOR~|
Software Downloads | User Registration | Contacts