An Gramadóir: Developers' Guide | ||
---|---|---|
Prev |
B
,
E
,
F
,
X
,
Y
,
and Z
. What do these mean?
No, or at least not without simulating a Unix-like environment with Cygwin. Even though the end user grammar checker Lingua::XX::Gramadoir is generated in pure Perl and will run under ActiveState Perl, the gramadoir scripts for generating it use bash, iconv, sed and all that.
You can use whatever encoding you want. The end user will be aware of this choice in only one way: it will be the default encoding for files input to the front-end script gram-xx.pl. On the other hand, they need only specify the command line option --incode to change the default. One other issue to be aware of is that the Perl regular expression engine for Unicode is two to three times slower than the 8-bit version. So if you are deciding between using UTF-8 and, say, one of the ISO-8859 encodings, it is probably worth sticking with ISO-8859.
5.3. Six XML tags are reserved for use by An Gramadóir while checking grammar:
B
,
E
,
F
,
X
,
Y
,
and Z
. What do these mean?
Some of these can be seen in action in the extended example presented in Section 1.2.
E
is used to mark up errors, something like this:
<E msg="PREFIXT"><T>an</T> <N pl="n" gnt="n" gnd="m">ainm</N></E>
B
and Z
are used to
mark up ambiguous words; see Section 3.2.3
for examples.
F
is used to mark up "rare" words,
and should correspond to grammatical code 127 in your
pos-xx.txt file.
X
is used to mark up words that do not
appear in the lexicon.
Y
is used to mark up words that should
be ignored, for instance if they appear in the user's
ignore file.