An Gramadóir: Developers' Guide | ||
---|---|---|
Prev | Chapter 1. An Overview | Next |
As described above in Section 1.2, An Gramadóir finds errors by first marking up the input text with grammatical information (ranging from simple part-of-speech tags to full phrase structure) and then performing pattern-matching on the marked up text. In other words, it is "rule-based", but without the limitations of a trivial pattern-matching approach like the one used by the venerable GNU diction package. The complexity of the errors that can be trapped and reported is limited only by the sophistication of the markup that is added. For Irish and the other Celtic languages, relatively little markup is required because many of the common errors made in writing involve misuse of the initial mutations which are determined almost entirely by local context (usually, just the preceding word).
For most other languages, creating a grammar checker with more than trivial coverage is a major undertaking, requiring syntactic analysis sophisticated enough to detect potentially "long distance" errors like noun/verb disagreement. This is surely true for a language like English, and even more so for languages with free word order. Because of this, the traditional approach to grammar checking has been to try something approximating a full parse of the input text. The problem is, even for English, where there is a huge market-driven need for robust language processing tools and huge amounts of money to be made developing them, the best parsers are only right maybe 80% of the time. This leads to brittle grammar checking and lots of false positives.
An Gramadóir is intended for use by minority and under-resourced language communities, where there is often little hope of assembling the resources (time, money, expertise) needed to tackle full-scale parsing. With this in mind, the grammar checking algorithm of An Gramadóir is designed in such a way that rules can be applied at various recursive "levels"; as a consequence, the resulting grammar checker will reflect precisely the amount of energy that is put into it. This is to be contrasted with a design requiring the construction of a complete parser, which might, if you're lucky, be correct 40-50% of the time, resulting in an essentially useless tool from the point of view of the end user. In other words, you can focus work on the parts of natural language processing generally regarded as "easy": morphology, part-of-speech tagging, noun phrase chunking, etc., postponing the "hard" parts: semantic disambiguation, prepositional phrase attachment, anaphora resolution, etc.