The problem of
assigning(tagging) a correct grammatical category to each word is very time
consuming and nontrivial - Czech language knows about 5 000 000 word forms.
DESAM corpus which has been manually disambiguated contains (December 1997)
more than 1 000 000 word forms (about 130 000 different word forms and 1665
different tags - grammatical categories).
Our approach exploits tagged DESAM corpus. A goal of the project is to
assist in the disambiguation process. We do not aim at fully automatic
disambiguation. We want, by ILP, to solve only a part - we hope that a
majority - of ambiguities.
Given annotated corpus, our tasks aims to find rules for