A List by Author: Karel Pala
- home page:
Corpus-based Rules for Czech Verb Discontinuous Constituents
In this paper we present a method for extracting general structures of the verb groups from a tagged and fully disambiguated corpus and consecutive exploitation of these structures for the building a formal grammar in the Prolog DCG fashion. Our goal is to apply them as a rules for the analysis of the Czech verb groups in the non-disambiguated grammatically tagged Czech corpus texts. The problem of the recognition of verb discontinuous constituents in Czech is also approached and obtained statistical data are presented.
DESAM - Approaches to Desambiguation
This paper deals with Czech desambiguated corpus DESAM. It is a tagged corpus which was manually desambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the desambiguated data. Possible ways of developing procedures for complete automatic desambiguation are considered.
Please install a newer browser for this site to function properly.