Technical Reports

A List by Author: Karel Pala

home page:

Corpus-based Rules for Czech Verb Discontinuous Constituents

by Eva ®áèková, Karel Pala, This is an adapted version of the paper accepted for printing in the Proceedings of TSD`99. August 1999, 6 pages.

FIMU-RS-99-03. Available as Postscript, PDF.


In this paper we present a method for extracting general structures of the verb groups from a tagged and fully disambiguated corpus and consecutive exploitation of these structures for the building a formal grammar in the Prolog DCG fashion. Our goal is to apply them as a rules for the analysis of the Czech verb groups in the non-disambiguated grammatically tagged Czech corpus texts. The problem of the recognition of verb discontinuous constituents in Czech is also approached and obtained statistical data are presented.

DESAM - Approaches to Desambiguation

by Karel Pala, Pavel Rychlý, Pavel Smr¾, December 1997, 12 pages.

FIMU-RS-97-09. Available as Postscript, PDF.


This paper deals with Czech desambiguated corpus DESAM. It is a tagged corpus which was manually desambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the desambiguated data. Possible ways of developing procedures for complete automatic desambiguation are considered.

Responsible contact: unix(atsign)fi(dot)muni(dot)cz