Technical Reports

A List by Author: Pavel Smrž

home page:

Automatic Processing of Czech Inflectional and Derivative Morphology

by Radek Sedláček, Pavel Smrž, This is an extended version of the paper which is going to be published in the Proceedings of the Fourth International Conference TSD 2001, LNAI 1902, Pilsen, Czech Republic, September 2001, Springer-Verlag. June 2001, 12 pages.

FIMU-RS-2001-03. Available as Postscript, PDF.


This paper deals with the effective implementation of the new Czech morphological analyser ajka which is based on the algorithmic description of the Czech formal morphology. First, we present two most important word-forming processes in Czech --- inflection and derivation. A brief description of the data structures used for storing morphological information as well as a discussion of the efficient storage of lexical items (stem bases of Czech words) is included too. Then, we describe the morphological analysis algorithm in details and finally, we bring some interesting features of the designed and implemented system ajka together with current statistic data.

Finding Semantically Related Words in Large Corpora

by Pavel Smrž, Pavel Rychlý, Slightly modified version of the paper published in the Proceedings of TSD 2001, Pilsen, Czech Republic. June 2001, 9 pages.

FIMU-RS-2001-02. Available as Postscript, PDF.


The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.

Off-line Recognition of Cursive Handwritten Czech Text

by Pavel Smrž, Štěpán Hrbáček, Michal Martinásek, February 1998, 8 pages.

FIMU-RS-98-02. Available as Postscript, PDF.


In this paper a part of the system for recognising off-line cursive Czech text is presented. Recently, various systems for recognition of cursive English text has been developed, however, to our knowledge no method has been presented yet for Czech, a language rich in diacritic marks. This paper deals with preprocessing which is different for Czech and English handwritten texts. For finding the letter boundaries a method based on minimising a heuristic cost function has been used.

DESAM - Approaches to Desambiguation

by Karel Pala, Pavel Rychlý, Pavel Smrž, December 1997, 12 pages.

FIMU-RS-97-09. Available as Postscript, PDF.


This paper deals with Czech desambiguated corpus DESAM. It is a tagged corpus which was manually desambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the desambiguated data. Possible ways of developing procedures for complete automatic desambiguation are considered.

Navigation and Information System for Visually Impaired People

by Ivan Kopeček, Pavel Smrž, May 1997, 7 pages.

FIMU-RS-97-05. Available as Postscript, PDF.


Orientation is one of the most important problems of visually impaired people. The aim of this paper is to suggest a contribution to the solution of this problem using computer technology. The basic idea is the detection of motion and orientation using sensors and consequent position identification. The detected trajectory is compared with a map and is corrected by means of the algorithm described in the paper. Some problems concerning sensor detection of human motion are also discussed. Based on the determined position other relevant information is provided to the user of system (information describing the neighbourhood of the actual position, optimal way to the chosen destination, possible warnings).

Word Hy-phen-a-tion by Neural Networks

by Pavel Smrž, Petr Sojka, August 1996, 10 pages.

FIMU-RS-96-04. Available as Postscript, PDF.


We are discussing our experiments we made when learning feedforward neural network to find possible hyphenation points in all words of given language. Neural networks show to be a good device for solving this difficult problem. The structure of the multilayer neural network used is given, together with a discussion about training sets, influence of input coding and results of experiments done for the Czech language. We end up with pros and cons of our approach tested - hybrid architecture suitable for a multilingual system.

Responsible contact:

Please install a newer browser for this site to function properly.

More information