Lexithiras: Multi-lingual Corpus Lexicography on PCs
Evangelos Dermatas, George Kokkinakis (University of Patras)
Á multi-lingual framework for corpus lexicography has been implemented in low cost PCs, improving the lexicographer's work productivity and quality through novel statistical processing and information retrieval tools which are applied to linguistically annotated large corpora. An automatic algorithm has been used to annotate the entire corpus using Part-of-Speech (POS) tags and to lemmatize the corresponding lexicon entries through a semi-automatic stochastic clustering method. Off-line corpus processing tools assist lexicographers to define the headwords of the Dictionary, while the compilation process is performed with the aid of on-line corpus processing tools.
Design and Implementation of a Dialog Manager
Jana Krutisová, Václav Matousek, Jana Ocelíková (Faculty of Applied Sciences, University of West Bohemia)
The presented paper deals with a methodology of a development of spoken dialog strategies used in multimodal information service systems. We briefly describe the initial conditions and resources of this development based on the wide experience of main European universities and institutes working in this significant problem area and then we bring out some concluding remarks to the program implementation of a successfully developed generic dialog manager of the multilingual information retrieval dialog system.
Dialog Based Programming
Ivan Kopecek (Faculty of Informatics, Masaryk University)
The architecture and basic principles of the fully conversational programming system DIALOG are described in the paper. The conversational approach based on speech synthesis and recognition inspires new ideas and viewpoints for the programming language that is based on the natural tree structure of the program rather than on the usual linear structure. The language also substantially suppresses syntactical errors and fully integrates facilities for editing and debugging of the program.
Speech Synthesis Based on the Composed Syllable Segments
Ivan Kopecek (Faculty of Informatics, Masaryk University)
The idea of the composed syllable segments for syllable based concatenative speech synthesis is described in the paper. The use of the composed syllable segments essentially reduces the number of elementary segments that have to be extracted from samples to create the speech segment database. The experiments show that the quality of the synthesized speech that is obtained by using composed syllable segments is very good even in comparison to the speech synthesis that uses the full syllable segment database. Prosodic segments structure of the implementation is also briefly mentioned in the paper.
Using Dialog Programming for Developing the Speech Oriented Hypertext System AUDIS
Ludek Bártek (Masaryk University, Brno), Robert Batusek (Masaryk University, Brno), Marek Fikera (Masaryk University, Brno), Ivan Kopecek (Masaryk University, Brno)
The concept of the hypertext system AUDIS oriented to blind users is presented in the paper in connection with the conversational programming language Dialog in which the hypertext system is programmed. The advantages of this architecture are discussed, also in the connection with the tree structure of the hypertext document.
Mutual Information in Czech Corpus ESO
Karel Pala, Pavel Rychlý (Faculty of Informatics, Masaryk University)
The development of the corpus ESO containing at the present moment 61,001,371 Czech word forms as well as the existence of the lemmatizer (tagger) LEMMA has made it possible to obtain the basic data concerning MI score for Czech. In this paper the first results of this enterprise are presented. Though the evaluation of the obtained data is still preliminary it is not difficult to see that they can be very useful in several directions, especially in the area of the lexicographical explorations of Czech.
Developing a Language Independent Interactive Bible Concordance on the World Wide Web
J. Gareth Evans (Swansea Institute of Higher Education)
This paper describes work in progress on the development of an interactive WWW-based concordance for Y Beibl Cymraeg Newydd (The New Welsh Bible). This system is designed so that the any textual data is separate from program code to enable bible concordances in other languages to be easily implemented. The concordance contains a high degree of functionality including verse searches based on word forms as well as lemmata, search range specification, boolean searches, and immediate searches based on words in the output text. Although, the system has been developed specifically for biblical texts, it can also be applied to a range of other literary works.
Cepstral Trajectory Transformation for Subword Recognition
Tibor Fegyó, Péter Tatai (Technical University of Budapest)
In this paper we introduce the pattern matching part of an open vocabulary speech recognition system. We focus on the cepstral trajectory transformation (CTT) which is applied for speaker dependent, demi-syllable based recognition. The basic CTT method is used with linear time warping. We also introduce extensions, which use reference averaging and nonlinear time alignment to achieve further improvements in the recognition rate. These methods are implemented and compared with the conventional dynamic time warping method.
The Romanian ROMVOX Text-to-Speech Synthesis System Using a Mixed Time Domain-LPC Pitch Alteration Technique
Attila Ferencz, István Nagy, Tünde-Csilla Kovács, Maria Ferencz, Teodora Ratiu (Software ITC), Diana Zaiu (Technical University of Cluj-Napoca)
Taking into account that waveform coding (time domain) methods assures a maximum level of intelligibility and naturalness of the synthesized speech, and also that prosodic effects superimposing requires the alteration of pitch (frequency domain), we developed a hybrid time domain-LPC method. This paper presents some theoretical considerations, signal processing aspects and testing results of the new synthesis method developed for the ROMVOX TTS system.
Pushing forward the Interface between Recognition and Understanding
How to Integrate Syntactic Structure into the Output of a Word Recognizer
F. Gallwitz, A. Batliner, J. Buckow, R. Huber, H. Niemann, E. Nöth (Universität Erlangen-Nürnberg)
In this paper we present an integrated approach for recognizing both the word sequence and the syntactic-prosodic structure of a spontaneous utterance. We take into account the fact that a spontaneous utterance is not merely an unstructured sequence of words by incorporating phrase boundary information into the language model and by providing HMMs to model boundaries. This allows for a distinction between word transitions across phrase boundaries and transitions within a phrase. During recognition, the syntactic-prosodic structure of the utterance is determined implicitly. Without any increase in computational effort, this leads to a 4% reduction of word error rate, and, at the same time, syntactic-prosodic boundary labels are provided for subsequent processing. The boundaries are recognized with a precision and recall rate of about 75% for both. They can be used to reduce drastically the computational effort for parsing spontaneous utterances, as has been shown in the German project.
Interfacing of CASA and Multistream Recognition
Hervé Glotin (ICP Grenoble, IDIAP Martigny), Frédéric Berthommier (ICP Grenoble), Emmanuel Tessier (ICP Grenoble), Hervé Bourlard (IDIAP Martigny)
In this paper we propose a running demonstration of coupling between an intermediate processing step (named CASA), based on the harmonicity cue, and partial recognition, implemented with a HMM/ANN multistream technique . The model is able to recognise words corrupted with narrow band noise, either stationary or having variable center frequency. The principle is to identify frame by frame the most noisy subband within four subbands by analysing a SNR-dependent representation. A static partial recogniser is fed with the remaining subbands. We establish on NUMBERS93 the noisy-band identification (NBI) performance as well as the word error rate (WER), and alter the correlation between these two indexes by changing the distribution of the noise.
New Tools for Disambiguation of Czech Texts
Pavel Smrz, Eva Zácková (Faculty of Informatics, Masaryk University)
This paper deals with semi-automatic disambiguation of Czech annotated texts based on partial syntactic analysis. The strengths and weaknesses of implemented rule-based disambiguation tools are discussed and obtained results are presented together with frequency statistics of different types of analysed noun groups.
An Approach of Relevance Theory Modelling in Question Answering Systems
Guylaine Gonel (Universite Paris 13), Bernard Levrat (LERIA, Paris)
We present here a research aimed at modelling interactions in cooperative dialogues on the sole base of a set of criteria rendering some aspects of the relevance as it was introduced in the ``Relevance Theory''. Using this theory initially introduced in the domain of human dialogue allows to take into account a measure of relevance of answers based on complexity of treatment and resource utilisation: An answer to a question products contextual effects on the questioner, at the price of some efforts to obtain them. Modelling relevance should allow the selection of relevant answers on the base of the best compromise between the effects of the informati on it conveys for the listener and the cost of their treatment.
Tagging a Morphologically Rich Language
Marcelo Finger (Instituto de Matemática e Estatística, University of Sao Paulo)
Building large annotated corpora, such as is the case of the Tycho Brahe Corpus of Historical Portuguese, is only feasible if we use automatic methods for such tasks as part of speech tagging. The best automatic tools for part of speech tagging described in the literature were developed and tested for English.
However, the morphological richness of Portuguese forces us to use a number of tags several times larger than that used for English. An analysis of the complexity of the algorithm shows a prohibitive inefficiency resulting from the adoption of a much larger number of tags.
In this work, we propose a new, two-step approach for tagging texts of morphologically rich languages. We describe how the design of tags is affected by this method, and how the existing techniques must be adapted to deal with the greater number of tags found in morphologically rich languages.
A Concept for a Prosodically and Statistically Driven Chunky Semantic Parser
Jürgen Haas, Manuela Boros, Elmar Nöth, Volker Warnke, Heinrich Niemann (University of Erlangen-Nürnberg)
In spoken dialog systems typically just a small set of predefined information has to be provided to the system in order to accomplish its task. We present her a concept for a partial (chunky) semantic parser, whose task is to detect the parts in a user utterance that contain needed information and to analyze these parts linguistically. In order to support the parser, we introduce statistical and prosodic methods to predict semantic concepts in an utterance und their location.
Prague Dependency Treebank: From analytic to tectogrammatical annotations
Eva Hajicová (Faculty of Mathematics and Physics, Charles University)
The Prague Dependency Treebank is conceived of as an annotated corpus of written Czech, comprising three layers of annotations: (i) morphemic layer with about 3000 morphemic tag values; a tag is assigned to each word form of the sentence in the corpus; (ii) analytic tree representations with every word form and punctuation mark explicitly represented as a node of a rooted tree, with no additional nodes added and with the edges of the tree corresponding to (surface) dependency relations, (iii) tectogrammatical trees (underlying sentence representations). A more detailed description of the structure and contents of the tectogrammatical annotations and a specification of the transition from the analytic tree to the tectogrammatical layer is given in the present contribution.
Partial Parsing and Repairs for Spoken Input
Ariane Halber (ENST Paris, Thomson-CSF)
Experiments point at the need for a robust grammatical analysis on recognition results. A robust parsing can first, help distinguish the incorrect recognitions that are recoverable; second, produce a relevant analysis for the interpretation process. We review existing robust parsing techniques and propose an approach based on Lexicalized Tree Grammars.
Partial Syntactic Analysis as a Tool for Semiautomatic Verb Valencies Acquisition And Checking
Semiautomatic Verb Valencies Acquisition And Checking
Pavel Smrz, Ales Horák (Faculty of Informatics, Masaryk University)
In our paper we discuss an approach to semiautomatic corpus processing aimed at verb valencies acquisition and validation of existing verb valency lists. The approach is based on the technique of partial syntactic analysis using a special kind of LALR grammar processing tool.
Epos -- A New Approach to the Speech Synthesis
Jirí Hanika (Faculty of Arts, Charles University), Petr Horák (Institute of Radio-Engineering and Electronics, Academy of Sciences)
Epos is a new language independent rule-driven Text-to-Speech (TTS) system primarily designed to serve as a research tool. We describe its modular structure and the philosophy of the Text-to-Speech Control Protocol used to coordinate the TTS processing.
Technical Insights into the Birth of a Corpus:
A few Words about Building
the Czech National Corpus (CNC)
The CNC is a large collection of Czech texts in SGML format. These texts are processed by the software concordance tool CQP (Corpus Query Program) developed in Stuttgart. By building the CNC I mean conversion of texts from their original format to the common format that is readable by the CQP. Texts in the CNC are taken either from written or from spoken sources. Only the former one will be described in what follows, because the spoken language has not been converted to the SGML format so far. There already exists quite a large collection of the spoken Czech, which has been rewritten into the electronic format, but the conversion has not yet been solved satisfactorily.
The LPC Analysis and Synthesis of F0 Contour
Petr Horák (Institute of Radio-Engineering and Electronics, Academy of Sciences)
Naturalness of the synthetic speech is given mainly by used synthesis method, by used unit inventory, by phonetic transcription and last but not least by modelling of prosody contours. Most of present text-to-speech (TTS) systems use modelling of prosody contours by rules, some TTS systems use prosody modelling by neural nets. This contribution deals with linear prediction analysis and synthesis of F0 contour and its using for improving of the prosody rules designing and for better prosody neural nets modelling.
Textual Deep Structure
Viatcheslav Iatsko (State University of Khakasia)
Textual deep structure comprises diachronic, synchronic, and causativeconsecutive logical relations constituting relational aspect of discourse, which should be differentiated from communicative aspect comprising lexical and grammatical manifestations of logical relations in surface structure of discourse. Three types of correlation between the communicative and relational aspects of discourse are distinguished: 1) non correspondence (contradiction), 2) correspondence, 3) inexplicability of logical relations between judgements in communicative aspect. The notion of textual deep structure is connected with the notion of standard structure of proposition.
An Overview of the Slovenian Spoken Dialog System
Ivo Ipsic, France Mihelic, Simon Dobrisek, Jerneja Gros, Nikola Pavesic (Ljubljana University)
In this paper we give an overview of a Slovenian spoken dialog system, developed within a joint project in multilingual speech recognition and understanding. The aim of the project is the development of an information retrieval system, capable of having a dialog with a user. The paper presents the work on the development of the Slovenian spoken dialog system. Such a system has to be able to handle spontaneous speech, and to provide the user with correct information. The information system being developed for Slovenian speech is used for air flight information retrieval. The system has to answer questions about air flight connections and their time and date.
In the paper we present the developed modules of the Slovenian system and show some results with respect to word accuracy, semantic accuracy and dialog success rate.
Detection of Sentence Types by the Integrated Prosody Module
Jana Klecková, Václav Matousek (University of West Bohemia, Plzen)
The contributed paper describes one of the parts of the subsystem of the multi-lingual multi-functional information retrieval dialog system developed within the frames of the European research program Copernicus.
The described prosody module was developed especially for the first continuous Czech speech processing, recognition and understanding system. Its architecture is based on the application of the construction principles used in recognizers of West-European languages and on their generalization for the processing of Czech sentences that, in contrary to German or English, does not keep any fixed word ordering in the sentences.
BR> Developing a Model of Dialog Strategy
Mare Koit, Haldur Õim (University of Tartu)
The paper gives a short survey of a dialog system we have been implemented at the University of Tartu. Its work is based on a dialog model. The main attention has been paid to one block - the dialog manager. The dialogs are considered where the active participant of the interaction is the computer, its communicative goal is to achieve a decision by the user to perform an action. The computer uses a partner model (its beliefs about evaluations of various aspects of the action by the user) and tries to influence the reasoning process of the user implementing a communicative strategy and tactics of enticing, convincing and threatening and correcting the underlying model. A dialog will be generated jointly by a user and the computer.
Automatic Generation of Instructions in a Multilingual Environment
Ivana Kruijff-Korbayová (Faculty of Mathematics and Physics, Charles University)
The paper describes work that is done in the context of the an international project called AGILE (Automatic Generation of Instructions in Languages of Eastern Europe). The overall aim of the project is to develop a Multilingual system for generating continuous instructional texts in Bulgarian, Czech and Russian. The project is concerned mainly with (i) the development or adaptation of linguistic resources for the chosen languages, and (ii) the investigation and specification of text structuring strategies employed in those languages for the given type of texts.
Generating Spoken Utterances from Concepts and Prosodic Schemes
Pierre Larrey, Nadine Vigouroux, Guy Pérennou (IRIT)
The utterances generated by current dialogue systems lack variability and portability. We present a two-level model for generating spoken utterances using prosodic commands : conceptual segments carrying local phonological prosodic schemes instantiated in global conceptual schemes for the whole utterance. Prosodic operators allow modification of the segments properties. The method generates speech under semantic and pragmatic constraints allowing easy modification of prosody. Moreover, the mark-up languages introduced can be an interface between speech synthesis and natural language generation.
Users' Behaviours in Spontaneous Oral Dialogue Strategy Design
Carine-Alexia Lavelle, Martine de Calmès, Guy Pérennou (IRIT)
During the last evaluations of the DEMON automatic telephonic inquiry system for train schedule information. we have been interested in analysing users' behaviour to see how these observations could help to redesign dialogue strategy. We specifically focused on users' utterances when they had to correct statements of the system. We have been interested in expert, as opposed to casual, users' behaviour.Two experiments were therefore performed. A habituation test and an experiment with computer people aware of automatic inquiry systems. Results of both those tests will be shown and compared to previous experiments performed with railways users. They will be discussed with regard to implications in dialogue strategy design.
A Voice Dialing System for Mobile Phones
Bálint Lükö (Technical University of Budapest)
In this paper a voice dialer system is presented . With this system one can dial conveniently by saying the phone number or the name of the desired subscriber. The user communicates with the system in the form of a simple dialog, using isolated words only. The system is speaker dependent and needs a quick training at first. The paper describes both the recognizer engine and the dialog procedures.
Integrating and Evaluating WSD in the Adaptation of a Lexical Database in Text Categorization Task
L. Alfonso Ureña (Universidad de Jaén), M. De Buenaga (Universidad Europea de Madrid), M. García (Universidad de Jaén), J.M. Gómez (Universidad Europea de Madrid)
Improvement in the accuracy of identifying the correct word sense (WSD) will give better results for many natural language processing tasks. In this paper, we present a new approach using WSD as an aid for Text Categorization (TC). This approach integrates a set of linguistics resources as knowledge sources. So, our approach, for TC using the Vector Space Model, integrates two different resources in text content analysis tasks: a lexical database (WORDNET) and training collections (Reuters-21578). We present the WSD task to TC application. Specifically, we apply WSD to the process of resolving ambiguity in categories WORDNET, so we complement training phases. We have developed experiments to evaluate the improvements obtained by the integration of the resources in TC task and for application of WSD in this task, obtaining a high accuracy in disambiguating category senses of WORDNET.
Connection Driven Parsing of Lexicalized TAG
Patrice Lopez (University of Nancy)
This paper presents a new kind of algorithm for parsing lexicalized TAG. It proceeds considering connected anchors in order to eliminate hypotheses as soon as possible during the analysis. This algorithm proceeds bottum-up in a bidirectional fashion. It has been designed to be applied to natural language and grammars of important size and permits local parsing. Even if it does not improve the worst case time complexity of classical TAG parsers, it permits in practice an interesting efficiency on average case and an important flexibility.
Construction of Structural Thematic Summary of Text
Natalia V. Loukachevitch, Boris V. Dobrov (Scientific Research Computer Center of Moscow State University)
We proposed a new form of text summarization - a structural thematic summary. A structural thematic summary represents contents of texts by indication of main topics of a text that are simulated by sets of terms corresponding to these topics. A structural thematic summary comprises the most informative fragments of thematic representation of a text that contains all terms of the text divided to thematic nodes and interrelations between various topics and subtopics of the text. A structural thematic summary can represent contents of documents of any size and of great varieties of genres. Language of documents and corresponding structural thematic summaries can be Russian or English.
Speaker Verification Using Vector Quantisation
Eugen Lupu, Petre Pop, Gavril Toderean (Technical University of Cluj-Napoca)
The paper presents the results of some experiments made for speaker verification using the single section vector quantisation. In this approach we consider the speaker of a particular utterance as an information source that can be modeled using the standard source coding method called vector quantisation . In vector quantisation (VQ) each source vector is coded as one of a prestored set of codewords that minimizes the distortion between itself and the source vector. For speech, a VQ codebook is obtained from a training sequence containing typical speech. In speaker verification we represent each speaker by a VQ codebook designed from a training sequence composed of repetitions of a particular utterance. The same utterance is used later by an unknown speaker which claims an identity. This test utterance is coded using the speaker codebook, and the resulting quantization distortion is compared with a threshold. If the distortion is below the threshold, the speaker is accepted. For our experiments we develop a medium for speech analysis in order to extract the parameters of speech such as: linear prediction coefficients, averaged spectrum in 17 bands or cepstral coefficients. In this experiments the LPC parameters were used. To obtain the codebooks for each speaker the LBG (Linde - Buzo - Gray) algorithm was implemented. In our experiments codebooks of 4-8-16 dimension were used. The speech material used were the utterances of Romanian vowels.
EULER: Multi-Lingual Text-to-Speech Project
T. Dutoit, F. Malfrère, V. Pagel (Faculte Polytechnique de Mons) P. Mertens (Kath. University Leuven) M. Bagein, A. Ruelle, A. Gilman (Faculte Polytechnique de Mons)
The aim of the project presented in this paper is to obtain a set of text-to-speech synthesizers for as many voices, languages and dialects as possible, free of use for noncommercial and non-military applications. This project is an extension of the MBROLA projects. MBROLA is a speech synthesizer that is freely distributed for noncommercial purposes. A multi-lingual speech segmentation and prosody transplantation tool called MBROLIGN has also been developed and freely distributed. Other labs have also recently distributed for free important tools for speech synthesis like Festival from University o f Edinburgh or the MULTEXT project of the University of Provence. The purpose of this paper is to present the EULER project, which will try to integrate all these results, to Eastern European potential partners, so as to increase the dissemination of the important results of MBROLA and MBROLIGN projects and stimulate East/West collaboration on TTS synthesis.
Text Understanding and Interpreting Methodologies Based on Hermeneutic and Linguistic Techniques
Sergey V. Chebanov (Russian Academy of Science), Gregory Y. Martynenko (St. Petersburg State University), Tatiana Y. Sherstinova (St. Petersburg State University)
Quality improvement of the existing text interpreting methodologies demands for unified conception of interpretation process taking advantages from diverse approaches. Five general types of language conceptions are stated and briefly described. Evident tendency of methodological rapprochement currently observed between contemporary hermeneutics and applied linguistics provokes the idea to synthesize both techniques in the advanced interpretation methodology. Basic theoretical principles for the development of such methodology are proposed.
``Natural Speaking'' Information Retrieval Dialog System
This paper presents an approach to managing spoken dialogs in information service systems. We briefly describe how the approach based upon a tri-partite model of interaction addresses the problems of cooperativeness and portability across language and task domains as well as the semantic interpretation process of utterances in a spoken dialog system for train timetable inquiries. The described approach has been implemented in the generic dialog manager of the SQEL multilingual information retrieval dialog system.
Automatic Phonemic Segmentation of Brazilian Portuguese Speech Databases
Henrique F. Nunes (Zetax Tecnologia Brasil), Edson J. Nagle, Cairo H. da Silva, Fernando Runstein, Henrique F. Nunes, Edson J. Nagle, Cairo H. da Silva, Fernando Runstein (Fundação Centro de Pesquisa e Desenvolvimento en Telecomunicações, Brasil)
This work describes an automatic phonemic segmenter and labeling system for continuous speech sentences spoken in Brazilian Portuguese. The system is based on the Hidden Markov Models. The speech sentences were modeled using phones as basic units. A single model per phone was used regardless of its context. We used 37 basic phone models at all. Results obtained with this system will be shown in this paper.
Automatically Derived Speech Units: Applications to Very Low Rate Coding and Speaker Verification
Jan Cernocký (FEI VUT Brno), Geneviève Baudoin (ESIEE Paris), Dijana Petrovska-Delacrétaz (DE-CIRC, EPFL Lausanne), Jean Hennebert (DE-CIRC, EPFL Lausanne), Gérard Chollet (ENST Paris)
Current systems for recognition, synthesis, very low bit-rate (VLBR) coding and text-independent speaker verification rely on sub-word units determined using phonetic knowledge. This paper presents an alternative to this approachdetermination of speech units using ALISP (Automatic Language Independent Speech Processing) tools. Experimental results for speaker-dependent VLBR coding are reported on two databases: average rate of 120 bps for unit encoding was achieved. In verification, this approach was tested during 1998's NIST-NSA evaluation campaign with a MLP-based scoring system.
Using Local Grammars for Agreement Modeling in Highly Inflective Languages
Goran Nenadic, Dusko Vitas (Faculty of Mathematics, University of Belgrade)
In this paper we present an extended approach to description and modeling of obligatory agreements in a highly inflective language in a way that can be used in NLP. The approach is demonstrated using obligatory correspondences (i.e. agreements) in noun phrases in Serbo-Croatian. We define the notion of a generic mark-up as an extension of the notion of local grammars. Using standard regular expression mechanism, these local grammars can be applied on an initially tagged text corpora for checking agreements or for a non-trivial lexical structure recognition (ex. recognition of complex noun phrases). By defining generic mark-ups one can model the agreements in various lexical structures in a highly inflective language like Serbo-Croatian.
A Natural Language Web-based Dialogue System with a Talking Face
Anton Nijholt, Mathieu van den Berk, Arjan van Hessen (University of Twente)
In this paper we discuss our research on interactions in a virtual theatre that has been built using VRML and therefore can be accessed through Web pages. In the virtual environment we employ two agents. Presently, our WWW-based virtual theatre allows navigation input through keyboard and mouse. In development is a navigation agent that allows speech input. We also have an information agent which allows a natural language dialogue with the system where the input is keyboard-driven and the output is both screen and (speech) synthesizer based. The system's spoken dialogue contribution is presented by visual speech; that is, a simple `talking face' on the screen mouths the systems questions and responses.
The Improvement of Common Statistical Measure
Pavel Rychlý (Faculty of Informatics, Masaryk University)
This paper proposes modifications of common statistical measures like frequency of the word forms or bigrams, mutual information ratio. This modifications gives better results especially for small corpora.
It also proposes an application of the modified frequency that can find specific words in a corpus which can be candidates for one-word terms.
Partial Parsing Method Applied to Rules Acquisition for Medical Expert System
Maciej Piasecki, Jerzy Sas (Wroclaw University of Technology)
The paper presents the variant of partial parsing method (PPM) applied to acquisition of expert rules from a Polish medical text. PPM is based on the premise that knowledge domain is already defined by knowledge engineer (i.e. names for classes, attributes, values etc.). The definitions are automatically translated from natural language into formal expressions stored partially in knowledge base and partially in semantic dictionary. PPM preserves composi-tionality principle and is based on sublanguage method. Subsequent sentences are scanned for occurrences of words belonging to subcategories. Parsing is used for recognition of compound phrases.
Generating Intonation Contours Using Tonal Specifications
Hannes Pirker, Erhard Rank (Öfai Vienna), Harald Trost (University of Vienna)
We present a novel approach to intonation modelling for speech synthesis based on a two-layer technique. The generator component of a concept-to-speech system produces an abstract phonological representation of intonation based on GToBI interpreting the linguistic and discourse information available. This abstract representation must be translated into concrete acoustic parameters. The paper describes how this mapping is achieved with the use of stylized F0 contours.
Recognition of a Weather Forecast Transmitted by the Czech Radio Broadcasting
Ludek Müller, Josef Psutka (University of WestBohemia)
This paper presents a speech recognition system used to transcribe a Czech weather forecast radio speech. The system is based on a HMM with mixture Gaussian continuous densities and is designed as a speaker independent. A language model is used to decrease a speech recognition error rate. A good speed/error rate is achieved through a Viterbi decoder with a beam pruning. Recognition results are presented for a speech collected from weather forecasts on Radio FM and the contribution of the language model is discussed.
An Interface Language for Dialogues
Violeta Quental (Pontifícia Universidade Católica do Rio de Janeiro), Maria Carmelita Padua Dias (Pontifícia Universidade Católica do Rio de Janeiro), Laura Sanchez Garcia (Universidade Federal do Paraná)
This paper describes an interface language, which enables users to cooperatively interact with a knowledge-based information system (KBIS). This interface language, called LINX, is designed as a means of providing a smooth interaction for users who search for information about data that is taxonomically organizes and encoded in formal logic. The Lexicon and the Grammar modules provide the correspondence between natural language and logic and interact with one another, so that linguistic consistency is guaranteed. This language was designed to permit dialogues in Portuguese, but its assumptions are language-independent.
Confidence Measures in Hybrid HMM/ANN Speech Recognition
Giulia Bernardis (Dalle Molle Institute for Perceptual Artificial Intelligence), Hervé Bourlard (Swiss Federal Institute of Technology)
In this paper we define and investigate a set of confidence measures based on hybrid HMM/ANN acoustic models. These measures are using the neural network to estimate the local phone posterior probabilities, which are then combined and normalized in different ways. Experimental results will show that the use of an appropriate duration normalization is very important to obtain good estimates of the phone and word confidences.
The different measures are evaluated at the phone and word levels on both isolated word (PHONEBOOK) and continuous speech (BREF) recognition tasks. It will be shown that one of those confidence measures is well suited for utterance verification, and that confidence measures at the word level perform better than those at the phone level. Finally, using the resulting approach on PHONEBOOK to rescore the N-best list is shown to yield a 34% decrease in word error rate.
Design of the Czech National Speech Corpus for Speech Recognition Applications with a Large Vocabulary
Vlasta Radová (University of West Bohemia)
The paper deals with some problems related to the design of the Czech speech corpus that should be used to train large vocabulary continuous speech recognition systems. A great attention is paid especially to an algorithm that allows to select a phonetically balanced speech database. In the paper, several variants of the algorithm are described and results of some experiments are presented. An attention is given as well to the choice of a suitable portable computer that will be used to record the speech corpus.
An Automated Speech Recognition System for Fast-food Restaurant Applications
L.J.M. Rothkrantz, F. Mohamed-Hoesein, A. Shirzad (Delft University of Technology)
This paper describes an application of Automated Speech Processing for a fast food drive-in restaurant. After describing the corpus of recorded human-human dialogues we present a model for an automated system based on this corpus. A prototype has been developed using Lernout & Hauspie as speech recognition software tool. The dialogue model and grammar will be described. The dialogue model was based on a Wizard of Oz experiment. The results of this experiment and the results of testing of the developed prototype will be presented.
Dutch Automatic Speech Recognition
J. Schalken, L. J. M. Rothkrantz (Delft University of Technology, Knowledge Based Systems)
This paper will present some of the results of DASeR a research group at the Technical University Delft. DASeR (Dutch Automatic Speech Recognition) researches the possibilities of automatic speech recognition using the Dutch language. At the moment research concentrates on the recognition using neural network techniques. This paper will present a research to the SOM and GSOMT (Growing SOM Tree, a hierarchical variant of the SOM) architectures, used for speech recognition. The results of training a phonetic typewriter, both with SOM and with GSOMT, will be presented. To improve the training and recognition speed, the system was implemented on a parallel computer, an nCUBE2.
Development of Multilingual Acoustic Models in the GlobalPhone Project
Tanja Schultz and Alex Waibel (Interactive Systems Laboratories, Universität Karlsruhe, Carnegie Mellon University)
This paper describes our recent effort in developing the GlobalPhone recognizer for multilingual large vocabulary continuous speech. This project investigates LVCSR systems in 15 languages, namely Arabic, Chinese (Mandarin and Wu), Croatian, English, French, German, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. Based on five languages we developed a global phoneme set and built multilingual speech recognizer by variing the method of acoustic model combination. Context dependent phoneme models are created using questions about languages and language groups. Results of a multilingual system which can handle five languages are presented.
Recent Improvements in Slovene Text-to-Speech System
Tomaz Sef, Ales Dobnikar, Matjaz Gams, Zeljko Khermayer (Jozef Stefan Institute, Ljubljana)
This paper presents a text-to-speech (TTS) system that is capable of synthesising continuous Slovenian speech. Input text is processed by a series of independent modules which are described in detail. In order to generate rules for our synthesis scheme, data was collected by analyzing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modelling and so-called superpositional approach at pitch modelling. Speech waveform is synthesized using a concatenative TD-PSOLA technique. Our system is used in several applications. It is built into an employment agent EMA that provides employment information through the Internet. Currently we are developing a system that will enable blind and partially sightless people to work in the Windows environment.
Remarks on Parsing Written and Spoken Discourse
Petr Sgall (Faculty of Mathematics and Physics, Charles University)
The two tasks of parsing written text and oral speech may be understood as two aspects of a single program; neither of them can be completely solved without paying attention to the other. An existing parser is briefly introduced that pays due attention to the anchoring of a sentence in context and, if combined with statistical and other methods of acoustic and phonemic analysis, can be used for analyzing spoken dialogue.
Phonetic Transcription for the Systems of Speech Synthesis and Recognition
Vladimir I. Kuznetsov, Tatiana Y. Sherstinova (St. Petersburg University)
The processing of speech data preserved in the Phonetic Fund of the Russian Language allows to obtain the real characteristics of Russian speech and to compare them with the existing phonetic theory, upon which the transcription rules for the systems of speech synthesis and recognition are based. The research materialthe recordings of the phonetically representative textcontains words in different phrase positions, which may considerably modify their phonation. The detailed analysis of the actual vowel phonation reveals complicated relations between real phonetic transcription and ideal phonetic transcription and therefore induces to review traditional transcription rules used for the systems of connected speech synthesis and recognition.
A Neural Network-Based Speaker-Independent System for Word Recognition in Romanian Language
Dragos Burileanu, Mihai Sima, Corneliu Burileanu, Victor Croitoru (University of Bucharest)
This paper describes the design philosophy of a speakerindependent system for isolatedspoken word recognition in Romanian language. First, two main methods utilized for speech recognition are briefly discussed. We then present the structure of the proposed system, based on a modified Learning Vector Quantization algorithm combined with a Dynamic Programming technique. The paper ends with a comparative set of experiments, conclusions and intended further work.
TENCOAutomatic Text Encoder
Milena Slavcheva, Boyanka Zaharieva (Bulgarian Academy of Sciences, Sofia)
TENCO (Text ENCOder) is a software system for automatic text encoding and for management of a texbase sutable for corpus-oriented research work in the NLP framework. The system produces text documents containing markup conformant to the Corpus Encoding Standard (CES) which is an application of SGML and is based on TEI. A special Text Type Model has been worked out to achieve the automation of the marking process. The most common structural variants of main genres are covered by the system. The aim is to make the construction of a text corpus of Bulgarian a well-organised, efficient, and controlled process that requires the minimum possible human labour and efforts.
Optimal Joint Procedure for Current Pitch
Period Discrimination and Speech Signal
Partition into Quasi-Periodic and Non-Periodic
Taras K. Vintsiuk (NAS Institute of Cybernetics & UNESCO/IIP International Research-Training Centre for Information Technologies and Systems, Kyjiv)
Quasi-periodicity and non-periodicity signal models are proposed. Each hypothetical one-quasiperiodical signal segment is considered as a random distortion of previous or following one taken with the unknown multiplying factor. The problem of optimal current pitch period discrimination and speech signal partition into quasiperiodic and non-periodic segments consists in 1) the finding of the best quasiperiod beginnings or the one-quasiperiod segments under restrictions on both value and changing of current quasiperiod duration and multiplying factor and 2) the association of optimal one-quasiperiod segment signals into large quasiperiodic and non-periodic segments. For this problem solving an effective algorithm based on dynamic programming is proposed and its application for speech signal analysis, recognition and synthesis is discussed.
Validity Criterion for Unsupervised Speaker Recognition
Itshak Voitovetsky, Hugo Guterman, Arnon Cohen (Ben-Gurion University of the Negev)
It is often required to perform automatic segmentation of multispeaker conversations, without having any prior knowledge on the speakers. Given the conversation signal, it is desired to estimate the number of speakers, segment the signal to single speaker segments and label each segment. A method for the determination of the number of speakers participating in a dialogue is presented in this paper. Multiple Self-Organizing Maps (SOMs) are used for clustering, with each SOM representing a cluster. At the end of each stage, a validity criterion (to determine the number of speakers) is calculated for different numbers of SOMs based clustering. Several experiments with dialogues of 2 and 3 speakers were conducted. For high quality speech, the number of speakers was correctly estimated. In telephone quality speech 2 out of eight files were estimated to have three (rather than two) speakers.
Harmonics Tracking and Sieving-Integrating
Algorithm for Robust Speech Recognition
Zhang Bo (J3, Department of Electronic Engineering, City University of HongKong)
There are many problems unsolved for robust speech recognition in noisy environments. This paper addresses two problems of them, robust speech end-point detection and robust isolated word recognition. The speech end-point detection algorithm is based on the sole features of human speech, which are associated with the unique production mechanism of human speech. One is the speech harmonic structure that corresponds to the human phonation part; the other is the speech formant structure that corresponds to the human articulation part. The word recognition algorithm uses a sieving-integrating procedure to measure how much the spectrum of the incoming signal contains the spectrum of the template signal. This is different from the traditional recognizers that measure the similarity between the spectral shapes of the clean speech signal in templates and the incoming signal. When tested on four noisy speech databases with SNR=10dB, the algorithm produced recognition accuracy of 100%, 90%, 90%, 72%.
Diphon Voice Synthesis Using Fast Fourier Transformation
Pavel Zikovský and Pavel Slavík (Czech Technical University, Faculty of Electrical Engineering, Dept. of Computer Science and Engineering)
Voice synthesis is getting more and more important. However, there is still not a method which would make speech synthesis good quality and in real time. The goal of this work is to produce a text-to-speech algorithm, which would find the compromise between the accuracy of the output and low computational cost. The following paper describes a new method of voice synthesis, relying on Fast Fourier Transformation. This method has shown itself as a quite powerful in terms of quality and speed. It is based on diphon voice synthesis, but it uses more powerful connecting algorithms and different descriptions of diphons.
Classification of Medical Unstructured Text Documents Using the Naïve Bayes Algorithm
Jan Zizka (Faculty of Informatics, Masaryk University), Ales Bourek (Teaching Hospital Brno-Bohunice, Masaryk University)
This paper describes an application of the naïve Bayesian learning to the classification of medical unstructured text documents obtained from the Internet sources. The purpose was to provide physicians a simple software tool that would be helpful in filtering large volumes of text data. This tool automatically splits documents into the relevant and irrelevant classes according to the previous learning based on the contents of positive and negative document examples classified by a human expert.
Simple Speech Classification Using Automatically Generated Rules
Jan Zizka (Faculty of Informatics, Masaryk University Brno), Zdenek Kratochvíl (Technical University, Brno), Lubos Popelínský (Faculty of Informatics, Masaryk University Brno)
This paper deals with the special architecture of artificial neural networks based on adaptive radial basis functions (RBF-ANN) which is applied to the problem of speech recognition. The proposed architecture and training algorithm allows a user to obtain directly a set of IF-THEN classification rules as soon as the network finishes its training. The classification results are compared with results obtained from the decision tree generator known as C4.5.
Experience from the Field. Name Dialing en masse
Vladimir Bergl, Ken Davies, Abraham Ittycheriah, Andrzej Sakrajda(IBM Watson Research Centre)
This paper presents the development and deployment of name dialers within IBM, including a recent deployment covering over 200,000 employees in US and Canada. The issues involved include acquiring the name/dept/phone number information, determining the pronunciation of many 'non-standard' names, iterating over the design of the dialog sequence to be used, building an appropriate acoustic model(s) for name recognition, handling of multi-lingual issues (pronunciation differences of names across languages, multi-lingual dialog strategies), and performance tuning. Techniques will be presented that address the difficulties to be expected with such systems and performance and operational data will be given to show the successes and remaining difficulties.
TreeTalk-D: a Machine Learning Approach to Dutch Word Pronunciation
Bertjan Busser (Tilburg University)
We present experimental results concerning the application of the IGTree decision-tree learning algorithm to Dutch word pronunciation. We evaluate four different Dutch word pronunciation systems configured to test the utility of modularization of grapheme to phoneme transcription (G) and stress prediction (S). Both training and testing data are extracted from the CELEX II lexical database. Experiments yield full word transcription accuracies (stressed and syllabified phonetic transcription) of roughly 75%, and 97% accuracy on G at the letter level. The best system performs G and S in sequence, using a context of four letters left and right per grapheme-phoneme mapping.
Speech Recognition in a Real Room and Multi-Simultaneous-Speaker Environment
Athanasios Koutras, Evangelos Dermatas, George Kokkinakis (Wire Communications Laboratory, Electrical & Computer Engineering De University of Patras)
In this paper we deal with the speaker separation and recognition problem in a multi-simultaneous-speaker environment. In particular two speaker separation methods are tested which are based on a) the information maximization and b) the output decorrelation filtering. The performance of the implemented algorithms is evaluated on the basis of phoneme recognition experiments performed on both the separated and the artificially mixed signals as well as on signals recorded in real room environment. Experimentally it is shown that the output decorrelation method works better than the information maximization method reducing the ''cocktail party effect'' by as much as 44%.
Speech Recognition in Noisy Reverberant Rooms Using Frequency Domain Adaptive Filtering
George Nokas, Evangelos Dermatas, George Kokkinakis (Wire Communications Laboratory, Electrical & Computer Engineering Department, University of Patras)
In this paper we show that noise reduction based on the frequency domain adaptive filtering method (FDAF) improves the score of speech recognition systems in reverberant rooms when a non-stationary noise source is present. In extensive experiments, a FDAF method was employed to suppress the noise in the primary microphone's signal by using a noisy reference signal taken from a close-talk microphone positioned near the noise source. The performance of a speaker independent IWSR system was measured and compared with that obtained by the time-domain adaptive filtering method. The recognition score increased up to 50% when the FDAF was used while a maximum of 30% was achieved by the time-domain LMS method.
Statistics of the Syllable Segments for Speech Synthesis of the Czech Language
Robert Batusek (Masaryk University, Brno)
A statistics of the number and the appearance of the syllable segments used for Czech speech synthesis is presented in the paper. The statistics was obtained by evaluation of the data in a large Czech corpus.
Verb Valency and Semantic Classification of Verbs
Ales Horák (Faculty of Informatics, Masaryk University)
This paper deals with computer processing of list of verb valencies for Czech language. We describe the results of semantic classification of Czech verbs according to their valency lists and determining the construction of the logical entity denoted by the verb.
Using TIL for Semantic Analysis of Text
Leo Hadacz (Faculty of Science, Masaryk University
In this paper we discuss the semantic analysis of a discourse. We use particular logical system called Transparent Intensional Logic, which is suitable for representing meaning of natural language expressions. To analyze an expression means for us to translate it into a special term, which is called construction. There are many ways how to do the translation. With respect to further computer processing of the resultant construction we propose one possibility, which we call the normal translation algorithm. Its functionality will be shown on several examples.
Multilingual Speech Recognition in the Context of Multilingual Information Retrieval Dialogues
Stefan Harbeck, Elmar Nöth, Heinrich Niemann (University of Erlangen-Nürnberg))
The multilingual speech recognizer implemented inside the Copernicus project is based on a combination of several monolingual recognizer within one recognizer using a special bigram grammar. Only allowing transitions between the words from one language, each hypothesized word chain contains words from just one language and language identification is an implicit by-product of the speech recognizer. Using this concept on a four language task the multilingual speech recognizer achieves nearly the same accuracy as using monolingual speech recognizers and error free language identification. Additionally a novel language identification module is presented which can be used for preselection of a subset of languages or for setting the apriori probabilities of the languages inside the multilingual speech recognizer. It achieves 76 percent accuracy on a 13 language task.
Czech National Corpus: Its Character, Goal and Background
Frantisek Cermák (Charles University, Prague)
Data-Driven Speech Analysis For ASR
Hynek Hermansky (Oregon Graduate Institute of Science & Technology, ICSI, Berkeley, TU Brno)
This article argues for attention to techniques which could extract reliable, reuseable, and relevant knowledge from currently available large amounts of speech data. A discriminant technique for data-driven design of temporal RASTA filters is described and discussed.
Extraction of Intonation in Thai Language by Karhunen-Loeve Transformation
Rachada Kongkachandra, Kreingsak Tamee, Chom Kimpan (Computer Engineering Division, Faculty of Engineering, King's Mongkut Institute of Technology, Bangkok
By reason of Thai is a tonal language that pitch variations influence the meaning of an utterance. Recognition the Thai words is a crucial task because of vast amount of meanings. In this paper, we present the improvement of our system in the sense of data reduction. The eigenvectors or principal components, extracted by the Karhunen-Loeve transformation (KLT) are exploited as word representatives. The proportion of decrement is 79% from original with 0.0003 mean square error and 19.18 signal/noise ratio. The method experiments based on 15 different Thai isolated words three times pronounced by ten males between 20-35 years old. Further, the paper will illustrate the characteristics of each tones that are obtained by Karhunen-Loeve transformation (KLT).
You BEEP MachineEmotion in Automatic Speech Understanding Systems
R. Huber, E. Nöth, A. Batliner, J. Buckow, V. Warnke, H. Niemann (University of Erlangen-Nuremberg)
In this paper we report on first experiments for the detection of emotion and the use of this information in a complex speech understanding system like . We do not look at lexical information like swear words but rather try to find emotional utterances with the use of acoustic prosodic cues. We only want to classify angry versus neutral speaking style. 20 speakers were asked to produce 50 neutral and 50 angry utterances. With this data set we created one training set and two test sets. One test set with seen speakers, but new turns, the other with unseen speakers, but seen turns. Each word of the emotional utterances was labeled as belonging to the class ``emotional'', each word in the neutral utterances as belonging to the class ``neutral''. For each word 276 prosodic features were calculated and multi layer perceptrons were trained for the two classes. We achieved a precision of 87% and a recall of 92% for the one test set and 94% respectively 84% for the other (precision respectively recall), when classifying turns as being either emotionally or neutral.
Using Rules of Consonant Distribution for Russian Continuous Speech Automatic Segmentation
P. Skrelin, K. Shalonova (Department of Phonetics, St.Petersburg State University)
The paper deals with the identification of word boundaries in Russian continuous speech. With the sound stream converted into the transcription symbols, one can use some methods of identification of word boundaries in the transcription symbol stream. The use of consonant distribution rules for obtaining reliable cues for word boundaries is proposed.
The Structure of the Multimedia Database STDB
Ralf Vollmann (Acoustics Research Dept. of the Austrian Academy of Sciences)
STDB STOOLS Database is a conceptual framework of an open systems approach to multimedia databases for scientific purposes developed at the Acoustics Research Department of the Austrian Academy of Sciences in Vienna being part of STOOLS acoustic analysis systems (Deutsch & Noll 1994); `multimedia' means that digital signals are involved, typically sound events; its main applications center around the (cumulative) development of speech corpora from running speech. Nevertheless, the sound database system, its associated acoustic I/O and sound analysis workstations may be (and are) extended to non-speech applications (e.g. noise research, experimental music, etc.).
How Much Training Data Is Required to Remove Data Sparseness in Statistical Language Learning?
Dan-Hee Yang, Mansuk Song (NLP Lab., Department of Computer Science
Yonsei University, Seoul)
There have been few attempts to estimate the corpus size for practical NLP. This study finds mathematical guess functions by a piecewise curve fitting algorithm with respect to the words that belong to four major parts of speech in a Korean dictionary. Hence, we estimate the size of a corpus necessary to enhance the reliability of corpus-based NLP by reducing the phenomenon of data sparseness to a reasonable degree.