TSD  Paper Instructions   O   Final Papers Instructions   O   Poster Instructions   O   Accepted Papers  TSD 



Accepted Papers


Papers accepted for TSD 2002, with abstracts

Topic: Text


Topic: Speech


Topic: Dialogue



Paper ID: 3
Type: LP
Title: A Common Solution for Tokenization and Part-of-Speech Tagging
Contact author: Jorge Grana and Miguel A. Alonso and Manuel Vilares
Topic: Text - parsing and part-of-speech tagging

Abstract: Current taggers assume that input texts are already tokenized, i.e. correctly segmented in tokens or high level information units that identify each individual component of the texts. This working hypothesis is unrealistic, due to the heterogeneous nature of the application texts and their sources. The greatest troubles arise when this segmentation is ambiguous. The choice of the correct segmentation alternative depends on the context, which is precisely what taggers study. In this work, we develop a tagger able not only to decide the tag to be assigned to every token, but also to decide whether some of them form or not the same term, according to different segmentation alternatives. For this task, we design an extension of the Viterbi algorithm able to evaluate streams of tokens of different lengths over the same structure. We also compare its time and space complexities with those of the classic and iterative versions of the algorithm.

Paper ID: 7
Type: LP
Title: Rule Parser for Arabic Stemmer
Contact author: Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi
Topic: Text - automatic morphology

Abstract: Arabic language exhibits a complex but very regular morphological structure that greatly affect its automation. Current available morphological analysis techniques for the Arabic language are based on heavy computational processes and/or the existence of large amount of associated data. Utilizing existed morphological techniques greatly degrade the efficiency of some natural language applications such as information retrieval system. This paper proposed a new Arabic morphological analysis technique. The technique is based on the pattern similarity of words derived from different roots. Unique patterns are extended and coded as rules that encode morphological characteristics. The technique does not require either complex computation or associated data yet adjustable to maintain enough accuracy. This technique utilizes a very simple parser to scan coded rules and decompose a given Arabic word into its morphological components. This paper provides an introduction to Arabic language and its morphological characteristic followed by an overview of currently available morphological techniques. Explanation of the developed stemmer and its components including rule set and parser were given. Experimental results and the work conclusion were provided at the end.

Paper ID: 9
Type: LP
Title: Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language
Contact author: Tomaž Šef and Maja Škrjanc and Matjaž Gams
Topic: Speech - text-to-speech synthesis

Abstract: This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant 'r') is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant 'r'). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.

Paper ID: 11
Type: SP
Title: Kernel Springy Discriminant Analysis and its Application to a Phonological Awareness Teaching System
Contact author: András Kocsor and Kornél Kovács
Topic: Speech - other

Abstract: Making use of the ubiquitous kernel notion, we present a new nonlinear supervised feature extraction technique called Kernel Springy Discriminant Analysis. We demonstrate that this method can efficiently reduce the number of features and increase classification performance. The improvements obtained admittedly arise from the nonlinear nature of the extraction technique developed here. Since phonological awareness is a great importance in learning to read, a computer-aided training system could be most beneficial in teaching young learners. Naturally, our system employs an effective automatic phoneme recognizer based on the proposed feature extraction technique.

Paper ID: 14
Type: LP
Title: Achieving an Almost Correct PoS-Tagged Corpus
Contact author: Pavel Květoň and Karel Oliva
Topic: Text - parsing and part-of-speech tagging

Abstract: After some theoretical discussion on the issue of representativity of a corpus, this paper presents a simple yet very efficient technique serving for (semi-)automatic detection of those positions in a part-of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "negative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "extended negative bigrams of length n", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The approach is illustrated throughout on the case of the NEGRA corpus (hence some command of German might be helpful, even though not really necessary). Finally, some general implications for statistical taggers are mentioned.

Paper ID: 15
Type: LP
Title: Evaluation of a Japanese Sentence Compression Method Based on Phrase Significance and Inter-Phrase Dependency
Contact author: Rei Oguro and Hiromi Sekiya and Yuhei Morooka and Kazuyuki Takagi and Kazuhiko Ozeki
Topic: Text - text/topic summarization

Abstract: Sentence compression is a method of text summarisation, where each sentence in a text is shortened in such a way as to retain the original information and grammatical correctness as much as possible. In a previous paper, we formulated the problem of sentence compression as an optimisation problem of extracting a subsequence of phrases from the original sentence that maximises the sum of topical importance and grammatical correctness. Based on this formulation an efficient sentence compression algorithm was derived. This paper reports a result of subjective evaluation for the quality of sentences compressed by using the algorithm.

Paper ID: 17
Type: SP
Title: User query understanding by the InBASE system as a source for a multilingual NLG module(first step)
Contact author: Michael V. Boldasov and Elena G. Sokolova and Michael G. Malkovsky
Topic: Text - multi-lingual issues

Abstract: In the paper we consider the NL generation component of InBASE system - the system for understanding of NL queries to Data Bases. This component generates new NL-query from the internal InBASE Q-representation of the user query. During the planning phase a linear positioned query representation is constructed, positions bearing at first conceptual, then syntactic information. Realization phase deals with the NL means to express the concepts (objects, attributes, values, relations between objects and attributes). The NL generation component is conceived as the first step in the direction from one way question - answering system, as InBASE is now, to a larger scale information system able to communicate with user in different areas.

Paper ID: 18
Type: LP
Title: Prosodic Classification of Offtalk: First Experiments
Contact author: Anton Batliner and Viktor Zeissler and Elmar Nöth and Heinrich Niemann
Topic: Dialogue - prosody and emotions in dialogues

Abstract: SmartKom is a multi-modal dialogue system which combines speech with mimics and gestures. In this paper, we want to deal with one of those phenomena which can be observed in such elaborated systems that we want to call `offtalk', i.e., speech that is not directed to the system (speaking to oneself, speaking aside). We report the classification results of first experiments which use a large prosodic feature vector in combination with part--of--speech information.

Paper ID: 20
Type: SP
Title: Large Vocabulary Speech Recognition of Slovenian Language Using Data-Driven Morphological Models
Contact author: Tomaž Rotovnik and Mirjam Sepesy Maučec and Bogomir Horvat and Zdravko Kačič,
Topic: Speech - automatic speech recognition

Abstract: A system for large vocabulary continuous speech recognition of Slovenian language is described. Two types of modelling units are examined: words and sub-words. The data-driven algorithm is used to automatically obtain word decompositions. The performances of one-pass and two-pass decoding strategies were compared. The new models gave promising results. The recognition accuracy was improved by 2.5% absolute at the same recognition time. On the other hand we achieved 30% increase in real time performance at the same recognition error.

Paper ID: 22
Type: LP
Title: Statistical Decision Making applied to Text and Dialogue Corpora for Effective Plan Recognition
Contact author: Manolis Maragoudakis and Aristomenis Thanopoulos and Nikos Fakotakis
Topic: Dialogue - development of dialogue strategies

Abstract: In this paper, we introduce an architecture designed to achieve effective plan recognition using Bayesian Networks which encode the semantic representation of the user’s utterances. The structure of the networks is determined from dialogue corpora, thus eliminating the high cost process of hand-coding domain knowledge. The conditional probability distributions are learned during a training phase in which data are obtained by the same set of dialogue acts. Furthermore, we have incorporated a module that learns semantic similarities of words from raw text corpora and uses the extracted knowledge to resolve the issue of the unknown terms, thus enhancing plan recognition accuracy, and improves the quality of the discourse. We present experimental results of an implementation of our platform for a weather information system and compare its performance against a similar, commercial one. Results depict significant improvement in the context of identifying the goals of the user. Moreover, we claim that our framework could straightforwardly being updated with new elements from the same domain or adapted to other domains as well.

Paper ID: 24
Type: LP
Title: NATURAL LANGUAGE GUIDED DIALOGUES FOR ACCESSING THE WEB
Contact author: Marta Gatius and Horacio Rodríguez
Topic: Dialogue - dialogue systems

Abstract: This paper proposes the use of ontologies representing domain and linguistic knowledge for guiding natural language (NL) communication on the Web contents. This proposal deals with the problem of accessing and processing the Web data required to answer user consults. Concepts and communication acts are represented in the conceptual ontology (CO). Domain-restricted grammars and lexicons are obtained automatically by adapting the general linguistic knowledge to cover the communication acts for a particular domain. The use of domain-restricted grammars and lexicons has proved to be efficient especially when the user is guided in introducing the NL queries. Once the query has been processed, the system fires the appropriate wrappers to extract the data from the Web. The domain concepts described in the CO provides a unifying framework to represent the knowledge obtained from the various Web sources.

Paper ID: 26
Type: LP
Title: German and Czech Speech Synthesis Using HMM-Based Speech Segment Database
Contact author: Jindřich Matoušek and Daniel Tihelka and Josef Psutka and Jana Hesová
Topic: Speech - text-to-speech synthesis

Abstract: This paper presents an experimental German speech synthesis system. As in case of a Czech text-to-speech system ARTIC, statistical approach (using hidden Markov models) was employed to build a speech segment database. This approach was confirmed to be language independent and it was shown to be capable of designing a quality database that led to an intelligible synthetic speech of a high quality. Some experiments with clustering the similar speech contexts were performed to enhance the quality of the synthetic speech. Our results show the superiority of phoneme-level clustering to subphoneme-level one.

Paper ID: 27
Type: SP
Title: Valency Lexicon for Czech: from Verbs to Nouns
Contact author: Markéta Lopatková and Veronika Řezníčková and Zdeněk Žabokrtský,
Topic: Text - other

Abstract: Valency lexicon of Czech verbshas been intensively worked on for more than a year,and now we have at our disposal a detailed description of valencyframes of several hundreds verbs.Presently, the challenge naturally arises, to use the existing lexiconfor capturing valency of other word classes.In this paper, we focus on valency of nouns derived fromverbs. We propose an algorithm for automatic predictionof valency frames of these nouns, and we test it on a sample of data.

Paper ID: 29
Type: LP
Title: Comparison and Combination of Confidence Measures
Contact author: Georg Stemmer and Stefan Steidl and Elmar Nöth and Heinrich Niemann and Anton Batliner
Topic: Speech - automatic speech recognition

Abstract: A set of features for word-level confidence estimation is developed. The features should be easy to implement and should require no additional knowledge beyond the information which is available from the speech recognizer and the training data. We compare a number of features based on a common scoring method, the normalized cross entropy. We also study different ways to combine the features. An artifical neural network leads to the best performance, and a recognition rate of 76% is achieved. The approach is extended not only to detect recognition errors but also to distinguish between insertion and substitution errors.

Paper ID: 30
Type: SP
Title: Uniform Speech Recognition Platform for Evaluation of New Algorithms
Contact author: Andrej Žgank and Tomaž Rotovnik and Zdravko Kačič and Bogomir Horvat
Topic: Speech - automatic speech recognition

Abstract: This paper presents the development of speech recognition platform, which main area of use is the evaluation of different new and improved algorithms for speech recognition (noise reduction, feature extraction, language model generation, training of acoustic models, ...). To enable wide use of the platform, different test configurations were added - from alphabet spelling to large vocabulary continuous speech recognition. At the moment, the speech recognition platform was implemented and evaluated with a studio (SNABI) and a fixed telephone (SpeechDat(II)) speech database.

Paper ID: 31
Type: LP
Title: Strategies for Developing a Real-Time Continuous Speech Recognition System for Czech Language
Contact author: Jan Nouza
Topic: Speech - automatic speech recognition

Abstract: This paper presents a set of ‘strategies’ that enabled the development of a real-time continuous speech recognition system for Czech language. The optimization strategies include efficient computation of HMM probability densities, pruning schemes applied to HMM states, words and word hypotheses, a bigram compression technique as well as parallel implementation of the real recognition system. In a series of off-line speaker-independent tests done with 1600 Czech sentences based on 7033-word lexicon we got 65 % recognition rate. Several on-line tests proved that similar rates can be achieved under real conditions and with response time that is shorter than 1 second.

Paper ID: 32
Type: SP
Title: Voice Chat with a Virtual Character: The Good Soldier Svejk Case Project
Contact author: Jan Nouza and Petr Kolář and Josef Chaloupka
Topic: Dialogue - dialogue systems

Abstract: In this paper we present our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed in the recently implemented version.

Paper ID: 33
Type: SP
Title: Application of Spoken Dialogue Technology in a Medical Domain
Contact author: I. Azzini and T. Giorgino and D. Falavigna and R. Gretter
Topic: Dialogue - dialogue systems

Abstract: The paper describes the ITC-irst approach for handling spoken dialog interactions over the telephone network. We will specifically describe the usage of the dialog system within a tele-medicine application scenario. First, the system architecture will be summarized, then we will briefly describe our approach for evaluating confidence measures for each of the words in the ``best path'' provided by our recognizer. Finally, an automatic service for home monitoring of patients affected by hypertension pathology will be described. Patients must periodically introduce data into a database containing their personal medical data. The collected data are managed, according to well established medical guidelines, by an automatic system that can suggest therapies or alert doctors.

Paper ID: 34
Type: SP
Title: Term Clustering using a Corpus-Based Similarity Measure
Contact author: Goran Nenadić and Irena Spasić and Sophia Ananiadou
Topic: Text - knowledge representation and reasoning

Abstract: In this paper we present a method for the automatic term clustering. The method uses a hybrid similarity measure to cluster terms automatically extracted from a corpus by applying the C/NC value method. The measure comprises contextual, functional and lexical similarity, and it is used to instantiate the cell values in a similarity matrix. The clustering algorithm uses either the nearest neighbour or the Ward’s method to calculate the distance between clusters. The approach has been tested and evaluated in the domain of molecular biology and the results are presented.

Paper ID: 35
Type: LP
Title: Applying dialogue constraints to the understanding process in a Dialogue system
Contact author: Emilio Sanchis and Fernando García and Isabel Galiano and Encarna Segarra
Topic: Dialogue - dialogue systems

Abstract: In this paper, we present an approach to the estimation of a dialogue-dependent understanding component of a dialogue system. This work is developed in the framework of the BASURDE Spanish dialogue system, which answers queries about train timetables by telephone in Spanish. Modelization which is specific to the dialogue state is proposed to improve the behaviour of the understanding process. Some experimental results are presented.

Paper ID: 36
Type: LP
Title: Evaluating a Probabilistic Dialogue Model for a Railway Information Task
Contact author: Carlos D. Martínez-Hinarejos and Francisco Casacuberta
Topic: Dialogue - other

Abstract: Dialogue modelling attempts to determine the way in which a dialog is developed. The dialogue strategy (i.e., the system behaviour) of an automatic dialogue system is determined by the dialogue model. Most dialogue systems use rule-based dialogue strategies, but recently, the probabilistic models have become very promising. We present probabilistic models based on the dialogue act concept, which uses user turns, dialogue history and semantic information. These models are evaluated as dialogue act labelers. The evaluation is carried out on a railway information task.

Paper ID: 37
Type: LP
Title: Comparative Study on Bigram Language Models for Spoken Czech Recognition
Contact author: Dana Nejedlová
Topic: Speech - automatic speech recognition

Abstract: The article deals with the problem of continuous speech recognition of Czech language. The main goal of this study is to compare various kinds of bigram language models with respect to the accuracy and speed of speech recognition. The main types of bigram language models are described here as well as multiple parameters that affect the performance of a speech recognition system. A comparison with a zerogram model is also made. Different models and various parameter settings are compared by means of the accuracy rate in extensive experiments done with a large test database of 1,600 Czech sentences recorded by 40 speakers.

Paper ID: 38
Type: LP
Title: Integration of speech recognition and automatic lipreading
Contact author: Pascal Wiggers and Leon J. M. Rothkrantz
Topic: Speech - automatic speech recognition

Abstract: At Delft University of Technology there is a project running on multimodal interfaces on the interaction of speech and lipreading. A large vocabulary speaker independent speech recognizer for the Dutch language was developed using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. To make the system more noise robust audio cues provided by an automatic lip-reading technique were integrated in the system. In this paper we give an outline of both systems and present results of experiments.

Paper ID: 42
Type: LP
Title: Heuristic and Statistical Methods for Speech/Non-speech Detector Design
Contact author: Michal Prcín and Luděk Müller
Topic: Speech - automatic speech recognition

Abstract: Speech/non-speech (S/NS) detection plays the important role for automatic speech recognition (ASR) system, especially in the case of isolated words or commands recognition. Even in continuous speech a S/NS decision can be made at the beginning and at the end of a sequence resulting in a "sleep mode" of the speech recognizer during the silence and in a reduction of computation demands. It is very difficult, however, to precisely locate the endpoints of the input utterance because of unpredictable background noise. In the proposed method in this paper, we make use of the advantages of two approaches (i.e. to try to find the best set of heuristic features and apply a statistical induction method) for the best S/NS decision.

Paper ID: 44
Type: LP
Title: Evaluation of prediction methods applied to an inflected language
Contact author: Nestor Garay-Vitoria and Julio Abascal and Luis Gardeazabal
Topic: Dialogue - assistive technologies based on speech and dialogue

Abstract: Prediction is one of the techniques that have been applied to Augmentative and Alternative Communication to help people enhancing the quality and quantity of the composed text in a time unit. Most of the literature has been focused in word prediction methods that may be easily applied to non-inflected languages. However, for inflected languages other approaches that mainly distinguish roots and suffixes may enhance the results (in terms of keystroke savings and hit ratio) of predictive systems. In this paper we present the approaches we have applied to the Basque language (an inflected one) and the results they achieve with a particular text (that was not used while creating the initial lexicons the systems use for prediction). Starting from this evaluation, one of the presented approaches is suggested as the best one.

Paper ID: 45
Type: LP
Title: The Role of WSD for Multilingual Natural Language Applications
Contact author: Andrés Montoyo and Rafael Romero and Sonia Vázquez and Carmen Calle and Susana Soler
Topic: Text - word sense disambiguation

Abstract: Nowadays, the need of advanced free text filtering in multilingual environment is increasing. Therefore, when searching for specific keywords in multilingual information space, it is desirable to eliminate occurrences where the word or words of each language are used in an inappropriate sense. This task could be exploited in internet browsers, and resource discovery systems, relational databases containing free text fields, electronic document management systems, data warehouse and data mining systems, etc. In order to resolve this problem in this paper we present a Word Sense D isambiguation interface, which it returns the words senses in different languages and it could be employed for multilingual natural language applications. This interface resolve lexical ambiguity of nouns and verbs in some European languages (English, Spanish) input texts, using the taxonomy of the EuroWordNet lexical knowledge database, and returning a multilingual output of the words senses (English, Spanish, Catalan and Basque). In addition to the relations in WordNet 1.5, EuroWordNet includes cross-language and cross-category relations, which are directly useful for multilingual Word Sense Disambiguation. This interface has been implemented using programming language C++ and providing a visual framework.

Paper ID: 47
Type: LP
Title: A Gibbsian Context-Free Grammar for Parsing
Contact author: Antoine Rozenknop
Topic: Text - parsing and part-of-speech tagging

Abstract: Probabilistic Context-Free Grammars can be used for speech recognition or syntactic analysis thanks to especially efficient algorithms. In this paper, we propose an instanciation of such a grammar, whose mathematical properties are intuitively more suitable for those tasks than SCFG's (Stochastic CFG), without requiring specific analysis algorithms. Results on Susanne text show that up to $33\%$ of analysis errors made by a SCFG can be avoided with this model.

Paper ID: 48
Type: SP
Title: SPEECH ENHANCEMENT USING MIXTURES OF GAUSSIANS FOR SPEECH AND NOISE
Contact author: Ilyas Potamitis and Nikos Fakotakis and Nikos Liolios and George Kokkinakis
Topic: Speech - other

Abstract: In this article we approximate the clean speech spectral magnitude as well as noise spectral magnitude with a mixture of Gaussians pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE), we derive a closed form solution for the spectral magnitude estimation task adapted to the spectral characteristics and noise variance of each band. We evaluate our algorithm using true, coloured, slowly and quickly varying noise types (Factory and aircraft noise) and demonstrate its robustness at very low SNRs.

Paper ID: 49
Type: LP
Title: Word Sense vs. Word Domain Disambiguation: a Maximum Entropy approach
Contact author: Armando Suárez and Manuel Palomar
Topic: Text - word sense disambiguation

Abstract: In this paper, a supervised learning system of word sense disambiguation is presented. It is based on \emph{maximum entropy conditional probability models}. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. The system were evaluated both using WordNet's senses and domains as the sets of classes of each word. Domain labels are obtained from the enrichment of WordNet with subject field codes which produces a polysemy reduction. Several types of features has been analyzed for a few words selected from the DSO corpus. Currently, the system implementation does not support any smoothing technique or complex pre-processing but its accuracy of the system is good when it is compared with, for example, the systems at SENSEVAL-2. Using the domain enrichment of WordNet, a 14\% of accuracy improvement is achieved.

Paper ID: 50
Type: SP
Title: From HTML to VoiceXML: A first approach.
Contact author: César González Ferreras and David Escudero Mancebo and Valentírn Carde\ noso Payo
Topic: Dialogue - markup languages related to speech and dialogue

Abstract: In this work, we discuss the construction process of the voice portal counterpart of a departamental web site. VoiceXML has been used as the dialog modelling language. A prototypical system has been built using our own VoiceXML interpreter, which easily integrates different implementation platforms. A general discussion of VoiceXML advantages and disadvantages is also reported and a simple startup procedure is proposed as a means to build voice portals starting from legacy web sites.

Paper ID: 53
Type: LP
Title: Cross-Language Access to Recorded Speech in the MALACH project
Contact author: D.W. Oard and D. Demner-Fushman and J. Hajič and B. Ramabhadran and S. Gustman and W.J. Byrne and D. Soergel and B. Dorr and P. Resnik and M. Picheny
Topic: Text - information retrieval

Abstract: The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in cross-language speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.

Paper ID: 54
Type: LP
Title: Utterance Verification based on the Likelihood Distance to Alternative Paths
Contact author: Gies Bouwman and Lou Boves
Topic: Speech - automatic speech recognition

Abstract: Utterance verification is the process where one tries to automatically reject incorrectly recognised utterances, while accepting as many correct results as possible. To this aim the probability of an error is often estimated by a one-dimensional confidence measure. In this paper we take a closer look at incorrect classification. We argue that errors stem from a number of different causes and that this observation must be reflected in the design of the utterance verifier. Therefore, we developed measures to detect either out-of-vocabulary (OOV) word errors or in-vocabulary substitution errors. To this aim, we compute confidence measures based on the distance between the likelihood of the first best output and two alternative hypotheses: one corresponding to the second best output, the other to the most likely free phone string. The paper reports on experiments on spoken Dutch city names for a directory assistance application. The results show that a 10% reduction in Confidence Error Rate can be achieved by using a classification and regression tree instead of a linear combination of the cues with a threshold value.

Paper ID: 55
Type: LP
Title: Rejection technique based on the mumble model
Contact author: Tomáš Bartoš and Luděk Müller
Topic: Speech - other

Abstract: In this paper a technique for detection and rejection of incorrectly recognized words is described. The used speech recognition system is based on a speaker-independent continuous density Hidden Markov Model recognizer and so-called mumble model, which structure and function is also described. An improved rejection technique is presented in comparison with the heuristic rejection method that we previously used. The new method is fully statistically based. Therefore selection of features for training and classification, procedures for statistical models parameters estimation and experimental results are reported. The improved rejection technique achieves approximately 12% error rate in detection of incorrectly recognized words.

Paper ID: 56
Type: LP
Title: Efficient Noise Estimation and its Application for Robust Speech Recognition
Contact author: Petr Motlírček and Lukáš Burget
Topic: Speech - automatic speech recognition

Abstract: The investigation of some well known noise estimation techniques is presented. The estimated noise is applied in our noise suppression system that is generally used for speech recognition tasks. Moreover, the algorithms are developed to take part in front-end of Distributed Speech Recognition (DSR). Therefore we have proposed some modifications of noise estimation techniques that are quickly adaptable on varying noise and do not need so much information from past segments. We also minimized the algorithmic delay. The robustness of proposed algorithms were tested under several noisy conditions.

Paper ID: 58
Type: LP
Title: Synthesis in Serbian Language
Contact author: Milan Sečujski and Radovan Obradović and Darko Pekar and Ljubomir Jovanov and Vlado Delić
Topic: Speech - text-to-speech synthesis

Abstract: This paper presents some basic criteria for conception of a concatenative text-to-speech synthesizer in Serbian language. The paper describes the prosody generator which was used, and reflects upon several peculiarities of Serbian language which led to its adoption. The paper also describes criteria for on-line selection of appropriate segments from a large speech corpus.

Paper ID: 61
Type: LP
Title: Using Salient Words to Perform Categorization of Web Sites
Contact author: Marek Trabalka and Mária Bieliková
Topic: Text - information retrieval

Abstract: In this paper we focus on categorization task for web sites. We compare some quantitative characteristics of existing web directories, analyze vocabulary used in descriptions of web sites in Yahoo web directory and propose an approach to automatically categorize web sites. Our approach is based on the novel concept of salient words. Experimental evaluation compares two realizations of proposed concept. The former uses words typical for just category, while the latter uses words typical for one or few categories. Results show that there is a limitation of using single vocabulary based method to properly categorize such heterogeneous space as is the World Wide Web.

Paper ID: 70
Type: LP
Title: Discourse-Semantic Analysis of Hungarian Sign Language
Contact author: Gábor Alberti and Helga M. Szabó
Topic: Text - lexical semantics and semantic networks

Abstract: not present

Paper ID: 71
Type: LP
Title: Speech Features Extraction Using Cone-shaped Kernel Distribution
Contact author: Janez Žibert and France Mihelič and Nikola Pavešić
Topic: Speech - automatic speech recognition

Abstract: The paper reviews two basic time--frequency distributions, spectrogram and cone--shaped kernel distribution applied to speech signals. We are proposing a new modified method of speech features extracting based on mel--frequency cepstral coefficients with use of the cone--shaped kernel distribution. We are additionally exploring several estimates of the time derivatives approximated by regression coefficients and coefficients determined by trigonometric functions. Analyzes and tests are performed for different sets of speech features obtained from spectrogram and cone--shaped kernel distribution using speech recognition system based on hidden Markov acoustic models. Our main goal has been to incorporate different time--frequency distributions into a speech features extraction process and potentially find an alternative way of deriving speech features based on these distributions.

Paper ID: 72
Type: LP
Title: A Voice-Driven Web Browser for Blind People
Contact author: Simon Dobrišek and Jerneja Gros and Boštjan Vesnicer and France Mihelič and Nikola Pavešić
Topic: Dialogue - dialogue systems

Abstract: A specialised small Web browser with a voice-driven dialogue manager and a text-to-speech screen reader is presented. The Web browser was built from the GTK Web browser Dillo, which is a free software project in the terms of the GNU general public license. The new built-in screen reader is now triggered by pointing the mouse and uses the text-to-speech module for its output. A dialogue module together with a spoken-command input was also introduced into the browser. It can be used for navigation through a structure of common Web pages. The developed browser is primarily intended to be used with the new Web portal, exclusively dedicated to blind and visually impaired users. All the Web pages at the portal or at sites that are linked from this portal are expected to be arranged as common HTML/XML pages, which complies with the basic recommendations set by the Web Access Initiative.

Paper ID: 74
Type: LP
Title: Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments
Contact author: Josef Psutka and Pavel Ircing and Josef V. Psutka and Vlasta Radová and William J. Byrne and Jan Hajič and Samuel Gustman and Bhuvana Ramabhadran
Topic: Speech - automatic speech recognition

Abstract: In this paper we describe the initial stages of the ASR component of the MALACH (Multilingual Access to Large Spoken Archives) project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation (VHF) by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present a baseline speech recognition results.

Paper ID: 80
Type: SP
Title: Word Sense Discrimination for Czech
Contact author: Robert Král
Topic: Text - word sense disambiguation

Abstract: This paper deals with the automatic discrimination of contexts of Czech ambiguous words. The Schutze's methodology was used, modified and transformed for the Czech language. This algorithm is based on word space and clustering. The semantic discrimination could be understood as a subtask of word sense disambiguation. In this approach, the sense of word is defined as the cluster of contexts of ambiguous word. We show that Schutze's method is transportable into Czech. Our results are not so good as his because we have experimented with a highly ambiguous word.

Paper ID: 86
Type: SP
Title: Tools for Semi-Automatic Assignment of Czech Nouns to Declination Patterns
Contact author: Dita Bartůšková and Radek Sedláček
Topic: Text - automatic morphology

Abstract: In this paper, we present tools for the semi-automatic assignment of Czech nouns to declination patterns. First, we explain the reasons for development of such tools and then we describe the structure of the system in detail. It is based on a decision tree that consists of questions and answers allowing to distinguish particular declination patterns. Finally, we provide basic statistic data that clarify the relation between the patterns we developped and the classical ones.

Paper ID: 87
Type: LP
Title: Dependency Analyser Configurable by Measures
Contact author: Tomáš Holan
Topic: Text - parsing and part-of-speech tagging

Abstract: In this paper we present a dependency analyser able to computesyntax recognition and analysis according to dependency grammars.Analyser is able to deal with nonprojective constructions,it has means to express the level of word-order freedom and its limitations.Level of word-order freedom and level of robustness (correctness)of sentences can be given as parameters of the analysis.Data and grammar definition laguages are also presented.

Paper ID: 88
Type: LP
Title: knowledge based speech interface for handhelds
Contact author: C.K. Yang and L.J.M. Rothkrantz
Topic: Dialogue - development of dialogue strategies

Abstract: This Paper describes a project done at CMG Trade Transport & Industry BV. It is called SWAMP and is an example of the application of speech technology in human-computer interaction. The reasoning model behind the speech interface is based on the Belief Desire Intention (BDI) model for rational agents. Other important tools that were used to build the speech user interface are the Microsoft Speech API 5 and CLIPS.

Paper ID: 89
Type: SP
Title: A Flexible Framework for Evaluation of New Algorithms for Dialogue Systems
Contact author: Pavel Cenek
Topic: Dialogue - dialogue systems

Abstract: Research in the field of dialog systems often involves building a dialog system used for evaluation of algorithms, collection of data and various experiments. A significant amount of time is needed to create such a system. In order to facilitate this task, we created a flexible, extensible and easy to use framework which can be used as a base for experimenting with dialog systems. Major features of the framework are introduced in the paper together with possible ways of their practical use.

Paper ID: 90
Type: LP
Title: The Generation and Use of Layer Information in Multilayered Extended Semantic Networks
Contact author: Sven Hartrumpf and Hermann Helbig
Topic: Text - lexical semantics and semantic networks

Abstract: The paradigm of Multilayered Extended Semantic Networks (MultiNet) is one of the most thoroughly described knowledge representantion systems along the line of semantic networks (Quillian 1968). The conceptual representation of MultiNet is characterized by embedding its nodes into a multidimensional space of layer attributes. These layer attributes and their values play an important part during the syntactico-semantic analysis of natural language texts and during the inferential answer finding in question answering systems. The paper demonstrates the automatic generation of complex layer information for conceptual nodes and their use in the phase of assimilation of knowledge pieces into a larger knowledge base.

Paper ID: 91
Type: SP
Title: ON THE FIRST GREEK-TTS BASED ON FESTIVAL SPEECH SYNTHESIS: ARCHITECTURE AND COMPONENTS DESCRIPTION
Contact author: Zervas P. and Potamitis I. and Fakotakis N. and Kokkinakis G.
Topic: Speech - text-to-speech synthesis

Abstract: In this article we describe the first Text To Speech (TTS) system for the Greek language based on Festival architecture. We discuss practical implementation details and we capitalize on the preparation of the diphone database and on the prediction of phoneme duration module implemented with CART tree technique. Two male databases where used for two different speech synthesis engines, namely, residual LPC synthesis and MBROLA technique.

Paper ID: 93
Type: LP
Title: Enhancing Best Analysis Selection and Parser Comparison
Contact author: Aleš Horák and Vladimírr Kadlec and Pavel Smrž
Topic: Text - parsing and part-of-speech tagging

Abstract: This paper discusses methods enhancing the selection of a ``best'' parsing tree from the output of natural language syntactic analysis. It presents a method for cutting away redundant parse trees based on the information obtained from a dependency tree-bank corpus. The effectivity of the enhanced parser is demonstrated by results of inter-system parser comparison. The test were run on the standard evaluation grammars (ATIS, CT and PT), our system outperforms the referential implementations.

Paper ID: 94
Type: LP
Title: Exploiting Thesauri and Hierarchical Categories in Cross-Language Information Retrieval
Contact author: Fatiha Sadat and Masatoshi Yoshikawa and Shunsuke Uemura
Topic: Text - information retrieval

Abstract: As Internet resources become accessible to more and more countries, there is a need to develop efficient methods for information retrieval across languages. In the present paper, we focus on query expansion techniques to improve the effectiveness of an information retrieval. A combination to a dictionary-based translation and statistical-based disambiguation is indispensable to overcome translation’s ambiguity. We propose a model using multiple sources for query reformulation and expansion to select expansion terms and retrieve information needed by a user. Relevance feedback, thesaurus-based expansion, as well as a new feedback strategy, based on the extraction of domain keywords to expand user’s query, are introduced and evaluated. We evaluated the effectiveness of the proposed combined method, by an application to a French-English Information Retrieval.

Paper ID: 97
Type: SP
Title: An Analysis of Limited Domains for Speech Synthesis
Contact author: Robert Batůšek
Topic: Speech - text-to-speech synthesis

Abstract: This paper deals with the problem of limited domain speech synthesis. Some experiments show that the segment variability is extremely large for unlimited speech synthesis. It seems that it is practically impossible to colllect the text corpus large enough to cover all combinations of even very coarse features. A natural question arises whether restricting the synthesizer to a specific domain can help to increase segment coverage. This paper provides an analysis of several limited domain text corpora and evaluates their applicability to the problem of segment selection for speech synthesis.

Paper ID: 98
Type: LP
Title: Advances in Very Low Bit Rate Speech Coding using Recognition and Synthesis Techniques
Contact author: Genevieve Baudoin and François Capman and Jan Černocký and Fadi El Chami and Maurice Charbit and Gérard Chollet and Dijana Petrovska-Delacrétaz
Topic: Speech - speech coding

Abstract: ALISP (Automatic Language Independent Speech Processing) units are an alternative concept to using phoneme-derived units in speech processing. This article describes advances in very low bit rate coding using ALISP units. Results of speaker-independent experiments are reported and speaker clustering using vector quantization is proposed. The improvements of speech re-synthesis using Harmonic Noise Model and dynamic selection of units are discussed.

Paper ID: 99
Type: LP
Title: Different Approaches to Build Multilingual Conversational Systems
Contact author: Marion Mast and Thomas Ross and Henrik Schulz and Heli Harrikari
Topic: Dialogue - dialogue systems

Abstract: The paper describes developments and results of the work being carried out during the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki) . The objective of the project is multi-modal, multi-lingual conversational access to information systems. This paper concentrates on issues of the multilingual telephony-based speech and natural language understanding components.

Paper ID: 100
Type: LP
Title: Strategies to Overcome Problematic Input in a Spanish Dialogue System
Contact author: Victoria Arranz and Núria Castell and Montserrat Civit
Topic: Dialogue - dialogue systems

Abstract: This paper focuses on the strategies adopted to tackle problematic input and ease communication between modules in a Spanish railway information dialogue system for spontaneous speech. The paper describes the design and tuning considerations followed by the understanding module, both from a language processing and semantic information extraction point of view. Such strategies aim to handle the problematic input received from the speech recogniser, which is due to spontaneous speech as well as recognition errors.

Paper ID: 101
Type: LP
Title: Fitting German into N-Gram Language Models
Contact author: Robert Hecht and Jürgen Riedler and Gerhard Backfried
Topic: Speech - automatic speech recognition

Abstract: We report on a series of experiments addressing the fact that German is less suited than English for word-based n-gram language models. Several systems were trained at different vocabulary sizes and various sets of lexical units. They were evaluated against a newly created corpus of German and Austrian broadcast news.

Paper ID: 102
Type: LP
Title: Dialogue systems and planning
Contact author: Guy Camilleri
Topic: Dialogue - other

Abstract: Planning processes are often used in dialogue systems to recognize the intentions conveyed in dialogue. The generation of utterances can also be achieved by a planning/execution mechanism. Some advantages of this kind of mechanim are: knowledge sharing, modular design, declarative description, etc. In this paper, we present some planning mechanisms and the related models enabling the dialogue management (generation and understanding).

Paper ID: 103
Type: LP
Title: A Comparison of Different Approaches to Automatic Speech Segmentation
Contact author: Kris Demuynck and Tom Laureys
Topic: Speech - speech segmentation

Abstract: We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a speech segmentation tool. We compare two automatic approaches to segmentation, based on the Viterbi and the Forward-Backward algorithm respectively. Further, we develop different techniques to cope with biases between automatic and manual segmentations. Experiments were performed to evaluate the generation of phonetic transcriptions as well as the different speech segmentation methods.

Paper ID: 104
Type: LP
Title: Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA
Contact author: Jan Žižka and Aleš Bourek
Topic: Text - information retrieval

Abstract: This paper describes a text-document-filtering software tool TEA (TExt Analyzer), which was originally developed for physicians to support selections of large numbers of unstructured medical text documents obtained from available Internet services. TEA learns interesting and relevant documents for individual users basically by the naive Bayes algorithm. Moreover, TEA provides a number of additional functions that improve its classification accuracy. The learning process of TEA is based on a set of labeled positive and negative examples of text documents, which obtain their labels from users interested in documents of certain, usually very specific topics. Experiments and real uses of TEA by physicians have demonstrated that a classification accuracy---separating the documents between two classes (interesting and uninteresting)---can be expected from 70% up to 97%, typically 85% and better.

Paper ID: 106
Type: LP
Title: KEYWORD SPOTTING USING SUPPORT VECTOR MACHINES
Contact author: Yassine Ben Ayed and Dominique Fohr and Jean Paul Haton and Gérard Chollet
Topic: Speech - automatic speech recognition

Abstract: Support Vector Machines is a new and promising technique in statistical learning theory. Recently, this technique produced very interesting results in pattern recognition. In this paper, one of the first application of Support Vector Machines (SVM) technique for the problem of keyword spotting is presented. It classifies the correct and the incorrect keywords by using linear and Radial Basis Function kernels. This is a first work proposed to use SVM in keyword spotting, in order to improve recognition and rejection accuracy. The obtained results are very promising.

Paper ID: 107
Type: LP
Title: Improved performances and automatic parameter estimation for a context-independent speech segmentation algorithm
Contact author: Guido Aversano and Anna Esposito
Topic: Speech - speech segmentation

Abstract: In the framework of a recently introduced algorithm for speech phoneme segmentation, a novel strategy has been elaborated for comparing different speech encoding methods and for finding parameters which are optimal to the algorithm. The automatic procedure that implements this strategy allows to improve previously declared performances and poses the basis for a more accurate comparison between the investigated segmentation system and other segmentation methods proposed in literature.

Paper ID: 110
Type: LP
Title: Phoneme Lattice Based A* Search Algorithm for Speech Recognition
Contact author: Pascal Nocera and Georges Linares and Dominique Massonié and Loic Lefort
Topic: Speech - automatic speech recognition

Abstract: This paper presents the Speeral continuous speech recognition system developed in the LIA. Speeral uses a modified A* algorithm to find in the search graph the best path taking into account acoustic and linguistic constraints. Rather than words by words, the A* used in Speeral is based on a phoneme lattice previously generated. To avoid the backtraking problems, the system keeps for each frame the deepest nodes of the partially explored lexical tree starting at this frame. If a new hypothesis to explore is ended by a word and the lexicon starting where this word finishes has already been developed, then the next hypothesis will ``jump'' directly to the deepest nodes. Decoding performances of Speeral are evaluated on the test set of the ARC B1 campaign of AUPELF'97. The experiments on this French database show the efficiency of the search strategy described in this paper.

Paper ID: 112
Type: SP
Title: Some like it Gaussian ...
Contact author: P. Matějka and P. Schwarz and M. Karafiát and J. Černocký
Topic: Speech - automatic speech recognition

Abstract: In Hidden Markov models, speech features are modeled by Gaussian distributions. In this paper, we propose to gaussianize the features to better fit to this modeling. A distribution of the data is estimated and a transform function is derived. We have tested two methods of the transform estimation (global and speaker based). The results are reported on recognition of isolated Czech words (SpeechDat-E) with CI and CD models and on medium vocabulary continuous speech recognition task (SPINE). Gaussianized data provided in all three cases results superior to standard MFC coefficients proving, that the gaussianization is a cheap way to increase the recognition accuracy

Paper ID: 113
Type: LP
Title: Visualisation Techniques for Analysing Meaning
Contact author: Dominic Widdows and Scott Cederberg and Beate Dorow
Topic: Text - lexical semantics and semantic networks

Abstract: Many ways of dealing with large collections of linguistic information involve the general principle of mapping words, larger terms and documents into some sort of abstract space. Considerable effort has been devoted to applying such techniques for practical tasks such as information retrieval and word-sense disambiguation. However, the inherent structure of these spaces is often less well-understood. Visualisation tools can help to uncover the relationships between meanings in this space, giving a clearer picture of the natural structure of linguistic information. We present a variety of tools for visualising word-meanings in vector spaces and graph models, derived from co-occurrence information and local syntactic analysis. Our techniques suggest new solutions to standard problems such as automatic management of lexical resources, which perform well under evaluation. The tools presented in this paper are all available for public use on our website.

Paper ID: 115
Type: LP
Title: Part-of-Speech Tagging for Old Chinese
Contact author: Liang Huang and Yinan Peng and Huan Wang and Zhenyu Wu
Topic: Text - parsing and part-of-speech tagging

Abstract: Old Chinese is essentially different from Modern Chinese, in both grammar and morphology. While there has recently been a great deal of work on part-of-speech (POS) tagging for modern Chinese, the POS of Old Chinese is largely neglected. To the best of our knowledge, this is the first work in this area. Fortunately however, in terms of tagging, Old Chinese is easier than modern Chinese in that most Old Chinese words are single-character-formed, requiring no segmentation. So in this paper, we will propose and analyze a simple statistical approach for POS tagging of Old Chinese. We first designed a tagset for Old Chinese that is later shown to be accurate and efficient. Then we apply the hidden markov model (HMM) together with the Viterbi algorithm and made several improvements, such as sparse data problem handling, and unknown word guessing, both designed especially for Chinese. As the training set grows larger, the hit rate for bigram and trigram increases to 94.9% and 97.6%, respectively. The importance of our work lies in the previously unseen features that are special for Old Chinese and we have developed successful techniques to deal with them. Although Old Chinese is now a dead language, this work still has many applications in such areas as Ancient-Modern Chinese Machine Translation.

Paper ID: 117
Type: LP
Title: Audio Collections of Endangered Arctic Languages in the Russian Federation
Contact author: Marina Lublinskaya and Tatiana Sherstinova
Topic: Speech - other

Abstract: In the Russian Federation 63 minority languages are mentioned in the "Red Book of the Languages of Russia", what means that they are practically dying out. Because of that it is highly important to make and preserve original recordings of these languages and prepare their documentation. Arctic peoples of Russia are demographically small and the number of speakers using them is decreasing dramatically. The paper describes three projects related to two Northern Languages - Nenets and Nganasan: Nenets Audio Dictionary, Nganasan Audio Dictionary and Russian-Nenets Online Multimedia Phrase-book.

Paper ID: 122
Type: LP
Title: Spanish Natural Language Interface for a Relational Database Querying System
Contact author: Rodolfo A. Pazos R. and Alexander Gelbukh and J. Javier González B. and Erika Alarcón R. and Alejandro Mendoza M. and A. Patricia Domírnguez S
Topic: Text - other

Abstract: Fast growth of Internet is creating a society where the demand on information storage, organization, access, and analysis services is continuously growing. This constantly increases the number of inexperienced users that need to access databases in a simple way. Together with the emergence of voice interfaces, such a situation foretells a promising future for database query systems using natural language interfaces. We describe the architecture of a relational database querying system using a natural language (Spanish) interface, giving a brief explanation of the implementation of each of the constituent modules: lexical parser, syntax checker, and semantic analyzer.

Paper ID: 128
Type: SP
Title: An Analysis of Conditional Responses in dialogue
Contact author: Elena Karagjosova and Ivana Kruijff-Korbayová
Topic: Speech - other

Abstract: In the context of collaborative dialogue, we analyze conditional responses of the form ``Not (if) c/Yes if c'' in reply to a question under discussion ``q''. A conditional response is used when the validity of "q" depends on a condition "c": when "c" is established in the context, the response indicates a possible need to revise "c", and thus opens negotiation; otherwise, the response raises the question whether "c". We discuss appropriateness conditions for conditional responses, and propose a uniform approach to their generation and interpretation.


Faculty of Informatics International Speech Communication Association Faculty of Applied Sciences