Abstracts

Paper ID: 3
Author: Jorge Graña
Title: A Common Solution for Tokenization and Part-of-Speech Tagging: One-Pass Viterbi Algorithm vs. Iterative Approaches
Topic: Text - parsing and part-of-speech tagging

Current taggers assume that input texts are already tokenized, i.e. correctly segmented in tokens or high level information units that identify each individual component of the texts. This working hypothesis is unrealistic, due to the heterogeneous nature of the application texts and their sources. The greatest troubles arise when this segmentation is ambiguous. The choice of the correct segmentation alternative depends on the context, which is precisely what taggers study. In this work, we develop a tagger able not only to decide the tag to be assigned to every token, but also to decide whether some of them form or not the same term, according to different segmentation alternatives. For this task, we design an extension of the Viterbi algorithm able to evaluate streams of tokens of different lengths over the same structure. We also compare its time and space complexities with those of the classic and iterative versions of the algorithm.

Paper ID: 7
Author: Imad A. Alsughaiyer
Title: Rule Parser for Arabic Stemmer
Topic: Text - automatic morphology

Arabic language exhibits a complex but very regular morphological structure that greatly affect its automation. Current available morphological analysis techniques for the Arabic language are based on heavy computational processes and/or the existence of large amount of associated data. Utilizing existed morphological techniques greatly degrade the efficiency of some natural language applications such as information retrieval system. This paper proposed a new Arabic morphological analysis technique. The technique is based on the pattern similarity of words derived from different roots. Unique patterns are extended and coded as rules that encode morphological characteristics. The technique does not require either complex computation or associated data yet adjustable to maintain enough accuracy. This technique utilizes a very simple parser to scan coded rules and decompose a given Arabic word into its morphological components. This paper provides an introduction to Arabic language and its morphological characteristic followed by an overview of currently available morphological techniques. Explanation of the developed stemmer and its components including rule set and parser were given. Experimental results and the work conclusion were provided at the end.

Paper ID: 9
Author: Tomaz Sef
Title: Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language
Topic: Speech - text-to-speech synthesis

This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant 'r') is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant 'r'). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.

Paper ID: 11
Author: Andras Kocsor
Title: Kernel Springy Discriminant Analysis and its Application to a Phonological Awareness Teaching System
Topic: Speech - other

Making use of the ubiquitous kernel notion, we present a new nonlinear supervised feature extraction technique called Kernel Springy Discriminant Analysis. We demonstrate that this method can efficiently reduce the number of features and increase classification performance. The improvements obtained admittedly arise from the nonlinear nature of the extraction technique developed here. Since phonological awareness is a great importance in learning to read, a computer-aided training system could be most beneficial in teaching young learners. Naturally, our system employs an effective automatic phoneme recognizer based on the proposed feature extraction technique.

Paper ID: 14
Author: Pavel Kveton
Title: Achieving an Almost Correct PoS-Tagged Corpus
Topic: Text - parsing and part-of-speech tagging

After some theoretical discussion on the issue of representativity of a corpus, this paper presents a simple yet very efficient technique serving for (semi-)automatic detection of those positions in a part-of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "negative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "extended negative bigrams of length n", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The approach is illustrated throughout on the case of the NEGRA corpus (hence some command of German might be helpful, even though not really necessary). Finally, some general implications for statistical taggers are mentioned.

Paper ID: 15
Author: Rei Oguro
Title: Evaluation of a Japanese Sentence Compression Method Based on Phrase Significance and Inter-Phrase Dependency
Topic: Text - text/topic summarization

Sentence compression is a method of text summarisation, where each sentence in a text is shortened in such a way as to retain the original information and grammatical correctness as much as possible. In a previous paper, we formulated the problem of sentence compression as an optimisation problem of extracting a subsequence of phrases from the original sentence that maximises the sum of topical importance and grammatical correctness. Based on this formulation an efficient sentence compression algorithm was derived. This paper reports a result of subjective evaluation for the quality of sentences compressed by using the algorithm.

Paper ID: 17
Author: Michael V. Boldasov
Title: User query understanding by the InBASE system as a source for a multilingual NLG module(first step)
Topic: Text - multi-lingual issues

In the paper we consider the NL generation component of InBASE system - the system for understanding of NL queries to Data Bases. This component generates new NL-query from the internal InBASE Q-representation of the user query. During the planning phase a linear positioned query representation is constructed, positions bearing at first conceptual, then syntactic information. Realization phase deals with the NL means to express the concepts (objects, attributes, values, relations between objects and attributes). The NL generation component is conceived as the first step in the direction from one way question - answering system, as InBASE is now, to a larger scale information system able to communicate with user in different areas.

Paper ID: 18
Author: Anton Batliner
Title: Prosodic Classification of Offtalk: First Experiments
Topic: Dialogue - prosody and emotions in dialogues

SmartKom is a multi-modal dialogue system which combines speech with mimics and gestures. In this paper, we want to deal with one of those phenomena which can be observed in such elaborated systems that we want to call `offtalk', i.e., speech that is not directed to the system (speaking to oneself, speaking aside). We report the classification results of first experiments which use a large prosodic feature vector in combination with part--of--speech information.

Paper ID: 20
Author: Rotovnik Tomaz
Title: Large Vocabulary Speech Recognition of Slovenian Language Using Data-Driven Morphological Models
Topic: Speech - automatic speech recognition

A system for large vocabulary continuous speech recognition of Slovenian language is described. Two types of modelling units are examined: words and sub-words. The data-driven algorithm is used to automatically obtain word decompositions. The performances of one-pass and two-pass decoding strategies were compared. The new models gave promising results. The recognition accuracy was improved by 2.5% absolute at the same recognition time. On the other hand we achieved 30% increase in real time performance at the same recognition error.

Paper ID: 22
Author: Manolis Maragoudakis
Title: Statistical Decision Making applied to Text and Dialogue Corpora for Effective Plan Recognition
Topic: Dialogue - development of dialogue strategies

In this paper, we introduce an architecture designed to achieve effective plan recognition using Bayesian Networks which encode the semantic representation of the user’s utterances. The structure of the networks is determined from dialogue corpora, thus eliminating the high cost process of hand-coding domain knowledge. The conditional probability distributions are learned during a training phase in which data are obtained by the same set of dialogue acts. Furthermore, we have incorporated a module that learns semantic similarities of words from raw text corpora and uses the extracted knowledge to resolve the issue of the unknown terms, thus enhancing plan recognition accuracy, and improves the quality of the discourse. We present experimental results of an implementation of our platform for a weather information system and compare its performance against a similar, commercial one. Results depict significant improvement in the context of identifying the goals of the user. Moreover, we claim that our framework could straightforwardly being updated with new elements from the same domain or adapted to other domains as well.

Paper ID: 24
Author: Marta Gatius
Title: NATURAL LANGUAGE GUIDED DIALOGUES FOR ACCESSING THE WEB
Topic: Dialogue - dialogue systems

This paper proposes the use of ontologies representing domain and linguistic knowledge for guiding natural language (NL) communication on the Web contents. This proposal deals with the problem of accessing and processing the Web data required to answer user consults. Concepts and communication acts are represented in the conceptual ontology (CO). Domain-restricted grammars and lexicons are obtained automatically by adapting the general linguistic knowledge to cover the communication acts for a particular domain. The use of domain-restricted grammars and lexicons has proved to be efficient especially when the user is guided in introducing the NL queries. Once the query has been processed, the system fires the appropriate wrappers to extract the data from the Web. The domain concepts described in the CO provides a unifying framework to represent the knowledge obtained from the various Web sources.

Paper ID: 26
Author: Jindrich Matousek
Title: German and Czech Speech Synthesis Using HMM-Based Speech Segment Database
Topic: Speech - text-to-speech synthesis

This paper presents an experimental German speech synthesis system. As in case of a Czech text-to-speech system ARTIC, statistical approach (using hidden Markov models) was employed to build a speech segment database. This approach was confirmed to be language independent and it was shown to be capable of designing a quality database that led to an intelligible synthetic speech of a high quality. Some experiments with clustering the similar speech contexts were performed to enhance the quality of the synthetic speech. Our results show the superiority of phoneme-level clustering to subphoneme-level one.

Paper ID: 27
Author: Zdenek Zabokrtsky
Title: Valency Lexicon for Czech: from Verbs to Nouns
Topic: Text - other

Valency lexicon of Czech verbshas been intensively worked on for more than a year,and now we have at our disposal a detailed description of valencyframes of several hundreds verbs.Presently, the challenge naturally arises, to use the existing lexiconfor capturing valency of other word classes.In this paper, we focus on valency of nouns derived fromverbs. We propose an algorithm for automatic predictionof valency frames of these nouns, and we test it on a sample of data.

Paper ID: 29
Author: Georg Stemmer
Title: Comparison and Combination of Confidence Measures
Topic: Speech - automatic speech recognition

A set of features for word-level confidence estimation is developed. The features should be easy to implement and should require no additional knowledge beyond the information which is available from the speech recognizer and the training data. We compare a number of features based on a common scoring method, the normalized cross entropy. We also study different ways to combine the features. An artifical neural network leads to the best performance, and a recognition rate of 76% is achieved. The approach is extended not only to detect recognition errors but also to distinguish between insertion and substitution errors.

Paper ID: 30
Author: Andrej Zgank
Title: Uniform Speech Recognition Platform for Evaluation of New Algorithms
Topic: Speech - automatic speech recognition

This paper presents the development of speech recognition platform, which main area of use is the evaluation of different new and improved algorithms for speech recognition (noise reduction, feature extraction, language model generation, training of acoustic models, ...). To enable wide use of the platform, different test configurations were added - from alphabet spelling to large vocabulary continuous speech recognition. At the moment, the speech recognition platform was implemented and evaluated with a studio (SNABI) and a fixed telephone (SpeechDat(II)) speech database.

Paper ID: 31
Author: Nouza
Title: Strategies for Developing a Real-Time Continuous Speech Recognition System for Czech Language
Topic: Speech - automatic speech recognition

This paper presents a set of ‘strategies’ that enabled the development of a real-time continuous speech recognition system for Czech language. The optimization strategies include efficient computation of HMM probability densities, pruning schemes applied to HMM states, words and word hypotheses, a bigram compression technique as well as parallel implementation of the real recognition system. In a series of off-line speaker-independent tests done with 1600 Czech sentences based on 7033-word lexicon we got 65 % recognition rate. Several on-line tests proved that similar rates can be achieved under real conditions and with response time that is shorter than 1 second.

Paper ID: 32
Author: Jan Nouza
Title: Voice Chat with a Virtual Character: The Good Soldier Svejk Case Project
Topic: Dialogue - dialogue systems

In this paper we present our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed in the recently implemented version.

Paper ID: 33
Author: Daniele Falavigna
Title: Application of the ITC-irst spoken dialog system in a medical domain
Topic: Dialogue - dialogue systems

The paper describes the ITC-irst approach for handling spoken dialog interactions over the telephone network. We will specifically describe the usage of the dialog system within a tele-medicine application scenario. First, the system architecture will be summarized, then we will briefly describe our approach for evaluating confidence measures for each of the words in the ``best path'' provided by our recognizer. Finally, an automatic service for home monitoring of patients affected by hypertension pathology will be described. Patients must periodically introduce data into a database containing their personal medical data. The collected data are managed, according to well established medical guidelines, by an automatic system that can suggest therapies or alert doctors.

Paper ID: 34
Author: Goran Nenadic
Title: Term Clustering using a Corpus-Based Similarity Measure
Topic: Text - knowledge representation and reasoning

In this paper we present a method for the automatic term clustering. The method uses a hybrid similarity measure to cluster terms automatically extracted from a corpus by applying the C/NC value method. The measure comprises contextual, functional and lexical similarity, and it is used to instantiate the cell values in a similarity matrix. The clustering algorithm uses either the nearest neighbour or the Ward’s method to calculate the distance between clusters. The approach has been tested and evaluated in the domain of molecular biology and the results are presented.

Paper ID: 35
Author: SANCHIS, EMILIO
Title: Applying dialogue constraints to the understanding process in a Dialogue system
Topic: Dialogue - dialogue systems

In this paper, we present an approach to the estimation of a dialogue-dependent understanding component of a dialogue system. This work is developed in the framework of the BASURDE Spanish dialogue system, which answers queries about train timetables by telephone in Spanish. Modelization which is specific to the dialogue state is proposed to improve the behaviour of the understanding process. Some experimental results are presented.

Paper ID: 36
Author: Carlos D. Martinez-Hinarejos
Title: Evaluating a Probabilistic Dialogue Model for a Railway Information Task
Topic: Dialogue - other

Dialogue modelling attempts to determine the way in which a dialog is developed. The dialogue strategy (i.e., the system behaviour) of an automatic dialogue system is determined by the dialogue model. Most dialogue systems use rule-based dialogue strategies, but recently, the probabilistic models have become very promising. We present probabilistic models based on the dialogue act concept, which uses user turns, dialogue history and semantic information. These models are evaluated as dialogue act labelers. The evaluation is carried out on a railway information task.

Paper ID: 37
Author: Dana Nejedlova
Title: Comparative Study on Bigram Language Models for Spoken Czech Recognition
Topic: Speech - automatic speech recognition

The article deals with the problem of continuous speech recognition of Czech language. The main goal of this study is to compare various kinds of bigram language models with respect to the accuracy and speed of speech recognition. The main types of bigram language models are described here as well as multiple parameters that affect the performance of a speech recognition system. A comparison with a zerogram model is also made. Different models and various parameter settings are compared by means of the accuracy rate in extensive experiments done with a large test database of 1,600 Czech sentences recorded by 40 speakers.

Paper ID: 38
Author: Pascal Wiggers
Title: Integration of speech recognition and automatic lipreading
Topic: Speech - automatic speech recognition

At Delft University of Technology there is a project running on multimodal interfaces on the interaction of speech and lipreading. A large vocabulary speaker independent speech recognizer for the Dutch language was developed using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. To make the system more noise robust audio cues provided by an automatic lip-reading technique were integrated in the system. In this paper we give an outline of both systems and present results of experiments.

Paper ID: 39
Author: Dimitri Woei-A-Jin
Title: Anaphora Resolution in a speech recognition environment
Topic: Speech - other

In this paper the methods applied to resolve anaphora in the speech recognition application scenario are described. In these applications shallow-parsing has been used, which provides no information about sentence structure and only relevant phrases are returned. As syntactic information is an essential input to most resolution algorithms, a new method had to be developed. It is based on the model presented in [Str98]. In addition a set of filters is employed to overcome the lack of syntactic information so that it is possible to determine some of the dependencies between different phrases, which are needed to successfully solve anaphora.

Paper ID: 42
Author: Michal Prcin
Title: Heuristic and Statistical Methods for Speech/Non-speech Detector Design
Topic: Speech - automatic speech recognition

Speech/non-speech (S/NS) detection plays the important role for automatic speech recognition (ASR) system, especially in the case of isolated words or commands recognition. Even in continuous speech a S/NS decision can be made at the beginning and at the end of a sequence resulting in a "sleep mode" of the speech recognizer during the silence and in a reduction of computation demands. It is very difficult, however, to precisely locate the endpoints of the input utterance because of unpredictable background noise. In the proposed method in this paper, we make use of the advantages of two approaches (i.e. to try to find the best set of heuristic features and apply a statistical induction method) for the best S/NS decision.

Paper ID: 44
Author: Nestor Garay-Vitoria
Title: Evaluation of prediction methods applied to an inflected language
Topic: Dialogue - assistive technologies based on speech and dialogue

Prediction is one of the techniques that have been applied to Augmentative and Alternative Communication to help people enhancing the quality and quantity of the composed text in a time unit. Most of the literature has been focused in word prediction methods that may be easily applied to non-inflected languages. However, for inflected languages other approaches that mainly distinguish roots and suffixes may enhance the results (in terms of keystroke savings and hit ratio) of predictive systems. In this paper we present the approaches we have applied to the Basque language (an inflected one) and the results they achieve with a particular text (that was not used while creating the initial lexicons the systems use for prediction). Starting from this evaluation, one of the presented approaches is suggested as the best one.

Paper ID: 45
Author: Andrés Montoyo
Title: The Role of WSD for Multilingual Natural Language Applications
Topic: Text - word sense disambiguation

Nowadays, the need of advanced free text filtering in multilingual environment is increasing. Therefore, when searching for specific keywords in multilingual information space, it is desirable to eliminate occurrences where the word or words of each language are used in an inappropriate sense. This task could be exploited in internet browsers, and resource discovery systems, relational databases containing free text fields, electronic document management systems, data warehouse and data mining systems, etc. In order to resolve this problem in this paper we present a Word Sense D isambiguation interface, which it returns the words senses in different languages and it could be employed for multilingual natural language applications. This interface resolve lexical ambiguity of nouns and verbs in some European languages (English, Spanish) input texts, using the taxonomy of the EuroWordNet lexical knowledge database, and returning a multilingual output of the words senses (English, Spanish, Catalan and Basque). In addition to the relations in WordNet 1.5, EuroWordNet includes cross-language and cross-category relations, which are directly useful for multilingual Word Sense Disambiguation. This interface has been implemented using programming language C++ and providing a visual framework.

Paper ID: 47
Author: Antoine Rozenknop
Title: A Gibbsian Context-Free Grammar for Parsing
Topic: Text - parsing and part-of-speech tagging

Probabilistic Context-Free Grammars can be used for speech recognition or syntactic analysis thanks to especially efficient algorithms. In this paper, we propose an instanciation of such a grammar, whose mathematical properties are intuitively more suitable for those tasks than SCFG's (Stochastic CFG), without requiring specific analysis algorithms. Results on Susanne text show that up to $33\%$ of analysis errors made by a SCFG can be avoided with this model.

Paper ID: 48
Author: Potamitis Ilyas
Title: SPEECH ENHANCEMENT USING MIXTURES OF GAUSSIANS FOR SPEECH AND NOISE
Topic: Speech - other

In this article we approximate the clean speech spectral magnitude as well as noise spectral magnitude with a mixture of Gaussians pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE), we derive a closed form solution for the spectral magnitude estimation task adapted to the spectral characteristics and noise variance of each band. We evaluate our algorithm using true, coloured, slowly and quickly varying noise types (Factory and aircraft noise) and demonstrate its robustness at very low SNRs.

Paper ID: 49
Author: ARMANDO SUAREZ
Title: Word Sense vs. Word Domain Disambiguation: a Maximum Entropy approach
Topic: Text - word sense disambiguation

In this paper, a supervised learning system of word sense disambiguation is presented. It is based on \emph{maximum entropy conditional probability models}. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. The system were evaluated both using WordNet's senses and domains as the sets of classes of each word. Domain labels are obtained from the enrichment of WordNet with subject field codes which produces a polysemy reduction. Several types of features has been analyzed for a few words selected from the DSO corpus. Currently, the system implementation does not support any smoothing technique or complex pre-processing but its accuracy of the system is good when it is compared with, for example, the systems at SENSEVAL-2. Using the domain enrichment of WordNet, a 14\% of accuracy improvement is achieved.

Paper ID: 50
Author: Cesar Gonzalez Ferreras
Title: From HTML to VoiceXML: A first approach.
Topic: Dialogue - markup languages related to speech and dialogue

In this work, we discuss the construction process of the voice portal counterpart of a departamental web site. VoiceXML has been used as the dialog modelling language. A prototypical system has been built using our own VoiceXML interpreter, which easily integrates different implementation platforms. A general discussion of VoiceXML advantages and disadvantages is also reported and a simple startup procedure is proposed as a means to build voice portals starting from legacy web sites.

Paper ID: 53
Author: Douglas W. Oard
Title: Cross-Language Access to Recorded Speech in the MALACH project
Topic: Text - information retrieval

The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in cross-language speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.

Paper ID: 54
Author: Gies Bouwman
Title: Utterance Verification based on the Likelihood Distance to Alternative Paths
Topic: Speech - automatic speech recognition

Utterance verification is the process where one tries to automatically reject incorrectly recognised utterances, while accepting as many correct results as possible. To this aim the probability of an error is often estimated by a one-dimensional confidence measure. In this paper we take a closer look at incorrect classification. We argue that errors stem from a number of different causes and that this observation must be reflected in the design of the utterance verifier. Therefore, we developed measures to detect either out-of-vocabulary (OOV) word errors or in-vocabulary substitution errors. To this aim, we compute confidence measures based on the distance between the likelihood of the first best output and two alternative hypotheses: one corresponding to the second best output, the other to the most likely free phone string. The paper reports on experiments on spoken Dutch city names for a directory assistance application. The results show that a 10% reduction in Confidence Error Rate can be achieved by using a classification and regression tree instead of a linear combination of the cues with a threshold value.

Paper ID: 55
Author: Tomáš Bartoš
Title: Rejection technique based on the mumble model
Topic: Speech - other

In this paper a technique for detection and rejection of incorrectly recognized words is described. The used speech recognition system is based on a speaker-independent continuous density Hidden Markov Model recognizer and so-called mumble model, which structure and function is also described. An improved rejection technique is presented in comparison with the heuristic rejection method that we previously used. The new method is fully statistically based. Therefore selection of features for training and classification, procedures for statistical models parameters estimation and experimental results are reported. The improved rejection technique achieves approximately 12% error rate in detection of incorrectly recognized words.

Paper ID: 56
Author: Petr Motlicek
Title: Efficient Noise Estimation and its Application for Robust Speech Recognition
Topic: Speech - automatic speech recognition

The investigation of some well known noise estimation techniques is presented. The estimated noise is applied in our noise suppression system that is generally used for speech recognition tasks. Moreover, the algorithms are developed to take part in front-end of Distributed Speech Recognition (DSR). Therefore we have proposed some modifications of noise estimation techniques that are quickly adaptable on varying noise and do not need so much information from past segments. We also minimized the algorithmic delay. The robustness of proposed algorithms were tested under several noisy conditions.

Paper ID: 58
Author: Milan Secujski
Title: Synthesis in Serbian Language
Topic: Speech - text-to-speech synthesis

This paper presents some basic criteria for conception of a concatenative text-to-speech synthesizer in Serbian language. The paper describes the prosody generator which was used, and reflects upon several peculiarities of Serbian language which led to its adoption. The paper also describes criteria for on-line selection of appropriate segments from a large speech corpus.

Paper ID: 61
Author: Marek Trabalka
Title: Using Salient Words to Perform Categorization of Web Sites
Topic: Text - information retrieval

In this paper we focus on categorization task for web sites. We compare some quantitative characteristics of existing web directories, analyze vocabulary used in descriptions of web sites in Yahoo web directory and propose an approach to automatically categorize web sites. Our approach is based on the novel concept of salient words. Experimental evaluation compares two realizations of proposed concept. The former uses words typical for just category, while the latter uses words typical for one or few categories. Results show that there is a limitation of using single vocabulary based method to properly categorize such heterogeneous space as is the World Wide Web.

Paper ID: 68
Author: Nira B. Volskaya
Title: Pause duration at syntactic boundaries
Topic: Speech - other

Any speech synthesis and speech recognition system has the algorithm which defines pause duration. A pause is an important member of prosodic cues used among other components -- pitch changes, prepausal lengthening, declination reset -- for boundary markings and thus for structuring the text into intonation units. We know that the speakers use variable combination of prosodic cues to mark boundaries between different syntactic or discourse units. What prosodic cues are of primary importance for the speaker to mark boundaries of different strength and to highlight the structural make up of the sentences in the text -- these issues have been the subject of intensive research. Some studies have shown than syntactic boundaries are characterized mostly by pauses [1,2,3] and lengthening [2,3,4,5,6], sometimes also pitch [7], some claim tham the interaction of prosodic cues is a mode complex one [8]. Is this study we investigate the role of a pause as one of prosodic cues used in marking syntactic boundaries in the Russian text.

Paper ID: 69
Author: Huang Ke, Ma Shaoping
Title: Text Categorization Based On Concept Indexing and Principal Component Analysis
Topic: Text - other

A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.

Paper ID: 70
Author: GÁBOR ALBERTI
Title: Discourse-Semantic Analysis of Hungarian Sign Language
Topic: Text - lexical semantics and semantic networks

not present

Paper ID: 71
Author: Janez Zibert
Title: Speech Features Extraction Using Cone-shaped Kernel Distribution
Topic: Speech - automatic speech recognition

The paper reviews two basic time--frequency distributions, spectrogram and cone--shaped kernel distribution applied to speech signals. We are proposing a new modified method of speech features extracting based on mel--frequency cepstral coefficients with use of the cone--shaped kernel distribution. We are additionally exploring several estimates of the time derivatives approximated by regression coefficients and coefficients determined by trigonometric functions. Analyzes and tests are performed for different sets of speech features obtained from spectrogram and cone--shaped kernel distribution using speech recognition system based on hidden Markov acoustic models. Our main goal has been to incorporate different time--frequency distributions into a speech features extraction process and potentially find an alternative way of deriving speech features based on these distributions.

Paper ID: 72
Author: Simon Dobrisek
Title: A Voice-Driven Web Browser for Blind People
Topic: Dialogue - dialogue systems

A specialised small Web browser with a voice-driven dialogue manager and a text-to-speech screen reader is presented. The Web browser was built from the GTK Web browser Dillo, which is a free software project in the terms of the GNU general public license. The new built-in screen reader is now triggered by pointing the mouse and uses the text-to-speech module for its output. A dialogue module together with a spoken-command input was also introduced into the browser. It can be used for navigation through a structure of common Web pages. The developed browser is primarily intended to be used with the new Web portal, exclusively dedicated to blind and visually impaired users. All the Web pages at the portal or at sites that are linked from this portal are expected to be arranged as common HTML/XML pages, which complies with the basic recommendations set by the Web Access Initiative.

Paper ID: 74
Author: Josef Psutka
Title: Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments
Topic: Speech - automatic speech recognition

In this paper we describe the initial stages of the ASR component of the MALACH (Multilingual Access to Large Spoken Archives) project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation (VHF) by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present a baseline speech recognition results.

Paper ID: 80
Author: Robert Kral
Title: Word Sense Discrimination for Czech
Topic: Text - word sense disambiguation

This paper deals with the automatic discrimination of contexts of Czech ambiguous words. The Schutze's methodology was used, modified and transformed for the Czech language. This algorithm is based on word space and clustering. The semantic discrimination could be understood as a subtask of word sense disambiguation. In this approach, the sense of word is defined as the cluster of contexts of ambiguous word. We show that Schutze's method is transportable into Czech. Our results are not so good as his because we have experimented with a highly ambiguous word.

Paper ID: 86
Author: Dita Bartuskova
Title: Tools for Semi-Automatic Assignment of Czech Nouns to Declination Patterns
Topic: Text - automatic morphology

In this paper, we present tools for the semi-automatic assignment of Czech nouns to declination patterns. First, we explain the reasons for development of such tools and then we describe the structure of the system in detail. It is based on a decision tree that consists of questions and answers allowing to distinguish particular declination patterns. Finally, we provide basic statistic data that clarify the relation between the patterns we developped and the classical ones.

Paper ID: 87
Author: Tomáš Holan
Title: Dependency Analyser Configurable by Measures
Topic: Text - parsing and part-of-speech tagging

In this paper we present a dependency analyser able to computesyntax recognition and analysis according to dependency grammars.Analyser is able to deal with nonprojective constructions,it has means to express the level of word-order freedom and its limitations.Level of word-order freedom and level of robustness (correctness)of sentences can be given as parameters of the analysis.Data and grammar definition laguages are also presented.

Paper ID: 88
Author: Cheng-KeYang
Title: knowledge based speech interface for handhelds
Topic: Dialogue - development of dialogue strategies

This Paper describes a project done at CMG Trade Transport & Industry BV. It is called SWAMP and is an example of the application of speech technology in human-computer interaction. The reasoning model behind the speech interface is based on the Belief Desire Intention (BDI) model for rational agents. Other important tools that were used to build the speech user interface are the Microsoft Speech API 5 and CLIPS.

Paper ID: 89
Author: Pavel Cenek
Title: A Flexible Framework for Evaluation of New Algorithms for Dialogue Systems
Topic: Dialogue - dialogue systems

Research in the field of dialog systems often involves building a dialog system used for evaluation of algorithms, collection of data and various experiments. A significant amount of time is needed to create such a system. In order to facilitate this task, we created a flexible, extensible and easy to use framework which can be used as a base for experimenting with dialog systems. Major features of the framework are introduced in the paper together with possible ways of their practical use.

Paper ID: 90
Author: Sven Hartrumpf
Title: The Generation and Use of Layer Information in Multilayered Extended Semantic Networks
Topic: Text - lexical semantics and semantic networks

The paradigm of Multilayered Extended Semantic Networks (MultiNet) is one of the most thoroughly described knowledge representantion systems along the line of semantic networks (Quillian 1968). The conceptual representation of MultiNet is characterized by embedding its nodes into a multidimensional space of layer attributes. These layer attributes and their values play an important part during the syntactico-semantic analysis of natural language texts and during the inferential answer finding in question answering systems. The paper demonstrates the automatic generation of complex layer information for conceptual nodes and their use in the phase of assimilation of knowledge pieces into a larger knowledge base.

Paper ID: 91
Author: Zervas Panos
Title: ON THE FIRST GREEK-TTS BASED ON FESTIVAL SPEECH SYNTHESIS: ARCHITECTURE AND COMPONENTS DESCRIPTION
Topic: Speech - text-to-speech synthesis

In this article we describe the first Text To Speech (TTS) system for the Greek language based on Festival architecture. We discuss practical implementation details and we capitalize on the preparation of the diphone database and on the prediction of phoneme duration module implemented with CART tree technique. Two male databases where used for two different speech synthesis engines, namely, residual LPC synthesis and MBROLA technique.

Paper ID: 93
Author: Ales Horak
Title: Enhancing Best Analysis Selection and Parser Comparison
Topic: Text - parsing and part-of-speech tagging

This paper discusses methods enhancing the selection of a ``best'' parsing tree from the output of natural language syntactic analysis. It presents a method for cutting away redundant parse trees based on the information obtained from a dependency tree-bank corpus. The effectivity of the enhanced parser is demonstrated by results of inter-system parser comparison. The test were run on the standard evaluation grammars (ATIS, CT and PT), our system outperforms the referential implementations.

Paper ID: 94
Author: Fatiha SADAT, Masatoshi YOSHIKAWA, and Shunsuke UEMURA
Title: Exploiting Thesauri and Hierarchical Categories in Cross-Language Information Retrieval
Topic: Text - information retrieval

As Internet resources become accessible to more and more countries, there is a need to develop efficient methods for information retrieval across languages. In the present paper, we focus on query expansion techniques to improve the effectiveness of an information retrieval. A combination to a dictionary-based translation and statistical-based disambiguation is indispensable to overcome translation’s ambiguity. We propose a model using multiple sources for query reformulation and expansion to select expansion terms and retrieve information needed by a user. Relevance feedback, thesaurus-based expansion, as well as a new feedback strategy, based on the extraction of domain keywords to expand user’s query, are introduced and evaluated. We evaluated the effectiveness of the proposed combined method, by an application to a French-English Information Retrieval.

Paper ID: 97
Author: Robert Batusek
Title: An Analysis of Limited Domains for Speech Synthesis
Topic: Speech - text-to-speech synthesis

This paper deals with the problem of limited domain speech synthesis. Some experiments show that the segment variability is extremely large for unlimited speech synthesis. It seems that it is practically impossible to colllect the text corpus large enough to cover all combinations of even very coarse features. A natural question arises whether restricting the synthesizer to a specific domain can help to increase segment coverage. This paper provides an analysis of several limited domain text corpora and evaluates their applicability to the problem of segment selection for speech synthesis.

Paper ID: 98
Author: Dijana Petrovska-Delacretaz
Title: Advances in Very Low Bit Rate Speech Coding using Recognition and Synthesis Techniques
Topic: Speech - speech coding

ALISP (Automatic Language Independent Speech Processing) units are an alternative concept to using phoneme-derived units in speech processing. This article describes advances in very low bit rate coding using ALISP units. Results of speaker-independent experiments are reported and speaker clustering using vector quantization is proposed. The improvements of speech re-synthesis using Harmonic Noise Model and dynamic selection of units are discussed.

Paper ID: 99
Author: Marion Mast
Title: Different Approaches to Build Multilingual Conversational Systems
Topic: Dialogue - dialogue systems

The paper describes developments and results of the work being carried out during the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki) . The objective of the project is multi-modal, multi-lingual conversational access to information systems. This paper concentrates on issues of the multilingual telephony-based speech and natural language understanding components.

Paper ID: 100
Author: Victoria Arranz
Title: Strategies to Overcome Problematic Input in a Spanish Dialogue System
Topic: Dialogue - dialogue systems

This paper focuses on the strategies adopted to tackle problematic input and ease communication between modules in a Spanish railway information dialogue system for spontaneous speech. The paper describes the design and tuning considerations followed by the understanding module, both from a language processing and semantic information extraction point of view. Such strategies aim to handle the problematic input received from the speech recogniser, which is due to spontaneous speech as well as recognition errors.

Paper ID: 101
Author: Robert Hecht
Title: Fitting German into N-Gram Language Models
Topic: Speech - automatic speech recognition

We report on a series of experiments addressing the fact that German is less suited than English for word-based n-gram language models. Several systems were trained at different vocabulary sizes and various sets of lexical units. They were evaluated against a newly created corpus of German and Austrian broadcast news.

Paper ID: 102
Author: Guy Camilleri
Title: Dialogue systems and planning
Topic: Dialogue - other

Planning processes are often used in dialogue systems to recognize the intentions conveyed in dialogue. The generation of utterances can also be achieved by a planning/execution mechanism. Some advantages of this kind of mechanim are: knowledge sharing, modular design, declarative description, etc. In this paper, we present some planning mechanisms and the related models enabling the dialogue management (generation and understanding).

Paper ID: 103
Author: Kris Demuynck
Title: A Comparison of Different Approaches to Automatic Speech Segmentation
Topic: Speech - speech segmentation

We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a speech segmentation tool. We compare two automatic approaches to segmentation, based on the Viterbi and the Forward-Backward algorithm respectively. Further, we develop different techniques to cope with biases between automatic and manual segmentations. Experiments were performed to evaluate the generation of phonetic transcriptions as well as the different speech segmentation methods.

Paper ID: 104
Author: Jan Zizka
Title: Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA
Topic: Text - information retrieval

This paper describes a text-document-filtering software tool TEA (TExt Analyzer), which was originally developed for physicians to support selections of large numbers of unstructured medical text documents obtained from available Internet services. TEA learns interesting and relevant documents for individual users basically by the naive Bayes algorithm. Moreover, TEA provides a number of additional functions that improve its classification accuracy. The learning process of TEA is based on a set of labeled positive and negative examples of text documents, which obtain their labels from users interested in documents of certain, usually very specific topics. Experiments and real uses of TEA by physicians have demonstrated that a classification accuracy---separating the documents between two classes (interesting and uninteresting)---can be expected from 70% up to 97%, typically 85% and better.

Paper ID: 106
Author: BEN AYED
Title: KEYWORD SPOTTING USING SUPPORT VECTOR MACHINES
Topic: Speech - automatic speech recognition

Support Vector Machines is a new and promising technique in statistical learning theory. Recently, this technique produced very interesting results in pattern recognition. In this paper, one of the first application of Support Vector Machines (SVM) technique for the problem of keyword spotting is presented. It classifies the correct and the incorrect keywords by using linear and Radial Basis Function kernels. This is a first work proposed to use SVM in keyword spotting, in order to improve recognition and rejection accuracy. The obtained results are very promising.

Paper ID: 107
Author: Guido Aversano
Title: Improved performances and automatic parameter estimation for a context-independent speech segmentation algorithm
Topic: Speech - speech segmentation

In the framework of a recently introduced algorithm for speech phoneme segmentation, a novel strategy has been elaborated for comparing different speech encoding methods and for finding parameters which are optimal to the algorithm. The automatic procedure that implements this strategy allows to improve previously declared performances and poses the basis for a more accurate comparison between the investigated segmentation system and other segmentation methods proposed in literature.

Paper ID: 110
Author: Pascal NOCERA
Title: Phoneme Lattice Based A* Search Algorithm for Speech Recognition
Topic: Speech - automatic speech recognition

This paper presents the Speeral continuous speech recognition system developed in the LIA. Speeral uses a modified A* algorithm to find in the search graph the best path taking into account acoustic and linguistic constraints. Rather than words by words, the A* used in Speeral is based on a phoneme lattice previously generated. To avoid the backtraking problems, the system keeps for each frame the deepest nodes of the partially explored lexical tree starting at this frame. If a new hypothesis to explore is ended by a word and the lexicon starting where this word finishes has already been developed, then the next hypothesis will ``jump'' directly to the deepest nodes. Decoding performances of Speeral are evaluated on the test set of the ARC B1 campaign of AUPELF'97. The experiments on this French database show the efficiency of the search strategy described in this paper.

Paper ID: 112
Author: Pavel Matejka
Title: Some like it Gaussian ...
Topic: Speech - automatic speech recognition

In Hidden Markov models, speech features are modeled by Gaussian distributions. In this paper, we propose to gaussianize the features to better fit to this modeling. A distribution of the data is estimated and a transform function is derived. We have tested two methods of the transform estimation (global and speaker based). The results are reported on recognition of isolated Czech words (SpeechDat-E) with CI and CD models and on medium vocabulary continuous speech recognition task (SPINE). Gaussianized data provided in all three cases results superior to standard MFC coefficients proving, that the gaussianization is a cheap way to increase the recognition accuracy

Paper ID: 113
Author: Dominic Widdows
Title: Visualisation Techniques for Analysing Meaning
Topic: Text - lexical semantics and semantic networks

Many ways of dealing with large collections of linguistic information involve the general principle of mapping words, larger terms and documents into some sort of abstract space. Considerable effort has been devoted to applying such techniques for practical tasks such as information retrieval and word-sense disambiguation. However, the inherent structure of these spaces is often less well-understood. Visualisation tools can help to uncover the relationships between meanings in this space, giving a clearer picture of the natural structure of linguistic information. We present a variety of tools for visualising word-meanings in vector spaces and graph models, derived from co-occurrence information and local syntactic analysis. Our techniques suggest new solutions to standard problems such as automatic management of lexical resources, which perform well under evaluation. The tools presented in this paper are all available for public use on our website.

Paper ID: 114
Author: Aldezabal
Title: Learning Argument/Adjanct distinction for Basque Verbs
Topic: Text - other

This paper presents the experiments performed on lexical knowledge acquisition in the form of verbal subcategorization information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used three different statistical filters to acquire the subcategorization information: Mutual Information, Pearson’s Chi-square and Fisher’s Exact test. Due to the characteristics of agglutinative languages like Basque, the usual classification of arguments in terms of their syntactic category (such as NP or PP) is not suitable. For that reason, the arguments will be classified in 48 different kinds of case markers, which makes the system fine grained if compared to equivalent systems that have been developed for other languages. This work addresses the problem of distinguishing arguments from adjuncts, being this one of the most significant sources of noise in subcategorization frame acquisition.

Paper ID: 115
Author: Liang HUANG
Title: Part-of-Speech Tagging for Old Chinese
Topic: Text - parsing and part-of-speech tagging

Old Chinese is essentially different from Modern Chinese, in both grammar and morphology. While there has recently been a great deal of work on part-of-speech (POS) tagging for modern Chinese, the POS of Old Chinese is largely neglected. To the best of our knowledge, this is the first work in this area. Fortunately however, in terms of tagging, Old Chinese is easier than modern Chinese in that most Old Chinese words are single-character-formed, requiring no segmentation. So in this paper, we will propose and analyze a simple statistical approach for POS tagging of Old Chinese. We first designed a tagset for Old Chinese that is later shown to be accurate and efficient. Then we apply the hidden markov model (HMM) together with the Viterbi algorithm and made several improvements, such as sparse data problem handling, and unknown word guessing, both designed especially for Chinese. As the training set grows larger, the hit rate for bigram and trigram increases to 94.9% and 97.6%, respectively. The importance of our work lies in the previously unseen features that are special for Old Chinese and we have developed successful techniques to deal with them. Although Old Chinese is now a dead language, this work still has many applications in such areas as Ancient-Modern Chinese Machine Translation.

Paper ID: 117
Author: Tatiana Sherstinova
Title: Audio Collections of Endangered Arctic Languages in the Russian Federation
Topic: Speech - other

In the Russian Federation 63 minority languages are mentioned in the "Red Book of the Languages of Russia", what means that they are practically dying out. Because of that it is highly important to make and preserve original recordings of these languages and prepare their documentation. Arctic peoples of Russia are demographically small and the number of speakers using them is decreasing dramatically. The paper describes three projects related to two Northern Languages - Nenets and Nganasan: Nenets Audio Dictionary, Nganasan Audio Dictionary and Russian-Nenets Online Multimedia Phrase-book.

Paper ID: 122
Author: Rodolfo A. Pazos R.
Title: Spanish Natural Language Interface for a Relational Database Querying System
Topic: Text - other

Fast growth of Internet is creating a society where the demand on information storage, organization, access, and analysis services is continuously growing. This constantly increases the number of inexperienced users that need to access databases in a simple way. Together with the emergence of voice interfaces, such a situation foretells a promising future for database query systems using natural language interfaces. We describe the architecture of a relational database querying system using a natural language (Spanish) interface, giving a brief explanation of the implementation of each of the constituent modules: lexical parser, syntax checker, and semantic analyzer.

Paper ID: 128
Author: Elena Karagjosova
Title: An Analysis of Conditional Responses in dialogue
Topic: Speech - other

In the context of collaborative dialogue, we analyze conditional responses of the form ``Not (if) c/Yes if c'' in reply to a question under discussion ``q''. A conditional response is used when the validity of "q" depends on a condition "c": when "c" is established in the context, the response indicates a possible need to revise "c", and thus opens negotiation; otherwise, the response raises the question whether "c". We discuss appropriateness conditions for conditional responses, and propose a uniform approach to their generation and interpretation.