GWC 2004 Paper Abstracts

Why WordNet Should Not Include Figurative Language, and What Whould Be Done Instead

Patrick Hanks (Berlin-Brandenburg Academy of Sciences, Germany)

Metaphor panel

Clustering of Word Senses

Eneko Agirre (University of the Basque Country, Donostia, Spain)

Metaphor panel

Building and Extending Knowledge Fragments

Wim Peters (University of Sheffield, UK)

Metaphor panel

The Heart of the Problem: How Shall We Represent Metaphors in Wordnets?

Antonietta Alonge (Università di Perugia, Italy), Birte Lönneker (University of Hamburg, Germany)

Metaphor panel

Implications of an AI Metaphor Understanding Project

John Barnden (University of Birmingham, UK)

Metaphor panel

Metaphors in the (Mental) Lexicon

Christiane Fellbaum (Princeton University, USA)

Metaphor panel

Sense Proximity versus Sense Relations

Julio Gonzalo (Universidad Nacional de Educaci\'on a Distancia, Madrid, Spain)

Metaphor panel

WordNet Has No `Recycle Bin'

B. A. Sharada, P. M. Girish (Central Institute of Indian Languages, Mysore, India)

This paper is conceived and prepared to provide an overview of the compound words in the WordNet, the miracle lexicon of the new millennium. Indeed meanings are not expressed by single words only such as noun, verb, etc., but also languages do have many ways to express content and the concept. Compound words are one among them. Wide range of words and expressions are included in the WordNet. They express a clear view on the existence of concepts in language and culture. After a keen verification, it is found that, some very frequent compound words are not included in the WordNet available online. This paper lists out some such frequent compound words in English. As far as WordNet is concerned  - this study is more an application oriented than architecture. Algorithms followed in the development of Subject Heading list are suggested.

Finding High-Frequent Synonyms of A Domain-Specific Verb in English...

Chun Xiao, Dietmar Rösner (Universität Magdeburg, Germany)

The task of binary relation extraction in IE  [Cowie00] is based mainly on high-frequent verbs and patterns. During the extraction of a specific relation from MEDLINE (PubMed offers free access to MEDLINE, with links to participating on-line journals and other related databases, available at \url{ } English abstracts, it is noticed that besides the high-frequent verb itself which represents the specific relation, some other word forms, such as the nominal and adjective forms of this verb, as well as its synonyms, also play a very important role. Because of the characteristics of the sub-language in MEDLINE abstracts, the synonym information of the verb can not be obtained directly from a lexicon such as WordNet (   [Fellbaum98]. In this paper, an approach which makes use of both corpus information and WordNet synonym set (WN-synset) information is proposed to find out the synonyms of a domain-specific verb in a sub-language. Given a golden standard synonym list obtained from the test corpus, the recall of this approach achieved 60% under the condition that the precision is 100%. The verbs corresponding to the 60% recall cover 93.05% of all occurrences of verbs in the golden standard synonym list.

EuroWordNet, Word Sense Disambiguation and CLIR

Paul Clough, Mark Stevenson (University of Sheffield, United Kingdom)

One of the aims of EuroWordNet (EWN) was to provide a resource for Cross-Language Information Retrieval (CLIR). In this paper we present experiments which test the usefulness of EWN for this purpose via a formal evaluation using the Spanish queries from the TREC6 CLIR test set. All CLIR systems using bilingual dictionaries must find a way of dealing with multiple translations and we employ a WSD algorithm for this purpose. It was found that this algorithm achieved only around 50% correct disambiguation when compared with manual judgment, however, retrieval performance using the senses it returned was 90% of that recorded using manually disambiguated queries.

Towards Binding Spanish Senses to Wordnet Senses

Javier Farreres, Karina Gibert, Horacio Rodr\'\i guez (Universitat Politècnica de Catalunya, Barcelona, Spain)

This work tries to enrich the Spanish Wordnet using a Spanish taxonomy as a knowledge source. The Spanish taxonomy is composed by Spanish senses, while Wordnet is composed by synsets (English senses). A set of weighted associations between Spanish words and Wordnet synsets is used for inferring associations between both taxonomies. (This research has been partially funded by Aliado (TIC2002-04447-c02-01).)

A Current Resource and Future Perspectives for Enriching WordNets...

Birte Lönneker, Carina Eilts (University of Hamburg, Germany)

This article deals with the question whether metaphors might be integrated into WordNets in a more systematic way. After outlining the advantages of having more information on metaphors in WordNets, it presents the Hamburg Metaphor Database and a possible method for integrating metaphors and corresponding equivalence relations into monolingual WordNets. Finally, problems are discussed that will have to be faced before more metaphor information could be included in WordNets.

WordNet for Lexical Cohesion Analysis

Elke Teich (Darmstadt University of Technology, Germany), Peter Fankhauser (Fraunhofer IPSI, Darmstadt, Germany)

This paper describes an approach to the analysis of lexical cohesion using WordNet. The approach automatically annotates texts with potential cohesive ties, and supports various thesaurus based and text based search facilities as well as different views on the annotated texts. The purpose is to be able to investigate large amounts of text in order to get a clearer idea to what extent semantic relations are actually used to make texts lexically cohesive and which patterns of lexical cohesion can be detected.

Statistical Overview of WordNet from 1.6 to 2.0

Jiangsheng Yu, Zhenshan Wen, Yang Liu, Zhihui Jin (Peking University, China)

We defined several discrete random variables and made their statistical comparisons between different versions of WordNet, by which the macroscopical evolution of WordNet from 1.6 to 2.0 is explored. And at the same time, the examples of extreme data will be enumerated during the experimental analysis.

Automatic Lexicon Generation through WordNet

Nitin Verma, Pushpak Bhattacharyya (Indian Institute of Technology, Bombay, India)

A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any application - be it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a method for automatically generating the dictionary from an input document - making use of the WordNet . The dictionary entries are in the form of Universal Words (UWs) which are language words (primarily English) concatenated with disambiguation information. The entries are associated with syntactic and semantic properties - most of which too are generated automatically. In addition to the WordNet, the system uses a word sense disambiguator , an inferencer and the knowledge base (KB) of the Universal Networking Language which is a recently proposed interlingua. The lexicon so constructed is sufficiently accurate and reduces the manual labour substantially.

Using WordNets in Teaching Virtual Courses of Computational Linguistics

Lothar Lemnitzer, Claudia Kunze (Universität Tübingen, Germany)

This paper focuses on wordnets, especially GermaNet, as topics of teaching and learning in the field of Computational Linguistics. We are aiming at two major goals: to use wordnets for the design of tasks in core modules of the Computational Linguistics curriculum on the one hand, and, on the other hand, to enhance the wordnet structure and its accessibility by the different student projects that have been defined and accomplished. These projects, coping with various structural and content-oriented issues of wordnets, have evolved from three virtual courses taught in Tübingen and Osnabrück. They will be presented in this paper. By establishing wordnets as teaching and learning contents, advanced students should be attracted to join the international wordnet research community.

Extending and Enriching WordNet with OntoLearn

Roberto Navigli, Paola Velardi (Università di Roma "La Sapienza", Italy), Alessandro Cucchiarelli, Francesca Neri (Università Politecnica delle Marche, Ancona, Italy)

OntoLearn is a system for word sense disambiguation, used to automatically enrich WordNet with domain concepts and to disambiguate WordNet glosses. We summarize the WSD algorithm used by Ontolearn, called structural semantic interconnection , and its main applications.

Soft Word Sense Disambiguation

Ganesh Ramakrishnan, B. P. Prithviraj, A. Deepa, Pushpak Bhattacharya, Soumen Chakrabarti (Indian Institute of Technology, Bombay, India)

Word sense disambiguation is a core problem in many tasks related to language processing. In this paper, we introduce the notion of {\em soft word sense disambiguation} which states that given a word, the sense disambiguation system should not commit to a particular sense, but rather, to a set of senses which are not necessarily orthogonal or mutually exclusive. The senses of a word are expressed by its WordNet synsets, arranged according to their relevance. The relevance of these senses are probabilistically determined through a Bayesian Belief Network. The main contribution of the work is a completely probabilistic framework for word-sense disambiguation with a semi-supervised learning technique utilising WordNet. WordNet can be customized to a domain using corpora from that domain. This idea applied to question answering has been evaluated on TREC data and the results are promising.

Approximating Synset Similarity using Topic Signatures

Eneko Agirre (University of the Basque Country, Donostia, Spain), Enrique Alfonseca (Universidad Autonoma de Madrid, Spain), Oier Lopez de Lacalle (University of the Basque Country, Donostia, Spain)

Topic signatures are context vectors built for concepts. They can be automatically acquired for any concept hierarchy using simple methods. This paper explores the correlation between a distributional-based semantic similarity based on topic signatures and several hierarchy-based similarities. We show that topic signatures can be used to approximate link distance in WordNet (0.88 correlation), which allows for various applications, e.g. classifying new concepts in existing hierarchies. We have evaluated two methods for building topic sigantures (monosemous relatives vs. all relatives) and explore a number of different parameters for both methods.

Grounding the Ontology on the Semantic Interpretation Algorithm

Fernando Gomez (University of Central Florida, Orlando, USA)

Some reorganizations and modifications to the WordNet ontology are explained. These changes have been suggested by extensive testing of the ontological categories with an algorithm for semantic interpretation The algorithm is based on predicates that have been defined for WordNet verb classes. The selectional restrictions of the predicates are WordNet ontological categories.

Morphosemantic Relations In and Across Wordnets

Orhan Bilgin, Özlem \c{C}etino\u{g}lu, Kemal Oflazer (Sabanci University, Istanbul, Turkey)

Morphological processes in a language can be effectively used to enrich individual wordnets with semantic relations. More importantly, morphological processes in a language can be used to discover less explicit semantic relations in other languages. This will both improve the internal connectivity of individual wordnets and also the overlap across different wordnets. Using morphology to improve the quality of wordnets and to automatically prepare synset glosses are two other possible applications.

VisDic~--~Wordnet Browsing and Editing Tool

Ale\v{s} Hor\'ak, Pavel Smr\v{z} (Masaryk University in Brno, Czech Republic)

This paper deals with wordnet development tools. It presents a designed and developed system for lexical database editing, which is currently employed in many national wordnet building projects. We discuss basic features of the tool as well as more elaborate functions that facilitate linguistic work in multilingual environment.

English-Arabic Prototype WordNet

William J. Black, Sabri El-Kateb (UMIST, Manchester, UK)

We report on the design and partial implementation of a bilingual English-Arabic dictionary based on WordNet. A relational database is employed to store the lexical and conceptual relations, giving the database extensibility in either language. The data model is extended beyond an Arabic replication of the word$\leftrightarrow$sense relation to include the morphological roots and patterns of Arabic. The editing interface also deals with Arabic script (without requiring a localized operating system).

Use of Wordnet for Retrieving Words from Their Meanings

\.{I}lknur Durgar El-Kahlout, Kemal Oflazer (Sabanci University, Istanbul, Turkey)

This paper presents a Meaning to Word System (MTW) for Turkish Language, that finds a set of words, closely matching the definition entered by the user. The approach of extracting words from "meaning"s is based on checking the similarity between the user's definition and each entry of the Turkish database without considering any semantics or grammatical information. Results on unseen user queries indicate that in 66% of the queries the correct responses were in the first 50 of the words returned, while for queries selected from the word definitions in a different dictionary in 92% of the queries correct responses were in the first 50 of the words returned. Our system make extensive uses of various linguistics resources including Turkish WordNet.

Two Kinds of Hypernymy Faults in WordNet: the Cases of Ring and Isolator

Yang Liu, Jiangsheng Yu, Zhengshan Wen, Shiwen Yu (Peking University, China)

Hypernymy is the key relation that serves to form the ontology of the noun and verb concepts in WordNet and provides a common way of making induction along the hypernymy tree for the NLP researchers. Howerver, we find 2~kinds of abnormal hypernymy in WordNet~2.0, the cases of ring and isolator for short, which can largely harass the reasoning and eventually lead to errors.

Extension of the SpanishWordNet

Clara Soler (Universitat Ramon Llull, Barcelona, Spain)

WordNet divides adjectives in descriptives and relationals basically and they are represented in an enumerative way. The category was not introduced in EuroWordNet and Spanish adjectives in the SpanishWordNet are the translation of the English synsets. This paper describes a proposal of organizing and incorporating adjectives into the SpanishWordNet in terms of representing its polymorphic behaviour. The new organization would be made according to the adjectives taxonomy of MikroKosmos ontology. It results that the ontological approach can be used to explain adjectives polysemy. In the end a new adjectival classification appears in EuroWordNet, in terms of the three types of entities of the Top Ontology.

Sociopolitical Domain As a Bridge from General Words to Terms of Specific Domains

Natalia Loukachevitch, Boris Dobrov (Moscow State University, Russia)

In the paper we argue that there exists a polythematic domain which is situated in an intermediate area between senses of a general language area and specific domains. The concepts of this domain can be naturally added to general wordnets together with publicly known technical terms. Such enhanced wordnets can provide much more considerable preliminary coverage of domain specific texts, improve efficiency of word sense disambiguation procedures.

Procedures and Problems in Korean-Chinese-Japanese Wordnet...

Key-Sun Choi, Hee-Sook Bae (KORTERM, Republic of Korea)

This paper introduces a Korean-Chinese-Japanese wordnet for nouns, verbs and adjectives. This wordnet is constructed based on a hierarchy of shared semantic categories originated from NTT Goidaikei (Hierarchical Lexical System). The Korean wordnet has been constructed by mapping a semantic category to each Korean word sense in a way that maps the same semantic hierarchy to the meanings of nouns, verbs, and adjectives. The meaning of each verb searched in the corpus is compared with its Japanese equivalent. The Chinese wordnet has been also constructed based on the same semantic hierarchy in comparison with the Korean wordnet. In terms of the argument structure, there is a semantic correspondence between Korean, Japanese and Chinese verbs.

ArchiWordNet: Integrating WordNet with Domain-Specific Knowledge

Luisa Bentivogli (ITC-irst, Trento, Italy), Andrea Bocco (Politecnico di Torino, Italy), Emanuele Pianta (ITC-irst, Trento, Italy)

Linguistic resources with domain-specific coverage are crucial for the development of concrete application systems, especially when integrated with domain-independent resources. In this paper we present our experience in the creation of ArchiWordNet, a specialized WordNet for the architecture and construction domain which is being created according to the WordNet model and integrated with WordNet itself. Problematic issues related to the creation of a domain-specific wordnet and its integration with a general language resource are discussed, and practical solutions adopted are described.

Using WordNet Predicates for Multilingual Named Entity Recognition

Matteo Negri, Bernardo Magnini (ITC-Irst, Trento, Italy)

WordNet predicates ( WN-preds ) establish relations between words in a certain language and concepts of a language independent ontology. In this paper we show how {\sc WN-preds} can be profitably used in the context of multilingual tasks where two or more wordnets are aligned. Specifically, we report about the extension to Italian of a previously developed Named Entity Recognition (NER) system for written English. Experimental results demonstrate the validity of the approach and confirm the suitability of WN-preds for a number of different NLP tasks.

Comparing Lexical Chain-based Summarisation Approaches Using an Extrinsic Evaluation

William Doran, Nicola Stokes, Joe Carthy, John Dunnion (University College Dublin, Ireland)

We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chain-based summarisation systems. We also compare the chain scoring and extraction techniques of our system to those of several other baseline systems, including a random summarizer and one based on tf.idf statistics. We use a task-orientated summarisation evaluation scheme that determines summary quality based on TDT story link detection performance.

WordNet Exploitation through a Distributed Network of Servers

I. D. Koutsoubos (Patras University and Research Academic Computer Technology Institute, Patras, Greece), Vassilis Andrikopoulos (Patras University, Greece), Dimitris Christodoulakis (Patras University and Research Academic Computer Technology Institute, Patras, Greece)

The architecture of a lexical database in which multilingual semantic networks would be stored requires the incorporation of flexible mechanisms and services, which would enable the efficient navigation within and across lexical data. We report on WordNet Management System (WMS), a system that functions as the interconnection and communication link between a user and a number of interlinked WordNets. Semantic information is being accessed through a distributed network of servers, forming a large-scale multilingual semantic network.

WordNet Applications

Jorge Morato, Miguel \'{A}ngel Marzal, Juan Llor\'{e}ns, Jos\'{e} Moreiro (University Carlos III, Madrid, Spain)

This paper describes WordNet design and development, discussing its origins, the objectives it initially intended to reach and the subsequent use to which it has been put, the factor that has determined its structure and success. The emphasis in this description of the product is on its main applications, given the instrumental nature of WordNet, and on the improvements and upgrades of the tool itself, along with its use in natural language processing systems. The purpose of the paper is to identify the most significant recent trends with respect to this product, to provide a full and useful overview of WordNet for researchers working in the field of information retrieval. The existing literature is reviewed and present applications are classified to concur with the areas discussed at the First International WordNet Congress.

Multilingual Central Repository

J. Atserias, L. Villarejo (Universitat Politècnica de Catalunya, Barcelona, Catalonia), G. Rigau, E. Agirre (University of the Basque Country, Donostia, Spain), J. Carroll (University of Sussex, UK), B. Magnini (ITC-irst, Trento, Italy), P. Vossen (Irion Technologies B.V., Delft, The Netherlands)

This paper describes the first version of the Multilingual Central Repository, a lexical knowledge base developed in the framework of the MEANING project. Currently the Mcr integrates into the EuroWordNet framework five local wordnets (including four versions of the English WordNet from Princeton), an upgraded version of the EuroWordNet Top Concept ontology, the MultiWordNet Domains, the Suggested Upper Merged Ontology (SUMO) and hundreds of thousand of new semantic relations and properties automatically acquired from corpora. We believe that the resulting Mcr will be the largest and richest Multilingual Lexical Knowledge Base in existence.

Assignment of Domain Labels to WordNet

Mauro Castillo, Francis Real (Universitat Politècnica de Catalunya, Barcelona, Spain), German Rigau (University of the Basque Country, Donostia, Spain)

This paper describes a process to automatically assign domain labels to WordNet glosses. One of the main goals of this work is to show different ways to enrich sistematically and automatically dictionary definitions (or gloses of new WordNet versions) with MultiWordNet domains. Finally, we show how this technique can be used to verify the consistency of the current version of MultiWordNet Domains.

Using a Lemmatizer to Support the Development and Validation of the Greek WordNet

Harry Kornilakis, Maria Grigoriadou (University of Athens, Greece), Eleni Galiotou (University of Athens and Technological Educational Institute of Athens, Greece), Evangelos Papakitsos (University of Athens, Greece)

In this paper we aim to give a description of the computational tools that have been designed and implemented to support the development and validation process of the Greek WordNet, which is currently being developed in the framework of the BalkaNet project. In particular, we focus on the description of a lemmatizer for the Greek language, which has been used as the basis for a number of tools supporting the linguists in their work of developing and validating the Greek WordNet.

Extending WordNet with Syntagmatic Information

Luisa Bentivogli, Emanuele Pianta (ITC-irst, Trento, Italy)

In this paper we present a proposal to extend WordNet-like lexical databases by adding information about the co-occurrence of word meanings in texts. More specifically we propose to add phrasets , i.e. sets of free combinations of words which are recurrently used to express a concept (let's call them Recurrent Free Phrases ). Phrasets are a useful source of information for different NLP tasks, and particularly in a multilingual environment to manage lexical gaps. At least a part of recurrent free phrases can also be represented through a new set of syntagmantic (lexical and semantic) WordNet relations.

Text Categorization and Information Retrieval Using WordNet Senses

Paolo Rosso (Polytechnic University of Valencia, Spain), Edgardo Ferretti (National University of San Luis, Argentina), Daniel Jim\'{e}nez, Vicente Vidal (Polytechnic University of Valencia, Spain)

In this paper we study the influence of semantics in the Text Categorization (TC) and Information Retrieval (IR) tasks. The $K$ Nearest Neighbours ($K$-NN) method was used to perform the text categorization. The experimental results were obtained taking into account for a relevant term of a document its corresponding WordNet synset. For the IR task, three techniques were investigated: the direct use of a weighted matrix, the Singular Value Decomposition (SVD) technique in the Latent Semantic Indexing (LSI) model, and the bisecting spherical $k$-means clustering technique. The experimental results we obtained taking into account the semantics of the documents, allowed for an improvement of the performance for the text categorization whereas they were not so promising for the IR task.


Maria Teresa Sagri, Daniela Tiscornia (Institute for Theory and Techniques for Legal Information, CNR, Firenze, Italy), Francesca Bertagna (Istituto di Linguistica Computazionale, CNR, Pisa, Italy)

The paper describes Jur-Wordnet, an extension for legal domain of the Italian ItalWordNet database, aimed at providing a knowledge base for the multilingual access to sources of legal information. Motivations and aims are discussed, together with details concerning the linguistic architecture and construction methodology.

Word Association Thesaurus As a Resource for Building WordNet

Anna Sinopalnikova (Masaryk University in Brno, Czech Republic and Saint-Petersburg State University, Russia)

The goal of the present paper is to report on the on-going research for applying psycholinguistic resources to building a WordNet-like lexicon of the Russian language. We are to survey different kinds of the linguistic data that can be extracted from a Word Association Thesaurus, a resource representing the results of a large-scaled free association test. In addition, we will give a comparison of Word Association Thesaurus and other language resources applied to wordnet constructing (e.g.~text corpora, explanatory dictionaries) from the viewpoint of the quality and quantity of information they supply the researcher with.

Exploiting ItalWordNet Taxonomies in a Question Classification Task

Francesca Bertagna (Istituto di Linguistica Computazionale, CNR, Pisa, Italy)

The paper presents a case-study about the exploitation of ItalWordNet for Question Answering. In particular, we will explore the access to ItalWordNet when trying to derive the information that is crucial for singling out the answers to Italian Wh-questions introduced by the interrogative elements Quale and Che.

Results and Evaluation of Hungarian Nominal WordNet v1.0

M\'{a}rton Mih\'{a}ltz, G\'{a}bor Pr\'{o}sz\'{e}ky (MorphoLogic, Budapest, Hungary)

This paper presents recent results of the ongoing project aimed at creating the nominal database of the Hungarian WordNet. We present 9 different automatic methods, developed for linking Hungarian nouns to WN 1.6 synsets. Nominal entries are obtained from two different machine-readable dictionaries, a bilingual English-Hungarian and an explanatory monolingual (Hungarian). The results are evaluated against a manually disambiguated test set. The final version of the nominal database is produced by combining the verified result sets and their intersections when confidence scores exceeded certain threshold values.

Fighting Arbitrariness in WordNet-like Lexical Databases...

Shun Ha Sylvia Wong (Aston University, Birmingham, UK)

Motivated by doubts on how faithfully and accurately a lexical database models the complicated relations that exist naturally between real-world concepts, we have studied concept organisation in WordNet 1.5 and EuroWordNet 2. Based on the arbitrariness in concept classification observed in these wordnets, we argue that concept formation in natural languages is a plausible means to improve concept relatedness in lexical databases. We also illustrate that word formation in Chinese exhibits natural semantic relatedness amongst Chinese concepts which can be exploited to aid word sense disambiguation.

Roles: One Dead Armadillo on WordNet's Speedway to Ontology

Martin Trautwein, Pierre Grenon (University of Leipzig, Germany)

We assume that the ontological structure of the common-sense world, and thus of human knowledge about this world, is organized in networks rather than in hierarchies. Thus, using the taxonomies that semantic relations generate in WordNet as the only source for the reconstruction of ontological information must fail at some point. Comparing the ontological structures underlying roles to WordNet representations, we demonstrate that the power of lexical semantics to abstract over contexts distorts the taxonomic order of a conceivable ontology. Approaches trying to adjust the semantics of WordNet relations, in order to reach a higher ontological adequacy, unintentionally produce artifacts deriving from differences between the frequency of contexts, and from metonymy-like reference to ontological relations.

The Topology of WordNet: Some Metrics

Ann Devitt, Carl Vogel (Trinity College, Dublin, Ireland)

This paper outlines some different metrics intended for measuring node specificity in WordNet. Statistics are used to characterise topological properties of the overall network.

Language to Logic Translation with PhraseBank

Adam Pease (Articulate Software Inc, Mountain View, USA), Christiane Fellbaum (Princeton University, USA)

We discuss a restricted natural language understanding system and a proposed extension to it, which is a corpus of phrases. The Controlled English to Logic Translation (CELT) system allows users to make statements in a domain-independent, restricted English grammar that have a clear formal semantics and that are amenable to machine processing. CELT needs a large amount of linguistic and semantic knowledge. It is currently coupled with the Suggested Upper Merged Ontology, which has been mapped by hand to WordNet 1.6. We propose work on a new corpus of phrases (called PhraseBank) to be added to WordNet and linked to SUMO, which will catalog common English phrase forms, and their deep meaning in terms of the formal ontology. This addition should significantly expand the coverage and usefulness of CELT.

Automatic Word Sense Clustering Using Collocation for Sense Adaptation

Sa-Im Shin, Key-Sun Choi (KORTERM, KAIST, Daejeon, Korea)

A specific sense of a word can be determined by collocation of the words gathered from the large corpus that includes context patterns. However, homonym collocation often causes semantic ambiguity. Therefore, the results extracted from corpus should be classified according to every meaning of a word in order to ensure correct collocation. In this paper, K-means clustering is used to solve this problem. This paper reports collocation conditions as well as normalized algorithms actually adopted to address this problem. As a result of applying the proposed method to selected homonyms, the optimal number of semantic clusters showed similarity to those in the dictionary. This approach can disambiguate the sense of homonyms optimally using extracted texts, thus resolving the ambiguity of homonyms arising from collocation.

Corpus Based Validation of WordNet Using Frequency Parameters

Ivan Obradovi\'c, Cvetana Krstev, Gordana Pavlovi\'c-La\v zeti\'c, Du\v sko Vitas (University of Belgrade, Serbia and Montenegro)

In this paper we define a set of frequency parameters to be used in synset validation based on corpora. These parameters indicate the coverage of the corpus by wordnet literals, the importance of one sense of a literal in comparison to the others, as well as the importance of one literal in a synset in comparison to other literals in the same synset. The obtained results can be used in synset refinement, as well as in information retrieval tasks.

Concerning the Difference ...\ in the Estonian WordNet

Heili Orav, Kadri Vider (University of Tartu, Estonia)

One source of Estonian WordNet have been corpora of Estonian. On the other hand, we get interested in word sense disambiguation, and about 100,000 words in corpora are manually disambiguated according to Estonian WordNet senses. The aim of this paper is to explain some theoretical problems that "do not work well in practice". These include the differentiation of word senses, metaphors, and conceptual word combinations.

Quality Control for Wordnet Development

Pavel Smr\v{z} (Masaryk University in Brno, Czech Republic)

This paper deals with quality assurance procedures for general-purpose language resources. Special attention is paid to quality control in wordnet development. General issues of quality management are tackled; technical as well as methodological aspects are discussed. As a case study, the application of the described procedures is demonstrated on the quality evaluation techniques in the context of the BalkaNet project.

Creation of English and Hindi Verb Hierarchies and their Applications ...

Debasri Chakrabarti, Pushpak Bhattacharyya (Indian Institute of Technology, Mumbai, India)

Verbs form the pivots of sentences. However, they have not received as much attention as nouns did in the ontology and lexical semantics research. The classification of verbs and placing them in a structure according to their selectional preference and other semantic properties seem essential in most text information processing tasks like machine translation, information extraction etc . The present paper describes the construction of a verb hierarchy using Beth Levin's verb classes for English, the hypernymy hierarchy of the WordNet and the constructs and the knowledge base of the Universal Networking Language (UNL) which is a recently proposed interlingua. These ideas have been translated into the building of a verb hierarchy for Hindi. The application of this hierarchy to the construction of the Hindi WordNet is discussed. The overall motivation for this work is the task of machine translation between English and Hindi.

Russian WordNet

Valentina Balkova (Russicon Company, Russia), Andrey Sukhonogov (Petersburg Transport University, Moscow, Russia), Sergey Yablonsky (Petersburg Transport University, Moscow and Russicon Company, Russia)

This paper deals with development of the first public web version of Russian WordNet and future parallel English-Russian and multiligual web versions of WordNet. It describes usage of Russian and English-Russian lexical language resources and software to process WordNet for Russian language and design of a database management systems for efficient storage and retrieval of various kinds of lexical information needed to process WordNet. Relevant aspects of the UML data models, XML format and related technologies are surveyed. The pilot Internet/Intranet version of described system based on Oracle 9i DBMS and Java technology is published at:

Cross-Lingual Validation of Multilingual Wordnets

Dan Tufi\c{s}, Radu Ion, Eduard Barbu, Verginica Barbu (Institute for Artificial Intelligence, Bucharest, Romania)

Incorporating Wordnet or its monolingual followers in modern NLP-based systems already represents a general trend motivated by numerous reports showing significant improvements in the overall performances of these systems. Multilingual wordnets, such as EuroWordNet or BalkaNet, represent one step further with great promises in the domain of multilingual processing. The paper describes one possible way to check the quality (correctness and completeness) of the interlingual alignments of several wordnets and pinpoints the possible omissions or alignment errors.

Adjectives in RussNet

Irina Azarova (Saint-Petersburg State University, Russia), Anna Sinopalnikova (Saint-Petersburg State University, Russia and Masaryk University, Brno, Czech Republic)

This paper deals with the problem of structuring adjectives in a wordnet. We will present several methods of dealing with this problem based on the usage of different language resources: frequency lists, text corpora, word association norms, and explanatory dictionaries. The work has been developed within the framework of the RussNet project aiming at building a wordnet for Russian. Three types of relations between descriptive adjectives are to be discussed in detail, and a technique for combining data from various resources to be introduced.

Pathways to Creativity in Lexical Ontologies

Tony Veale (University College Dublin, Ireland)

Language is a highly creative medium, and lexicalized ontologies like WordNet are rich in implicit evidence of the conceptual innovations underlying lexical inventiveness. We argue that WordNet's overt linguistic influences make it far more conducive to the development of creative thinking systems than other, more formalized conceptual ontologies like Cyc.

A Corpus Based Approach to Near Synonymy of German Multi-Word Expressions

Christiane Hümmer (Berlin-Brandenburg Academy of Sciences, Germany)

The core of this paper is a detailed corpus-based analysis of the two nearly synonymous German idioms etw. liegt jmdm. im Blut and etw. ist jmdm. in die Wiege gelegt. The central conclusions drawn from this analysis are: On the basis of the behaviour of the semantic arguments of the two idioms - their presence or absence as well as certain semantic properties - clear statements can be made about the context conditions under which the two idioms are interchangeable and those allowing the realisation of one of them while excluding the other one. Furthermore, it is stated that even in the contexts that allow both idioms, the choice of one or the other makes a subtle difference. This difference has to do with the metaphorical image encoded in the idiom. The prominent degree of prototypicality of certain traits demonstrates that speakers actively use these subtle differences. The paper constitutes thus an investigation on the level below WordNet synsets discussing the concept of synonymy underlying WordNet organisation.

Extending the Italian WordNet with the Specialized Language of the Maritime Domain

Adriana Roventini, Rita Marinelli (Istituto di Linguistica Computazionale, CNR, Pisa, Italy)

In this paper we describe the creation, we are carrying out of a specialized lexicon belonging to the maritime domain (including the technical and commercial/maritime transport domain) and the link of this lexicon to the generic one of the ItalWordNet lexical database. The main characteristics of the lexical semantic database and the specific features of the specialized language are described together with the coding performed according to the ItalWordNet semantic relations model and the approach adopted to connect the terminological database to the generic one. Some of the problems encountered and a few expected advantages are also considered.