A List by Author: Pavel Zezula


Employing Subsequence Matching in Audio Data Processing

by Petr Volny, David Novák, Pavel Zezula, September 2011, 29 pages.

FIMU-RS-2011-04. Available as Postscript, PDF.


We overview current problems of audio retrieval and time-series subsequence matching. We discuss the usage of subsequence matching approaches in audio data processing, especially in automatic speech recognition (ASR) area and we aim at improving performance of the retrieval process. To overcome the problems known from the time-series area like the occurrence of implementation bias and data bias we present a Subsequence Matching Framework as a tool for fast prototyping, building, and testing similarity search subsequence matching applications. The framework is build on top of MESSIF (Metric Similarity Search Implementation Framework) and thus the subsequence matching algorithms can exploit advanced similarity indexes in order to significantly increase their query processing performance. To prove our concept we provide a design of query-by-example spoken term detection type of application with the usage of phonetic posteriograms and subsequence matching approach.

Adaptive Approximate Similarity Searching through Metric Social Networks

by Jan Sedmidubský, Stanislav Bartoň, Vlastislav Dohnal, Pavel Zezula, A full version of the paper presented at ICDE 2008. November 2007, 22 pages.

FIMU-RS-2007-06. Available as Postscript, PDF.


Exploiting the concepts of social networking represents a novel approach to the approximate similarity query processing. We present an unstructured and dynamic P2P environment in which a metric social network is built. Social communities of peers giving similar results to specific queries are established and such ties are exploited for answering future queries. Based on the universal law of generalization, a new query forwarding algorithm is introduced and evaluated. The same principle is used to manage query histories of individual peers with the possibility to tune the tradeoff between the extent of the history and the level of the query-answer approximation. All proposed algorithms are tested on real data and medium-sized P2P networks consisting of tens of computers.

LOBS: Load Balancing for Similarity Peer-to-Peer Structures

by David Novák, Pavel Zezula, June 2007, 36 pages.

FIMU-RS-2007-04. Available as Postscript, PDF.


The real-life experience with the similarity search shows that this task is both difficult and very expensive in terms of processing time. The peer-to-peer structures seem to be a suitable solution for content-based retrieval in huge data collections. In these systems, the computational load generated by a query traffic is highly skewed which degrades the searching performance. Since no current load-balancing techniques are designed for this task, we propose LOBS -- a novel and general system for load-balancing in peer-to-peer structures with time-consuming searching. LOBS is based on the following principles: measuring the computational load of the peers, separation of the logical and the physical level of the system, and detailed analysis of the load source to exploit either data relocation or data replication.

This report contains detailed description of the fundamentals and specific algorithms of LOBS, a theoretical analysis of its behaviour, and results of extensive experiments we conducted using a prototype implementation of LOBS. We tested LOBS with the peer-to-peer structure \mchord{} having a various number of peers. We used a real-life dataset and query sets of various distributions. The results show that LOBS is able to cope with any query-distribution and that it improves both the utilization of resources and the system performance of query processing. The costs of balancing are reasonable compared to the level of imbalance and are very small if the system has time to adapt to a query-traffic. The behaviour of LOBS is independent of the size of the network.

rhoIndex, Designing and Evaluating an Indexing Structure for Graph Structured Data

by Stanislav Bartoň, Pavel Zezula, A full version of the paper presented at IEEE MCD 2006. September 2006, 22 pages.

FIMU-RS-2006-07. Available as Postscript, PDF.


An own design of an indexing structure for general graph structured data called rhoIndex that allows an effective processing of special path queries is presented. These special queries represent for example a search for all paths lying between two arbitrary vertices limited to a certain path length. The rhoIndex is a multilevel balanced tree structure where each node is created with a certain graph transformation and described by modified adjacency matrix. Hence, rhoIndex indexes all the paths to a predefined length linclusive. The search algorithm is then able to find all the paths shorter than or having the length land some of the paths longer then the predefined llying between any two vertices in the indexed graph. The designed search algorithm exploits a special graph structure, a transcription graph, to compute the result using the rhoIndex . We also present an experimental evaluation of the process of creating the rhoIndex on graphs with different sizes and also a complexity evaluation of the search algorithm that uses the rhoIndex.