A List by Author: David Novák


Employing Subsequence Matching in Audio Data Processing

by Petr Volny, David Novák, Pavel Zezula, September 2011, 29 pages.

FIMU-RS-2011-04. Available as Postscript, PDF.


We overview current problems of audio retrieval and time-series subsequence matching. We discuss the usage of subsequence matching approaches in audio data processing, especially in automatic speech recognition (ASR) area and we aim at improving performance of the retrieval process. To overcome the problems known from the time-series area like the occurrence of implementation bias and data bias we present a Subsequence Matching Framework as a tool for fast prototyping, building, and testing similarity search subsequence matching applications. The framework is build on top of MESSIF (Metric Similarity Search Implementation Framework) and thus the subsequence matching algorithms can exploit advanced similarity indexes in order to significantly increase their query processing performance. To prove our concept we provide a design of query-by-example spoken term detection type of application with the usage of phonetic posteriograms and subsequence matching approach.

LOBS: Load Balancing for Similarity Peer-to-Peer Structures

by David Novák, Pavel Zezula, June 2007, 36 pages.

FIMU-RS-2007-04. Available as Postscript, PDF.


The real-life experience with the similarity search shows that this task is both difficult and very expensive in terms of processing time. The peer-to-peer structures seem to be a suitable solution for content-based retrieval in huge data collections. In these systems, the computational load generated by a query traffic is highly skewed which degrades the searching performance. Since no current load-balancing techniques are designed for this task, we propose LOBS -- a novel and general system for load-balancing in peer-to-peer structures with time-consuming searching. LOBS is based on the following principles: measuring the computational load of the peers, separation of the logical and the physical level of the system, and detailed analysis of the load source to exploit either data relocation or data replication.

This report contains detailed description of the fundamentals and specific algorithms of LOBS, a theoretical analysis of its behaviour, and results of extensive experiments we conducted using a prototype implementation of LOBS. We tested LOBS with the peer-to-peer structure \mchord{} having a various number of peers. We used a real-life dataset and query sets of various distributions. The results show that LOBS is able to cope with any query-distribution and that it improves both the utilization of resources and the system performance of query processing. The costs of balancing are reasonable compared to the level of imbalance and are very small if the system has time to adapt to a query-traffic. The behaviour of LOBS is independent of the size of the network.