Advanced searching methods for digital data

Nowadays, the most dynamically changing area of the computer data processing is multimedia. There is an estimation that 93% of information is produced in a digital form and the volume of digital data during one year will exceed 1 exabyte, i.e., 10¹⁸ bytes – an exponential growth is expected. In addition, only 1% of data volume on the Internet is in a textual form, the rest being multimedia, available also as a product of radio or television services. Information systems have to deal with such situation and the fundamental ability is searching. The traditional way of searching based on the concept of exact match is not usable for multimedia data. A suitable solution exploits the concept of similarity that retrieves, according to a given example, data mutually similar.

We are interested in problems of efficient searching in large data volumes. Most of our activities is concentrated aroud MUFIN Project which also includes a demo for searching similar images. Standard approaches based on a global directory are not feasible from the scalability point of view. We especially focus on distributed systems such as GRIDs or popular peer-to-peer networks, that provide appropriate infrastructure support. We concentrate on structured distributed systems that store data according to given rules within nodes of a network as well as on unstructured systems whose advantages lie in self-organization – the rules for searching are automatically created by the network and are not known in advance.

The standalone and very important area is a problem of similarity specification, i.e., the way of determining the data proximity. The choice of similarity function influences not only the quality of search results but the response time of searching as well. When an inappropriate function is applied we needn't be able to search large data volumes in real-time.

Main research directions

Ranking and Relevance Feedback in Image Retrieval
Similarity Searching Architectures for WEB Databases
Approximation Techniques for Similarity Searching
Self-Organizing Search Networks
Computational Advertising
Clustering and Categorization in Metric Spaces
Applications of Similarity Searching: video, audio, music
Similarity Searching: Beyond the Metric Space
Data Cleaning and Integration
Collaborative Filtering

Information for students

Searching in digital data is a very hot and attractive topic for research. The students of bachelor or master programme are welcomed. Advanced searching methods for digital data, e.g. technologies of similarity searching used in MUFIN Project, are taught in course PA128 Similarity Searching in Multimedia Data. For more detailed information, contact any member of our research team.

Laboratory of Data Intensive Systems and Applications (DISA)

Laboratory of Data Intensive Systems and Applications focuses on index structures and similarity searching in very large collections of digital data. Similarity searching is a perspective field of information processing technologies. Laboratory is open to all students offering participation on various research and application projects.

Contacts

prof. Ing. Pavel Zezula, CSc.
Telephone: +420-549 49 7992
Email: zezula2c-serbwK@fiSCFY6YNb1.muni_79xBQppK.cz