translated by Google

Data Processing

Subcategories:

Similar Search

Annotation:
Search based on similarity becomes an integral part of data processing tools because more and more data collections can not be totally organized and the only way to compare pairs of objects is their degree of similarity. The candidate will familiarize himself with modeling of similarity through metric spaces, basic types of similar questions, principles of dividing metric spaces and supporting theoretical foundations of building similar search engines. An overview of existing tools is also included.

Warp:
Metric distance functions, similarity queries, metric divisions, metric search strategy, metric transformations, approximate search; Overview of existing approaches; Indexing structures for large data collections; Approximate techniques; Scalable distributed architectures.

Basic study material:
P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach. Advances in Database Systems, Springer-Verlag, vol. 32. Springer. Chapters 1, 2, and 3 plus chapter 4 or 5.

Tutor: prof. Pavel Zezula, dr. Michal Batko, doc. Vlastislav Dohnal

Other Recommended Literature:
H. Samet, Foundations of Multimedia and Metric Data Structures, Morgan Kaufmann Publishers, 2006.

Searching for information

Annotation:
Search is currently considered to be the most widely used application of computer science. Its success is then based on a long-term development of technology, which is still being revised due to exponential growth of data. The candidate will become familiar with the modern data retrieval methods used in contemporary practice.

Warp:
Data search models; Search engine scoring methods; Documents and queries; Indexing and searching; Parallel and distributed search Web search; Multimedia Search; Digital libraries.

Basic study material:
Ricardo Baeza-Yates and Berthier Riberio-Neto, Modern Information Retrieval, Addison Wesley, 2011. Chapters 1, 3, 4 plus one of the other chapters of your choice.

Tutor: prof. Pavel Zezula, dr. Michal Batko, doc. Vlastislav Dohnal

Other Recommended Literature:
CD Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Contemporary themes of data processing research

Annotation:
Data processing methods belong to the rapidly evolving fields of informatics as a result of the rapidly evolving range of multiple data types, a sharp increase in data volume, and the development of hardware infrastructure organized in networks. Topics are discussed every year at hundreds of expert conferences, among which the most important are: VLDB, ACM SIGMOD, ACM SIGIR, IEEE ICDE, EDBT, and others.

Warp:
The subject of the exam is to study four articles of top conferences, jointly selected with a guarantor, so that their content best reflects the student's needs within his PhD study. Proceedings should belong to the latest conference proceedings. Content specifications will be part of the exam application.

Basic study material:
Proceedings of conferences
VLDB - Very Large Data Bases
ACM SIGMOD - Management of Data
ACM SIGIR - Information Retrieval
IEEE ICDE - International Conference on Data Engineering
EDBT - Extending Data Base Technology

Tutor: prof. Pavel Zezula, dr. Michal Batko, doc. Vlastislav Dohnal

Machine learning

Annotation:
The candidate will learn the basics of inductive derivation and basic methods of machine learning (Chapters 2-5 of the Mitchell monograph) and then concentrate on the selected parts as a rule the corresponding focus of doctoral work (three other chapters from the Mitchell monograph or other study literature based on the examiner's agreement ).

Warp:
Inductive derivation. Machine Learning Methods. Classification and regression tasks. Network learning models and genetic algorithms. Multiple classifier methods (ensemble methods). Evaluating results. Collapsing. Detection of remote points. Inductive logic programming.

Basic study material:
Machine Learning / Tom M. Mitchell .. - Boston: McGraw-Hill, c1997

Tutor: doc. Tomáš Brázdil, doc. Lubomír Popelínský

Other Recommended Literature:
Pattern Recognition and Machine Learning. Chris. M. Bishop. Springer 2006.
Data mining: concepts and techniques. Jiawei Han et al. 3rd ed. Morgan Kaufmann 2011.
Foundation of Inductive Logic Programming. Nienhuys-Cheng, Shan-Hwei. Springer, 1997.
Review articles from the Machine Learning Journal (Springer) and other comparable periodicals (primarily ACM, IEEE, Springer)

Knowledge mining

Annotation:
The candidate will get acquainted with the process of data mining, pre-processing and mining methods (Chapters 3, 6, 8, 10 of Han's Monograph, 3rd Edition). After that, the focus of the doctoral thesis (the next three chapters of Han's monograph or other study literature on the basis of the examiner's agreement) will focus on selected parts.

Warp:
Knowledge Mining Process. Models. Preprocessing methods including text. Mining from data (including multi-relational, network and graph, and time-spatial). Learning common patterns and association rules. Mining from text and web (text and web mining). Methods for visual analytics.

Basic study material:
Data mining: concepts and techniques / Jiawei Han, Micheline Kamber, Jian Pei. - 3rd ed. Morgan Kaufmann 2011.

Tutor: doc. Lubomír Popelínský

Other Recommended Literature:
Handbook of data visualization / Chun-houh Chen, Wolfgang Härdle, Antony Unwin, editors .. - Berlin: Springer, c2008
Web data mining: exploring hyperlinks, content, and usage data / Bing Liu .. - Berlin: Springer, c2007
review articles from Data Mining and Knowledge Discovery (Springer) and other comparable periodicals (primarily ACM, IEEE, Springer)