translated by Google

Machine-translated page for increased accessibility for English questioners.

Circuit Data processing and machine learning

Subheadings:

Similarity search

Annotation:
Similarity search is becoming an integral part of data processing tools, as more and more data collections cannot be totally organized, and the only way to compare object pairs is to measure their similarity. The candidate will get acquainted with modeling of similarity as metric spaces, basic types of similarity queries, principles of partitioning of metric spaces and supporting theoretical foundations of building similarity search engines. An overview of existing tools is also included.

Warp:
Metric distance functions, similarity queries, principles of metric space partitioning, metric search strategies, metric transformations, approximated search; Overview of existing approaches; Indexing structures for large data collections; Approximated techniques; Scalable distributed architectures.

Basic study material:
P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach. Advances in Database Systems, Springer-Verlag, volume 32. Springer. 2006. Chapters 1, 2, and 3, plus Chapter 4 or 5.

Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.

Other recommended literature:
H. Samet, Foundations of Multimedia and Metric Data Structures, Morgan Kaufmann Publishers, 2006.

Searching for information

Annotation:
Search is currently considered to be the most widespread application of computer science. Its success is then based on the long-term development of technology, which is constantly revised due to the exponential growth of data. The candidate will get acquainted with modern data retrieval methods used in contemporary practice.

Warp:
Search data models; Search engine evaluation metrics; Documents and inquiries; Indexing and searching; Parallel and distributed search; Web search; Multimedia search; Digital libraries.

Basic study material:
Ricardo Baeza-Yates and Berthier Riberio-Neto, Modern Information Retrieval, Addison Wesley, 2011. Chapters 1, 3, 4, plus one of the other chapters of your choice.

Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.

Other recommended literature:
CD Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Contemporary topics of data processing research

Annotation:
Data processing methods belong to the rapidly developing fields of informatics due to the rapidly evolving range of diverse data types, a sharp increase in the volume of data, and the development of hardware infrastructure organized in networks. Topics are discussed every year at hundreds of professional conferences, where the most important ones include: VLDB, ACM SIGMOD, ACM SIGIR, IEEE ICDE, EDBT, and others.

Warp:
The subject of the exam is to study four articles of top conferences, provided that their content reflects as much as possible the subject of the student within his PhD study. Proceedings should belong to the latest years of conferences. The content of the articles should also be close to the subject of examinations.

Basic study material:
Conference proceedings
VLDB - Very Large Data Bases
ACM SIGMOD - Management of Data
ACM SIGIR - Information Retrieval
IEEE ICDE - International Conference on Data Engineering
EDBT - Extending Data Base Technology

Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.

Machine learning

Annotation:
Machine learning algorithms allow computers to extract an unknown function from data without the need for explicit programming. The candidate will get acquainted with the basic methods of machine learning and then focus on technical parts selected from the study literature on the basis of an agreement with the examiner.

Warp:
Inductive derivation. Machine learning methods. Classification and regression problems. Network learning models and genetic algorithms. Methods of multiple classifiers (ensemble methods). Evaluation of results. Clustering. Detection of remote points. Inductive logic programming.

Basic study material:
Machine Learning. The Art and Science of Algorithms that Make Sense of Data Peter Flach, Cambridge University Press 2012.

Examiner: doc. RNDr. Tomáš Brázdil, Ph.D. , doc. Mgr. Bc. Vít Nováček, PhD, doc. RNDr. Lubomír Popelínský, Ph.D.

Other recommended literature:
Pattern Recognition and Machine Learning. Chris. M. Bishop. Springer 2006.
Data mining: concepts and techniques. Jiawei Han et al. 3rd ed. Morgan Kaufmann 2011.
Review articles from Machine Learning Journal (Springer) and other comparable periodicals (especially ACM, IEEE, Springer).

Knowledge mining

Annotation:
The candidate will get acquainted with the process of data mining, methods of preprocessing and data mining (chapters 3, 6, 8, 10 from Han's monograph, 3rd edition). Then they focus on selected parts, usually the corresponding focus of the doctoral thesis (another three chapters from Han's monograph or other study literature on the basis of an agreement with the examiner).

Warp:
The process of acquiring knowledge. Models. Data preprocessing methods, including text. Data mining (including multirelational, network and graph and time-space). Learning frequent patterns and association rules. Text and web mining (text and web mining). Methods for visual data analysis (visual analytics).

Basic study material:
Data mining: concepts and techniques / Jiawei Han, Micheline Kamber, Jian Pei. - 3rd ed. Morgan Kaufmann 2011.

Examiner: doc. Mgr. Bc. Vít Nováček, PhD, doc. RNDr. Lubomír Popelínský, Ph.D.

Other recommended literature:
Handbook of data visualization / Chun-houh Chen, Wolfgang Härdle, Antony Unwin, editors .. - Berlin: Springer, c2008
Web data mining: exploring hyperlinks, contents, and usage data / Bing Liu .. - Berlin: Springer, c2007
review articles from Data Mining and Knowledge Discovery (Springer) and other comparable periodicals (especially ACM, IEEE, Springer)