PhD positions at the Department of Machine Learning and Data Processing

Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, announces an open call for one PhD position starting from the Spring 2019 term, with applications in the following areas:

The deadline for application is Dec 16, 2018.

General information

Perspective PhD students are expected to show their research background and skills in the area of the selected PhD topic. They should also be proficient in English; prior knowledge of Czech is not necessary.


The applications will be evaluated by the department committee, whose members will choose the best applicant. The announced PhD position is funded with an extra department stipend of 10 000 CZK per month (summing up to 28 700 CZK with the standard faculty stipend in the Czech study programme). The stipend is granted to the successful applicant for the first 2 years, with an expected renewal (after an evaluation) for another 2 years. The total length of study is 4 years.

Application procedure

Applicants are advised to contact directly their perspective supervisor (as listed below) for more specific details, well ahead of the deadline. The final applications consisting of CV, motivation letter, description of study time plan and expected outcomes, and possibly other relevant documents supporting the candidate's excellence should be sent to the Head of the Department

  • Aleš Horák, before the respective deadline for application (see above)

The candidates are still obliged to pass the standard admission procedure for doctoral study. The stipend can be awarded only after successfully completing the standard admission procedure.

Topic: Evaluation techniques for adaptive educational systems

Supervisor: doc. Mgr. Radek Pelánek, Ph.D.
Area: Adaptive Learning

Motivation: Adaptive educational systems aim to provide students with personalized learning experience. The adaptive behaviour is based on student modeling -- utilizing techniques from artificial intelligence and machine learning to estimate knowledge of students. Researchers have already proposed many student modeling techniques and specific algorithms for adaptive learning. However, it is difficult to tell which techniques and algorithms really work. Evaluation of educational systems is difficult, because the main objective (student learning) is not directly observable. Moreover, open online learning system that are used by students outside of school have to take into account also engagement of students, i.e., the evaluation needs to consider multiple criteria.

Context: The topic is part of the research carried within Adaptive Learning research group. An applicant is expected to collaborate intensively with the rest of the group and evaluate primarily techniques (algorithms) used in systems developed by the group (e.g.,,,,, systems used by hundreds of students every day), it is also possible to develop a new educational system. The aim of the work is, however, to develop general techniques applicable also to other systems (e.g., Khan academy, Duolingo).

Goals: The overall goal of the work is to develop general techniques for evaluation of adaptive educational systems. A partial step is to perform evaluation of specific student modeling techniques and algorithms for adaptive learning, this should serve as a basis for development of general techniques. Example of specific goals are: a) the analysis of mastery learning criteria, b) the development of evaluation methods for domain models (definitions of knowledge components), c) the use of automatic experimentation techniques (e.g., multi-armed bandits, Bayesian optimization).

Further information:

Topic: Computer-aided PoS tagger building

Supervisor: doc. Mgr. Pavel Rychlý, Ph.D.
Area: Computational linguistics. Part of Speech tagging.

Description: Part of speech (PoS) tagging is a long-time and widely studied problem in the field of natural language processing. For many languages, PoS tagging is even considered as a solved problem since there are many taggers with relatively high accuracy (97% for English). On the other hand, most of the existing PoS taggers face problems in industrial-grade usage. Moreover, the task of building a tagger for a new language or even training an existing tagger for a new language or new language variant is still difficult, especially for languages with rich morphology.

The aim of the thesis is to build both methodology and respective system supporting creation of a PoS tagger for a new language including the process of preparing an annotated corpus as the training data for the tagger. The system should lead both advanced and novice users in the crowd-source environment.

The system should support the following features:

  • automatic extraction of texts from public sources (Wikipedia) for annotation by users
  • tagset refinement based on annotator agreement
  • simple user interface for annotating small parts of the data by novice and/or anonymous users
  • exporting the tagger language model
  • a new tagger or adaptation of an existing one to support lemmatization and tagging of unseen word forms

The system services of the project will be accessible via a web interface navigating users through the process of building and refining tagset, annotating corpus and tagger configuration.

Publications relevant to the given topic can be found on the pages of the NLP Centre FI MU

Topic: Intelligent Multilingual Man Machine Communication

Supervisor: doc. RNDr. Aleš Horák, Ph.D.
Area:Natural Language Processing, Knowledge Representation, Dialogue Management

Communication between a man and a computer program is one of the long-term goals of the NLP field. Current systems related to chatbot-style communication are able to respond in a satisfactory level to a broad range of questions, the background of such communications is, however, kept mostly on the lexical level. Recent results in the task of open domain question answering promise to bring adequate background to such dialogues.

The aim of the thesis is to combine the two approaches (general discussion robots and question answering systems) in a new approach with the concentration on multilingual environment. In the evaluation part, the thesis needs to offer new results (also) for languages other than the mainstream ones to prove its applicability to a broad spectrum of languages.

Publications relevant to the given topic can be found on the pages of the NLP Centre FI MU

Topic: Synthesis and Verification of Stochastic Systems Using Learning Methods

Supervisor: doc. RNDr. Tomáš Brázdil, Ph.D.
Area:Theoretical computer science, Formal Methods

We concentrate on analysis of systems that exhibit randomness and non-determinism. Randomness often stems from failures in physical components, unreliable communication, randomization, etc. Non-determinism naturally arises from underspecification, concurrency, etc. Such systems are typically modelled using Markov decision processes (MDPs), or stochastic games. These models have been widely studied for decades in various contexts such as engineering, artificial intelligence and machine learning (AI-ML), and, a bit more recently, formal verification. Recent results indicate that the field of formal verification may hugely benefit from results and methods developed in the framework of AI-ML. The aim of this PhD project is to further investigate the interplay between synthesis methods from AI-ML and formal verification of MDPs and stochastic games.