PhD positions at the Department of Machine Learning and Data Processing

Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, announces an open call for one PhD position starting from the Autumn 2024 term, with applications in the following areas:

The deadline for application is May 12, 2024.

General information

Perspective PhD students are expected to show their research background and skills in the area of the selected PhD topic. They should also be proficient in English; prior knowledge of Czech is not necessary.

Stipend

The applications will be evaluated by the department committee, whose members will choose the best applicant. The announced PhD position is funded with an extra department stipend of 10 000 CZK per month (summing up to 45 000 CZK with the standard faculty stipend in the Czech study programme when fulfilling the presented conditions). The stipend is granted to the successful applicant for the first 2 years, with an expected renewal (after an evaluation) for another 2 years. The total length of study is 4 years.

Application procedure

Applicants are advised to contact directly their perspective supervisor (as listed below) for more specific details, well ahead of the deadline. The final applications consisting of CV, motivation letter, description of study time plan and expected outcomes, and possibly other relevant documents supporting the candidate's excellence should be sent to the Head of the Department

The candidates are still obliged to pass the standard admission procedure for doctoral study. The stipend can be awarded only after successfully completing the standard admission procedure.


Topic - Intelligent Multilingual Man Machine Communication

Supervisor doc. RNDr. Aleš Horák, Ph.D.
Area - Natural Language Processing, Knowledge Representation, Dialogue Management

Communication between a man and a computer program is one of the long-term goals of the NLP field. Current systems related to chatbot-style communication are able to respond in a satisfactory level to a broad range of questions, the background of such communications is, however, kept mostly on the lexical level. Recent results in the task of open domain question answering promise to bring adequate background to such dialogues.

The aim of the thesis is to combine the two approaches (general discussion robots and question answering systems) in a new approach with the concentration on multilingual environment. In the evaluation part, the thesis needs to offer new results (also) for languages other than the mainstream ones to prove its applicability to a broad spectrum of languages.

Publications relevant to the given topic can be found on the pages of the NLP Centre FI MU

Topic - Synthesis and Verification of Stochastic Systems Using Learning Methods

Supervisor doc. RNDr. Tomáš Brázdil, Ph.D.
Area - Theoretical computer science, Formal Methods

We concentrate on analysis of systems that exhibit randomness and non-determinism. Randomness often stems from failures in physical components, unreliable communication, randomization, etc. Non-determinism naturally arises from underspecification, concurrency, etc. Such systems are typically modelled using Markov decision processes (MDPs), or stochastic games. These models have been widely studied for decades in various contexts such as engineering, artificial intelligence and machine learning (AI-ML), and, a bit more recently, formal verification. Recent results indicate that the field of formal verification may hugely benefit from results and methods developed in the framework of AI-ML. The aim of this PhD project is to further investigate the interplay between synthesis methods from AI-ML and formal verification of MDPs and stochastic games.

Topic - Anomaly detection and description

Supervisor doc. RNDr. Lubomír Popelínský, Ph.D.
Area - Anomaly detection

Anomaly (outlier, rare event) analysis aims at finding anomalous instance in data. Anomalous means potentialy generated by other mechanism than majority of data. Detection, however, is only the first step of the analysis. The second looks for description and explanation of a found anomaly and it can be tightly connected with feature selection for anomaly detection.

The goal of this PhD research is to elaborate new methods for detection and description of anomalies. Research can be focused on a particular domain, e.g. text, structured data like graphs or sentences among others, and/or on a particular kind of anomalies, e.g. class-based anomalies or anomalies in logic.

Topic - Next Generation Clinical Decision Support System

Supervisor doc. Mgr. Bc. Vít Nováček, PhD
Area - Medical Informatics, Clinical Decision Support

In recent years, advanced machine learning (ML) techniques such as deep, representation, and relational learning have been instrumental in a number of medical informatics breakthroughs (e.g. Stanford's dermatologist-level model for predicting melanoma in 2017). These advances have been enabled by growing maturity of various machine learning methods applicable to such use cases, and by vast volumes and numbers of life science datasets that have become available for machine processing. However, many crucial challenges remain unsolved in the field of machine learning and artificial intelligence (AI) applications in medicine. One of the most important under-researched areas is for instance explainability of AI-powered predictive models for clinical decision support. Machine-aided techniques for suggesting possible diagnosis or personalised therapies do exist nowadays, but their trustworthiness and applicability is often critically hampered by the inability of the models to explain their recommendations. Therefore, an advance in this area would not only be a possible academic breakthrough, but also a result with potentially vast societal and economic impact.

The aims of the prospective thesis are:

Topic - Tools and algorithms for a better understanding of the eukaryotic repeatome

Supervisor doc. Ing. Matej Lexa, Ph.D.
Area - Bioinformatics, Computational DNA sequence and genome analysis, Sequence repeats

Repetitive DNA in sequenced eukaryotic genomes often causes problems in analyses of these genomes, however, it is also a rich source of poorly understood evolutionary and functional information.

The aim of the prospective thesis would be to identify blind spots in our ability to find and interpret this kind of information and design algorithms and computational tools that would give us new information about the evolutionary history and function of repeats in genomes.

Examples of possible directions are:

Other related directions can be discussed.

Topic - Efficient Cross-Modal Retrieval in Human Motion Data

Supervisor doc. RNDr. Jan Sedmidubský, Ph.D.
Area - Information Retrieval, Computer Vision

Due to recent advances in pose-estimation methods, human motion can be extracted from a common video in the form of 3D skeleton sequences. However, effective and efficient content-based access to large volumes of extracted skeleton data still remains a challenging problem. Inspired by the recent success of cross-modal vision-language models for image and video domains, the objective of this PhD topic is to propose new text-to-motion retrieval techniques able to efficiently search for relevant motions based on a natural-language description. This topic includes several research problems that need to be tackled: adopting/building a large motion dataset that provides skeleton sequences along with textual descriptions, designing deep motion encoders to represent complex skeleton data by semantic features, indexing semantic features, and learning a common embedding space for motion and text modalities. The developed text-to-motion search techniques could find application opportunities in many areas, such as computer animation, security, or sports.

Topic - Personalized and explainable similarity search

Supervisor prof. Ing. Pavel Zezula, CSc.
Area - Data management, Information retrieval, Similarity search

Similarity search is a core operation of many data processing tasks, such as data cleaning and integration, outlier detection, frequent pattern mining, clustering, classification, and many others. Similarity search finds objects in diverse collections of data – such as text, images, audio, video, graphs, or deep network embeddings – according to some definition of sameness. Although current systems can undoubtedly reflect diversity in data and similarity measures, once decided, specific similarity search systems use the same vision of similarity for all its users and search objectives. This is in contrast with the natural understanding of similarity, which is subjective and context-dependent.

The proposed thesis aims to extend the functionality of existing systems with two important features: explainability and personalization. Explainability addresses the potential questions users may have about why the search system considers specific retrieved objects similar, thus fostering user trust and facilitating system refinement. Personalization enables users to use their perspective of similarity while utilizing the same global index structure. This approach minimizes irrelevant results and better aligns with individual user preferences. In essence, this thesis attempts to bridge the gap between static, one-size-fits-all similarity search systems and the dynamic, subjective nature of human understanding by incorporating explainability and personalization into existing systems.

Publications relevant to the topic can be found at https://scholar.google.com/citations?user=EHIdRAYAAAAJ, or https://dblp.org/pid/z/PZezula.html.