Colloquium of Faculty of Informatics

The Informatics Colloquium takes place on Tuesday at 14:30 during the term time. The aim of the colloquia is to introduce state of the art research from all areas of computer science to the wide audience of the Faculty of Informatics.

Time and place

Tuesday 14:30–15:30, lecture hall A217, Faculty of Informatics
(speaker available for informal discussion since around 14:00 in room A220)

Autumn 2023 - schedule overview

26/9	Ivan Zelinka ( VSB - Technical University of Ostrava)	Visualization and analysis of malware through fractal geometry
3/10	Jiří Šimša (Google)	Data Pipelines for Machine Learning
10/10	Robert Ganian (TU Wien)	Mapping the Complexity-Theoretic Landscapes of Artificial Intelligence
17/10	Ernest Cachia (University of Malta)	Tools for improving data accessibility in Bioinformatics
24/10	Zdeněk Dvořák (Charles University)	Flow-critical graphs
31/10	Jakub Gajarský (University of Warsaw)	Designing efficient graph algorithms using logic
7/11	Louis Esperet (CNRS Grenoble)	Universal graphs and labelling schemes
14/11		PhD Fest
21/11	Peer Kröger (University of Kiel)	Learned Index Structures for Nearest Neighbor Search Problems
28/11	Phil Newton (Swansea University)	Evidence-Based Learning, Teaching and Assessment
5/12	Lukáš Sekanina (Brno University of Technology)	Evolutionary Design for Computer Engineering
12/12	Monica Dragoicea (University Politehnica of Bucharest)	Smart services, explainable AI, and smart visualisations

On Tuesday, November 14, there will be a series of two talks given by our PhD students.

Ivan Zelinka
Visualization and analysis of malware through the lens of fractal geometry

Tuesday 26 September 2023, 14:30, lecture hall A217

To date, a large number of research papers have been written on malware classification, identification, classification into different families, and the distinction between malware and goodware. These works were based on captured malware samples and sought to analyze malware and goodware using various techniques, including artificial intelligence techniques. Some of these works also looked at malware analysis using malware visualization. These works typically convert malware samples capturing the structure of the malware into image structures that are then subjected to image processing. Visualizations of this type usually result in images showing black and white granularity, and artificial intelligence methods for classification are then applied to these images. In this lecture, we propose a very unconventional and new approach to malware visualization based on fractal geometry, where visually very interesting images are subsequently used to classify malware and goodware. Our approach opens up a vast topic for future discussion and provides many new directions for research in malware analysis and classification, as discussed in the conclusion. Above all, however, it opens up a new area of computer code visualization using fractal geometry, which offers a theoretically infinite number of different representations of the code, and thus also the "uniqueness" of the display. The results of the fractal conversion and subsequent classification experiments presented are based on a database of 6,589,997 goodware, 827,853 potentially unwanted applications, and 4,174,203 malware samples provided by ESET. Therefore, this lecture is not a comprehensive compact study that would present the results obtained from comparative experiments, but rather tries to show a new direction in the field of visualization using fractal geometry and its possible use in malware analysis.

Jiří Šimša
Data Pipelines for Machine Learning

Tuesday 3 October 2023, 14:30, lecture hall A217

In this talk we present Google's infrastructure for machine learning (ML) data pipelines. The talk is divided into two parts:

1) tf.data overview -- data pipelines for ML jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex transformations, and transferring data to hardware accelerators while overlapping computation and communication to achieve optimal performance. We present tf.data, a framework for building and executing efficient data pipelines for ML jobs. The tf.data API provides operators which can be parameterized with user-defined computation, composed, and reused across different machine learning domains. These abstractions allow users to focus on the application logic of data processing, while tf.data's runtime ensures that pipelines run efficiently.

2) tf.data service overview -- traditionally, data pipelines of ML jobs execute on the same host as the ML computation. The data processing can however become a bottleneck of the ML computation if there are insufficient resources (e.g. CPU and memory bandwidth) to process data fast enough. This can slow down the ML computation and waste valuable and scarce ML hardware (e.g. GPUs and TPUs) used by the ML computation. We present tf.data service, a disaggregated data processing service built on top of tf.data, along with (1) empirical evidence based on production workloads for the need of disaggregation, as well as quantitative evaluation of the impact disaggregation has on the performance and cost of production workloads, (2) benefits of disaggregation beyond horizontal scaling, (3) analysis of tf.data service's adoption at Google, the lessons learned during building and deploying the system and potential future lines of research opened up by our work.

Robert Ganian
Mapping the Complexity-Theoretic Landscapes of Artificial Intelligence

Tuesday 10 October 2023, 14:30, lecture hall A217

Over the past few decades, there has been a concentrated and highly successful scientific effort aimed at identifying the boundaries of theoretical tractability for classical computational problems such as Boolean Satisfiability, Model Checking, Constraint Satisfaction and a variety of fundamental graph problems. Indeed, today we can use the parameterized refinement of complexity theory along with a variety of other cutting-edge tools to draw detailed complexity-theoretic landscapes capturing when these problems are tractable and when they are, at least under well-established assumptions, "hard". However, the rapid ascent of Artificial Intelligence has led to the identification of entirely new kinds of important computational problems, and our understanding of when these can be solved efficiently is in many cases still in its infancy.

In this high-level talk, we will explore selected new developments in mapping the landscapes of tractability for fundamental problems in several subfields of Artificial Intelligence, including recommender systems and machine learning. The talk will include an introduction to the foundations of parameterized complexity theory and will cover new algorithms and lower bounds for matrix completion problems as well as the recently introduced parameterized refinement of PAC-learning.

Ernest Cachia
Tools for improving data accessibility in Bioinformatics

Tuesday 17 October 2023, 14:30, lecture hall A217

Our research in bioinformatics focuses primarily on proteins. We have developed programs to predict protein function (related paper in progress) and to provide a unified view of proteins by integrating several databases into a one-stop-shop solution (SADIP). SADIP is being revised to optimise the architecture, and we are writing a paper for it. The idea behind SADIP is to provide a portal that provides as much information as possible about the protein, such as any identified domains (with data drawn from SCOP and CATH), disordered regions, moonlighting, etc. We also have a prototype, which requires further improvement, which analyses scientific literature and uses ontologies to markup information retrieved from literature to map links between proteins, genes, diseases and drug interactions. Finally, our research also focuses on the FAIR assessment of resources (the paper is published, and the tool is available at https://autofair.research.um.edu.mt/portal/home/). We have strong collaborations with the Centre for Molecular Medicine & Biobanking, who are also spearheading Bioinformatics-related projects at UM.

Our future direction involves optimising and extending our current projects and applying our expertise to problems in the biomedical sphere. An example of one such project completed in the past is a tool to track the progress of Alzheimer's disease through an analysis of MRI images. This is an automated process that aids physicians in their diagnostic process. We are keen to collaborate in areas where datasets are available and where we can form a research problem in some reasonable detail. We can work in several scenarios through small exploratory projects with a group of undergraduate students (3-4 students, but the caveat is that the problem needs to be very well defined), an undergraduate research student or with students at Master level. This can be discussed on a case-by-case basis.

Zdeněk Dvořák
Flow-critical graphs

Tuesday 24 October 2023, 14:30, lecture hall A217

Many important results in graph coloring were obtained through the study of critical graphs, i.e., minimal obstructions to having a certain chromatic number. By a famous result of Tutte, nowhere-zero flows are dual to coloring, and it is natural to ask whether a progress on some of the many open questions concerning nowhere-zero flows could not be made through the study of suitably defined flow-critical graphs. The talk will survey recent results and open questions on this topic.

Jakub Gajarský
Designing efficient graph algorithms using logic

Tuesday 31 October 2023, 14:30, lecture hall A217

We will focus on a logic-based approach to designing efficient algorithms for many basic graph theoretic problems (such as k-dominating set, k-independent set, subgraph isomorphism, and many of their variants). The basic idea is simple – instead of manually designing the algorithm for the problem we are trying to solve, we derive the algorithm for the problem automatically from its definition (essentially, we can have an algorithm for producing algorithms).

This area of research has been intensively studied in the past 25 years and recently underwent major developments. First, we will give a high-level overview of this research area, and then we will focus on a new result that tells us when we can improve upon the existing state-of-the-art results in this field, such as the seminal result of Grohe, Kreutzer and Siebertz. Along the way, we will give a brief introduction to basic notions of structural theory of sparse graphs, with the hope of making them accessible to the general computer science audience.

Louis Esperet
Universal graphs and labelling schemes

Tuesday 7 November 2023, 14:30, lecture hall A217

In this talk I will survey some recent development in the area of universal graphs and labelling schemes. I will explain the construction of a graph on n^(1+o(1)) vertices and edges that contains all planar graphs on n vertices as induced subgraphs. This concludes a line of research initiated in the 80's in TCS, and improves at the same time a classical result of Babai, Chung, Erdős, Graham, and Spencer (1982). The techniques combine recent advances in graph theory (the product structure theorem), together with tools from data structures.

Joint work with V. Dujmović, C. Gavoille, G. Joret, P. Micek and P. Morin.

PhD fest - Presentations by local Ph.D. students

Tuesday 14 November 2023, 14:30, lecture hall A217

Oldřich Pecák

Acceleration of nuclear particle transport simulators

Nuclear particle transport simulators are a crucial software used in developing modern nuclear reactor designs, fusion power research and design of modern particle accelerators. Due to the probabilistic nature of particles, most modern simulators use a Monte Carlo approach to calculating the results. Although theoretically naively parallelizable, modern simulators are mostly limited to running on CPUs and do not utilize the massively parallel architecture of GPGPUs and FPGAs. This causes them to underutilize the power of the current state-of-the-art supercomputers, which rely primarily on GPGPUs.

In this short lecture, I will go through the history of the simulators and give an intro to their inner workings. I will present the common algorithms, data structures and their limitations for parallelization on GPGPUs. We will look at several design changes that can be implemented to aid in parallelization. Lastly, I will talk about the possibility of using custom hardware (implemented on an FPGA) for the acceleration of the simulators.

Dávid Halász

Trust Building via Adaptive Safety in Autonomous Ecosystems

The evolving landscape of autonomous cyber-physical systems is progressing towards cooperative ecosystems. These complex structures, characterized by dynamic interactions among various autonomous systems, offer heightened autonomy and adaptability but pose substantial challenges in ensuring safety. The swift development of autonomous driving underlines the urgency to address these issues. Existing safety assurance methods, while effective on an individual level, struggle to encompass the complexities of dynamic ecosystems. To bridge this gap, this lecture advocates for adaptive safety mechanisms informed by trust and trustworthiness between ecosystem members and proposes a method ensuring adaptive safety.

Peer Kröger
Learned Index Structures for Nearest Neighbor Search Problems

Tuesday 21 November 2023, 14:30, lecture hall A217

With the current efforts in representation learning and vector space embedding methods for processing big, complex data, vector representations (a.k.a. feature spaces) experience a kind of renaissance. Consequently, efficiently querying such vector databases remains a challenge of upmost importance. In such feature spaces, similarity queries such as nearest neighbor search and its variants are prevalent rather than classical key search. Indexing potentially high-dimensional feature spaces as a basis for efficient query processing is the key for efficient search. Recently, the indexing research has been influenced by the concept of learned index structures: determining the results of a query can in general be seen as a learning task and a machine learning model can potentially be trained to predict query results. While the learned indexing approach is already well researched for traditional key search (as it is common in relational databases), research is currently ongoing for similarity queries in multi-dimensional feature spaces.

In this talk, we briefly review traditional indexing approaches and the recently proposed idea of learned index structures. We discuss general challenges of indexing mutli-dimensional feature spaces and highlight current research results for different variants of nearest neighbor queries and open questions in the field.

Phil Newton
Evidence-Based Learning, Teaching and Assessment

Tuesday 28 November 2023, 14:30, lecture hall A217

In this session we will cover the basic principles of how humans learn, and how those principles can be translated into teaching and assessment strategies for academics and learning strategies for students. We will review some common teaching strategies and consider the evidence on whether or not they are effective, and how best to use them (or not!).

Lukáš Sekanina
Evolutionary Design for Computer Engineering

Tuesday 5 December 2023, 14:30, lecture hall A217

Evolutionary design, i.e., the use of evolutionary algorithms for the automated creation of programs, electronics circuits, antennas, robots, and other objects, has become a fruitful approach in computer science and engineering in the last two decades. This talk surveys the key ingredients of evolutionary design methods and presents several techniques for improving the scalability of this approach. Examples of evolved solutions (such as approximate arithmetic circuits, CNN architectures, and image filters) that show unique properties compared to conventional designs will be presented.

Monica Dragoicea
Smart services, explainable AI, and smart visualisations

Tuesday 12 December 2023, 14:30, lecture hall A217

Today, we have technology that must work for the people’s benefit. We must continue to learn how to intertwine science, technology, and innovation in our fast-growing service-oriented economy, where we deal with people-centric interactions. These interactions are more and more mediated by technology, can be described in complex actor networks, the service ecosystems, and develop in service systems, that integrate people, technology, and shared information. These interactions are more and more “informed” by rich data-driven processes and the citizens are experiencing them more and more through the smart service ecosystems.

Therefore, we have the incentives now to better understand, describe, and innovate in complex, service-oriented systems like healthcare, business organizations, government agencies, and cities. And we have the possibility to formulate new value propositions for our society with the new tools that the human mind has created. While AI will always remain artificial because it will always be a product constructed out of the human mind, we cannot deny the potentialities of these new technologies/algorithms to help people do their jobs better or to improve people’s lives.

However, there is a growing need for these new technologies to be better explained to increase the degree of acceptance at all levels of society and by all the relevant stakeholders. We want to see a better adoption of AI-type technologies, as this can bring a lot of benefits to society, companies, and individuals. But we need to increase the trust in their decision-building process and create a kind of explainability regarding law compliance, robustness and reliability in operation, and trustworthiness.

In this respect, this presentation proposes a journey in the smart service design process, where AI explainability and smart visualization can bring the process of decision-making closer to the most important recipient of this endeavor, the smart citizen.