PhD Thesis Defenses

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.

2022

 

Dezember

Michael HEDDERICH
(Advisor: Prof. Dietrich Klakow)

Weak Supervision and Label Noise Handling for Natural Language Processing in Low-Resource Scenarios
Tuesday, 20.12.22, 11:00h
Building C7 4, conference room

Also per video conference

Abstract:

The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully.
In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model’s quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.

Alexander KAMPMANN
(Advisor: Prof. Andreas Zeller)

Leveraging Input Features for Testing and Debugging
Monday, 19.12.22, 11:00h
Building E9 1, room 0.01

Also per video conference

Abstract:

When debugging software, it is paramount to understand why the program failed in the first place. How can we aid the developer in this process, and automate the process of understanding a program failure? The behavior of a program is determined by its inputs. A common step in debugging is to determine how the inputs influences the program behavior. In this thesis, I present Alhazen, a tool which combines a decision tree and a test generator into a feedback loop to create an explanation for which inputs lead to a specific program behavior. This can be used to generate a diagnosis for a bug automatically. Alhazen is evaluated for generating additional inputs, and recognizing bug-triggering inputs. The thesis makes first advances towards a user study on whether Alhazen helps software developers in debugging. Results show that Alhazen is effective for both use cases, and indicate that machine-learning approaches can help to understand program behavior. Considering Alhazen, the question what else can be done with input features follows naturally. In the thesis I present Basilisk, a tool which focuses testing on a specific part of the input. Basilisk is evaluated on its own, and an integration with Alhazen is explored. The results show promise, but also some challenges. All in all, the thesis shows that there is some potential in considering input features.

Tobias ALT-VEIT
(Advisor: Prof. Joachim Weickert)

Connecting Mathematical Models for Image Processing and Neural Networks
Thursday, 15.12.22, 16:15h
Per video conference

Abstract:

This thesis deals with the connections between mathematical models for image processing and deep learning. While data-driven deep learning models such as neural networks are flexible and well performing, they are often used as a black box. This makes it hard to provide theoretical model guarantees and scientific insights. On the other hand, more traditional, model-driven approaches such as diffusion, wavelet shrinkage, and variational models offer a rich set of mathematical foundations. Our goal is to transfer these founda-tions to neural networks. To this end, we pursue three strategies. First, we design trainable variants of traditional models and reduce their parameter set after training to obtain transparent and adaptive models. Moreover, we investigate the architectural design of numerical solvers for partial differential equations and translate them into building blocks of popular neural network architectures. This yields criteria for stable networks and inspires novel design concepts. Lastly, we present novel hybrid models for inpainting that rely on our theoretical findings. These strategies provide three ways for combining the best of the two worlds of model- and data-driven approaches. Our work contributes to the overarching goal of closing the gap between these worlds that still exists in performance and understanding.

 

Heiko BECKER
(Advisor: Dr. Eva Darulová, now Uppsala)

Verified Compilation and Optimization of Floating-Point Kernels
Wednesday, 14.12.22, 10:15h
Building E1 5, room 002
Also per video conference

Abstract:

When verifying safety-critical code on the level of source code, we trust the compiler to produce machine code that preserves the behavior of the source code. Trusting a verified compiler is easy. A rigorous machine-checked proof shows that the compiler correctly translates source code into machine code. Modern verified compilers (e.g. CompCert and CakeML) have rich input languages, but only rudimentary support for floating-point arithmetic. In fact, state-of-the-art verified compilers only implement and verify an inflexible one-to-one translation from floating-point source code to machine code. This translation completely ignores that floating-point arithmetic is actually a discrete representation of the continuous real numbers.
This thesis presents two extensions improving floating-point arithmetic in CakeML. First, the thesis demonstrates verified compilation of elementary functions to floating-point code in: Dandelion, an automatic verifier for polynomial approximations of elementary functions; and libmGen, a proof-producing compiler relating floating-point machine codeto the implemented real-numbered elementary function. Second, the thesis demonstrates verified optimization of floating-point code in: Icing, a floating-point language extending standard floating-point arithmetic with optimizations similar to those used by unverified compilers, like GCC and LLVM; and RealCake, an extension of CakeML with Icing into the first fully verified optimizing compiler for floating-point arithmetic.

 

Jana HOFMANN
(Advisor: Prof. Bernd Finkbeiner)

Logical Methods for the Hierarchy of Hyperlogics
Monday, 12.12.22, 16:00h
Building E1 1, room 407

Abstract:

In this thesis, we develop logical methods for reasoning about hyperproperties. Hyperproperties describe relations between multiple executions of a system. Unlike trace properties, hyperproperties comprise relational properties like noninterference, symmetry, and robustness. While trace properties have been studied extensively, hyperproperties form a relatively new concept that is far from fully understood. We study the expressiveness of various hyperlogics and develop algorithms for their satisfiability and synthesis problems.
In the first part, we explore the landscape of hyperlogics based on temporal logics, first-order and second-order logics, and logics with team semantics. We establish that first-order/second-order and temporal hyperlogics span a hierarchy of expressiveness, whereas team logics constitute a radically different way of specifying hyperproperties. Furthermore, we introduce the notion of temporal safety and liveness, from which we obtain fragments of HyperLTL (the most prominent hyperlogic) with a simpler satisfiability problem.
In the second part, we develop logics and algorithms for the synthesis of smart contracts. We introduce two extensions of temporal stream logic to express (hyper)properties of infinite-state systems. We study the realizability problem of these logics and define approximations of the problem in LTL and HyperLTL. Based on these approximations, we develop algorithms to construct smart contracts directly from their specifications.

Gerrit GROßMANN
(Advisor: Prof. Verena Wolf)

Stochastic Spreading on Complex Networks
Monday, 12.12.22, 10:00h
Building E1 5, room 029

Abstract:

Complex interacting systems are ubiquitous in nature and society. Computational modeling of these systems is of great relevance for science and engineering. Complex networks are common representations of these systems (e.g., friendship networks or road networks). Dynamical processes (e.g., virus spreading, traffic jams) that evolve on these networks are shaped and constrained by the underlying connectivity. This thesis provides numerical methods to study stochastic spreading processes on complex networks. We consider the processes as inherently probabilistic and analyze their behavior through the lens of probability theory. The contributions include methods for efficient simulations, controlling the spreading using interventions, and inferring the latent interaction graph.

Janis KALOFOLIAS
(Advisor: Prof. Jilles Vreeken)

Subgroup Discovery for Structured Target Concepts
Thursday, 07.12.22, 10:30h
Building C9 3, conference room

Abstract:

In this work we study subgroup discovery, a theoretical framework for finding subgroups in data —i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset.This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. Namely, we show how to efficiently discover representative subgroups, that ensure ensure fairness of the sub-population with regard to a sensitive trait; for entities with additional pairwise relational information we introduce a novel measure of connectedness beyond established measures, and use it to discover named robustly connected sub-populations. Further, we introduce kernelised subgroup discovery, which generalises this task to entities of arbitrary structure and with the additional flexibility of only requiring a Gramian appropriate for the given entities. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel, which allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, and propose a meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel-based method.

Josenildo COSTA DA SILVA
(Advisor: PD Dr. Matthias Klusch)

Privacy-Preserving Distributed Data Mining
Wednesday, 07.12.22, 14:00h
Video conference

Abstract:

This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identified limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduces to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.

Marcel STEINMETZ
(Advisor: Prof. Jörg Hoffmann)

Conflict-Driven Learning in AI Planning State-Space Search
Monday, 05.12.22, 17:00h
Building E1 1, Room 4.01

Abstract:

Many combinatorial computation problems in computer science can be cast as a reachability problem in an implicitly described, potentially huge, graph: the state space. State-space search is a versatile and widespread method to solve such reachability problems, but it requires some form of guidance to prevent exploring that combinatorial space exhaustively. Conflict-driven learning is an indispensab-le search ingredient for solving constraint satisfaction problems (most prominent-ly, Boolean satisfiability). It guides search towards solutions by identifying con-flicts during the search, i.e., search branches not leading to any solution, learning from them knowledge to avoid similar conflicts in the remainder of the search.
In this thesis, we adapt the conflict-driven learning methodology to goal-reachability objectives in classical and probabilistic planning. The canonical form of conflicts in this context are dead-end states, i.e., states from which the desired goal property cannot be reached. We pioneer methods for learning sound and generalizable dead-end knowledge from conflicts encountered during forward sta-te-space search. We develop search methods to identify conflicts in deterministic and probabilistic state spaces, and we develop suitable refinement methods for the different unsolvability detectors so to recognize these and similar dead-end states. Arranged in a depth-first search, our techniques approach the elegance of conflict-driven learning in constraint satisfaction, featuring the ability to learn to refute search subtrees, and intelligent backjumping to the root cause of a con-flict. On the side, we contribute and explore a large design space of probabilistic state-space search methods, establishing new search algorithms, admissible sta-te-space reduction techniques, and goal-probability bounds suitable for heuristic state-space search.

November

Jie HUANG
(Advisor: Prof. Michael Backes)

Mitigating Security and Privacy Threats from Untrusted Application Components on Android
Thursday, 28.11.22, 12:00h
The defense will take place via video conference. Please inquire about the link from the research group.

Abstract:

While Android’s data-intensive and open-source nature, combined with its less-than-strict market approval process, has allowed the installation of flawed and even mali-cious apps, its coarse-grained security model and update bottleneck in the app ecosystem make the platform’s privacy and security situation more worrying.
This dissertation introduces a line of works that mitigate privacy and security threats from untrusted app components. The first work presents a compiler-based library com-partmentalization solution that utilizes privilege separation to establish a strong trust-worthy boundary between the host app and untrusted lib components, thus protecting sensitive user data from being compromised by curious or malicious ad libraries. While for vulnerable third-party libraries, we then build the second work that implements an API-compatibility-based library update framework using drop-in replacements of outdated libraries to minimize the open vulnerability window caused by libraries and we perform multiple dynamic tests and case studies to investigate its feasibility. Our latest work fo-cuses on the misusing of powerful accessibility (a11y) features in untrusted apps. We pre-sent a privacy-enhanced a11y framework that treats the a11y logic as a pipeline com-posed of multiple modules running in different sandboxes. We further enforce flow con-trol over the communication between modules, thus reducing the attack surface from abusing a11y APIs while preserving the a11y benefits.

Patrick FERBER
(Advisors: Profs. Jörg Hoffmann, UdS, & Malte Hemert, Uni Basel)

Machine Learning for Classical Planning: Neural Network Heuristics, Online Portfolios, and State Space Topologies
Thursday, 17.11.22, 17:00h
The defense will take place in Basel. Attendance via video conference is also possible.

Abstract:

Many real world problems can be solved by state space search. We uses machine learning to improve the state of the art in planning in three directions. Informed heuristics are a key component of most state space searches. In the first part, we train neural networks to become well-informed heuristics for a group of similar problems. There exists no single best planning algorithm. Also our trained heuristics exhibit complementary strengths. A priori it is unknown which algorithm to use on a new problem. Thus, we present new state of the art techniques for algorithm selection for state space search. Even if we select the right algorithm, some problems are too difficult for all available algorithms. The search space of a problem together with a heuristic forms a topology. Exploiting this topology greatly improves search, but the topology can only be computed after the search terminated. We present a new method to generalize the topology from a simple problem to similar, but harder problems.

Maryam GHAREGHANI
(Advisor: Prof. Tobias Marschall, now Düsseldorf)

Single-cell strand sequencing for structural variant analysis and genome assembly
Wednesday, 02.11.22, 15:00h
Building E2 1, room 0.01

Abstract:

Rapid advances of DNA sequencing technologies and development of computational tools to analyze sequencing data has started a revolution in the field of genetics. DNA sequencing has applications in medical research, disease diagnosis and treatment, and population genetic studies. Different sequencing techniques have their own advantages and limitations, and they can be used together to solve genome assembly and genetic variant detection. The focus of this thesis is on a specific single-cell sequencing technology, called strand sequencing. With its chromosome and haplotype-specific strand information, this technique has very powerful signals for discovery of genomic structural variations, haplotype phasing, and chromosome clustering. We developed statistical and computational tools to exploit this information from strand sequencing technology. I first present a computational framework for detecting structural variations in single cells using strand sequencing data. These variations and genomic rearrangements have been observed in cancer, therefore the discovery of such events within cell populations can lead to a more accurate picture of cancer genomes and help in diagnosis. In the remainder of this defense, I elaborate on two computational pipelines for clustering long DNA sequences by their original chromosome and haplotype in the absence of a reference genome. These pipelines are developed to facilitate genome assembly and de novo haplotype phasing in a fast and accurate manner. The resulting haplotype assemblies can be useful in studying genomic variations with no reference bias, gaining insights in population genetics, and detection of compound heterozygosity.

October

André ZENNER
(Advisor: Prof. Antonio Krüger)

Advancing Proxy-Based Haptic Feedback in Virtual Reality
Friday, 28.10.22, 15:00h
Building D3 2, „Reuse“ seminar room

Abstract:

This thesis advances haptic feedback for Virtual Reality (VR). Our work is guided by Sutherland’s 1965 vision of the ultimate display, which calls for VR systems to control the existence of matter. To push towards this vision, we build upon proxy-based haptic feedback, a technique characterized by the use of passive tangible props. The goal of this thesis is to tackle the central drawback of this approach, namely, its inflexibility, which yet hinders it to fulfill the vision of the ultimate display. Guided by four research questions, we first show-case the applicability of proxy-based VR haptics by employing the technique for data exploration. We then extend the VR system’s control over users‘ haptic impressions in three steps. First, we contribute the class of Dynamic Passive Haptic Feedback (DPHF) alongside two novel concepts for conveying kinesthetic properties, like virtual weight and shape, through weight-shifting and drag-changing proxies. Conceptually orthogonal to this, we study how visual-haptic illusions can be leveraged to unnoticeably redirect the user’s hand when reaching towards props. Here, we contribute a novel perception-inspired algorithm for Body Warping-based Hand Redirection (HR), an open-source framework for HR, and psychophysical insights. The thesis concludes by proving that the combination of DPHF and HR can outperform the individual techniques in terms of the achievable flexibility of the proxy-based haptic feedback.

Vinh Thinh Ho
(Advisor: Prof. Gerhard Weikum)

Entities with Quantities: Extraction, Search, and Ranking
Friday, 18.10.22, 10:30h
Building E1 4, Room 0.24

Abstract:

Quantities are more than numeric values. They denote measures of the world’s entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly reve-nue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.

Patrick SPEICHER
(Advisor: Prof. Michael Backes)

Simulated Penetration Testing and Mitigation Analysis
Tuesday, 18.10.22, 11:00h
Building E9 1, Room 0.01

Abstract:

As corporate networks and Internet services are becoming increasingly more complex, it is hard to keep an overview over all deployed software, their potential vulnerabilities, and all existing security protocols. Simulated penetration testing was proposed to extend regular penetration testing by transferring gathered information about a network into a formal model and simulate an attacker in this model. Having a formal model of a network enables us to add a defender trying to mitigate the capabilities of the attacker with their own actions. We name this two-player planning task Stackelberg planning. The goal behind this is to help administrators, penetration testing consultants, and the management level at finding weak spots of large computer infrastructure and suggesting cost-effective mitigations to lower the security risk.
In this thesis, we first lay the formal and algorithmic foundations for Stackelberg planning tasks. By building it in a classical planning framework, we can benefit from well-studied heuristics, pruning techniques, and other approaches to speed up the search, for example symbolic search. We then design a theory for privilege escalation and demonstrate the applicability of our framework to local computer networks and to Internet-wide scenarios by investigating the robustness of both the email infrastructure and the web.

Attila KINALI-DOGAN
(Advisor: Dr. Christoph Lenzen)

On Time, Time Synchronization and Noise in Time Measurement Systems
Wednesday, 12.10.22, 10:00h
Building E1 4, Room 0.24

Abstract:

Time plays an important role in our modern lives. Having accurate time means having clocks synchronized to other clocks with as little time difference between them. We will look into the properties of noise in time keeping and measurement, introducing a variant of the Fourier Transform for power limited signals. We will then continue to noise propagation in electronics and the degradation of signal-to-noise ratio due to non-linearities. And finally we will look into the effect of metastabil-ity in distributed clock synchronization systems.

Ahmed SALEM
(Advisor: Prof. Michael Backes)

Adversarial Inference and Manipulation of Machine Learning Models
Tuesday, 04.10.22, 11:00h

Abstract:

Machine learning (ML) has established itself as a core component for various critical applications. However, with this increasing adoption rate of ML models, multiple attacks have emerged targeting different stages of the ML pipeline. Abstractly, the ML pipeline is divided into three phases, including training, updating, and inference.
In this thesis, we evaluate the privacy, security, and accountability risks of the three stages of the ML pipeline. Firstly, we explore the inference phase, where the adversary can only access the target model after deployment. In this setting, we explore one of the most severe attacks against ML models, namely the membership inference attack (MIA). We relax all the MIA’s key assumptions, thereby showing that such attacks are broadly applicable at low cost and thereby pose a more severe risk than previously thought. Secondly, we study the updating phase. To that end, we propose a new attack surface against ML models, i.e., the change in the output of an ML model before and after being updated. We then introduce four attacks, including data reconstruction ones, against this setting. Thirdly, we explore the training phase, where the adversary interferes with the target model’s training. In this setting, we propose the model hijacking attack, in which the adversary can hijack the target model to provide their own illegal task. Finally, we propose different defense mechanisms to mitigate such identified risks.

September

Jonas SCHNEIDER-BENSCH
(Advisor: Prof. Michael Backes)

New approaches to Privacy Preserving Signatures
Monday, 26.09.22, 16:30h

Abstract:

In this thesis we advance the theory and practice of privacy preserving digital signatures. Privacy preserving signatures such as group and ring signatures enable signers to hide in groups of potential signers. We design a cryptographic primitive called signatures with flexible public keys, which allows for modular construction of privacy preserving signatures. Its core is an equivalence relation between verification keys, such that key repre-sentatives can be transformed in their class to obscures their origin. The resulting con-structions are more efficient than the state of the art, under the same or weaker assump-tions. We show an extension of the security model of fully dynamic group signatures, which are those where members may join and leave the group over time. Our contribu-tion here, which is facilitated by the new primitive, is the treatment of membership status as potentially sensitive information. In the theory of ring signatures, we show a construc-tion of ring signatures which is the first in the literature with logarithmic signature size in the size of the ring without any trusted setup or reliance on non-standard assumptions. We show how to extend our techniques to the derived setting of linkable ring signatures, where different signatures of the same origin may be publicly linked. Here, we further revisit the notion of linkable anonymity, offering a significant strengthening compared to previous definitions.

Michael BACKENKÖHLER
(Advisor: Prof. Verena Wolf)

Analysis of Markovian Population Models
Tuesday, 20.09.22, 14;00h
Building E1 7, room 001

Abstract:

Markovian population models are a powerful paradigm to describe processes of stochastically interacting agents. Their dynamics is given by a continuous-time Markov chains over the population sizes. Such large state-spaces make their analysis challenging.
In this thesis we develop methods for this problem class leveraging their structure. We derive linear moment constraints on the expected occupation measure and exit probabilities. In combination with semi-definite constraints on moment matrices, we obtain a convex program. This way, we are able to provide bounds on mean first-passage times and reaching probabilities. We further use these linear constraints as control variates to improve Monte Carlo estimation of different quantities. Two different algorithm for the construction of efficient variate sets are presented and evaluated.
Another set of contributions is based on a state-space lumping scheme that aggregates states in a grid structure. Based on the probabilities of these approximations we iteratively refine relevant and truncate irrelevant parts of the state-space. This way, the algorithm learns a well-justified finite-state projection for different scenarios.

August

Jozef Hladký
(Advisor: Prof. Hans-Peter Seidel)

Latency Hiding through High Fidelity Novel View Synthesis on Thin Clients using Decoupled Streaming Rendering from Powerful Servers
Tuesday, 16.08.2022, 15:00 h
Building E1 4, Room 0.24

Abstract:

Highly responsive 3D applications with state-of-the-art visual fidelity have always been associated with heavy immobile workstation hardware. By offloading demanding computations to powerful servers in the cloud, streaming 3D content from the data center to a thin client can deliver high fidelity responsive experience that is indistinguishable from the content computed locally on a powerful workstation. We introduce methods suitable for this scenario that enable network latency hiding.
In the first part, we introduce a novel high-dimensional space—the camera offset space—and show how it can help to identify an analytical potentially visible set of geometry valid for a range of camera translational and rotational offsets. We demonstrate an efficient parallel implementation of the visibility resolution algorithm which leads to a first-ever method for computing a PVS that is valid for an analytical range of camera offsets, is computable in real-time without the need of pre-processing or spatial data structure construction and requires only raw triangle stream as an input.
In the second part of the thesis, we focus on capturing the scene appearance into structures that enable efficient encoding and decoding, transmission, low memory footprint, and high-fidelity high-framerate reconstruction on the client. Multiple strategies for shading sample distribution and texture atlas packing layouts are presented and analyzed for shading reconstruction quality, packing and compression efficiency.
The third part of the thesis presents a data structure that jointly encodes both appearance and geometry into a texture atlas.
The scene G-Buffer is processed to construct coarse low-resolution geometric proxies which capture the scene appearance and simple planar surfaces. These proxies can be locally augmented with high resolution data to capture complex geometry in sufficient detail, achieving efficient sample distribution and allocation. Capturing the scene from multiple views enables disocclusion support and allows network latency hiding on a thin client device.

July

Jonas FISCHER
(Advisor: Prof. Jilles Vreeken)

More than the sum of its parts – pattern mining, neural networks and how they complement each other
Thursday, 28.07.2022, 13:00 h
Building E1 4, Room 0.24

Abstract:

In this thesis we explore pattern mining and deep learning. Often seen as or-thogonal, we show that these fields complement each other and propose to combine them to gain from each other’s strengths. We, first, show how to ef-ficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpreta-bility of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, pro-posing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previ-ously possible. Large neural networks are, however, exceedingly expensive to train for which ‘lottery tickets’ – small, well-trainable sub-networks in ran-domly initialized neural networks – offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.

David STUTZ
(Advisors: Prof. Bernt Schiele and Prof. Matthias Hein, now Tübingen)

Understanding and Improving Robustness and Uncertainty Estimation in Deep Learning
Friday, 22.07.2022, 14:00 h
Building E1 4, Room 0.24 (and via video conference: https://zoom.us/j/92313014265?pwd=UUxaVXcvVEZVaUhucEt5dUlGVXVBdz09)

Abstract:

Deep learning is becoming increasingly relevant for many high-stakes applications such as medical diagnosis or autonomous driving where wrong decisions can have massive impact on human lives. Unfortunately, deep neural networks are typically assessed solely based on generalization, e.g., accuracy on a fixed test set. However, this is clearly insufficient for safe deployment as potential malicious actors and distribution shifts or the effects of quantization and unreliable hardware are disregarded. In such settings, it is also important to obtain reasonable estimates of the model’s confidence alongside its predictions.
This thesis studies robustness and uncertainty estimation in deep learning along three main directions: First, we consider so-called adversarial examples, slightly perturbed inputs causing severe drops in accuracy. Second, we study weight perturbations, focusing particularly on bit errors in quantized weights. This is relevant for deploying models on special-purpose hardware for efficient inference, so-called accelerators. Finally, we address uncertainty estimation to improve robustness and provide meaningful statistical performance guarantees for safe deployment.

Lukas LANGE
(Advisor: Prof. Dietrich Klakow)

Robust Input Representations for Low-Resource Information Extraction
Friday, 22.07.2022, 10:00 h
Building B3 1, Auditorium 0.11

Abstract:

Recent advances in the field of natural language processing were achieved with deep learning models. This led to a wide range of new research questions concerning the stabil-ity of such large-scale systems and their applicability beyond well-studied tasks and da-tasets, such as information extraction in non-standard domains and languages, in particu-lar, in low-resource environments.
In this work, we address these challenges and make important contributions across fields such as representation learning and transfer learning by proposing novel model architec-tures and training strategies to overcome existing limitations, including a lack of training resources, domain mismatches and language barriers.
In particular, we propose solutions to close the domain gap between representation mod-els. We explore domain-adaptive pre-training of multilingual languages models. Further-more, we present a novel meta-embedding architecture that creates joint representa-tions of multiple embedding methods.
Our broad set of experiments demonstrates state-of-the-art performance of our methods for various sequence tagging and classification tasks and highlight their robustness in chal-lenging low-resource settings across languages and domains.

Adwait SHARMA
(Advisor: Prof. Jürgen Steimle)

Design and Recognition of Microgestures for Always-Available Input
Thursday, 21.07.2022, 15:00 h
The defense will take place via video conference: https://cs-uni-saarland-de.zoom.us/j/82822036987

Abstract:

Gestural user interfaces for computing devices most commonly require the user to have at least one hand free to interact with the device, for example, moving a mouse, touching a screen, or performing mid-air gestures. Consequently, users find it difficult to operate compu-ting devices while holding or manipulating everyday objects. This limits the users from interac-ting with the digital world during a significant portion of their everyday activities, such as, using tools in the kitchen or workshop, carrying items, or workout with sports equipment.
This thesis pushes the boundaries towards the bigger goal of enabling always-available input. Microgestures have been recognized for their potential to facilitate direct and subtle interac-tions. However, it remains an open question how to interact using gestures with computing de-vices when both of the user’s hands are occupied holding everyday objects. We take a holistic approach and focus on three core contributions: i) To understand end-users preferences, we present an empirical analysis of users’ choice of microgestures when holding objects of diverse geometries. Instead of designing a gesture set for a specific object or geometry and to identify gestures that generalize, this thesis leverages the taxonomy of grasp types established from prior research. ii) We tackle the critical problem of avoiding false activation by introducing a novel gestural input concept that leverages a single-finger movement, which stands out from everyday finger motions during holding and manipulating objects. Through a data-driven ap-proach, we also systematically validate the concept’s robustness with different everyday ac-tions. iii) While full sensor coverage on the user’s hand would allow detailed hand-object inter-action, minimal instrumentation is desirable for real-world use. This thesis addresses the prob-lem of identifying sparse sensor layouts. We present the first rapid computational method, a-long with a GUI-based design tool that enables iterative design based on the designer’s high-level requirements. Furthermore, we demonstrate that minimal form-factor devices, like smart rings, can be used to effectively detect microgestures in hands-free and busy scenarios.
Overall, the presented findings will serve as both conceptual and technical foundations for enabling interaction with computing devices wherever and whenever users need them.

June

Andrea HORNAKOVA
(Advisor: Dr. Paul Swoboda)

Lifted Edges as Connectivity Priors for Multicut and Disjoint Paths
Wednesday, 29.06.2022, 14:00 h
Building E1 5, Room 0.29

Abstract:

We study two graph decompositions problems and their representation by 0/1 labeling of edges. The first is multicut (MC) which represents decompositions of undirected graphs (clustering of nodes into connected components). The second is disjoint paths (DP) in di-rected acyclic graphs where the clusters correspond to node-disjoint paths. Our main in-terest is to study connectivity priors represented by so-called lifted edges in the two prob-lems. We call the resulting problems lifted multicut (LMC) and lifted disjoint paths (LDP). Our study of lifted multicut concentrates on partial LMC represented by labeling of a sub-set of (lifted) edges. Given partial labeling, some NP-hard problems arise.
The main focus of the talk is LDP problem. We prove that this problem is NP-hard and propose an optimal integer linear programming (ILP) solver. The solver uses linear ine-qualities that produce a high-quality LP relaxation. LDP is a convenient model for multiple object tracking (MOT) because DP naturally leads to trajectories of objects and lifted ed-ges help to prevent id switches and re-identify persons. Our tracker using the optimal LDP solver was a leading tracker on three benchmarks of the MOT challenge MOT15/16/17, improving significantly over state-of-the-art at the time of its publication. In order to sol-ve even larger instances of a challenging dataset MOT20, we introduce an approximate LDP solver based on Lagrange decomposition. The new tracker achieved on all the four standard MOT benchmarks performance comparable or better than state-of-the-art me-thods (at the time of publication).

Michaela KLAUCK
(Advisor: Prof. Holger Hermanns)

On the Connection of Probabilistic Model Checking, Planning, and Learning for System Verification
Tuesday, 28.06.2022, 11:30 h
Building E1 7, Room 0.01

Abstract:

This thesis presents approaches using techniques from the model checking, planning, and learning community to make systems more reliable and perspicuous. First, two heuristic search and dynamic programming algorithms are adapted to be able to check extremal reachability probabilities, expected accumulated rewards, and their bounded versions, on general Markov decision processes (MDPs). Thereby, the problem space originally solvable by these algorithms is enlarged considerably. In a comprehensive case study it is shown that the implementation, called Modysh, is competitive with state-of-the-art model checkers and even outperforms them on very large state spaces. Second, Deep Statistical Model Checking (DSMC) is introduced, usable for quality assessment and learning pipeline analysis of systems incorporating trained decision-making agents, like neural networks (NNs). The idea of DSMC is to use statistical model checking to assess NNs resolving nondeterminism in systems modeled as MDPs. In a comprehensive scalability study it is demonstrated that DSMC is a lightweight technique tackling the complexity of NN analysis in combination with the state space explosion problem.

Gilles NIES
(Advisor: Prof. Holger Hermanns)

Mastering Satellite Operation: On Model-based and Data-driven Optimal Battery-Aware Scheduling
Wednesday, 22.06.2022, 17:00 h
Building E1 7, Room 0.01

Abstract:

Rechargeable batteries as a power source have become omnipresent. Especi-ally the lithium-ion variant powers virtually every contemporary portable device, it constitutes the central energy source in the domains of e-mobility, autonomous drones and robots, most earth-orbiting spacecraft like satellites and many more.
In this work, we take the perspective of the kinetic battery model, an intuitive analytic model, that keeps track of a battery’s charge as it is strained with a sequence of loads over time. We provide the mathematical foundation of the battery model, extend it with capacity limits and realistic charging behavior, as well as uncertainty in its initial state and the load it is strained with. We derive efficient energy budget analysis algorithms in the form of discre-tization and analytical bisection schemes, discuss their efficiency, empirically analyze their performance and identify their individual strengths and weaknesses. We show how our techniques can be used during design time and in-flight, in the context of the two nanosatellite missions GomX-1 and GomX-3.

Ben WIEDERHAKE
(Advisor: Dr. Christoph Lenzen)

Pulse Propagation, Graph Cover, and Packet Forwarding
Tuesday, 14.06.2022, 10:00 h
Building E1 4, Room 0.24

Abstract:

We study distributed systems, with a particular focus on graph problems and fault tolerance.
Fault-tolerance in a microprocessor or even System-on-Chip can be improved by using a fault-tolerant pulse propagation design. The existing design TRIX achieves this goal by being a distributed system consisting of very simple nodes. We show that even in the typical mode of operation without faults, TRIX performs significantly better than a regular wire or clock tree: Statistical evaluation of our simulated experiments show that we achieve a skew with standard deviation of O(log log H), where H is the height of the TRIX grid.
The distance-r generalization of classic graph problems can give us insights on how distance affects hardness of a problem. For the distance-r dominating set problem, we present both an algorithmic upper and unconditional lower bound for any graph class with certain high-girth and sparseness criteria. In particular, our algorithm achieves a O(r · f(r))-approximation in time O(r), where f is the expansion function, which correlates with density. For constant r, this implies a constant approximation factor, in constant time. We also show that no algorithm can achieve a (2r + 1 − δ)-approximation for any δ > 0 in time O(r), not even on the class of cycles of girth at least 5r. Furthermore, we ex-tend the algorithm to related graph cover problems and even to a different execution model.
Furthermore, we investigate the problem of packet forwarding, which address-es the question of how and when best to forward packets in a distributed sys-tem. These packets are injected by an adversary. We build on the existing algo-rithm OED to handle more than a single destination. In particular, we show that buffers of size O(log n) are sufficient for this algorithm, in contrast to O(n) for the naive approach.

Maximilian SCHWENGER
(Advisor: Prof. Bernd Finkbeiner)

Statically Analyzed Stream Monitoring for Cyber-Physical Systems
Monday, 13.06.2022, 16:00 h
Building E9 1, Room 0.01

Abstract:

Cyber-physical systems are digital systems interacting with the physical world. Even though this induces an inherent complexity, they are responsible for safety-critical tasks like governing nuclear power plants or controlling autonomous vehicles. To preserve trust into the safety of such systems, this thesis presents a runtime verification approach designed to generate trustworthy monitors from a formal specification. These monitors are responsible for observing the cyber-physical system during runtime and ensuring its safety. As underlying language, I present the asynchronous real-time specification language RTLola. It contains primitives for arithmetic properties and grants precise control over the timing of the monitor. With this, it enables specifiers to express properties relevant to cyber-physical systems. The thesis further presents a static analysis that identifies inconsistencies in the specification and provides insights into the dynamic behavior of the monitor. As a result, the resource consumption of the monitor becomes predictable. The generation of the monitor produces either a hardware description synthesizable onto programmable hardware, or Rust code with verification annotation. These annotations allow for proving the correctness of the monitor with respect to the semantics of RTLola. Last, I present the construction of a conservative hybrid model of the underlying system using information extracted from the specification. This model enables further verification steps.

Anna GUIMARAES
(Advisor: Prof. Gerhard Weikum)

Data Science Methods for the Analysis of Controversial Social Media Discussions
Friday, 10.06.2022, 13:00 h

Abstract:

Social media communities like Reddit and Twitter allow users to express their views on topics of their interest, and to engage with other users who may share or oppose these views. This can lead to productive discussions towards a consensus, or to contended debates, where disagreements frequently arise. Prior work on such settings has primarily focused on identifying notable instances of antisocial behavior such as hate-speech and „trolling“, which represent possible threats to the health of a community. These, however, are exceptionally severe phenomena, and do not encompass controversies stemming from user debates, differences of opinions, and off-topic content, all of which can naturally come up in a discussion without going so far as to compromise its development.
This dissertation proposes a framework for the systematic analysis of social media discussions that take place in the presence of controversial themes, disagreements, and mixed opinions from participating users. We start by building a feature model to characterize adversarial discussions surrounding political campaigns on Twitter, with a focus on the factual and sentimental nature of their topics and the role played by different users involved. We then extend our approach to Reddit discussions, leveraging community feedback signals to define a new notion of controversy and to highlight conversational archetypes that arise from frequent and interesting interaction patterns. We use our feature model to build logistic regression classifiers that can predict future instances of controversy in Reddit communities centered on politics, world news, sports, and personal relationships. Finally, our model also provides the basis for a comparison of different communities in the health domain, where topics and activity vary considerably despite their shared overall focus.

May

Preethi LAHOTI
(Advisor: Prof. Gerhard Weikum)

Operationalizing Fairness for Responsible Machine Learning
Friday, 20.05.2022, 16:15 h

Abstract:

As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This dissertation brings ML fairness closer to real-world applications by developing new models and methods that address key challenges in operationalizing fairness in practice.
First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset.
Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation.
Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs to capture nuanced expert knowledge on which individuals should be treated similarly with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data.
Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen failure risks. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty.
Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.

Lassaad CHEIKHROUHOU
(Advisor: PD Dr. Werner Stephan)

Inductive Verification of Cryptographic Protocols Based on Message Algebras – Trace and Indistinguishability Properties
Thursday, 19.05.2022, 16:30 h
Building E1 4, room 0.24

Abstract:

Seit 1981 wurden zahlreiche formale Methoden zur Analyse kryptographischer Protokolle entwickelt und erfolgreich angewendet. Trotz vieler Verbesserungen, beschränkt sich der Anwendungsbereich gerade induktiver Verfahren auf das einfache enc-dec Szenario (Entschlüsseln hebt Verschlüsseln ab) und auf Standardeigenschaften (Vertraulichkeit und Authentifizierung).
In dieser Arbeit erweitern wir den Anwendungsbereich der werkzeugunterstützten induktiven Methode auf Protokolle mit algebraisch spezifizierten kryptografischen Primitiven und auf Ununterscheidbarkeitseigenschaften wie die Resistenz gegen Offline-Testen.
Eine Axiomatisierung von Nachrichtenstrukturen, abgeleitet aus einem konstruktiven Modell (Termersetzung), liefert die Basis für die Definition rekursiver Funktionen und induktives Schließen (partielle Ordnungen, Fallunterscheidungen). Eine neue Beweistechnik
für Vertraulichkeitseigenschaften verwendet rekursive Testfunktionen, die beweisbar korrekt bzgl. eines induktiv definierten Angreifermodells sind. Die Formalisierung von Ununterscheidbarkeitseigenschaften durch generische Ableitungen und ein zentrales Theorem erlauben eine Reduktion auf Trace-Eigenschaften.
Die allgemeinen Aspekte unserer Techniken werden zusammen mit zwei vollständig ausgearbeiteten realen Fallstudien, PACE und TC-AMP, diskutiert, die für den deutschen Personalausweis entwickelt wurden. TC-AMP gehört sicher zu den komplexesten algebraisch
spezifizierten Protokollen, die formal verifiziert wurden. Insbesondere, sind uns keine Ansätze bekannt, die vergleichbare Fälle behandeln.

André NUSSER
(Advisor: Prof. Karl Bringmann)

Fine-Grained Complexity and Algorithm Engineering of Geometric Similarity Measures
Friday, 13.05.2022, 16:00 h

Abstract:

Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity.
Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.

Nico HERBIG
(Advisor: Prof. Antonio Krüger)

Multi-Modal Post-Editing of Machine Translation
Thursday, 12.05.2022, 10:00 h

Abstract:

As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.

Maximilian ALTMEYER
(Advisor: Prof. Antonio Krüger)

Understanding Personal and Contextual Factors to Increase Motivation in Gamified Systems
Tuesday, 03.05.2022, 15:00 h

Abstract:

Gamification, the use of game elements in non-game contexts, has been shown to help people reaching their goals, affect people’s behavior and enhance the users‘ experience within interactive systems. However, past research has shown that gamification is not always successful. In fact, literature reviews revealed that almost half of the interventions were only partially successful or even unsuccessful. Therefore, understanding the factors that have an influence on psychological measures and behavioral outcomes of gamified systems is much in need. In this thesis, we contribute to this by considering the context in which gamified systems are applied and by understanding personal factors of users interacting with the system. Guided by Self-Determination Theory, a major theory on human motivation, we investigate gamification and its effects on motivation and behavior in behavior change contexts, provide insights on contextual factors, contribute knowledge on the effect of personal factors on both the perception and effectiveness of gamification elements and lay out ways of utilizing this knowledge to implement personalized gamified systems. Our contribution is manifold: We show that gamification affects motivation through need satisfaction and by evoking positive affective experiences, ultimately leading to changes in people’s behavior. Moreover, we show that age, the intention to change behavior, and Hexad user types play an important role in explaining interpersonal differences in the perception of gamification elements and that tailoring gamified systems based on these personal factors has beneficial effects on both psychological and behavioral outcomes. Lastly, we show that Hexad user types can be partially predicted by smartphone data and interaction behavior in gamified systems and that they can be assessed in a gameful way, allowing to utilize our findings in gamification practice. Finally, we propose a conceptual framework to increase motivation in gamified systems, which builds upon our findings and outlines the importance of considering both contextual and personal factors. Based on these contributions, this thesis advances the field of gamification by contributing knowledge to the open questions of how and why gamification works and which factors play a role in this regard.

April

Cuong Xuan CHU
(Advisor: Prof. Gerhard Weikum)

Knowledge Extraction from Fictional Texts
Monday, 25.04.2022, 10:00 h

Abstract:

Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries – by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges:
– Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available.
– Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains.
– Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues.
This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains.
– The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision.
– The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall.
– The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.

Ivan PRYVALOV
(Advisor: Prof. Michael Backes)

Cryptography with Anonymity in Mind
Monday, 11.04.2022, 14:00 h

Abstract:

Advances in information technologies gave a rise to powerful ubiquitous computing devices, and digital networks have enabled new ways of fast communication, which immediately found tons of applications and resulted in large amounts of data being transmitted. For decades, cryptographic schemes and privacy-preserving protocols have been studied and researched in order to offer end users privacy of their data and implement useful functionalities at the same time, often trading security properties for cryptographic assumptions and efficiency. In this plethora of cryptographic constructions, anonymity properties play a special role, as they are important in many real-life scenarios. However, many useful cryptographic primitives lack anonymity properties or imply prohibitive costs to achieve them.
In this thesis, we expand the territory of cryptographic primitives with anonymity in mind. First, we define Anonymous RAM, a generalization of a single-user Oblivious RAM to multiple mistrusted users, and present two constructions thereof with different trade-offs between assumptions and efficiency. Second, we define an encryption scheme that allows to establish chains of ciphertexts anonymously and verify their integrity. Furthermore, the aggregatable version of the scheme allows to build a Parallel Anonymous RAM, which enhances Anonymous RAM by supporting concurrent users. Third, we show our technique for constructing efficient non-interactive zero-knowledge proofs for statements that consist of both algebraic and arithmetic statements. Finally, we show our framework for constructing efficient single secret leader election protocols, which have been recently identified as an important component in proof-of-stake cryptocurrencies.

Marten OLTROGGE
(Advisor: Prof. Michael Backes)

TLS on Android – Evolution over the last decade
Thursday, 07.04.2022, 10:30 h

Abstract:

Smart devices and mobile platforms are omnipresent. Android OS has evolved to become the most dominating mobile operating system on the market with billions of devices and a platform with millions of apps. Apps increasingly offer solutions to everyday problems and have become an indispensable part of people’s daily life. Due to this, mobile apps carry and handle more and more personal and privacy-sensitive data which also involves communication with backend or third party services. Due to this, their network traffic is an attractive target for MitMAs. In my talk, I will present a line of research addressing security on Android with a main focus on the network security of Android apps. This work covers various approaches for improving network security on Android and investigates their efficacy as well as it puts findings in context with the general evolution of network security in a larger perspective.

March

Nikolas HAVRIKOV
(Advisor: Prof. Andreas Zeller)

Grammar-Based Fuzzing Using Input Features
Friday, 04.03.2022, 10:00 h

Abstract:

In grammar-based fuzz testing, a formal grammar is used to produce test inputs that are syntactically valid in order to reach the business logic of a program under test.
In this setting, it is advantageous to ensure a high diversity of inputs to test more of the program’s behavior. How can we characterize features that make inputs diverse and associate them with the execution of particular parts of the program?
We present a measure of input coverage called k-path coverage, which makes it possible to efficiently express, assess, and achieve input diversity. We demonstrate and evaluate how to systematically attain k-path coverage, how it correlates with code coverage and can thus be used as its predictor. By automatically inferring explicit associations between k-path features and the coverage of individual methods we further show how to generate inputs that specifically target the execution of given code locations.

Franziska LICHTBLAU
(Advisor: Prof. Anja Feldmann)

From the Edge to the Core: Towards Informed Vantage Point Selection for Internet Measurement Studies
Tuesday, 01.03.2022, 11:00 h

Abstract:

Since the early days of the Internet, measurement scientists are trying to keep up with the fast-paced development of the Internet. As the Internet grew organically over time and without build-in measurability, this process requires many workarounds and due diligence. As a result, every measurement study is only as good as the data it relies on. Moreover, data quality is relative to the research question—a data set suitable to analyze one problem may be insufficient for another. This is entirely expected as the Internet is decentralized, i.e., there is no single observation point from which we can assess the complete state of the Internet. Because of that, every measurement study needs specifically selected vantage points, which fit the research question. In this thesis, we present three different vantage points across the Internet topology — from the e.g., to the Internet core. We discuss their specific features, suitability for different kinds of research questions, and how to work with the corresponding data. The data sets obtained at the presented vantage points allow us to conduct three different measurement studies and shed light on the following aspects: (a) The prevalence of IP source address spoofing at a large European Internet Exchange Point (IXP), (b) the propagation distance of BGP communities, an optional transitive BGP attribute used for traffic engineering, and (c) the impact of the global COVID-19 pandemic on Internet usage behavior at a large Internet Service Provider (ISP) and three IXPs.

February

Harini P. HARIHARAN
(Advisor: Prof. Thorsten Herfet)

Non-Disruptive Use of Light Fields in Image and Video Processing
Wednesday, 23.02.2022, 10:00 h

Abstract:

In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-to-reality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.

Kireeti BODDUNA
(Advisor: Prof. Joachim Weickert)

Structure-aware Image Denoising, Super-resolution and Enhancement Methods
Tuesday, 22.02.2022, 16:15 h

Abstract:

Denoising, super-resolution and structure enhancement are lassical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.

Anna TIGUNOVA
(Advisor: Prof. Gerhard Weikum)

Extracting personal information from conversations
Friday, 18.02.2022, 11:00 h

Abstract:

Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications, such as chatbot assistants, recommendation models and personalized search. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge, providing users with full control over their personal information. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Such conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts.
In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakers‘ personal attributes, such as demographic facts, hobbies and interpersonal relationships. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.

Aditya NITTALA
(Advisor: Prof. Jürgen Steimle)

From Wearable Towards Epidermal Computing: Soft Wearable Devices for Rich Interaction on the Skin
Friday, 11.02.2022, 17:30 h

Abstract:

Human skin provides a large, always available, and easy to access real-estate for interaction. Recent advances in new materials, electronics, and human-computer interaction have led to the emergence of electronic devices that reside directly on the user’s skin. These conformal devices referred to as Epidermal Devices, have mechanical properties compatible with human skin: they are very thin, often thinner than human hair; they elastically deform when the body is moving, and stretch with the user’s skin.
Firstly, this thesis provides a conceptual understanding of Epidermal Devices in the HCI literature. We compare and contrast them with other technical approaches that enable novel on-skin interactions. Then, through a multi-disciplinary analysis of Epidermal Devices, we identify the design goals and challenges that need to be addressed for advancing this emerging research area in HCI. Following this, our fundamental empirical research investigated how epidermal devices of different rigidity levels affect passive and active tactile perception. Generally, a correlation was found between the device rigidity and tactile sensitivity thresholds as well as roughness discrimination ability. Based on these findings, we derive design recommendations for realizing epidermal devices.
Secondly, this thesis contributes novel Epidermal Devices that enable rich on-body interaction. SkinMarks contributes to the fabrication and design of novel Epidermal Devices that are highly skin-conformal and enable touch, squeeze, and bend sensing with co-located visual output. These devices can be deployed on highly challenging body locations, enabling novel interaction techniques and expanding the design space of on-body interaction. Multi-Touch Skin enables high-resolution multi-touch input on the body. We present the first non-rectangular and high-resolution multi-touch sensor overlays for use on skin and introduce a design tool that generates such sensors in custom shapes and sizes. Empirical results from two technical evaluations confirm that the sensor achieves a high signal-to-noise ratio on the body under various grounding conditions and has a high spatial accuracy even when subjected to strong deformations.
Thirdly, Epidermal Devices are in contact with the skin, they offer opportunities for sensing rich physiological signals from the body. To leverage this unique property, this thesis presents the rapid fabrication and computational design techniques for realizing Multi-Modal Epidermal Devices} that can measure multiple physiological signals from the human body. Devices fabricated through these techniques can measure ECG (Electrocardiogram), EMG (Electromyogram), and EDA (Electro-Dermal Activity). We also contribute a computational design and optimization method based on underlying human anatomical models to create optimized device designs that provide an optimal trade-off between physiological signal acquisition capability and device size. The graphical tool allows for easily specifying design preferences and to visually analyze the generated designs in real-time, enabling designer-in-the-loop optimization. Experimental results show high quantitative agreement between the prediction of the optimizer and experimentally collected physiological data.
Finally, taking a multi-disciplinary perspective, we outline the roadmap for future research in this area by highlighting the next important steps, opportunities, and challenges. Taken together, this thesis contributes towards a holistic understanding of \textit{Epidermal Devices}: it provides an empirical and conceptual understanding as well as technical insights through contributions in DIY (Do-It-Yourself), rapid fabrication, and computational design techniques.

Maximilian FICKERT
(Advisor: Prof. Jörg Hoffmann)

Adaptive Search Techniques in AI Planning and Heuristic Search
Friday, 11.02.2022, 16:00 h

Abstract:

Heuristic state-space search is a common approach to solve problems appearing in artificial intelligence and other subfields of computer science. This is typically viewed as a static process: the heuristic function guiding the search is assumed to be unchanged, and its resulting values are directly used for guidance without applying any further reasoning to them. Yet critical aspects of the task may only be discovered during the search, e.g., regions of the state space where the heuristic does not yield reliable values.
Our work here aims to make this process more dynamic, allowing the search to adapt to such observations, e.g., by detecting weaknesses in the heuristic function and refining it accordingly during the search. We also consider settings that inherently require adaptation: In online replanning, a plan that is being executed must be amended for changes in the environment. Similarly, in real-time search, an agent must act under strict time constraints with limited information.
Our results show that the flexibility of these algorithms makes them more robust than traditional approaches on a wide range of benchmarks, and they often yield substantial improvements over current state-of-the-art planners.

January

Simon MOLL
(Advisor: Prof. Sebastian Hack)

Vectorization System for Unstructured Codes with a Data-parallel Compiler IR
Tuesday, 25.01.2022, 10:00 h

Abstract:

Data-parallel programming is an established programming model for Single Instruction Multiple Data (SIMD) CPUs, which underlies loop vectorization, function vectorization and explicitly data-parallel languages.  SIMD programs can only follow one path of execution, which makes the vectorization of data-parallel code with control flow challenging.  For the issue of control flow, state of the art vectorization systems either only operate on structured code or will not give any guarantees on unstructured codes.  This thesis specifically addresses the reliable vectorization of unstructured data-parallel code with contributions in their representation, analysis and transformation. The novel P-LLVM intermediate representation provides a well-defined semantics to the data-parallel region at every stage of the vectorization pipeline.
Partial control-flow linearization is a new algorithm to remove divergent control flow with linear time and guarantees on the control flow retained.  We present a new divergence analysis that considers all divergence effects of P-LLVM programs.  Our new approach to identifying points of joining divergent control flow is optimal on directed acyclic graphs.  We extend it further to a quadratic-time algorithm for arbitrary reducible CFGs.  As we show earlier approaches are either less precise or incorrect.
Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests.  SIMD targets benefit from tensorization through more opportunities for re-use of loaded values and more efficient memory access behavior.
The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization.  Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches.  RV performs on par with the ISPC compiler, which only supports its structured domain-specific language, on a range of tree traversal codes with complex control flow.  RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.

2021

Dezember

Johannes KRUPP
(Advisor: Prof. Christian Rossow)

Using Honeypots to Trace Back Amplification DDoS Attacks
Tuesday, 21.12.2021, 09:00 h

Abstract:

This thesis presents a line of work that enables practical DDoS attack traceback supported by honeypot reflectors. To this end, we investigate the tradeoffs between applicability, required a priori knowledge, and traceback granularity in three settings. First, we show how spoofed attack packets and non-spoofed scan packets can be linked using honeypot-induced fingerprints, which allows attributing attacks launched from the same infrastructures as scans. Second, we present a classifier-based approach to trace back attacks launched from booter services after collecting ground-truth data through self-attacks. Third, we propose to use BGP poisoning to locate the attacking network without prior knowledge and even when attack and scan infrastructures are disjoint. Finally, as all of our approaches rely on honeypot reflectors, we introduce an automated end-to-end pipeline to systematically find amplification vulnerabilities and synthesize corresponding honeypots.

Tobias LANGE
(Advisor: Prof. Thorsten Herfet)

Provision of Multi-Camera Footage for Professional Production and Consumer Environments
Thursday, 16.12.2021, 13:00 h

Abstract:

Multi-camera footage contains much more data compared to that of conventional video. While the additional data enables a number of new effects that previously required a large amount of CGI magic and manual labor to achieve, it can easily overpower consumer hardware and networks. We explore the necessary steps to create an interactive multiview streaming system, from the cameras via the compression and streaming of the material, to the view interpolation to create immersive perspective shifts for viewers. By only using freely available consumer hardware, and making sure all steps can run in real-time, in combination with the others, the benefits of multi-camera video are made available to a wider public.
With the construction of a modular camera array for lightfield recording, we highlight the most important properties of such an array to allow for good post-processing of the recorded data. This includes a flexible yet sturdy frame, the management of computational and storage resources, as well as the required steps to make the raw material ready for further processing. With the Unfolding scene we display the possibilities of lightfields when a good camera array is combined with the talent of professional visual effect artists for the creation of future cinematic movies. Furthermore, we explore the benefits of 5D-lightfield video for scenes with fast motion, using precisely controlled time delays between the shutters of different cameras in the capturing array.

Mohamed OMRAN
(Advisor: Prof. Bernt Schiele)

From Pixels to People: Recovering Location, Shape and Pose of Humans in Images
Wednesday, 15.12.2021, 12:00 h
Campus E1 4, room 24

Abstract:

Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as „looking at people“, aiming to perceive humans in images at different levels of granularity. This includes bounding-box level pedestrian detection, different pixel-level recognition tasks, and 3D human shape and pose estimation. We conclude with a discussion of benchmarking practices in computer vision.

Christopher HAHN
(Advisor: Prof. Bernd Finkbeiner)

Logical and Deep Learning Methods for Temporal Reasoning
Monday, 13.12.2021, 17:15 h

Abstract:

In this thesis, we study logical and deep learning methods for the temporal reasoning of reactive systems.
In Part I, we determine decidability borders for the satisfiability and realizability problem of temporal hyperproperties. Temporal hyperproperties relate multiple computation traces to each other and are expressed in a temporal hyperlogic. In particular, we identify decidable fragments of the highly expressive hyperlogics HyperQPTL and HyperCTL*. As an application, we elaborate on an enforcement mechanism for temporal hyperproperties. We study explicit enforcement algorithms for specifications given as formulas in universally quantified HyperLTL.
In Part II, we train a (deep) neural network on the trace generation and realizability problem of linear-time temporal logic (LTL). We consider a method to generate large amounts of additional training data from practical specification patterns. The training data is generated with classical solvers, which provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions such that the neural network generalizes to the semantics of the logic. The neural network can predict solutions even for formulas from benchmarks from the literature on which the classical solver timed out.
Additionally, we show that it solves a significant portion of problems from the annual synthesis competition (SYNTCOMP) and even out-of-distribution examples from a recent case study.

Jesko HECKING-HARBUSCH
(Advisor: Prof. Bernd Finkbeiner)

Synthesis of Asynchronous Distributed Systems from Global Specifications
Friday, 10.12.2021, 15:15 h

Abstract:

The synthesis problem asks whether there exists an implementation for a given formal specification and derives such an implementation if it exists. This approach enables engineers to think on a more abstract level about what a system should achieve instead of how it should accomplish its goal. The synthesis problem is often represented by a game between system players and environment players. Petri games define the synthesis problem for asynchronous distributed systems with causal memory. So far, decidability results for Petri games are mainly obtained for local winning conditions, which is limiting as global properties like mutual exclusion cannot be expressed.
In this thesis, we make two contributions. First, we present decidability and undecidability results for Petri games with global winning conditions. The global safety winning condition of bad markings defines markings that the players have to avoid. We prove that the existence of a winning strategy for the system players in Petri games with a bounded number of system players, at most one environment player, and bad markings is decidable. The global liveness winning condition of good markings defines markings that the players have to reach. We prove that the existence of a winning strategy for the system players in Petri games with at least two system players, at least three environment players, and good markings is undecidable.
Second, we present semi-decision procedures to find winning strategies for the system players in Petri games with global winning conditions and without restrictions on the distribution of players. The distributed nature of Petri games is employed by proposing encodings with true concurrency. We implement the semi-decision procedures in a corresponding tool.

Azin GHAZIMATIN
(Advisors: Prof. Gerhard Weikum and Dr. Rishiraj Saha Roy)

Enhancing Explainability and Scrutability of Recommender Systems
Thursday, 09.12.2021, 11:00 h

Abstract:

Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm’s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in filtering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system’s behavior can be modified accordingly. In this thesis, we will present our contributions towards explainability and scrutability of recommender systems:
– We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms.
– We propose a method, PRINCE, to facilitate provider-side explainability through generating action-based, counterfactual and concise explanations.
– We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations.

Inken HAGESTEDT
(Advisor: Prof. Michael Backes)

Health Privacy: Methods for privacy-preserving data sharing of methylation, micro-biome and eye tracking data
Wednesday, 08.12.2021, 15:00 h

Abstract:

This thesis studies the privacy risks of biomedical data and develops mechanisms for privacy-preserving data sharing. The contribution of this work is two-fold: First, we demonstrate privacy risks of a variety of biomedical data types such as DNA methylation data, microbiome data and eye tracking data. Despite being less stable than well-studied genome data and more prone to environmental changes, well-known privacy attacks can be adopted and threaten the privacy of data donors. Nevertheless, data sharing is crucial to advance biomedical research given that collection the data of a sufficiently large population is complex and costly. Therefore, we develop as a second step privacy-preserving tools that enable researchers to share such biomedical data. And second, we equip researchers with tools to enable privacy-preserving data sharing. These tools are mostly based on differential privacy, machine learning techniques and adversarial examples and carefully tuned to the concrete use case to maintain data utility while preserving privacy.

Philip WELLNITZ
(Advisors: Profs. Kurt Mehlhorn and Karl Bringmann)

Counting Patterns in Strings and Graphs
Thursday, 02.12.2021, 15:00 h

Abstract:

We study problems related to finding and counting patterns in strings and graphs.
In the string-regime, we are interested in counting how many substring of a text 𝑇 are at Hamming (or edit) distance at most 𝑘 to a pattern 𝑃. Among others, we are interested in the fully-compressed setting, where both 𝑇 and 𝑃 are given in a compressed representation. For both distance measures, we give the first algorithm that runs in (almost) linear time in the size of the compressed representations. We obtain the algorithms by new and tight structural insights into the solution structure of the problems.
In the graph-regime, we study problems related to counting homomorphisms between graphs. In particular, we study the parameterized complexity of the problem #IndSub(𝛷), where we are to count all 𝑘-vertex induced subgraphs of a graph that satisfy the property 𝛷. Based on a theory of Lovász, Curticapean et al., we express #IndSub(𝛷) as a linear combination of graph homomorphism numbers to obtain #W[1]-hardness and almost tight conditional lower bounds for properties 𝛷 that are monotone or that depend only on the number of edges of a graph. Thereby, we prove a conjecture by Jerrum and Meeks.
In addition, we investigate the parameterized complexity of the problem #Hom(ℋ → 𝒢) for graph classes ℋ and 𝒢. In particular, we show that for any problem 𝑃 in the class #W[1], there are classes ℋ_𝑃 and 𝒢_𝑃 such that 𝑃 is equivalent to #Hom(ℋ_𝑃 → 𝒢_𝑃).

November

Pedro MERCADO-LOPEZ
(Advisor: Prof. Matthias Hein, now Tübingen)

Beyond the Arithmetic Mean: Extensions of Spectral Clustering and Semi-supervised Learning for Signed and Multilayer Graphs via Matrix Power Means
Tuesday, 30.11.2021, 12:00 h

Abstract:

In this thesis we present extensions of spectral clustering and semi-supervised learning to signed and multilayer graphs. These extensions are based on a one-parameter family of matrix functions called Matrix Power Means. In the scalar case, this family has the arithmetic, geometric and harmonic means as particular cases.
We study the effectivity of this family of matrix functions through suitable versions of the stochastic block model to signed and multilayer graphs. We provide provable properties in expectation and further identify regimes where the state of
the art fails whereas our approach provably performs well. Some of the settings that we analyze are as follows: first, the case where each layer presents a reliable approximation to the overall clustering; second, the case when one single layer has information about the clusters whereas the remaining layers are potentially just noise; third, the case when each layer has only partial information but all together show global information about the underlying clustering structure.
We present extensive numerical verifications of all our results and provide matrix-free numerical schemes. With these numerical schemes we are able to show that our proposed approach based on matrix power means is scalable to large sparse signed and multilayer graphs.
Finally, we evaluate our methods in real world datasets. For instance, we show that our approach consistently identifies clustering structure in a real signed network where previous approaches failed. This further verifies that our methods are competitive to the state of the art.

Daniel GNAD
(Advisor: Prof. Jörg Hoffmann)

Star-Topology Decoupled State-Space Search in AI Planning and Model Checking


Monday, 22.11.2021, 16:00 h
Building E1 1, Room 4.07

Abstract:

State-space search is a widely employed concept in many areas of computer science.
The well-known state explosion problem, however, imposes a severe limitation to the effective implementation of search in state spaces that are exponential in the size of a compact system description. Decoupled state-space search is a novel approach to tackle the state explosion. It decomposes the system such that the dependencies between components take the form of a star topology with a center and several leaf components. Decoupled search exploits that the leaves in that topology are conditionally independent. We introduce decoupled search in the context of AI planning and formal verification. Building on common formalisms, we develop the concept of the decoupled state space and prove its correctness.
For several existing techniques, most prominently partial-order reduction, symmetry breaking, Petri-net unfolding, and symbolic state representations, we prove that decoupled search can be exponentially more efficient, confirming that it is indeed a novel concept that exploits model properties in a unique way. Empirically, decoupled search favorably compares to state-of-the-art AI planners, and outperforms well-established model checking tools.

Yannick FORSTER
(Advisor: Prof. Gert Smolka)

Computability in Constructive Type Theory
– new date – Thursday, 25.11.21, 15:15h
Building E1 3, HS 002

Abstract:

We give a formalised and machine-checked account of computability theory in the Calculus of Inductive Constructions (CIC), the constructive type theory underlying the Coq proof assistant.
We first develop synthetic computability theory, pioneered by Richman, Bridges, and Baue, where one treats all functions as computable, eliminating the need for a model of computation. We assume a novel parametric axiom for synthetic computability and give proofs of results like Rice’s theorem, the Myhill isomorphism theorem, and the existence of Post’s simple and hypersimple predicates relying on no other axioms such as Markov’s principle or choice axioms.
As a second step, we introduce models of computation. We give a concise overview of definitions of various standard models and contribute machine-checked simulation proofs, posing a non-trivial engineering effort.
We identify a notion of synthetic undecidability relative to a fixed halting problem, allowing axiom-free machine-checked proofs of undecidability. We contribute such undecidability proofs for the historical foundational problems of computability theory which require the identification of invariants left out in the literature and now form the basis of the Coq Library of Undecidability Proofs.
We then identify the weak call-by-value λ-calculus L as sweet spot for programming in a model of computation. We introduce a certifying extraction framework and analyse an axiom stating that every function of type N → N is L-computable.

Xiaoyu SHEN
(Advisors: Profs. Dietrich Klakow and Gerhard Weikum)

Deep Latent-Variable Models for Neural Text Generation
Friday, 05.11.2021, 10:00 h

Abstract:

Text generation aims to produce human-like natural language output for down-stream tasks. It covers a wide range of applications like machine translation, document summarization, dialogue generation and so on. Recently deep neural network-based end-to-end architectures are known to be data-hungry, and text generated from them usually suffer from low diversity, interpretability and controllability. As a result, it is difficult to trust the output from them in real-life applications. Deep latent-variable models, by specifying the probabilistic distribution over an intermediate latent process, provide a potential way of addressing these problems while maintaining the expressive power of deep neural networks. This presentation will explain how deep latent-variable models can improve over the standard encoder-decoder model for text generation. We will start from an introduction of encoder-decoder and deep latent-variable models, then go over popular optimization strategies, and finally elaborate on how latent variable models can help improve the diversity, interpretability and data efficiency in different applications of text generation tasks.

October

Marc HABERMANN
(Advisor: Prof. Christian Theobalt)

Real-Time Human Performance Capture and Synthesis
Friday, 29.10.2021, 15:00 h

Abstract:

Most of the images one finds in the media, such as on the Internet or in textbooks and magazines, contain humans as the main point of attention. Thus, there is an inherent necessity for industry, society, and private persons to be able to thoroughly analyze and synthesize the human-related content in these images. One aspect of this analysis and subject of this thesis is to infer the 3D pose and surface deformation, using only visual information, which is also known as human performance capture.
This thesis proposes two monocular human performance capture methods, which for the first time allow the real-time capture of the dense deforming geometry as well as an unseen 3D accuracy for pose and surface deformations. At the technical core, this work introduces novel GPU-based and data-parallel optimization strategies in conjunction with other algorithmic design choices that are all geared towards real-time performance at high accuracy. Moreover, this thesis presents a new weakly supervised multi-view training strategy combined with a fully differentiable character representation that shows superior 3D accuracy.
However, there is more to human-related Computer Vision than only the analysis of people in images. It is equally important to synthesize new images of humans in unseen poses and also from camera viewpoints that have not been observed in the real world. To this end, this thesis presents a method and ongoing work on character synthesis, which allow the synthesis of controllable photoreal characters that achieve motion- and view-dependent appearance effects as well as 3D consistency and which run in real time. This is technically achieved by a novel coarse-to-fine geometric character representation for efficient synthesis, which can be solely supervised on multi-view imagery.

September

Apratim BHATTACHARYYA
(Advisors: Profs. Bernt Schiele and Mario Fritz)

Long-term Future Prediction under Uncertainty and Multi-modality
Friday, 24.09.2021, 16:00 h

Abstract:

Humans have an innate ability to excel at activities that involve prediction of complex object dynamics such as predicting the possible trajectory of a billiard ball after it has been hit by the player or the prediction of motion of pedestrians while on the road. A key feature that enables humans to perform such tasks is anticipation. To advance the field of autonomous systems, particularly, self-driving agents, in this thesis, we focus on the task of future prediction in diverse real world settings, ranging from deterministic scenarios such as prediction of paths of balls on a billiard table to the predicting the future of non-deterministic street scenes. Specifically, we identify certain core challenges for long-term future prediction: long-term prediction, uncertainty, multi-modality, and exact inference. To address these challenges, this thesis makes the following core contributions. Firstly, for accurate long-term predictions, we develop approaches that effectively utilize available observed information in the form of image boundaries in videos or interactions in street scenes. Secondly, as uncertainty increases into the future in case of non-deterministic scenarios, we leverage Bayesian inference frameworks to capture calibrated distributions of likely future events. Finally, to further improve performance in highly-multimodal non-deterministic scenarios such as street scenes, we develop deep generative models based on conditional variational autoencoders as well as normalizing flow based exact inference methods. Furthermore, we introduce a novel dataset with dense pedestrian-vehicle interactions to further aid the development of anticipation methods for autonomous driving applications in urban environments.

Kelvin CHELLI

High Mobility in OFDM based Wireless Communication Systems
(Advisor: Prof. Thorsten Herfet)
Wednesday, 15.09.2021, 13:30 h

Abstract:

Orthogonal Frequency Division Multiplexing (OFDM) has been adopted as the transmission scheme in most of the wireless systems we use on a daily basis. It brings with it several inherent advantages that make it an ideal waveform candidate in the physical layer. However, OFDM based wireless systems are severely affected in High Mobility scenarios. In this thesis, we investigate the effects of mobility on OFDM based wireless systems and develop novel techniques to estimate the channel and compensate its effects at the receiver. Compressed Sensing (CS) based channel estimation techniques like the Rake Matching Pursuit (RMP) and the Gradient Rake Matching Pursuit (GRMP) are developed to estimate the channel in a precise, robust and computationally efficient manner. In addition to this, a Cognitive Framework that can detect the mobility in the channel and configure an optimal estimation scheme is also developed and tested. The Cognitive Framework ensures a computationally optimal channel estimation scheme in all channel conditions. We also demonstrate that the proposed schemes can be adapted to other wireless standards easily. Accordingly, evaluation is done for three current broadcast, broadband and cellular standards. The results show the clear benefit of the proposed schemes in enabling high mobility in OFDM based wireless communication systems.

Michael JACOBS

Design and Implementation of WCET Analyses – Including a Case Study on Multi-Core Processors with Shared Buses
(Advisor: Prof. Sebastian Hack)
Friday, 10.09.2021, 14:00 h

Abstract:

For safety-critical real-time embedded systems, the worst-case execution time WCET) analysis – determining an upper bound on the possible execution times of a program – is an important part of the system verification. Multi-core processors share resources (e.g. buses and caches) between multiple processor cores and, thus, complicate the WCET analysis as the execution times of a program executed on one processor core significantly depend on the programs executed in parallel on the concurrent cores. We refer to this phenomenon as shared-resource interference.
This thesis proposes a novel way of modeling shared-resource interference during WCET analysis. It enables an efficient analysis – as it only considers one processor core at a time – and it is sound for hardware platforms exhibiting timing anomalies. Moreover, this thesis demonstrates how to realize a timing-compositional verification on top of the proposed modeling scheme. In this way, this thesis closes the gap between modern hardware platforms, which exhibit timing anomalies, and existing schedulability analyses, which rely on timing compositionality.
In addition, this thesis proposes a novel method for calculating an upper bound on the amount of interference that a given processor core can generate in any time interval of at most a given length. Our experiments demonstrate that the novel method is more precise than existing methods.

July

Rakshith SHETTY

Adversarial Content Manipulation for Analyzing and Improving Model Robustness
(Advisor: Prof. Bernt Schiele)
Tuesday, 27.07.2021, 14:30 h

Abstract:

Computer vision systems deployed in the real-world will encounter inputs far from its training distribution. For example, a self-driving car might see a blue stop-sign it has not seen before. To ensure safe operation when faced with such out-of-distribution corner cases, it is vital to quantify the model robustness to such data before deployment. In this dissertation, we build generative models to create synthetic data variations at scale and leverage them to test the robustness of target computer vision systems to these variations. First, we build generative models which can controllably manipulate image and text data. This includes models to a) change visual context by removing objects, b) edit the appearance of an object in images, and c) change the writing style of text. Next, using these generative models we create model-agnostic and model-targeted input variations, to study the robustness of computer vision systems. While in the model-agnostic case, the input image is manipulated based on design priors, in the model-targeted case the manipulation is directly guided by the target model performance. With these methods, we measure and improve the robustness of various computer vision systems — specifically image classification, segmentation, object detection and visual question answering systems — to semantic input variations. Additionally, in the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user.

Ayush TEWARI

Self-Supervised Reconstruction and Synthesis of Faces
(Advisor: Prof. Christian Theobalt)
Monday, 26.07.2021, 17:00 h

Abstract:

Photorealistic and semantically controllable digital models of human faces are important for a wide range of applications such as movies, virtual reality, and casual photography. Traditional approaches require expensive setups which capture the person from multiple cameras under different illumination conditions. Recent approaches have also explored digitizing faces under less constrained settings, even from a single image of the person. These approaches rely on priors, commonly known as 3D morphable models (3DMMs), which are learned from datasets of 3D scans.  This thesis pushes the state of the art in high-quality 3D reconstruction of faces from monocular images. A model-based face autoencoder architecture is introduced which integrates convolutional neural networks, 3DMMs, and differentiable rendering for self-supervised training on large image datasets. This architecture is extended to enable the  refinement of a pretrained 3DMM just from a dataset of monocular images, allowing for  higher-quality reconstructions. In addition, this thesis demonstrates the learning of the identity components of a 3DMM directly from videos without using any 3D data. Since videos are more readily available, this model can generalize better compared to the models learned from limited 3D scans.
This thesis also presents methods for the photorealistic editing of portrait images. In contrast to traditional approaches, the presented methods do not rely on any supervised training. Self-supervised editing is achieved by integrating the semantically meaningful 3DMM-based monocular reconstructions with a pretrained and fixed generative adversarial network. While this thesis presents several ideas which enable self-supervised learning for the reconstruction and synthesis of faces, several open challenges remain. These challenges, as well as an outlook for future work are also discussed.

Frederik MÖLLERS

On Privacy in Home Automation Systems
(Advisor: Prof. Christoph Sorge)
Monday, 12.07.2021, 10:00 h

Abstract:

Home Automation Systems (HASs) are becoming increasingly popular in newly built as well as existing properties. While offering increased living comfort, resource saving features and other commodities, most current commercial systems do not protect sufficiently against passive attacks. In this thesis we investigate privacy aspects of Home Automation Systems. We analyse the threats of eavesdropping and traffic analysis attacks, demonstrating the risks of virtually undetectable privacy violations. By taking aspects of criminal and data protection law into account, we give an interdisciplinary overview of privacy risks and challenges in the context of HASs. We present the first framework to formally model privacy guarantees of Home Automation Systems and apply it to two different dummy traffic generation schemes. In a qualitative and quantitative study of these two algorithms, we show how provable privacy protection can be achieved and how privacy and energy efficiency are interdependent. This allows manufacturers to design and build secure Home Automation Systems which protect the users‘ privacy and which can be arbitrarily tuned to strike a compromise between privacy protection and energy efficiency.

Duc Cuong NGUYEN

Improving Android App Security and Privacy with Developers
(Advisor: Prof. Michael Backes)
Thursday, 08.07.2021, 14:00 h

Abstract:

Existing research has uncovered many security vulnerabilities in Android applications (apps) caused by inexperienced, and unmotivated developers. Especially, the lack of tool support makes it hard for developers to avoid common security and privacy problems in Android apps. As a result, this leads to apps with security vulnerability that exposes end users to a multitude of attacks.
This thesis presents a line of work that studies and supports Android developers in writing more secure code. We first studied to which extent tool support can help developers in creating more secure applications. To this end, we developed and evaluated an Android Studio extension that identifies common security problems of Android apps, and provides developers suggestions to more secure alternatives. Subsequently, we focused on the issue of outdated third-party libraries in apps which also is the root cause for a variety of security vulnerabilities. Therefore, we analyzed all popular 3rd party libraries in the Android ecosystem, and provided developers feedback and guidance in the form of tool support in their development environment to fix such security problems. In the second part of this thesis, we empirically studied and measured the impact of user reviews on app security and privacy evolution. Thus, we built a review classifier to identify security and privacy related reviews and performed regression analysis to measure their impact on the evolution of security and privacy in Android apps. Based on our results we proposed several suggestions to improve the security and privacy of Android apps by leveraging user feedbacks to create incentives for developers to improve their apps toward better versions.

Bhaskar RAY CHAUDHURY

Finding Fair and Efficient Allocations
(Advisor: Prof. Kurt Mehlhorn)
Monday, 05.07.2021, 18:00 h

Abstract:

Fair division has developed into a fundamental branch of mathematical economics over the last seven decades (since the seminal work of Hugo Steinhaus in the 1940s). In a classical fair division problem, the goal is to „fairly“ allocate a set items among a set of agents. Several real-life scenarios are paradigmatic of the problems in this domain.
In this defense, we explore some fundamental questions about fair division. In particular, we focus on settings in which the items to be divided are either indivisible goods or divisible bads. Despite their practical significance, both these settings have been relatively less investigated than the divisible goods setting.

June

Alexander MARX

Information-Theoretic Causal Discovery
(Advisor: Prof. Jilles Vreeken)
Tuesday, 29.06.2021, 14:00 h

Abstract:

It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Causal discovery tries to answer precisely this question by rigorously analyzing under which assumptions it is feasible to infer directed causal networks from passively collected, so-called observational data. Classical approaches assume the data to be faithful to the causal graph, that is, independencies found in the distribution are assumed to be due to separations in the true graph. Under this assumption, so-called constraint-based methods can infer the correct Markov equivalence class of the true graph (i.e. the correct undirected graph and some edge directions), only using conditional independence tests.
In this dissertation, we aim to alleviate some of the weaknesses of constraint-based algorithms. In the first part, we investigate causal mechanisms, which cannot be detected when assuming faithfulness. We then suggest a weaker assumption based on triple interactions, which allows for recovering a broader spectrum of causal mechanisms. Subsequently, we focus on conditional independence testing, which is a crucial tool for causal discovery. In particular, we propose to measure dependencies through conditional mutual information, which we show can be consistently estimated even for the most general setup: discrete-continuous mixture random variables. Last, we focus on distinguishing Markov equivalent graphs (i.e. infer the complete DAG structure), which boils down to inferring the causal direction between two random variables. In this setting, we focus on continuous and mixed-type data and develop our methods based on an information-theoretic postulate, which states that the true causal graph can be compressed best, i.e. has the smallest Kolmogorov complexity.

Leif BERGERHOFF

Evolutionary Models for Signal Enhancement and Approximation
(Advisor: Prof. Joachim Weickert)
Monday, 28.06.2021, 15:00 h

Abstract:

This thesis deals with nature-inspired evolution processes for the purpose of signal enhancement and approximation. The focus lies on mathematical models which originate from the description of swarm behaviour. We extend existing approaches and show the potential of swarming processes as a modelling tool in image processing. In our work, we discuss the use cases of grey scale quantisation, contrast enhancement, line detection, and coherence enhancement. Furthermore, we propose a new and purely repulsive model of swarming that turns out to describe a specific type of backward diffusion process. It is remarkable that our model provides extensive stability guarantees which even support the utilisation of standard numerics. In experiments, we demonstrate its applicability to global and local contrast enhancement of digital images. In addition, we study the problem of one-dimensional signal approximation with limited resources using an adaptive sampling approach including tonal optimisation. We suggest a direct energy minimisation strategy and validate its efficacy in experiments. Moreover, we show that our approximation model can outperform a method recently proposed by Dar and Bruckstein.

Marius STEFFENS

Understanding Emerging Client-Side Web Vulnerabilities using Dynamic Program Analysis
(Advisor: Dr. Ben Stock)
Monday, 28.06.2021, 14:00 h

Abstract:

Today’s Web heavily relies on JavaScript as it is the main driving force behind the plethora of Web applications that we enjoy daily. The complexity and amount of this client-side code have been steadily increasing over the years. At the same time, new vulnerabilities keep being uncovered, for which we mostly rely on manual analysis of security experts. Unfortunately, such manual efforts do not scale to the problem space at hand.
Therefore in this thesis, we present techniques capable of finding vulnerabilities automatically and at scale that originate from malicious inputs to postMessage handlers, polluted prototypes, and client-side storage mechanisms.
Our results highlight that the investigated vulnerabilities are prevalent even among the most popular sites, showing the need for automated systems that help developers uncover them in a timely manner. Using the insights gained during our empirical studies, we provide recommendations for developers and browser vendors to tackle the underlying problems in the future. Furthermore, we show that security mechanisms designed to mitigate such and similar issues cannot currently be deployed by first-party applications due to their reliance on third-party functionality. This leaves developers in a no-win situation, in which either functionality can be preserved or security enforced.

Sreyasi NAG CHOWDHURY

Text-Image Synergy for Multimodal Retrieval and Annotation
(Advisor: Prof. Gerhard Weikum)
Monday, 28.06.2021, 10:00 h

Abstract:

Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.

Anurag PANDEY

Variety Membership Testing in Algebraic Complexity
(Advisor: Prof. Markus Bläser)
Thursday, 17.06.2021, 17:00 h

Abstract:

We study some of the central problems in algebraic complexity theory through the lens of the variety membership testing problem. In the first part, we investigate whether separations between algebraic complexity classes can be phrased as instances of the variety membership testing problem. We make progress in the monotone commutative and the noncommutative settings. In the second part, we give efficient algorithms for the blackbox polynomial identity testing problem, and the rank computation problem of symbolic ma-trices, both phrasable as instances of the variety membership testing problem. For the former, we give a randomness-efficient algorithm and for the latter, we give an approximation algorithm. In the final part, we prove lower bounds, showing the NP-hardness of two problems on 3-tensors — the orbit closure containment problem and the tensor slice rank problem, both of which are instances of the variety membership testing problem.

Mahmoudreza BABAEI

Information Consumption on Social Media: Efficiency, Divisiveness, and Trust
(Advisor: Prof. Krishna Gummadi)
Monday, 14.06.2021, 09:00 h

Abstract:

Over the last decade, the advent of social media has profoundly changed the way people produce and consume information online. On these platforms, users themselves play a role in selecting the sources from which they consume information, overthrowing traditional journalistic gatekeeping. Moreover, advertisers can target users with news stories using users’ personal data. This new model has many advantages: the propagation of news is faster, the number of news sources is large, and the topics covered are diverse. However, in this new model, users are often overloaded with redundant information, and they can get trapped in filter bubbles by consuming divisive and potentially false information. To tackle these concerns, in my thesis, I address the following important questions:
(i) How efficient are users at selecting their information sources?
(ii) How can we break the filter bubbles that users get trapped in?
(iii) How can we design an efficient framework for fact-checking?

Oliver SCHRANZ

Towards Principled Dynamic Analysis on Android
(Advisor: Prof. Michael Backes)
Tuesday, 08.06.2021, 13:00 h

Abstract:

The vast amount of information and services accessible through mobile handsets running the Android operating system has led to the tight integration of such devices into our daily routines. However, their capability to capture and operate upon user data provides an unprecedented insight into our private lives that needs to be properly protected, which demands for comprehensive analysis and thorough testing. While dynamic analysis has been applied to these problems in the past, the corresponding literature consists of scattered work that often specializes on sub-problems and keeps on re-inventing the wheel, thus lacking a structured approach. To overcome this unsatisfactory situation, this dissertation introduces two major systems that advance the state-of-the-art of dynamically analyzing the Android platform. First, we introduce a novel, fine-grained and non-intrusive compiler-based instrumentation framework that allows for precise and high-performance modification of Android apps and system components. Second, we present a unifying dynamic analysis platform with a special focus on Android’s middleware in order to overcome the common challenges we identified from related work. Together, these two systems allow for a more principled approach for dynamic analysis on Android that enables comparability and composability of both existing and future work.

Mohamed GAD-ELRAB

Explainable Methods for Knowledge Graph Refinement and Exploration via Symbolic Reasoning
(Advisor: Prof. Gerhard Weikum)
Monday, 07.06.2021, 16:00 h

Abstract:

Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare.  While recent efforts have led to the construction of large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine these KGs to enhance their coverage and accuracy via KG completion and KG validation. Both of these tasks are particularly challenging when performed under the open world assumption. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Moreover, exploring KGs, by search and browsing,  is essential for users to understand the KG value and limitations towards down-stream applications. However, enabling KG exploration is a challenge task given the large size of existing KGs
This dissertation tackles the challenges of KG refinement and KG exploration by combining symbolic reasoning over the KG with other techniques such as KG embedding models and text mining. Through such a combination, the proposed methods handle the complex nature of KGs and provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used for accurately inferring missing links in the KG. Furthermore, we propose a framework for tracing evidence for supporting (refuting) KG facts from both KG and text. Extracted evidence is used to assess the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.

Dilip DURAI

Novel graph based algorithms for transcriptome sequence analysis
(Advisor: Prof. Marcel Schulz, now Uni Frankfurt)
Wednesday, 02.06.2021, 9:00 h

Abstract:

RNA-sequencing (RNA-seq) is one of the most-widely used techniques in molecular biology. A key bioinformatics task in any RNA-seq workflow is the assembling the reads.  As the size of transcriptomics data sets is constantly increasing, scalable and accurate assembly approaches have to be developed.Here, we propose several approaches to improve assembling of RNA-seq data generated by second-generation sequencing technologies. We demonstrated that the systematic removal of irrelevant reads from a high coverage dataset prior to assembly, reduces runtime and improves the quality of the assembly. Further, we propose a novel RNA-seq assembly workflow comprised of read error correction, normalization, assembly with informed parameter selection and transcript-level expression computation.
In recent years, the popularity of third-generation sequencing technologies increased as long reads allow for accurate isoform quantification and gene-fusion detection, which is essential for biomedical research. We present a sequence-to-graph alignment method to detect and to quantify transcripts for third-generation sequencing data. Also, we propose the first gene-fusion prediction tool which is specifically tailored towards long-read data and hence achieves accurate expression estimation even on complex data sets. Moreover, our method predicted experimentally verified fusion events along with some novel events, which can be validated in the future.

May

 Mouhammad SAKR

Parameterized Verification and Repair of Concurrent Systems
(Advisor: PD Dr. Swen Jacobs)
Monday, 31.05.2021, 17:15 h

Abstract:

Concurrent systems are hard to get correct, and the problem becomes much complex when the system is composed of an arbitrary number n of components. For such systems, methods such as parameterized model checking can provide correctness guarantees that hold regardless of n. Unfortunately the parameterized model checking problem (PMCP) is in general undecidable, however there exist several methods that decide the problem for specific classes of systems and properties. Furthermore, if a parameterized model checker detects an error in a given system, it does not provide a mechanism to repair it such that it satisfies the specification. Hence, to repair the system, the user has to find out which behavior of a component causes the error, and how it can be corrected. Both tasks may be nontrivial.
In this thesis, we present novel approaches for model checking, repair and synthesis of systems that may be parameterized in their number of components. We improve existing results for guarded protocols and we show that the PMCP of guarded protocols and token passing systems is decidable for specifications that add a quantitative aspect to LTL, called Prompt-LTL. Furthermore, we present, to our knowledge, the first parameterized repair algorithm. We show how this algorithm can be used on classes of systems that can be represented as well structured transition systems (WSTS).

Adrin JALALI

Interpretable Methods in Cancer Diagnostics
(Advisor: Prof. Nico Pfeifer, now Tübingen)
Friday, 28.05.2021, 14:00 h

Abstract:

Cancer is a hard problem. It is hard for the patients, for the doctors and nurses, and for the researchers working on understanding the disease and finding better treatments for it. The challenges faced by a pathologist diagnosing the disease for a patient is not necessarily the same as the ones faced by cell biologists working on experimental treatments and understanding the fundamentals of cancer. In this thesis we work on different challenges faced by both of the above teams.
This thesis first presents methods to improve the analysis of the flow cytometry data used frequently in the diagnosis process, specifically for the two subtypes of non-Hodgkin Lymphoma which are our focus: Follicular Lymphoma and Diffuse Large B Cell Lymphoma. With a combination of concepts from graph theory, dynamic programming, and machine learning, we present methods to improve the diagnosis process and the analysis of the abovementioned data. The interpretability of the method helps a pathologist to better understand a patient’s disease, which itself improves their choices for a treatment. In the second part, we focus on the analysis of DNA-methylation and gene expression data, both of which present the challenge of being very high dimensional yet with a few number of samples comparatively. We present an ensemble model which adapts to different patterns seen in each given data, in order to adapt to noise and batch effects. At the same time, the interpretability of our model helps a pathologist to better find and tune the treatment for the patient: a step further towards personalized medicine.

Aurore FASS

Studying JavaScript Security Through Static Analysis
(Advisor: Prof. Michael Backes)
Wednesday, 26.05.2021, 14:00 h

Abstract:

As the Internet keeps on growing, so does the interest of malicious actors. While the Internet has become widespread and popular to interconnect billions of people, this interconnectivity also simplifies the spread of malicious software. Specifically, JavaScript has become a popular attack vector, as it enables to stealthily exploit bugs and further vulnerabilities to compromise the security and privacy of Internet users. In this thesis, we approach these issues by proposing several systems to statically analyze real-world JavaScript code at scale.
First, we focus on the detection of malicious JavaScript samples. To this end, we propose two learning-based pipelines, which leverage syntactic, control and data-flow based features to distinguish benign from malicious inputs. Subsequently, we evaluate the robustness of such static malicious JavaScript detectors in an adversarial setting. For this purpose, we introduce a generic camouflage attack, which consists in rewriting malicious samples to reproduce existing benign syntactic structures. Finally, we consider vulnerable browser extensions. In particular, we abstract an extension source code at a semantic level, including control, data, and message flows, and pointer analysis, to detect suspicious data flows from and toward an extension privileged context. Overall, we report on 185 unique Chrome extensions that attackers could exploit to, e.g., execute arbitrary code in a victim’s browser.

Kathrin GROSSE

Why is Machine Learning so hard?
(Advisor: Prof. Michael Backes)
Thursday, 20.05.2021, 11:00 h

Abstract:

Security has always been a cat and mouse game. In some fields, solutions have emerged: for example cryptography found sound ways to represent attacker and defender, allowing to formally reason about vulnerability. Recently, researchers have raised concerns about the security of learning algorithms. Such learning models are shown labelled instances during training (for example Malware and benign programs). At so called test-time, the model is able to predict for an unseen program if it is benign or malicious. Although security in general is a hard problem, in Machine Learning (ML), many vulnerabilities are inherently linked to properties that are essential to the algorithms. As a result, ML security is still far from being solved.
In the first part of the talk, we will talk about the lottery ticket hypothesis and deep neural networks. Lottery tickets are small subnetworks that can be retrieved by an iterative training and pruning procedure. We show how randomness influences the resulting subnetwork: given a fixed set of initial weights, training with stochasticity during training will yield a different subnetwork each run. These different networks are truly different, and not an instance of weight space symmetry. This property illustrates the complexity of ML when it comes to the resulting classifiers.
However, works exist that link this diversity even inside a single model to the framework of Bayesian uncertainty. Bayesian uncertainty measures allow a classifier to quantify how certain it is about the class assignment it outputs. In this context, we investigate test-time vulnerability or evasion attacks, also called adversarial examples. In other words, we minimally alter inputs so they will be classified at high confidence by the Bayesian network. We further transfer this adversarial examples to another, second classifier that relies on a different approximation of Bayesian uncertainty. Yet, the adversarial examples work on both models equally well. We conclude that although ML models are very diverse, they are not diverse enough to provide protection against test-time attacks.

 

April

Ezekiel Olamide SOREMEKUN

Evidence-driven Testing and Debugging of Software Systems
(Advisor: Prof. Andreas Zeller)
Thursday, 08.04.2021, 14:00 h

Abstract:

Automated debugging techniques systematically expose, reproduce, diagnose and fix software bugs. These techniques aim to support developers during software testing and debugging activities. A key challenge is that the state-of-the-art debugging techniques are hardly used or adopted in practice because they fail to address the actual needs of software developers. Thus, there is a gap between proposed debugging techniques and the actual state of software practice. To bridge this gap, we propose an evidence-driven approach that collects empirical evidence from software practice to guide, direct and automate software testing and debugging tasks. Applying this approach provides several empirical results and techniques in this work. Firstly, to elicit developers‘ debugging needs, we conduct an empirical study to evaluate the state of debugging in practice and provide our empirical data as a benchmark (called DBGBench) to aid the automated evaluation of debugging techniques. To build effective debugging techniques, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) techniques using real bugs and apply our empirical results to develop a hybrid approach that outperforms the state-of-the-art AFL techniques. To repair invalid program inputs, we evaluate the prevalence and causes of invalid inputs in practice. We then build on our empirical results to develop a general-purpose algorithm that automatically diagnoses and repairs invalid inputs. Lastly, to generate inputs (dis)similar to real-world program inputs, we learn input distributions from real-world sample inputs using probabilistic grammar. Then, we employ the learned probabilistic grammar to guide the test generation of inputs (dis)similar to the sample inputs found in practice. Overall, our evaluation shows that our evidence-driven approach allows one to elicit relevant debugging requirements from software practice and develop novel and effective debugging techniques. In summary, our proposed debugging techniques build on empirical evidence obtained from software practice and improve over the state-of-the-art techniques.

March

Michael SCHERER

Computational Solutions for Addressing Heterogeneity in DNA Methylation Data
(Advisor: Prof. Thomas Lengauer)
Tuesday, 30.03.2021, 11:00 h

Abstract:

DNA methylation is a cell-type-specific epigenetic modification with important roles in gene regulation and can be assayed genome-wide using various experimental techniques. The generated data enables in-depth characterization of heterogeneity in DNA methylation patterns at three levels: (i) between sample groups defined by a phenotype, (ii) between the samples sharing a phenotype, and (iii) within a sample. Within my thesis and in this talk, I will present novel computational solutions for addressing the three levels of heterogeneity. First, we extended the RnBeads software package facilitating DNA methylation data analysis with novel modules for epigenetic age prediction, missing value imputation, and differential variability analysis. Second, we developed a novel R-software package (named MAGAR) for associating DNA methylation alterations with genetic variations in so-called methylation quantitative trait loci (methQTL). We further discerned tissue-specific from shared methQTLs, which showed distinct properties and genomic location. To address between-sample heterogeneity, we developed a pipeline to deconvolve heterogeneous DNA methylation patterns into their cellular constituents. Lastly, we investigated within-sample heterogeneity (WSH), which can be quantified genome-wide using different scores. We developed a novel metric and compared the scores using simulated and experimental data. In this talk, I will discuss novel software tools for dissecting DNA methylation heterogenity with an emphasis on the methQTL analysis and the WSH scores. Furthermore, I will show how these software packages can be used to further our understanding of epigenetic regulation.

Panagiotis MANDROS

Discovering robust dependencies from data
(Advisor: Prof. Jilles Vreeken)
Thursday, 04.03.2021, 11:00 h

Abstract:

Nowadays, scientific discoveries are fueled by data-driven approaches. As a motivating example from Materials Science, using appropriate feature selection techniques the physicochemical process of compound semiconductor crystallization can now be described with only a few atomic material properties. The dissertation investigates this direction further, that is, we aim to develop interpretable knowledge discovery techniques that can uncover and meaningfully summarize complex phenomena from data.
Such feature selection tasks are very demanding. The dependencies in data can be of any type (e.g., linear, non-linear, multivariate), while the data can be of any type as well (i.e., mixtures of discrete and continuous attributes). We have to accommodate all types in our analysis, because otherwise we can miss important interactions (e.g., linear methods cannot find non-linear relationships). For interpretability, the degree of dependence should be meaningfully summarized (e.g., “target Y depends 50% on X”, or “the variables in set X exhibit 80% dependence”). From a statistical perspective, the degree of dependence should be robustly measured from data samples of any size to minimize spurious discoveries. Lastly, the combinatorial optimization problem to discover the top dependencies out of all possible dependencies is NP-hard, and hence, we need efficient exhaustive algorithms. For situations where this is prohibitive (i.e., large dimensionalities), we need approximate algorithms with guarantees such that we can interpret the quality of the solution.
In this dissertation, we solve the aforementioned challenges and propose effective algorithms that discover dependencies in both supervised and unsupervised scenarios. In particular, we employ notions from Information Theory to meaningfully quantify all types of dependencies (e.g., Mutual Information and Total Correlation), which we then couple with notions from statistical learning theory to robustly estimate them from data. Lastly, we derive tight and efficient bounding functions that can be used by branch-and-bound, enabling exact solutions in large search spaces. Case studies on Materials Science data confirm that we can indeed discover high-quality dependencies from complex data.

February

Saskia METZLER

Structural Building Blocks in Graph Data: Characterised by Hyperbolic Communities and Uncovered by Boolean Tensor Clustering
(Advisor: Prof. Pauli Miettinen, now University of Eastern Finland)
Wednesday, 24.02.2021, 14:00 h

Abstract:

Graph data nowadays easily become so large that it is infeasible to study the underlying structures manually. Thus, computational methods are needed to uncover large-scale structural information. In this thesis, we present methods to understand and summarise large networks.
We propose the hyperbolic community model to describe groups of more densely connected nodes within networks using very intuitive parameters. The model accounts for a frequent connectivity pattern in real data: a few community members are highly interconnected; most members mainly have ties to this core. Our model fits real data much better than previously-proposed models. Our corresponding random graph generator, HyGen, creates graphs with realistic intra-community structure.
Using the hyperbolic model, we conduct a large-scale study of the temporal evolution of communities on online question–answer sites. We observe that the user activity within a community is constant with respect to its size throughout its lifetime, and a small group of users is responsible for the majority of the social interactions.
We propose an approach for Boolean tensor clustering. This special tensor factorisation is restricted to binary data and assumes that one of the tensor directions has only non-overlapping factors. These assumptions – valid for many real-world data, in particular time-evolving networks – enable the use of bitwise operators and lift much of the computational complexity from the task.

Andreas RAU

Topic-Driven Testing
(Advisor: Prof. Andreas Zeller)
Monday, 08.02.2021, 15:00 h

Abstract:

Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stack—to the best of our knowledge a unique feature of our technique.

Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can transfer on average sixty percent of the tested application behavior to new apps—without any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say “add item to shopping cart”, it tests the specified behavior in an application–both in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frameworks, creates change resilient UI tests, and lays the foundation for learning, transferring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.

January

Philipp von STYP-REKOWSKY

Retrofitting Privacy Controls to Stock Android
(Advisor: Prof. Michael Backes)
Monday, 18.01.2021, 14:00 h

Abstract:

Android is the most popular operating system for mobile devices, making it a prime target for attackers. To counter these, Android’s security concept uses app isolation and access control to critical system resources. However, Android gives users only limited options to restrict app permissions according to their privacy preferences but instead lets developers dictate the permissions users must grant. Moreover, Android’s security model is not designed to be customizable by third-party developers, forcing users to rely on device manufacturers to address their privacy concerns. This thesis presents a line of work that retrofits comprehensive privacy controls to the Android OS to put the user back in charge of their device. It focuses on developing techniques that can be deployed to stock Android devices without firmware modifications or root privileges. The first part of this dissertation establishes fundamental policy enforcement on third-party apps using inlined reference monitors to enhance Android’s permission system. This approach is then refined by introducing a novel technique for dynamic method hook injection on Android’s Java VM. Finally, we present a system that leverages process-based privilege separation to provide a virtualized application environment that supports the enforcement of complex security policies. A systematic evaluation of our approach demonstrates its practical applicability, and over one million downloads of our solution confirm user demand for privacy-enhancing tools.

2020

December

Michael GERKE

Modeling and Verifying the FlexRay Physical Layer Protocol with Reachability Checking of Timed Automata
(Advisor: Prof. Bernd Finkbeiner)
Monday, 21.12.2020, 14:15 h

Abstract:

The FlexRay automotive bus protocol is used in many modern cars to facilitate communication between the various electronic control units (ECU) that control the car. FlexRay’s physical layer protocol facilitates communicating a message from an ECU to other ECUs that each have their own local oscillator which is not synchronized to the other ECUs local oscillators, even though they all run at roughly the same frequency. The physical layer protocol is thus the foundation that the upper protocol layers build upon, and it has to deal with a non-synchronized bit transfer over the bus‘ potentially long wire harness, with variances in delay that can be incurred while traversing the bus, with interference that might cause voltage values on the bus to change thus effectively flipping bits, and with local oscillators that will deviate slightly from their nominal frequency. Verifying the reliability and the resilience of such a physical layer protocol against various potentially fault inducing scenarios under a variety of realistic assumptions on the performance of the hardware the protocol is deployed on, is nontrivial. To this end, the thesis presents models of the physical layer protocol and its hardware environment using Timed-Automata, which include error models. Using a combination of Binary Decision Diagrams and Difference Bound Matrices, Algorithmic approaches for fully symbolic model checking of Timed Automata and data structures enabling these approaches are presented as well. This allows to handle the considerable complexity of model checking such an industrially used protocol. Furthermore, the thesis presents the lessons learned on how to optimize such Timed Automata models to enable model checking while retaining enough complexity to provide relevant results concerning the reliability and resilience of the modeled protocol. The results of the verification effort confirm the resilience of the FlexRay physical layer protocol against specific patterns of bit flips and against specific degradations in the hardware performance, exceeding the limits put forth by the protocol’s designers.

Tribhuvanesh OREKONDY

Understanding and Controlling Leakage in Machine Learning
(Advisor: Prof. Bernt Schiele)
Friday, 18.12.2020, 16:00 h

Abstract:

Machine learning models are being increasingly adopted in a variety of real-world scenarios. However, the privacy and confidentiality implications for users in such scenarios is not well understood. In this talk, I will present our work that focuses on better understanding and controlling leakage of information in three parts. The first part of the talk is motivated by the fact that a massive amount of personal photos are disseminated by individuals, many of which unintentionally reveals a wide range of private information. To address this, I will present our work on visual privacy, which attempts to identify privacy leakage in visual data. The second part of my talk analyzes privacy leakage during training of ML models, such as in the case of federated learning where individuals contribute towards the training of a model by sharing parameter updates. The third part of my talk focuses on leakage of information at inference-time by presenting our work on model functionality stealing threats, where an adversary exploits black-box access to a victim ML model to inexpensively create a replica model. In each part, I also discuss our work to mitigate leakage of sensitive information to enable wide-spread adoption of ML techniques in real-world scenarios.

Eldar INSAFUTDINOV

Towards Accurate Multi-Person Pose Estimation in the Wild
(Advisor: Prof. Bernt Schiele)
Wednesday, 16.12.2020, 16:00 h

Abstract:

Human pose estimation is a computer vision task of localising major joints of a human skeleton in natural images and is one of the most important visual recognition tasks in the scenes containing humans with numerous applications in robotics, virtual and augmented reality, gaming and healthcare among others. This thesis proposes methods for human pose estimation and articulated pose tracking in images and video sequences. Unlike most of the prior work, which is focused on pose estimation of single pre-localised humans, we address a case with multiple people in real world images which entails several challenges such as person-person overlaps in highly crowded scenes, unknown number of people or people entering and leaving video sequences. Our multi-person pose estimation algorithm is based on the bottom-up detection-by-grouping paradigm through an optimization of a multicut graph partitioning objective. Our approach is general and is also applicable to multi-target pose tracking in videos. In order to facilitate further research on this problem we collect and annotate a large scale dataset and a benchmark for articulated multi-person tracking. We also propose a method for estimating 3D body pose using on-body wearable camera using a pair of downward facing, head-mounted cameras that capture an entire body. Our final contribution is a method for reconstructing 3D objects from weak supervision. Our approach represents objects as 3D point clouds and is able to learn them with 2D supervision only and without requiring camera pose information at training time. We design a differentiable renderer of point clouds as well as a novel loss formulation for dealing with camera pose ambiguity.

Daniel VAZ

Approximation Algorithms for Network Design and Cut Problems in Bounded-Treewidth
(Advisors: Prof. Kurt Mehlhorn and Prof. Parinya Chalermsook, now Aalto University, Finland)
Tuesday, 08.12.2020, 14:00 h

Abstract:

In this talk, we will present new approximation results for the group Steiner tree problem on bounded-treewidth graphs. In the group Steiner tree problem, the input is a graph together with sets of vertices called groups; the goal is to choose one representative vertex from each group and connect all the representatives with minimum cost. The problem is hard to approximate even when the input graph is a tree: it is unlikely that an approximation factor better than O(log^2 n) exists, and current techniques cannot achieve this ratio even on graphs with treewidth 2. We show an O(log^2 n)-approximation algorithm for bounded-treewidth graphs, which matches the known lower bound for trees, and improves upon previous techniques. These results imply new approximation algorithms for other related problems, such as a high-connectivity extension of group Steiner tree. We will also introduce the firefighter problem, which models the spread of fire in a graph, and briefly discuss our results for this problem on trees and bounded-treewidth graphs.

Franziska MÜLLER

Real-time 3D Hand Reconstruction in Challenging Scenes from a Single Color or Depth Camera
(Advisor: Prof. Christian Theobalt)
Wednesday, 02.12.2020, 16:00 h

Abstract:

Hands are one of the main enabling factors for performing complex tasks and humans naturally use them for interactions with their environment. Reconstruction and digitization of 3D hand motion opens up many possibilities for important applications, for example in human–computer interaction, augmented or virtual reality (AR/VR), automatic sign-language translation, activity recognition, or teaching robots. Different approaches for 3D hand motion capture have been actively researched in the past. Earlier methods have clear drawbacks like the need to wear intrusive markers or gloves or having to calibrate a stationary multi-camera setup. For interaction purposes, users need continuous control and immediate feedback. This means the algorithms have to run in real time and be robust in uncontrolled scenes. Thus, the majority of more recent methods uses a single color or depth camera which, however, makes the problem harder due to more ambiguities in the input. While recent research has shown promising results, current state-of-the-art methods still have strong limitations. Most approaches only track the motion of a single hand in isolation and do not take background-clutter or interactions with arbitrary objects or the other hand into account. The few methods that can handle more general and natural scenarios run far from real time or use complex multi-camera setups which makes them unusable for many of the aforementioned applications. This thesis pushes the state of the art for real-time 3D hand tracking and reconstruction in general scenes from a single RGB or depth camera. The presented approaches explore novel combinations of generative hand models and powerful cutting-edge machine learning techniques. In particular, this thesis proposes a novel method for hand tracking in the presence of strong occlusions and clutter, the first method for full global 3D hand tracking from in-the-wild RGB video, and a method for simultaneous pose and dense shape reconstruction of two interacting hands that, for the first time, combines a set of desirable properties previously unseen in the literature.

Vitalii AVDIIENKO

Program Analysis for Anomaly Detection
(Advisor: Prof. Andreas Zeller)
Wednesday, 02.12.2020, 10:30 h

Abstract:

When interacting with mobile applications, users may not always get what they expect. For instance, when users download Android applications from a market, they do not know much about their actual behavior. A brief description, a set of screens-hots and a list of permissions, which give a high level intuition of what applications might be doing, form user expectations. However applications do not always meet them. For example, a gaming application intentionally could send SMS messages to a premium number in a background without a user’s knowledge. A less harmful scenario would be a wrong message confirming a successful action that was never executed.
Whatever the behavior of a mobile application might (app) be, in order to test and fully understand it, there needs to be a technique that can analyse the User Interface (UI) of the app and the code associated with it as the whole.
This thesis presents a static analysis framework called SAFAND that given an ANDROID app performs the following analysis:
gathers information on how the application uses sensitive data;
identifies and analyses UI elements of the application;
binds UI elements with their corresponding behavior.
The thesis illustrates how results obtained from the framework can be used to identify problems ranging from small usability issues to malicious behavior of real-world applications.

November

Arsène PÉRARD-GAYOT

Generating Renderers
(Advisor: Prof. Philipp Slusallek)
Friday, 27.11.2020, 15:30 h

Abstract:

Physically-based renderers are large pieces of software, and require a lot of computing power to produce pictures from complex 3D scenes. Part of this is due to the fact that in these renderers, optimizing performance is often diffi-cult because of the large configurability in the type of scene that is given as an input. For instance, optimizing for SIMD execution is difficult when the intersection functions need to perform dynamic dispatches to allow different primitive types. Another issue is that these renderers are written using currently available techniques and tools for software development, which are severely lacking when it comes to expressing parallelism or exploiting low-level hardware features. The direct consequence of that problem is that renderer implementations often rely on non-portable compiler-specific intrinsics, making them difficult to port and adapt to new architectures and platforms. The traditional solution to these problems is to design domain-specific languages that al-low the programmer to encode domain-specific knowledge into the code. With that knowledge, a domain-specific compiler can then optimize that code and choose the best strategy for execution on a given piece of hardware. However, writing domain-specific compilers is an arduous task, as it requires both an ex-tensive knowledge of the domain, and a deep understanding of compiler construction.
In this talk, we will present another way to encode domain-specific knowledge, and describe a rendering framework based on it. The general idea is to build generic, high-level and declarative rendering code in a langu-age supporting partial evaluation. Then, such code can be optimized for a target machine and particular use-case by provi-ding partial evaluation annotations. Those annotations instruct the compiler to specialize the entire code according to the available knowledge of the input: In the case of rendering, for instance, this means that shaders are specialized when their inputs are known, and in general, all redundant computations are eliminated. Additionally, partial evaluation allows to use the knowledge of the target architecture to adapt the execution strategies of various part of the renderer. The results of this technique show that the generated renderers are faster and more por-table than renderers written with state-of-the-art competing libraries, and that in compa-rison, the presented work requires less implementation effort.

Nataniel P. BORGES Junior

Learning the Language of Apps
(Advisor: Prof. Andreas Zeller)
Thursday, 26.11.2020, 14:00 h

Abstract:

To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and effciently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app’s language , that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to rene this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20% . We learn the language of textual inputs, by identifying the semantics of input felds in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage by ≈ 10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.

 

Philipp MÜLLER

Sensing, Interpreting, and Anticipating Human Social Behaviour in the Real World
(Advisor: Prof. Andreas Bulling, now Stuttgart)
Friday, 13.11.2020, 14:00 h

Abstract:

A prerequisite for the creation of social machines that are able to support humans in fields like education, psychotherapy, or human resources is the ability to automatically detect and analyse human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging.
This thesis moves closer to the vision of social machines in the real world, making fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, it advances the state of the art in human visual behaviour sensing, proposing a novel unsupervised method for eye contact detection in group interactions by exploiting the connection between gaze and speaking turns. Furthermore, the thesis makes use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, this thesis improves the interpretation of social signals by proposing datasets and methods for emotion recognition and low rapport detection in less constrained settings than previously studied. In addition, it for the first time investigates a cross-dataset evaluation setting for emergent leadership detection. Third, this thesis pioneers methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.

Aastha MEHTA

Ensuring Compliance with Data Privacy and Usage Policies in Online Services
(Advisors: Prof. Peter Druschel & Prof. Deepak Garg)
Tuesday, 03.11.2020, 14:30 h

Abstract:

Online services collect and process a variety of sensitive personal data that is subject to complex privacy and usage policies. Complying with the policies is critical and often legally binding for service providers, but it is challenging as applications are prone to many disclosure threats. In this thesis, I present two compliance systems, Qapla and Pacer, that ensure efficient policy compliance in the face of direct and side-channel disclosures, respectively.
Qapla prevents direct disclosures in database-backed applications (e.g., personnel management systems), which are subject to complex access control, data linking, and aggregation policies. Conventional methods inline policy checks with application code. Qapla instead specifies policies directly on the database and enforces them in a database adapter, thus separating compliance from the application code.
Pacer prevents network side-channel leaks in cloud applications. A tenant’s secrets may leak via its network traffic shape, which can be observed at shared network links (e.g., network cards, switches). Pacer implements a cloaked tunnel abstraction, which hides secret-dependent variation in tenant’s traffic shape, but allows variations based on non-secret information, enabling secure and efficient use of network resources in the cloud.
Both systems require modest development efforts, and incur moderate performance overheads, thus demonstrating their usability.

September

Frederic RABER

Supporting Lay Users in Privacy Decisions When Sharing Sensitive Data
(Advisor: Prof. Antonio Krüger)
Wednesday, 23.09.2020, 16:00 h

Abstract:

The first part of the thesis focuses on assisting users in choosing their privacy settings, by using machine learning to derive the optimal set of privacy settings for the user. In contrast to other work, our approach uses context factors as well as individual factors to provide a personalized set of privacy settings. The second part consists of a set of intelligent user interfaces to assist the users throughout the complete privacy journey, from defining friend groups that allow targeted information sharing; through user interfaces for selecting information recipients, to find possible errors or unusual settings, and to refine them; up to mechanisms to gather in-situ feedback on privacy incidents, and investigating how to use these to improve a user’s privacy in the future. Our studies have shown that including tailoring the privacy settings significantly increases the correctness of the predicted privacy settings; whereas the user interfaces have been shown to significantly decrease the amount of unwanted disclosures.

Thorsten WILL

From condition-specific interactions towards the differential complexome of proteins
(Advisor: Prof. Volkhard Helms)
Tuesday, 22.09.2020, 10:00 h

Abstract:

While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In my thesis, I present four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions.
PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, I developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples.
Furthermore, I demonstrate a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with the novel differential protein complex analysis tool CompleXChange.

Dushyant MEHTA

Real-time 3D Human Body Pose Estimation from Monocular RGB Input
(Advisor: Prof. Christian Theobalt)
Monday, 14.09.2020, 14:00 h

Abstract:

Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture systems require extensive on-body, and/or external instrumentation, which is not only cumbersome but expensive as well, putting it out of the reach of most creators. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to-entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes.

Alexander LÜCK

Stochastic spatial modelling of DNA methylation patterns and moment-based parameter estimation
(Advisor: Prof. Verena Wolf)
Friday, 11.09.2020, 14:30 h

Abstract:

In this thesis, we introduce and analyze spatial stochastic models for DNA methylation, an epigenetic mark with an important role in development. The underlying mechanisms controlling methylation are only partly understood. Several mechanistic models of enzyme activities responsible for methylation have been proposed. Here, we extend existing hidden Markov models (HMMs) for DNA methylation by describing the occurrence of spatial methylation patterns with stochastic automata networks. We perform numerical analysis of the HMMs applied to (non-)hairpin bisulfite sequencing KO data and accurately predict the wild-type data from these results. We find evidence that the activities of Dnmt3a/b responsible for de novo methylation depend on the left but not on the right CpG neighbors.

Felix KLEIN

Synthesizing Stream Control
(Advisor: Prof. Bernd Finkbeiner)
Tuesday, 08.09.2020, 14:00 h

Abstract:

The management of reactive systems requires controllers, which coordinate time, data streams, and data transformations, all joint by the high level perspective of their control flow. This flow is required to drive the system correctly and continuously, which turns the development into a challenge. The process is error-prone, time consuming, unintuitive, and costly. An attractive alternative is to synthesize the system instead, where only the desired behavior needs to be specified. We introduce Temporal Stream Logic (TSL), a logic which exclusively argues about the system’s control, while treating data and functional transfor-mations as interchangeable black-boxes. In TSL it is possible to specify control flow prop-erties independently of the complexity of the handled data. Furthermore, with TSL a syn-thesis engine can check for realizability, even without a concrete implementation of the data transformations at hand. We present a modular development framework that uses syn-thesis to identify the high level control. If successful, the created control flow is extended with concrete data transformations in order to be compiled into a final executable.
Nevertheless, current synthesis approaches cannot replace existing manual development work flows immediately. The developer still may create incomplete or faulty specifications at first, that need the be refined after a subsequent inspection. In the worst case, require-ments are contradictory or miss important assumptions resulting in unrealizable specifica-tions. In both scenarios, additional feedback from the synthesis engine is needed to debug the errors. To this end, we also explore two further improvements. On the one hand, we consider output sensitive synthesis metrics. On the other hand, we analyse the extension with delay, whose requirement is a frequent reason for unrealizability.

Fabian BENDUN

Privacy Enhancing Technologies: Protocol verification, implementation and specification
(Advisor: Prof. Michael Backes)
Monday, 07.09.2020, 11:00 h

Abstract:

In this thesis, we present novel methods for verifying, implementing, and specifying protocols. In particular, we focus properties modeling data protection and the protection of privacy. In the first part of the thesis, the author introduces protocol verification and presents a model for verification that encompasses so-called Zero-Knowledge (ZK) proofs. These ZK proofs are a cryptographic primitive that is particularly suited for hiding information and hence serves the protection of privacy. The here presented model gives a list of criteria which allows the transfer of verification results from the model to the implementation if the criteria are met by the implementation. In particular, the criteria are less demanding than the ones of previous work regarding ZK proofs. The second part of the thesis contributes to the area of protocol implementations. Hereby, ZK proofs are used in order to improve multi-party computations. The third and last part of the thesis explains a novel approach for specifying data protection policies. Instead of relying on policies, this approach relies on actual legislation. The advantage of relying on legislation is that often a fair balancing is introduced which is typically not contained in regulations or policies.

August

Mikko RAUTIAINEN

Sequence to Graph Alignment: Theory, Practice and Applications
(Advisor: Prof. Tobias Marschall, now Uni Düsseldorf)
Tuesday, 25.08.2020, 10:00 h

Abstract:

Genome graphs, also called sequence graphs, provide a way of explicitly representing genomic variation and are used in several applications in bioinformatics. Due to the growing importance of sequence graphs, methods for handling graph-based data structures are becoming more important. In this work I examine the generalization of sequence alignment to graphs. First, I present a theoretical basis for quick bit-parallel sequence-to-graph alignment. The bit-parallel method outperforms previous algorithms by a factor of 3-21x in runtime depending on graph topology. Next, I present GraphAligner, a practical tool for aligning sequences to graphs. GraphAligner enables sequence-to-graph alignment to scale to mammalian sized genomes and is an order of magnitude faster than previous graph alignment tools. Finally, I present two applications where GraphAligner is an essential part. First, AERON is a tool for quantifying RNA expression and detecting gene fusion events with long reads. Second, I present a hybrid graph-based genome assembly pipeline that uses novel methods to combine short and long read sequencing technologies.

Ralf JUNG

Understanding and Evolving the Rust Programming Language
(Advisor: Prof. Derek Dreyer)
Friday, 21.08.2020, 11:00 h

Abstract:

Rust is a young systems programming language that aims to fill the gap between high-level languages with strong safety guarantees and low-level languages that give up on safety in exchange for more control. Rust promises to achieve memory and thread safety while at the same time giving the programmer control over data layout and memory management. This dissertation presents two projects establishing the first formal foundations for Rust, enabling us to better understand and evolve this language: RustBelt and Stacked~Borrows. RustBelt is a formal model of Rust’s type system, together with a soundness proof establishing memory and thread safety. The model is also able to verify some intricate APIs from the Rust standard library that internally encapsulate the use of unsafe language features. Stacked Borrows is a proposed extension of the Rust specification which enables the compiler to use Rust types to better analyze and optimize the code it is compiling. The adequacy of this specification is not only evaluated formally, but also by running real Rust code in an instrumented version of Rust’s Miri interpreter that implements the Stacked Borrows semantics. RustBelt is built on top of Iris, a language-agnostic framework for building higher-order concurrent separation logics. This dissertation gives an introduction to Iris and explains how to build complex high-level reasoning principles from a few simple ingredients. In RustBelt, this technique is used to introduce the lifetime logic, a separation logic account of borrowing, which plays a key role in the Rust type system.

July

Alexander GRESS

Integration of protein three-dimensional structure into the workflow of interpretation of genetic variants
(Advisor: Prof. Olga Kalinina)
Friday, 24.07.2020, 12:00 h

Abstract:

In this thesis I present my work on the integration of protein 3D structure information into the methodological workflow of computational biology. We developed an algorithmic pipeline that is able to map protein sequences to protein structures, providing an additional source of information. We used this pipeline in order to analyze the effects of genetic variants from the perspective of protein 3D structures. We analyzed genetic variants associated with diseases and compared their structural arrangements to that of neutral variants. Additionally, we discussed how structural information can improve methods that aim to predict the consequences of genetic variants.

Fatemeh BEHJATI ARDAKANI

Computational models of gene expression regulation
(Advisor: Prof. Marcel Schulz, now Frankfurt)
Tuesday, 21.07.2020, 10:00 h

Abstract:

To date, many efforts have been put into elucidating the genetic or epigenetic defects that result in various diseases. Even though there has been a lot achieved through these efforts, there are still many unknowns about gene regulation. In this thesis, I describe our work on various problems about understanding the gene regulation. In the first project we proposed a new approach for predicting transcription factor binging across different tissues and cell lines. Next, we focused on a special group of genes, namely bidirectional genes. Through adopting appropriate regularization incorporated into our statistical model, we were able to achieve interpretable predictive models linking the histone modification data with gene expression. The two subsequent projects aimed at taking advantage of single cell sequencing data. By investigating the single cell expression patterns of bidirectional genes, we proposed a novel hypothetical model describing several transcription states existing in bidirectional genes. Finally, we developed a multi-task learning framework suitable for analyzing single cell data from which we were able to derive potential new discoveries through deciphering the complexities of the regulation mechanism.

Sabine MÜLLER

Nephroplastoma in MRI Data
(Advisor: Prof. Joachim Weickert)
Thursday, 09.07.2020, 16:15 h

Abstract:

The main objective of this work is the mathematical analysis of nephroblastoma in MRI sequences. At the beginning we provide two different datasets for segmentation and classification. Based on the first dataset, we analyze the current clinical practice regarding therapy planning on the basis of annotations of a single radiologist. We can show with our benchmark that this approach is not optimal and that there may be significant differences between human annotators and even radiologists. In addition, we demonstrate that the approximation of the tumor shape currently used is too coarse granular and thus prone to errors. We address this problem and develop a method for interactive segmentation that allows an intuitive and accurate annotation of the tumor.
While the first part of this thesis is mainly concerned with the segmentation of Wilms‘ tumors, the second part deals with the reliability of diagnosis and the planning of the course of therapy. The second data set we compiled allows us to develop a method that dramatically improves the differential diagnosis between nephroblastoma and its precursor lesion nephroblastomatosis. Finally, we can show that even the standard MRI modality for Wilms‘ tumors is sufficient to estimate the developmental tendencies of nephroblastoma under chemotherapy.

Yongqin XIAN

Learning from Limited Labeled Data – Zero-Shot and Few-Shot Learning
(Advisor: Prof. Bernt Schiele)
Tuesday, 07.07.2020, 16:00 h

Abstract:

Human beings have the remarkable ability to recognize novel visual concepts after observing only few or zero examples of them. Deep learning, however, often requires a large amount of labeled data to achieve a good performance. Labeled instances are expensive, difficult and even infeasible to obtain because the distribution of training instances among labels naturally exhibits a long tail. Therefore, it is of great interest to investigate how to learn efficiently from limited labeled data. This thesis concerns an important subfield of learning from limited labeled data, namely, low-shot learning. The setting assumes the availability of many labeled examples from known classes and the goal is to learn novel classes from only a few~(few-shot learning) or ze-ro~(zero-shot learning) training examples of them. To this end, we have developed a series of multi-modal learning approaches to facilitate the knowledge transfer from known classes to novel classes for a wide range of visual recognition tasks including image classifica-tion, semantic image segmentation and video action recognition. More specifically, this thesis mainly makes the following contributions. First, as there is no agreed upon zero-shot image classification benchmark, we define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets. Second, in order to tackle the labeled data scarcity, we propose feature generation frameworks that synthesize data in the visual feature space for novel classes. Third, we extend zero-shot learning and few-shot learning to the semantic segmentation task and propose a challenging benchmark for it. We show that incorporating semantic information into a semantic segmentation network is effective in segmenting novel classes. Finally, we develop better video representation for the few-shot video classification task and leverage weakly-labeled videos by an efficient retrieval method.

Kailash BUDHATHOKI

Causal Inference on Discrete Data
(Advisor: Prof. Jilles Vreeken)
Monday, 06.07.2020, 14:00 h

Abstract:

Causal inference is one of the fundamental problems in science. A particularly interesting setting is to tell cause from effect between a pair of random variables X and Y given a sample from their joint distribution. For a long period of time, it was thought to be impossible to distinguish between causal structures X → Y and Y → X from observational data as the factorisation of the joint distribution is the same in both directions. In the past decade, researchers have made a long stride in this direction by exploiting sophisticated properties of the joint distribution. Most of the existing methods, however, are for continuous real-valued data.
In the first part of the thesis, we consider bivariate causal inference on different discrete data settings—univariate i.i.d., univariate non-i.i.d., and multivariate i.i.d. pairs. To this end, we build upon the principle of algorithmic independence of conditionals (AIC), which states that marginal distribution of the cause is algorithmically independent of conditional distribution of the effect given the cause. However, as Kolmogorov complexity is not computable, we approximate the AIC from above through the statistically sound Minimum Description Length (MDL) principle. On univariate i.i.d. and non-i.i.d. pairs, where causal mechanisms are simple, we use refined MDL codes that are minimax optimal w.r.t. a model class. We resort to crude MDL codes on a pair of multivariate i.i.d. variables.
Although useful, saying that there exists a causal relationship from a set of variables towards a certain variable of interest does not always fully satisfy one’s curiosity; for a domain expert it is of particular interest to know those conditions that are most effective, such as the combinations of drugs and their dosages that are most effective towards recovery. Motivated by this problem, in the second part of this thesis, we consider discovering statistically reliable causal rules from observational data. Overall, extensive evaluations show that methods proposed in this thesis are highly accurate, and discover meaningful causations from real-world data.

Jan-Hendrik LANGE

Multicut Optimization Guarantees & Geometry of Lifted Multicuts
(Advisor: Prof. Björn Andres, now Dresden)
Friday, 03.07.2020, 16:30 h

Abstract:

The multicut problem, a graph partitioning model purely based on similarities and dissimilarities defined on the edges, has gained attention in recent years in areas such as network analysis and computer vision. Its key feature is that the number of components of the partition need not be specified, but is determined as part of the optimization process given the edge information. In this thesis talk we address two challenges associated with this NP-hard combinatorial optimization problem.
Firstly, the methods that are commonly employed to solve practical multicut problems are fast heuristics, due to the large size of the instances. While the provided solutions are of high quality empirically, they come without any (non-trivial) guarantees, neither in terms of the objective value nor in terms of the variable values. In the first part of the talk we develop efficient algorithms as well as theory that both explain and exploit the structural simplicity of practical instances in order to compute non-trivial guarantees. Our results include lower bounds and parts of optimal solutions that are obtained efficiently as well as theoretical insights into the tightness of linear programming relaxations.
Secondly, multicuts only encode the same-cluster relationship for pairs of nodes that are adjacent in the underlying graph. A generalization of the multicut problem that allows to model such inter-cluster connectivity for arbitrary pairs of nodes is known as the lifted multicut problem. In the second part of the talk, we study the (convex hull of the) feasible set, the lifted multicut polytope. We investigate under which conditions the fundamental inequalities associated with lifted multicuts define facets of the lifted multicut polytope. For the special case of trees, we further draw a connection to pseudo-Boolean optimization and give a tighter description of the lifted multicut polytope, which is exact when the tree is a simple path. For the moral lineage tracing problem, which is a closely related clustering formulation for joint segmentation and tracking of biological cells in images, we present a tighter description of the associated polytope and an improved branch-and-cut method.

Prabhav KALAGHATGI

Inferring phylogenetic trees under the general Markov model via a minimum spanning tree backbone
(Advisor: Prof. Thomas Lengauer)
Thursday, 02.07.2020, 10:00 h

Abstract:

Phylogenetic trees are models of the evolutionary history of species. Phylogenetic trees are typ-ically leaf- labeled with all species placed on leaves, a model that may not be appropriate for rapidly evolving pathogens that may share ancestor-descendant relationships. In this thesis we model evolutionary relationships using so-called generally labeled phylogenetic trees that allow sampled species to be at interior nodes of the tree. The widely adopted strategy to infer phylo-genetic trees involves computationally intensive optimization of continuous-time hidden Mar-kov models (HMM) that make unrealistic assumptions about gene evolution. We present a method called SEM-GM that performs phylogeny inference under a more realistic discrete-time HMM on trees known as the general Markov model. Additionally, we presented a minimum spanning tree (MST) framework called MST-backbone which improves the scalability of SEM-GM. MST-backbone (SEM-GM) scales linearly with the number of leaves. However, the loca-tion of the root as inferred under the GM model is not realistic on empirical data, suggesting that the GM model may be overtrained. MST-backbone was inspired by the topological rela-tionship between MSTs and phylogenetic trees. We discovered that the topological relationship does not necessarily hold if there is no unique MST. We proposed so-called vertex-order based MSTs (VMSTs) that guarantee a topological relationship with phylogenetic trees.

June

Andreas ABEL

Automatic Generation of Models of Microarchitecture
(Advisor: Prof. Jan Reineke)
Friday, 12.06.2020, 10:00 h

Abstract:

Detailed microarchitectural models are necessary to predict, explain, or optimize the perfor-mance of software running on modern microprocessors. Building such models often requires a significant manual effort, as the documentation provided by hardware manufacturers is typical-ly not precise enough. The goal of this thesis is to develop techniques for generating microar-chitectural models automatically.
In the first part, we focus on recent x86 microarchitectures. We implement a tool to accurately evaluate small microbenchmarks using hardware performance counters. We then describe tech-niques to automatically generate microbenchmarks for measuring the performance of individual instructions and for characterizing cache architectures. We apply our implementations to more than a dozen different microarchitectures. In the second part of the thesis, we study more general techniques to obtain models of hardware components. In particular, we propose the concept of gray-box learning, and we develop a learning algorithm for Mealy machines that exploits prior knowledge about the system to be learned. Finally, we show how this algorithm can be adapted to minimize incompletely specified Mealy machines—-a well-known NP-complete problem. Our implementation outperforms existing exact minimization techniques by several orders of magnitude on a number of hard benchmarks; it is even competitive with state-of-the-art heuristic approaches.

Lara SCHNEIDER

Multi-omics integrative analyses for decision support systems in personalized cancer treatment
(Advisor: Prof. Hans-Peter Lenhof)
Wednesday, 10.06.2020, 09:30 h

Abstract:

Cancer is a heterogeneous class of diseases caused by the complex interplay of (epi-)genetic aberrations and is difficult to treat due to its heterogeneity. In this thesis, we present a tool suite of novel methods and computational tools for the genetic and molecular characterization of tumors to support decision making in personalized cancer treatments. The first tool included in this tool suite is GeneTrail2, a web service for the statistical analysis of molecular signatures. While GeneTrail2 focuses on the evaluation of aberrant biological pathways and signal cascades, RegulatorTrail identifies those transcriptional regulators that appear to have a high impact on these pathogenic processes. With DrugTargetInspector (DTI), we focus specifically on personalized medicine and translational research. DTI offers comprehensive functions for the evaluation of target molecules, the associated signaling cascades, and the corresponding drugs. Finally, we present ClinOmicsTrailbc, an analysis tool for stratification of breast cancer treatments. ClinOmicsTrailbc supports clinicians with a comprehensive evaluation of on- and off-label drugs as well as immune therapies. In summary, this work presents novel methods and computational tools for the integrative analysis of multi-omics data for translational research and clinical decision support, assisting researchers and clinicians in finding the best possible treatment options in a deeply personalized way.

Yuliya BUTKOVA

Towards Efficient Analysis of Markov Automata
(Advisor: Prof. Holger Hermanns)
Thursday, 04.06.2020, 17:00 h

Abstract:

One of the most expressive formalisms to model concurrent systems is Markov automata. They serve as a semantics for many higher-level formalisms, such as generalised stochastic Petri nets and dynamic fault trees. Two of the most challenging problems for Markov automata to date are (i) the optimal time-bounded reachability probability and (ii) the optimal long-run average rewards. In this thesis, we aim at designing efficient sound techniques to analyse them. We approach the problem of time-bounded reachability from two different angles. First, we study the properties of the optimal solution and exploit this knowledge to construct an efficient algorithm that approximates the optimal values up to a guaranteed error bound. This algorithm is exhaustive, i. e. it computes values for each state of the Markov automaton. This may be a limitation for very large or even infinite Markov automata. To address this issue we design a second algorithm that approximates the optimal solution by only working with part of the total state-space. For the problem of long-run average rewards there exists a polynomial algorithm based on linear programming. Instead of chasing a better theoretical complexity bound we search for a practical solution based on an iterative approach. We design a value iteration algorithm that in our empirical evaluation turns out to scale several orders of magnitude better than the linear programming based approach.

Ankur SHARMA

Snapshot: Friend or Foe of Data Management — On Optimizing Transaction Processing in Database and Blockchain Systems
(Advisor: Prof. Jens Dittrich)
Wednesday, 03.06.2020, 14:00 h

Abstract:

Data management is a complicated task. Due to a wide range of data management tasks, businesses often need a sophisticated data management infrastructure with a plethora of distinct systems to fulfill their requirements. Moreover, since snapshot is an essential ingredient in solving many data management tasks such as checkpointing and recovery, they have been widely exploited in almost all major data management systems that have appeared in recent years. However, snapshots do not always guarantee exceptional performance. In this dissertation, we will see two different faces of the snapshot, one where it has a tremendous positive impact on the performance and usability of the system, and another where an incorrect usage of the snapshot might have a significant negative impact on the performance of the system. This dissertation consists of three loosely-coupled parts that represent three distinct projects that emerged during this doctoral research.
In the first part, we analyze the importance of utilizing snapshots in relational database systems. We identify the bottlenecks in state-of-the-art snapshotting algorithms, propose two snapshotting techniques, and optimize the multi-version concurrency control for handling hybrid workloads effectively. Our snapshotting algorithm is up to 100x faster and reduces the latency of analytical queries by up to 4x in comparison to the state-of-the-art techniques. In the second part, we recognize strict snapshotting used by Fabric as a critical bottleneck, and replace it with MVCC and propose some additional optimizations to improve the throughput of the permissioned-blockchain system by up to 12x under highly contended workloads. In the last part, we propose ChainifyDB, a platform that transforms an existing database infrastructure into a blockchain infrastructure. ChainifyDB achieves up to 6x higher throughput in comparison to another state-of-the-art permissioned blockchain system. Furthermore, its external concurrency control protocol outperforms the internal concurrency control protocol of PostgreSQL and MySQL, achieving up to 2.6x higher throughput in a blockchain setup in comparison to a standalone isolated setup. We also utilize snapshots in ChainifyDB to support recovery, which has been missing so far from the permissioned-blockchain world.

May

Alexander HEWER

Registration and Statistical Analysis of the Tongue Shape during Speech Production
(Advisor: Dr. Ingmar Steiner)
Wednesday, 27.05.2020, 16:15 h

Abstract:

The human tongue plays an important role in everyday life. In particular, it contributes to speech production by interacting with other articulators, like the palate and the teeth. Thus, it is of great interest to understand its shape and motion during this process. Magnetic resonance imaging (MRI) is nowadays regarded as the state-of-the-art modality for analyzing the vocal tract during speech production.
Results of such an analysis can be used to construct statistical tongue models that are able to synthesize realistic tongue motion and shape.
In my defense talk, I will present some highlights of my research that focused on deriving such models from static MRI data. For example, I will show how MRI scans can be denoised for further processing by using a diffusion based approach. We will also learn how image segmentation and template matching can help to estimate the tongue shape information from such data.
Finally, we will see how a statistical tongue model can be utilized to reconstruct the whole three-dimensional tongue shape from sparse motion capture data.

April

Vineet RAJANI

A type-theory for higher-order amortized analysis
(Advisor: Prof. Deepak Garg)
Wednesday, 15.04.2020, 14:00 h

Abstract:

Amortized analysis is a popular algorithmic technique used to analyze the resource consumption of a program. It basically works by using a ghost state called the /potential/ and showing that the available potential is sufficient to account for the resource consumption of the program.
In this thesis we present λ-amor, a type-theory for amortized resource analysis of higher-order functional programs. Verification in λ-amor is based on the use of a novel type theoretic construct to represent potential at the level of types and then building an affine type-theory around it. With λ-amor we show that, type-theoretic amortized analysis can be performed using well understood concepts from sub-structural and modal type theories. Yet, it yields an extremely expressive framework which can be used for resource analysis of higher-order programs, both in a strict and lazy setting. We show embeddings of two very different styles (one based on /effects/ and the other on /coeffects/) of type-theoretic resource analysis frameworks into λ-amor. We show that λ-amor is sound (using a logical relations model) and complete for cost analysis of PCF programs (using one of the embeddings).
Next, we apply ideas from λ-amor to develop another type theory (called λ-cg) for a very different domain — Information Flow Control (IFC). λ-cg uses a similar type-theoretic construct (which λ-amor uses for the potential) to represent confidentiality label (the ghost state for IFC).
Finally, we abstract away from the specific ghost states (potential and confidentiality label) and describe how to develop a type-theory for a general ghost state with a monoidal structure.

Andreas SCHMIDT

Cross-layer Latency-Aware and -Predictable Data Communication
(Advisor: Prof. Thorsten Herfet)
Tuesday, 07.04.2020, 14:00 h

Abstract:

Cyber-physical systems are making their way into more aspects of everyday life. These systems are increasingly distributed and hence require networked communication to coordinatively fulfil control tasks. Providing this in a robust and resilient manner demands for latency-awareness and -predictability at all layers of the communication and computation stack. This thesis addresses how these two latency-related properties can be implemented at the transport layer to serve control applications in ways that traditional approaches such as TCP or RTP cannot. Thereto, the Predictably Reliable Real-time Transport (PRRT) protocol is presented, including its unique features (e.g. partially reliable, ordered, in-time delivery, and latency-avoiding congestion control) and unconventional APIs. This protocol has been intensively evaluated using the X-Lap toolkit that has been specifically developed to support protocol designers in improving latency, timing, and energy characteristics of protocols in a cross-layer, intra-host fashion. PRRT effectively circumvents latency-inducing bufferbloat using X-Pace , an implementation of the cross-layer pacing approach presented in this thesis. This is shown using experimental evaluations on real Internet paths. Apart from PRRT, this thesis presents means to make TCP-based transport aware of individual link latencies and increases the predictability of the end-to-end delays using Transparent Transmission Segmentation.

February

Kathrin STARK

Mechanising Syntax with Binders in Coq
(Advisor: Prof. Gert Smolka)
Friday, 14.02.2020, 15:00 h, in building E1 7, room 0.01

Abstract:

Mechanising binders in general-purpose proof assistants such as Coq is cumbersome and difficult. Yet binders, substitutions, and instantiation of terms with substitutions are a critical ingredient of many programming languages. Any practicable mechanisation of the meta-theory of the latter hence requires a lean formalisation of the former. We investigate the topic from three angles:
First, we realise formal systems with binders based on both pure and scoped de Bruijn algebras together with basic syntactic rewriting lemmas and automation. We automate this process in a compiler called Autosubst; our final tool supports many-sorted, variadic, and modular syntax. Second, we justify our choice of realisation and mechanise a proof of convergence of the sigma-calculus, a calculus of explicit substitutions that is complete for equality of the de Bruijn algebra corresponding to the lambda-calculus.
Third, to demonstrate the practical usefulness of our approach, we provide concise, transparent, and accessible mechanised proofs for a variety of case studies refined to de Bruijn substitutions.

Abhimitra MEKA

Live Inverse Rendering
(Advisor: Prof. Christian Theobalt)
Monday, 03.02.2020, 13:45 h, in building E1 4, room 0.19

Abstract:

Seamlessly integrating graphics into real environments requires the estimation of the fundamental light transport components of a scene – geometry, reflectance and illumination. While the theory and practices for estimating environmental geometry and self-localization on mobile devices has progressed rapidly, the task of estimating scene reflectance and illumination from monocular images or videos in real-time (termed live inverse rendering) is still at a nascent stage. The challenge is that of designing efficient representations and models for these appearance parameters and solving the resulting high-dimensional, non-linear and under-constrained system of equations at frame rate.
This thesis talk will comprehensively explore various representations, formulations, algorithms and systems for addressing these challenges in monocular inverse rendering. Starting with simple assumptions on the light transport model – of Lambertian surface reflectance and single light bounce scenario – the talk will expand in several directions by including 3D geometry, multiple light bounces, non-Lambertian isotropic surface reflectance and data-driven reflectance representation to address various facets of this problem. In the first part, the talk will explore the design of fast parallel non-linear GPU optimization schemes for solving both sparse and dense set of equations underlying the inverse rendering problem. In the next part, the current advances in machine learning methods will be applied to design novel formulations and loss-energies to give a significant push to the state-of-the-art of reflectance and illumination estimation. Several real-time applications of illumination-aware scene editing, including relighting and material-cloning, will also be shown to be possible for first time by the new models proposed in this thesis.

January

Mathias FLEURY

Formalization of Logical Calculi in Isabelle/HOL
(Advisor: Prof. Christoph Weidenbach)
Tuesday, 28.01.2020, 10:00 h, in building E1 4, room 0.24

Abstract:

I develop a formal framework for propositional satifisfiability with the conflict-driven clause learning (CDCL) procedure using the Isabelle/HOL proof assistant. The framework offers a convenient way to prove metatheorems and experiment with variants, including the Davis-Putnam-Logemann-Loveland procedure. The most noteworthy aspects of my work are the inclusion of rules for forget, restart and the refinement approach.
Through a chain of refinements, I connect the abstract CDCL calculus first to a more concrete calculus, then to a SAT solver expressed in a simple functional programming language, and finally to a SAT solver in an imperative language, with total correctness guarantees. The imperative version relies on the two-watched-literal data structure and other optimizations found in modern solvers.

Maximilian JOHN

Of Keyboards and Beyond – Optimization in Human-Computer-Interaction
(Advisor: Dr. Andreas Karrenbauer)
Wednesday, 15.01.2020, 16:30 h, in building E1 4, room 0.24

Abstract:

This talk covers optimization challenges in the context of human-computer interaction, specifically in UI design. In particular, we focus on the design of the new French keyboard layout, discuss the special challenges of this national-scale project and our algorithmic contributions. Exploiting the special structure of this design problem, we propose an optimization framework that efficiently computes keyboard layouts and provides very good optimality guarantees in form of tight lower bounds. The optimized layout that we showed to be nearly optimal was the basis of the new French keyboard standard recently published in the National Assembly in Paris. Finally, we introduce a modeling language for mixed integer programs that especially focuses on the challenges and features that appear in participatory optimization problems similar to the French keyboard design process.

Patrick TRAMPERT

3D Exemplar-based Image Inpainting in Electron Microscopy
(Advisor: Prof. Philipp Slusallek)
Monday, 13.01.2020, 14:00 h, in building D3 4 (DFKI), VisCenter room

Abstract:

In electron microscopy (EM) a common problem is the non-availability of data, which causes artefacts in reconstructions. In this thesis the goal is to generate artificial data where missing in EM by using exemplar-based inpainting (EBI). We implement an accelerated 3D version tailored to applications in EM, which reduces reconstruction times from days to minutes.
We develop intelligent sampling strategies to find optimal data as input for reconstruction methods. Further, we investigate approaches to reduce electron dose and acquisition time. Sparse sampling followed by inpainting is the most promising approach. As common evaluation measures may lead to misinterpretation of results in EM and falsify a subsequent analysis, we propose to use application driven metrics and demonstrate this in a segmentation task.
A further application of our technique is the artificial generation of projections in tilt- based EM. EBI is used to generate missing projections, such that the full angular range is covered. Subsequent reconstructions are significantly enhanced in terms of resolution, which facilitates further analysis of samples.
In conclusion, EBI proves promising when used as an additional data generation step to tackle the non-availability of data in EM, which is evaluated in selected applications. Enhancing adaptive sampling methods and refining EBI, especially considering the mutual influence, promotes higher throughput in EM using less electron dose while not lessening quality.

2019

December

Martin WEIER

Perception-driven Rendering: Techniques for the Efficient Visualization of 3D Scenes including View- and Gaze-contingent Approaches
(Advisor: Prof. Philipp Slusallek)
Thursday, 19.12.2019, 15:00 h, in building D3 4 (DFKI), VisCenter room

Abstract:

Modern image synthesis approaches enable to create digital images of astonishing complexity and beauty, yet processing resources remain a limiting factor. At the same time, advances in display technologies drive the development of novel display devices, thereupon, further tightening the requirements of rendering approaches. Rendering efficiency is a central challenge involving a trade-off between visual fidelity and interactivity. However, the human visual system has some limitations and the highest possible visual quality is not always necessary. The knowledge of those limitations can be used to develop better and more efficient rendering systems, a field known as perception-driven rendering.  The central question in this thesis is how to exploit the limitations or use the potentials of perception to enhance the quality of different rendering techniques whilst maintaining their performance and vice versa. To this end, this thesis presents some state-of-the-art research and models that exploit the limitations of perception in order to increase visual quality but also to reduce workload alike. This research results in several practical rendering approaches that tackle some of the fundamental challenges of computer graphics. Here, different sampling, filtering, and reconstruction techniques aid the visual quality of the synthesized images. An in-depth evaluation of the presented systems including benchmarks, comparative examination with image metrics as well as user studies and experiments demonstrated that the methods introduced are visually superior or on the same qualitative level as ground truth, whilst having a significantly reduced run time.

Hyeongwoo KIM
Learning-based Face Reconstruction and Editing
(Advisor: Prof. Christian Theobalt)
Thursday, 12.12.2019, 16:15 h, in building E1 4, room 0.19

Abstract:

Photo-realistic face editing – an important basis for a wide range of applications in movie and game productions, and applications for mobile devices – is based on computationally expensive algorithms that often require many tedious time-consuming manual steps. This thesis advances state-of-the-art face performance capture and editing pipelines by proposing machine learning-based algorithms for high-quality inverse face rendering in real time and highly realistic neural face rendering, and a video-based refocusing method for faces and general videos. In particular, the proposed contributions address fundamental open challenges towards real-time and highly realistic face editing. The first contribution addresses face reconstruction and introduces a deep convolutional inverse rendering framework that jointly estimates all facial rendering parameters from a single image in real time. The proposed method is based on a novel boosting process that iteratively updates the synthetic training data to better reflect the distribution of real-world images. Second, the thesis introduces a method for face video editing at previously unseen quality. It is based on a generative neural network with a novel space-time architecture, which enables photo-realistic re-animation of portrait videos using an input video. It is the first method to transfer the full 3D head position, head rotation, face expression, eye gaze and eye blinking from a source actor to a portrait video of a target actor. Third, the thesis contributes a new refocusing approach for faces and general videos in postprocessing. The proposed algorithm is based on a new depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video and the focus distance for each frame. The high-quality results shown with various applications and challenging scenarios demonstrate the contributions presented in the thesis, and also show potential for machine learning-driven algorithms to solve various open problems in computer graphics.

Dhruv GUPTA
Search and Analytics Using Semantic Annotations
(Advisor: Dr. Klaus Berberich)
Thursday, 12.12.2019, 14:00 h, in building E1 4, room 0.24

Abstract:

Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user’s information need. To discern the user’s intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy.
This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections.
This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.

Martin BROMBERGER
Decision Procedures for Linear Arithmetic
(Advisors: Prof. Christoph Weidenbach and Dr. Thomas Sturm, now LORIA, Nancy)
Tuesday, 10.12.2019, 14:00 h, in building E1 5, room 002

Abstract:

In this thesis, we present new decision procedures for linear arithmetic in the context of SMT solvers and theorem provers:
1) CutSAT++, a calculus for linear integer arithmetic that combines techniques from SAT solving and quantifier elimination in order to be sound, terminating, and complete.
2) The largest cube test and the unit cube test, two sound (although incomplete) tests that find integer and mixed solutions in polynomial time. The tests are especially efficient on absolutely unbounded constraint systems, which are difficult to handle for many other decision procedures.
3) Techniques for the investigation of equalities implied by a constraint system. Moreover, we present several applications for these techniques.
4) The Double-Bounded reduction and the Mixed-Echelon-Hermite transformation, two transformations that reduce any constraint system in polynomial time to an equisatisfiable constraint system that is bounded. The transformations are beneficial because they turn branch-and-bound into a complete and efficient decision procedure for unbounded constraint systems.
We have implemented the above decision procedures (except for CutSAT++) as part of our linear arithmetic theory solver SPASS-IQ and as part of our CDCL(LA) solver SPASS-SATT. We also present various benchmark evaluations that confirm the practical efficiency of our new decision procedures.

Felipe Fernandes ALBRECHT
Analyzing Epigenomic Data in a Large-Scale Context
(Advisor: Prof. Thomas Lengauer)
Monday, 09.12.2019, 16:15 h, in building E1 4, room 0.23

Abstract:

While large amounts of epigenomic data are publicly available, their retrieval in a form suitable for downstream analysis is a bottleneck in current research. In a typical analysis, users are required to download huge files that span the entire genome, even if they are only interested in a small subset or an aggregation thereof. The DeepBlue Epigenomic Data Server mitigates this issue by providing a robust server that affords a powerful API for searching, filtering, transforming, aggregating, enriching, and downloading data from several epigenomic consortia. This work also presents companion tools that utilize the DeepBlue API to enable users not proficient in programming to analyze epigenomic data: (i) an R/Bioconductor package that integrates DeepBlue into the R analysis workflow; (ii) a web portal that enables users to search, select, filter and download the epigenomic data available in the DeepBlue Server; (iii) DIVE, a web data analysis tool that allows researchers to perform large-epigenomic data analysis in a programming-free environment. Furthermore, these tools are integrated, being capable of sharing results among themselves, creating a powerful large-scale epigenomic data analysis environment.

Hazem TORFAH
Model Counting for Reactive Systems
(Advisor: Prof. Bernd Finkbeiner)
Thursday, 05.12.2019, 15:00 h, in building E1 7, room 0.01

Abstract:

Model counting is the problem of computing the number of solutions for a logical formula. In the last few years, it has been primarily studied for propositional logic, and has been shown to be useful in many applications. In planning, for example, propositional model counting has been used to compute the robustness of a plan in an incomplete domain. In information-flow control, model counting has been applied to measure the amount of information leaked by a security-critical system.
In this thesis, we introduce the model counting problem for linear-time properties, and show its applications in formal verification. In the same way propositional model counting generalizes the satisfiability problem for propositional logic, counting models for linear-time properties generalizes the emptiness problem for languages over infinite words to one that asks for the number of words in a language. The model counting problem, thus, provides a foundation for quantitative extensions of model checking, where not only the existence of computations that violate the specification is determined, but also the number of such violations.
We solve the model counting problem for the prominent class of omega-regular properties. We present algorithms for solving the problem for different classes of properties, and show the advantages of our algorithms in comparison to indirect approaches based on encodings into propositional logic.
We further show how model counting can be used for solving a variety of quantitative problems in formal verification, including quantitative information-flow in security-critical systems, and the synthesis of approximate implementations for reactive systems.

Yang HE
Improved Methods and Analysis for Semantic Image Segmentation
(Advisor: Prof. Mario Fritz)
Tuesday, 03.12.2019, 15:00 h, in building E1 4, room 0.24

Abstract:

Modern deep learning has enabled amazing developments of computer vision in recent years. As a fundamental task, semantic segmentation aims to predict class labels for each pixel of images, which empowers machines perception of the visual world. In spite of recent successes of fully convolutional networks, several challenges remain to be addressed. In this work, we focus on this topic, under different kinds of input formats and various types of scenes. Specifically, our study contains two aspects:
(1) Data-driven neural modules for improved performance.
(2) Leverage of datasets w.r.t. training systems with higher performances and better data privacy guarantees.

Daniel GRÖGER
Digital Fabrication of Custom Interactive Objects with Rich Materials
(Advisor: Prof. Jürgen Steimle)
Tuesday, 03.12.2019, 14:00 h, in building E1 7, room 0.01

Abstract:

This thesis investigates digital fabrication as a key technology for prototyping physical user interfaces with custom interactivity. It contributes four novel design and fabrication approaches for interactive objects that leverage rich materials of everyday objects. The contributions enable easy, accessible, and versatile design and fabrication of interactive objects with custom stretchability, input and output on complex geometries and diverse materials, tactile output on 3D-object geometries, and capabilities of changing their shape and material properties.
Together, the contributions of this thesis advance the fields of digital fabrication, rapid prototyping, and ubiquitous computing towards the bigger goal of exploring interactive objects with rich materials as a new generation of physical interfaces.

Andreas SCHMID
Plane and Sample: Using Planar Subgraphs for Efficient Algorithms
(Advisor: Prof. Kurt Mehlhorn)
Monday, 02.12.2019, 11:00 h, in building E1 4, room 0.24

Abstract:

In my thesis, we showcased how planar subgraphs with special structural properties can be used to find efficient algorithms for two NP-hard problems in combinatorial optimization.
In this talk I will describe how we developed algorithms for the computation of Tutte paths and show how these special subgraphs can be used to efficiently compute long cycles and other relaxations of Hamiltonicity if we restrict the input to planar graphs.
In addition I will give an introduction to the Maximum Planar Subgraph Problem MPS and show how dense planar subgraphs can be used to develop new approximation algorithms for this NP-hard problem. All new algorithms and arguments I present are based on a novel approach that focuses on maximizing the number of triangular faces in the computed subgraph.

November

Kashyap POPAT
Credibility Analysis of Textual Claims with Explainable Evidence
(Advisor: Prof. Gerhard Weikum)
Tuesday, 26.11.2019, 16:00 h, in building E1 5, room 0.29

Abstract:

Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not.
To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources.
We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.

Tim RUFFING
Cryptography for bitcoin and friends
(Advisor: Prof. Aniket Kate, now Purdue)
Thursday, 21.11.2019, 16:00 h, in building E1 7, room 0.01

Abstract:

Numerous cryptographic extensions to Bitcoin have been proposed since Satoshi Nakamoto introduced the revolutionary design in 2008. However, only few proposals have been adopted in Bitcoin and other prevalent cryptocurrencies, whose resistance to fundamental changes has proven to grow with their success. In this dissertation, we introduce four cryptographic techniques that advance the functionality and privacy provided by Bitcoin and similar cryptocurrencies without requiring fundamental changes in their design: First, we realize smart contracts that disincentivize parties in distributed systems from making contradicting statements by penalizing such behavior by the loss of funds in a cryptocurrency. Second, we propose CoinShuffle++, a coin mixing protocol which improves the anonymity of cryptocurrency users by combining their transactions and thereby making it harder for observers to trace those transactions. The core of CoinShuffle++ is DiceMix, a novel and efficient protocol for broadcasting messages anonymously without the help of any trusted third-party anonymity proxies and in the presence of malicious participants. Third, we combine coin mixing with the existing idea to hide payment values in homomorphic commitments toobtain the ValueShuffle protocol, which enables us to overcome major obstacles to the practical deployment of coin mixing protocols. Fourth, we show how to prepare the aforementioned homomorphic commitments for a safe transition to post-quantum cryptography.

Giorgi MAISURADZE
Assessing the Security of Hardware-Assisted Isolation Techniques
(Advisor: Prof. Christian Rossow)
Thursday, 21.11.2019, 11:00 h, in building E9 1, room 0.01

Abstract:

Modern computer systems execute a multitude of different processes on the same hardware. To guarantee that these processes do not interfere with one another, either accidentally or maliciously, modern processors implement various isolation techniques, such as process, privilege, and memory isolation. Consequently, these isolation techniques constitute a fundamental requirement for a secure computer system.
This dissertation investigates the security guarantees of various isolation techniques that are used by modern computer systems. To this end, we present the isolation challenges from three different angles. First, we target fine-grained memory isolation that is used by code-reuse mitigation techniques to thwart attackers with arbitrary memory read vulnerabilities. We demonstrate that dynamically generated code can be used to undermine such memory isolation techniques and, ultimately, propose a way to improve the mitigation. Second, we study side effects of legitimate hardware features, such as speculative execution, and show how they can be leveraged to leak sensitive data across different security contexts, thereby breaking process and memory isolation techniques. Finally, we demonstrate that in the presence of a faulty hardware implementation, all isolation guarantees can be broken. We do this by using a novel microarchitectural issue—discovered by us—to leak arbitrary memory contents across all security contexts. 

Leander TENTRUP
Symbolic Reactive Synthesis
(Advisor: Prof. Bernd Finkbeiner)
Friday, 15.11.2019, 16:00 h, in building E1 7, room 0.01

Abstract:

In this thesis, we develop symbolic algorithms for the synthesis of reactive systems. Synthesis, that is the task of deriving correct-by-construction implementations from formal specifications, has the potential to eliminate the need for the manual—and error-prone—programming task. The synthesis problem can be formulated as an infinite two-player game, where the system player has the objective to satisfy the specification against all possible actions of the environment player. The standard synthesis algorithms represent the underlying synthesis game explicitly and, thus, they scale poorly with respect to the size of the specification.
We provide an algorithmic framework to solve the synthesis problem symbolically. In contrast to the standard approaches, we use a succinct representation of the synthesis game which leads to improved scalability in terms of the symbolically represented parameters. Our algorithm reduces the synthesis game to the satisfiability problem of quantified Boolean formulas (QBF) and dependency quantified Boolean formulas (DQBF). In the encodings, we use propositional quantification to succinctly represent different parts of the implementation, such as the state space and the transition function.
We develop highly optimized satisfiability algorithms for QBF and DQBF. Based on a counterexample-guided abstraction refinement (CEGAR) loop, our algorithms avoid an exponential blow-up by using the structure of the underlying symbolic encodings. Further, we extend the solving algorithms to extract certificates in the form of Boolean functions, from which we construct implementations for the synthesis problem. Our empirical evaluation shows that our symbolic approach significantly outperforms previous explicit synthesis algorithms with respect to scalability and solution quality.

Nora SPEICHER
Unsupervised multiple kernel learning approaches for integrating molecular cancer patient data
(Advisor: Prof. Nico Pfeifer, now Tübingen)
Monday, 11.11.2019, 14:00 h, in building E1 7, room 0.01

Abstract:

Cancer is the second leading cause of death worldwide. A characteristic of this disease is its complexity leading to a wide variety of genetic and molecular aberrations in the tumors. This heterogeneity necessitates personalized therapies for the patients. However, currently defined cancer subtypes used in clinical practice for treatment decision-making are based on relatively few selected markers and thus provide only a coarse classification of tumors. The increased availability in multi-omics data measured for cancer patients now offers the possibility of defining more informed cancer subtypes. Such a more fine-grained characterization of cancer subtypes harbors the potential of substantially expanding treatment options in personalized cancer therapy.
In this work, we identify comprehensive cancer subtypes using multidimensional data. For this purpose, we apply and extend unsupervised multiple kernel learning methods. Three challenges of unsupervised multiple kernel learning are addressed: robustness, applicability, and interpretability. First, we show that regularization of the multiple kernel graph embedding framework, which enables the implementation of dimensionality reduction techniques, can increase the stability of the resulting patient subgroups. This improvement is especially beneficial for data sets with a small number of samples. Second, we adapt the objective function of kernel principal component analysis to enable the application of multiple kernel learning in combination with this widely used dimensionality reduction technique. Third, we improve the interpretability of kernel learning procedures by performing feature clustering prior to integrating the data via multiple kernel learning. On the basis of these clusters, we derive a score indicating the impact of a feature cluster on a patient cluster, thereby facilitating further analysis of the cluster-specific biological properties. All three procedures are successfully tested on real-world cancer data. 

Gorav JINDAL
On Approximate Polynomial Identity Testing and Real Root Finding
(Advisor: Prof. Markus Bläser)
Monday, 11.11.2019, 10:00 h, in building E1 7, room 0.01

Abstract:

In this thesis we study the following three themes, which share a connection through the (arithmetic) circuit complexity of polynomials.
1. Rank of symbolic matrices: There exist easy randomized polynomial time algorithms for computing the commutative rank of symbolic matrices, finding deterministic polynomial time algorithms remains elusive. We first demonstrate a deterministic polynomial time approximation scheme (PTAS) for computing the commutative rank. Prior to this work, deterministic polynomial time algorithms were known only for computing a 1/2-approximation of the commutative rank. We give two distinct proofs that our algorithm is a PTAS. We also give a min-max characterization of commutative and non-commutative ranks of symbolic matrices.
2. Computation of real roots of real sparse polynomials: It is known that solving a system of polynomial equations reduces to solving a uni-variate polynomial equation. We describe a polynomial time algorithm for sparse polynomials, which computes approximations of all the real roots (even though it may also compute approximations of some complex roots). Moreover, we also show that the roots of integer trinomials are well-separated.
3. Complexity of symmetric polynomials: It is known that symmetric Boolean functions are easy to compute. In contrast, we show that the assumption of inequality of VP and VNP implies that there exist hard symmetric polynomials. To prove this result, we use an algebraic analogue of the classical Newton’s iteration.

Julian STEIL
Mobile Eye Tracking for Everyone
(Advisor: Prof. Andreas Bulling, now Stuttgart)
Friday, 08.11.2019, 12:00 h, in building E1 4, room 0.24

Abstract:

Mobile eye tracking has significant potential to overcome fundamental limitations of remote systems in term of mobility and practical usefulness. However, mobile eye tracking currently faces two fundamental challenges that prevent it from being practically usable and that, consequently, have to be addressed before mobile eye tracking can truly be used by everyone: Mobile eye tracking needs to be advanced and made fully functional in unconstrained environments, and it needs to be made socially acceptable. Nevertheless, solving technical challenges and social obstacles alone is not sufficient to make mobile eye tracking attractive for the masses. The key to success is the development of convincingly useful, innovative, and essential applications. This thesis provides solutions for all of these challenges and paves the way for the development of socially acceptable, privacy-aware, but highly functional mobile eye tracking devices and novel applications, so that mobile eye tracking can develop its full potential to become an everyday technology for everyone.

October

Quynh NGUYEN
Optimization Landscape of Deep Neural Networks
(Advisor: Prof. Matthias Hein, now Tübingen)
Friday, 18.10.2019, 13:00 h, in building E1 7, room 0.01

Abstract:

Deep learning has drawn increasing attention in artificial intelligence due to its superior empirical performance in various fields of applications, including computer vision, natural language processing and speech recognition. This raises important and fundamental questions on why these methods work so well in practice.
The goal of this thesis is to improve our theoretical understanding on the optimization landscape of deep neural nets. We show that under some over-parameterization conditions, in particular as the width of one of the hidden layers exceeds the number of training samples, every stationary point of the loss with some full-rank structure is a global minimum with zero training error. Moreover, every sub-level set of the loss is connected and unbounded. This suggests that by increasing over-parameterization, the loss function of deep neural nets can become much more favorable to local search algorithms than any other „wildly“ non-convex function. In an attempt to bring our results closer to practice, we come up with a new class of network architectures which provably have a nice optimization surface while empirically achieving good generalization performance on standard benchmark dataset. Lastly, we complement our results by identifying an intuitive function class which neural networks of arbitrary depth might not be able to learn if their hidden layers are not wide enough.

Scott KILPATRICK
Non-Reformist Reform for Haskell Modularity
(Advisor: Prof. Derek Dreyer)
Tuesday, 15.10.2019, 15:00 h, in building E1 5, room 0.29

Abstract:

In this thesis, I present Backpack, a new language for building separately-typecheckable packages on top of a weak module system like Haskell’s. The design of Backpack is the first to bring the rich world of type systems to the practical world of packages via mixin modules. It’s inspired by the MixML module calculus of Rossberg and Dreyer but by choosing practicality over expressivity Backpack both simplifies that semantics and supports a flexible notion of applicative instantiation. Moreover, this design is motivated less by foundational concerns and more by the practical concern of integration into Haskell. The result is a new approach to writing modular software at the scale of packages.

Yusra IBRAHIM
Understanding Quantities in Web Tables and Text
(Advisor: Prof. Gerhard Weikum)
Tuesday, 08.10.2019, 15:00 h, in building E1 4, room 024

Abstract:

There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text.
In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures–for example “48 km/h” is equivalent to “30 mph”. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales.
In this thesis, we target these challenges through the following contributions:
We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources.
We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base.
We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines.
We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities.
We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.

September

Cornelius BRAND
Paths and Walks, Forests and Planes – Arcadian Algorithms and Complexity
(Advisor: Prof. Holger Dell, now ITU Copenhagen)
Thursday, 26.09.2019, 10:00 h, in building E1 7, room 001

Abstract:

We present new results on parameterized algorithms for hard graph problems. We generalize and improve established methods (i.a. Color-Coding and representative families) by an algebraic formulation of the problems in the language of exterior algebra.Our method gives (1) a faster algorithm to estimate the number of simple paths of given length in directed graphs, (2) faster deterministic algorithms for finding such paths if the input graph contains only few of them, and (3) faster deterministic algorithms to find spanning trees with few leaves. We also consider the algebraic foundations of our new method in exterior and commutative algebra.
Additionally, we investigate the fine-grained complexity of determining the number of forests with a given number of edges in a given undirected graph.
We (1) complete the complexity classification of the Tutte plane under the exponential time hypothesis, and (2) prove that counting forests with a given number of edges is at least as hard as counting cliques of a given size.

Milivoj SIMEONOVSKI
Accountable Infrastructure and Its Impact on Internet Security and Privacy
(Advisor: Prof. Michael Backes)
Tuesday, 17.09.2019, 14:00 h, in building E9 1, room 001

Abstract:

The Internet infrastructure relies on the correct functioning of the basic underlying protocols, which were designed for functionality. Security and privacy have been added post hoc, mostly by applying cryptographic means to different layers of communication. In the absence of accountability, as a fundamental property, the Internet infrastructure does not have a built-in ability to associate an action with the responsible entity, neither to detect or prevent misbehavior.
In this thesis, we study accountability from a few different perspectives. First, we study the need of having accountability in anonymous communication networks as a mechanism that provides repudiation for the proxy nodes by tracing back selected outbound traffic in a provable manner. Second, we design a framework that provides a foundation to support the enforcement of the right to be forgotten law in a scalable and automated manner. The framework provides a technical mean for the users to prove their eligibility for content removal from the search results. Third, we analyze the Internet infrastructure determining potential security risks and threats imposed by dependencies among the entities on the Internet. Finally, we evaluate the feasibility of using hop count filtering as a mechanism for mitigating Distributed Reflective Denial-of-Service attacks, and conceptually show that it cannot work to prevent these attacks.

Davis ISSAC
On some covering, partition, and connectivity problems in graphs
(Advisor: Dr. Andreas Karrenbauer)
Monday, 09.09.2019, 14:00 h, in building E1 4, room 0.24

Abstract:

We look at some graph problems related to covering, partition, and connectivity. First, we study the problems of covering and partitioning edges with bicliques, especially from the viewpoint of parameterized complexity. For the partition problem, we develop much more efficient algorithms than the ones previously known. In contrast, for the cover problem, our lower bounds show that the known algorithms are probably optimal. Next, we move on to graph coloring, which is probably the most extensively studied partition problem in graphs. Hadwiger’s conjecture is a long-standing open problem related to vertex coloring. We prove the conjecture for a special class of graphs, namely squares of 2-trees, and show that square graphs are important in connection with Hadwiger’s conjecture. Then, we study a coloring problem that has been emerging recently, called rainbow coloring. This problem lies in the intersection of coloring and connectivity. We study different variants of rainbow coloring and present bounds and complexity results on them. Finally, we move on to another parameter related to connectivity called spanning tree congestion (STC). We give tight bounds for STC in general graphs and random graphs. While proving the results on STC, we also make some contributions to the related area of connected partitioning.

August

Florian SCHMIDT
Applications, challenges and new perspectives on the analysis of transcriptional regulation using epigenomic and transcriptomic data
(Advisor: Dr. Marcel Schulz, now Frankfurt)
Monday, 26.08.2019, 10:00 h, in building E1 4, room 0.24

Abstract:

The integrative analysis of epigenomics and transcriptomics data is an active research field in Bioinformatics. New methods are required to interpret and process large omics data sets, as generated within consortia such as the International Human Epigenomics Consortium. In this thesis, we present several approaches illustrating how combined epigenomics and transcriptomics datasets, e.g. for differential or time series analysis, can be used to derive new biological insights on transcriptional regulation. In my work, I focused on regulatory proteins called transcription factors (TFs), which are essential for orchestrating cellular processes.
I present novel approaches, which combine epigenomics data, such as DNaseI-seq, predicted TF binding scores and gene-expression measurements in interpretable machine learning models. In joint work with our collaborators within and outside IHEC, we have shown that our methods lead to biological meaningful results, which could be validated with wet-lab experiments.
Aside from providing the community with new tools to perform integrative analysis of epigenomics and transcriptomics data, we have studied the characteristics of chromatin accessibility data and its relation to gene-expression in detail to better understand the implications of both computational processing and of different experimental methods on data interpretation.

July

Marco VOIGT
Decidable Fragments of First-Order Logic and of First-Order Linear Arithmetic with Uninterpreted Predicates
(Advisor: Prof. Christoph Weidenbach)
Wednesday, 31.07.2019, 14:00 h, in building E1 4, room 0.24

Abstract:

First-order logic (FOL) is one of the most prominent formalisms in computer science. Unfortunately, FOL satisfiability is not solvable algorithmically in full generality. The classical decision problem is the quest for a delineation between the decidable and the undecidable parts of FOL. This talk aims to shed new light on the boundary. To this end, recently discovered structure on the decidable side of the first-order landscape shall be discussed.
The primary focus will be on the syntactic concept of separateness of variables and its applicability to the classical decision problem and beyond. Two disjoint sets of first-order variables are separated in a given formula if each atom in that formula contains variables from at most one of the two sets. This simple notion facilitates the definition of decidable extensions of many well-known decidable FOL fragments. Although the extensions exhibit the same expressive power as the respective originals, certain logical properties can be expressed much more succinctly. In at least two cases the succinctness gap cannot be bounded using any elementary function.
The secondary focus will be on linear first-order arithmetic over the rationals with uninterpreted predicates. Novel decidable fragments of this logical language shall be discussed briefly.

Marc ROTH
Counting Problems on Quantum Graphs
(Advisor: Dr. Holger Dell)
Monday, 15.07.2019, 12:00 h, in building E1 7, room 001

Abstract:

Quantum graphs, as defined by Lovász in the late 60s, are formal linear combinations of simple graphs with finite support. They allow for the complexity analysis of the problem of computing finite linear combinations of homomorphism counts, the latter of which constitute the foundation of the structural hardness theory for parameterized counting problems: The framework of parameterized counting complexity was introduced by Flum and Grohe, and McCartin in 2002 and forms a hybrid between the classical field of computational counting as founded by Valiant in the late 70s and the paradigm of parameterized complexity theory due to Downey and Fellows which originated in the early 90s.
The problem of computing homomorphism numbers of quantum graphs subsumes general motif counting problems and the complexity theoretic implications have only turned out recently in a breakthrough regarding the parameterized subgraph counting problem by Curticapean, Dell and Marx in 2017.
We study the problems of counting partially injective and edge-injective homomorphisms, counting induced subgraphs, as well as counting answers to existential first-order queries. We establish novel combinatorial, algebraic and even topological properties of quantum graphs that allow us to provide exhaustive parameterized and exact complexity classifications, including necessary, sufficient and mostly explicit tractability criteria, for all of the previous problems.

Steven SCHÄFER
Engineering Formal Systems in Constructive Type Theory
(Advisor: Prof. Gert Smolka)
Friday, 12.07.2019, 14:15 h, in building E1 7, room 001

Abstract:

This thesis presents a practical methodology for formalizing the meta-theory of formal systems with binders and coinductive relations in constructive type theory. While constructive type theory offers support for reasoning about formal systems built out of inductive definitions, support for syntax with binders and coinductive relations is lacking. We provide this support. We implement syntax with binders using well-scoped de Bruijn terms and parallel substitutions. We solve substitution lemmas automatically using the rewriting theory of the σ-calculus. We present the Autosubst library to automate our approach in the proof assistant Coq. Our approach to coinductive relations is based on an inductive tower construction, which is a type-theoretic form of transfinite induction. The tower construction allows us to reduce coinduction to induction. This leads to a symmetric treatment of induction and coinduction and allows us to give a novel construction of the companion of a monotone function on a complete lattice. We demonstrate our methods with a series of case studies. In particular, we present a proof of type preservation for CCω, a proof of weak and strong normalization for System F, a proof that systems of weakly guarded equations have unique solutions in CCS, and a compiler verification for a compiler from a non-deterministic language into a deterministic language. All technical results in the thesis are formalized in Coq.

Jonas KAISER
Formal Verification of the Equivalence of System F and the Pure Type System L2
(Advisor: Prof. Gert Smolka)
Thursday, 11.07.2019, 16:15 h, in building E1 7, room 001

Abstract:

We develop a formal proof of the equivalence of two different variants of System F. The first is close to the original presentation where expressions are separated into distinct syntactic classes of types and terms. The second, L2, is a particular pure type system (PTS) where the notions of types and terms, and the associated expressions are unified in a single syntactic class. The employed notion of equivalence is a bidirectional reduction of the respective typing relations. A machine-verified proof of this result turns out to be surprisingly intricate, since the two variants noticeably differ in their expression languages, their type systems and the binding of local variables.
Most of this work is executed in the Coq theorem prover and encompasses a general development of the PTS metatheory, an equivalence result for a stratified and a PTS variant of the simply typed lambda-calculus as well as the subsequent extension to the full equivalence result for System F. We utilise nameless de Bruijn syntax with parallel substitutions for the representation of variable binding and develop an extended notion of context morphism lemmas as a structured proof method for this setting.
We also provide two developments of the equivalence result in the proof systems Abella and Beluga, where we rely on higher-order abstract syntax (HOAS). This allows us to compare the three proof systems, as well as HOAS and de Bruijn for the purpose of developing formal metatheory.

Marco SPEICHER
Measuring User Experience for Virtual Reality
(Advisor: Prof. Antonio Krüger)
Wednesday, 10.07.2019, 13:00 h, in building D3 2,  Reuse seminar room (-2.17)

Abstract:

In recent years, Virtual Reality (VR) and 3D User Interfaces (3DUI) have seen a drastic increase in popularity, especially in terms of consumer-ready hardware and software. These technologies have the potential to create new experiences that combine the advantages of reality and virtuality. While the technology for input as well as output devices is market ready, only a few solutions for everyday VR – online shopping, games, or movies – exist, and empirical knowledge about performance and user preferences is lacking. All this makes the development and design of human-centered user interfaces for VR a great challenge.
This thesis investigates the evaluation and design of interactive VR experiences. We introduce the Virtual Reality User Experience (VRUX) model based on VR-specific external factors and evaluation metrics such as task performance and user preference. Based on our novel UX evaluation approach, we contribute by exploring the following directions: shopping in virtual environments, as well as text entry and menu control in the context of everyday VR. Along with this, we summarize our findings by design spaces and guidelines for choosing optimal interfaces and controls in VR.

Sanjar KARAEV
Matrix Factorization over Dioids and its Applications in Data Mining
(Advisor: Prof. Pauli Miettinen, now University of Eastern Finland)
Wednesday, 10.07.2019, 10:30 h, in building E1 5, room 0.29

Abstract:

While classic matrix factorization methods, such as NMF and SVD, are known to be highly effective at finding latent patterns in data, they are limited by the underlying algebraic structure. In particular, it is often difficult to distinguish heavily overlapping patterns because they interfere with each other. To deal with this problem, we study matrix factorization over algebraic structures known as dioids, that are characterized by the idempotency of addition (a + a = a). Idempotency ensures that at every data point only a single pattern contributes, which makes it easier to distinguish them. In this thesis, we consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra).
The Boolean algebra is, perhaps, the most well known dioid, and there exist Boolean matrix factorization algorithms that produce low reconstruction errors. In this thesis, however, a different objective function is used — the description length of the data, which enables us to obtain more compact and highly interpretable results.
The tropical and subtropical algebras are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating most prominent features of the data.

Hosnieh SATTAR
Intents and Preferences Prediction Based on Implicit Human Cues
(Advisor: Dr. Mario Fritz)
Tuesday, 02.07.2019, 13:00 h, in building E1 4, room 0.22

Abstract:

Understanding implicit human cues is a key factor to build advanced human-machine interfaces. The modern interfaces should be able to understand and predict human intents and assist users from implicit data rather waiting for explicit input. Eye movement, electroencephalogram (EEG), electrocardiogram (ECG), body shape, are several sources of implicit human cues. In this thesis, we focused on eye data and body shape.
The eye is known as a window to the mind and could reflect allot about the ongoing process in mind. Thus, in the first part of the thesis, we focus on advancing techniques for search target prediction from human gaze data. We propose a series of techniques for search target inference algorithm from human fixation data recorded during the visual search that broaden the scope of search target prediction to categorical classes, such as object categories and attributes.
Moreover, body shape could reveal a lot of information about the people choice of outfit, food, sport, or living style. Hence, in the second part of this thesis, we studied how body shape could influences people outfits’ preferences. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded.

June

Sarvesh NIKUMBH
Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D
(Advisor: Prof. Nico Pfeifer, now Tübingen)
Monday, 24.06.2019, 16:00 h, in building E1 4, room 0.24

Abstract:

With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multi-pronged regulatory mechanisms of the genome. In my colloquium, I will present interpretable machine learning-based approaches for prediction and analysis of long-range chromatin interactions and how they can help gain novel, fine-grained insights into genome regulatory mechanisms. Specifically, I will first present our string kernels-based support vector classification method for prediction of long-range chromatin interactions. This method was one of the first bioinformatic approaches to suggest a potential general role of short tandem repeat sequences in long-range genome regulation and its 3D organization. I will then present CoMIK, a method we developed for handling variable-length DNA sequences in classification scenarios. CoMIK can not only identify the features important for classification but also locate them within the individual sequences, thus enabling the possibility of more fine-grained insights.

Thomas LEIMKÜHLER
Artificial Intelligence for Efficient Image-based View Synthesis
(Advisor: Prof. Hans-Peter Seidel and Prof. Tobias Ritschel, now University College London)
Monday, 24.06.2019, 9:00 h, in building E1 4, room 0.19

Abstract:

Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or   multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. We demonstrate how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion.

Azim Dehghani Amirabad
From genes to transcripts: Integrative modeling and analysis of regularity networks
(Advisor: Dr. Marcel Schulz, now Frankfurt)
Tuesday, 11.06.2019, 10:00 h, in building E1 7, room 002

Abstract:

Although all the cells in an organism posses the same genome, the regulatory mechanisms lead to highly specic cell types. Elucidating these regulatory mechanisms is a great challenge in systems biology research. Nonetheless, it is known that a large fraction of our genome is comprised of regulatory elements, the precise mechanisms by which different combinations of regulatory elements are involved in controlling gene expression and cell identity are poorly understood. This thesis describes algorithms and approaches for modeling and analysis of different modes of gene regulation. We present POSTIT a novel algorithm for modeling and inferring transcript isoform regulation from transcriptomics and epigenomics data. POSTIT uses multi-task learning with structured-sparsity inducing regularizer to share the regulatory information between isoforms of a gene, which is shown to lead to accurate isoform expression prediction and inference of regulators.
We developed a novel statistical method, called SPONGE, for large-scale inference of ceRNA networks. In this framework, we designed an efficient empirical p-value computation approach, by sampling from derived null models, which addresses important confounding factors such as sample size, number of involved regulators and strength of correlation.
Finally, we present an integrative analysis of miRNA and protein-based post-transcriptional regulation. We postulate a competitive regulation of the RNA-binding protein IMP2 with miRNAs binding the same RNAs using expression and RNA binding data. This function of IMP2 is relevant in the contribution to disease in the context of adult cellular metabolism.

May

Nadia ROBERTINI
Model-based Human Performance Capture in Outdoor Scenes
(Advisor: Prof. Christian Theobalt)
Tuesday, 21.05.2019, 10:00 h, in building E1 4, room 019

Abstract:

Technologies for motion and performance capture of real actors have enabled the creation of realistic-looking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios.
Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and fine-scale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.

Peter FAYMONVILLE
Monitoring with Parameters
(Advisor: Prof. Bernd Finkbeiner)
Friday, 17.05.2019, 14:15 h, in building E1 7, room 0.01

Abstract:

Runtime monitoring of embedded systems is a method to safeguard their reliable operation by detecting runtime failures within the system and recognizing unexpected environment behavior during system development and operation. Specification languages for runtime monitoring aim to provide the succinct, understandable, and formal specification of system and component properties.
Starting from temporal logics, this thesis extends the expressivity of specification languages for runtime monitoring in three key aspects while maintaining the predictability of resource usage. First, we provide monitoring algorithms for linear-time temporal logic with parameters (PLTL), where the parameters bound the number of steps until an eventuality is satisfied. Second, we introduce Lola 2.0, which adds data parameterization to the stream specification language Lola. Third, we integrate real-time specifications in RTLola and add real-time sliding windows, which aggregate data over real-time intervals. For the combination of these extensions, we present a design-time specification analysis which provides resource guarantees.
We report on a case study on the application of the language in an autonomous UAS. Component and system properties were specified together with domain experts in the developed stream specification language and evaluated in a real-time hardware-in-the-loop testbed with a complex environment simulation.

April

Matthias DÖRING
Computational Approaches for Improving Treatment and Prevention of Viral Infections
(Advisor: Prof. Nico Pfeifer, now Tübingen)
Tuesday, 30.04.2019, 14:00 h, in building E2 1, room 0.01

Abstract:

The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments.
geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.

Charalampos KYRIAKOPOULOS
Stochastic Modeling of DNA Demethylation Dynamics in ESCs
(Advisor: Prof. Verena Wolf)
Tuesday, 30.04.2019, 11:00 h, in building E1 5, room 0.29

Abstract:

DNA methylation and demethylation are opposing processes that when in balance create stable patterns of epigenetic memory. The control of DNA methylation pattern formation in replication dependent and independent demethylation processes has been suggested to be influenced by Tet mediated oxidation of a methylated cytosine, 5mC, to a hydroxylated cytosine, 5hmC. Based only on in vitro experiments, several alternative mechanisms have been proposed on how 5hmC influences replication dependent maintenance of DNA methylation and replication independent processes of active demethylation. In this thesis we design an extended and easily generalizable hidden Markov model that uses as input hairpin (oxidative-)bisulfite sequencing data to precisely determine the over time dynamics of 5mC and 5hmC, as well as to infer the activities of the involved enzymes at a single CpG resolution. Developing the appropriate statistical and computational tools, we apply the model to discrete high-depth sequenced genomic loci, and on a whole genome scale with a much smaller sequencing depth. Performing the analysis of the model’s output on mESCs data, we show that the presence of Tet enzymes and 5hmC has a very strong impact on replication dependent demethylation by establishing a passive demethylation mechanism, implicitly impairing methylation maintenance, but also down-regulating the de novo methylation activity.

Tomas BASTYS
Analysis of the Protein-Ligand and Protein-Peptide Interactions Using a Combined Sequence- and Structure-Based Approach
(Advisor: Prof. Olga Kalinina)
Wednesday, 17.04.2019, 15:00 h, in building E1 4, room 0.24

Abstract:

Proteins participate in most of the important processes in cells, and their ability to perform their function ultimately depends on their three-dimensional structure. They usually act in these processes through interactions with other molecules. Because of the importance of their role, proteins are also the common target for small molecule drugs that inhibit their activity, which may include targeting protein interactions. Understanding protein interactions and how they are affected by mutations is thus crucial for combating drug resistance and aiding drug design.
This dissertation combines bioinformatics studies of protein interactions at both primary sequence and structural level. We analyse protein-protein interactions through linear motifs, as well as protein-small molecule interactions, and study how mutations affect them. This is done in the context of two systems. In the first study of drug resistance mutations in the protease of the human immunodeficiency virus type 1, we successfully apply molecular dynamics simulations to estimate the effects of known resistance-associated mutations on the free binding energy, also revealing molecular mechanisms of resistance. In the second study, we analyse consensus profiles of linear motifs that mediate the recognition by the mitogen-activated protein kinases of their target proteins. We thus gain insights into the cellular processes these proteins are involved in.

 Abdalghani ABUJABAL
Question Answering over Knowledge Bases with Continuous Learning
(Advisor: Prof. Gerhard Weikum)
Friday, 12.04.2019, 11:00 h, in building E1 4, room 0.24

Abstract:

Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, two research issues arise:
(i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, and
(ii) how to design and curate datasets to advance research in question answering. In this dissertation, we make the following contributions in the area of question answering (QA). For issue (i), we present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. We also introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems.

Evgeny LEVINKOV
Generalizations of the Multicut Problem for Computer Vision
(Advisor: Prof. Bernt Schiele)
Monday, 08.04.2019, 11:30 h, in building E1 4, room 0.24

Abstract:

Graph decompositions have always been an essential part of computer vision as a large variety of tasks can be formulated in this framework. One way to optimize for a decomposition of a graph is called the multicut problem, also known as correlation clustering. Its particular feature is that the number of clusters is not required to be known a priori, but is rather deduced from the optimal solution. On the downside, it is NP-hard to solve.
In this thesis we present several generalizations of the multicut problem, that allow to model richer dependencies between nodes of a graph. We also propose efficient local search algorithms, that in practice find good solution in reasonable time, and show applications to image segmentation, multi-human pose estimation, multi-object tracking and many others.

Sebastian HAHN
On Static Execution-Time Analysis – Compositionality, Pipeline Abstraction, and Predictable Hardware
(Advisor: Prof. Reinhard Wilhelm)
Thursday, 04.04.2019, 16:00 h, in building E1 1, room 407

Abstract:

Proving timeliness is an integral part of the verification of safety-critical real-time systems. To this end, timing analysis computes upper bounds on the execution times of programs that execute on a given hardware platform.
At least since multi-core processors have emerged, timing analysis separates concerns by analysing different aspects of the system’s timing behaviour individually. In this work, we validate the underlying assumption that a timing bound can be soundly composed from individual contributions. We show that even simple processors exhibit counter-intuitive behaviour – a locally slow execution can lead to an even slower overall execution – that impedes the soundness of the composition. We present the compositional base bound analysis that accounts for any such amplifying effects within its timing contribution. This enables a sound although expensive compositional analysis even for complex contemporary processors.
In addition, we discuss future hardware designs that enable efficient compositional analyses. We introduce a hardware design, the strictly in-order pipeline, that behaves monotonically with respect to the progress of a program’s execution. Monotonicity enables us to formally reason about properties such as compositionality.

Joanna (Asia) BIEGA
Enhancing Privacy and Fairness in Search Systems
(Advisor: Prof. Gerhard Weikum)
Monday, 01.04.2019, 15:15 h, in building E1 4, room 024

Abstract:

Search systems have a substantial potential to harm individuals and the society. First, since they collect vast amounts of data about their users, they have the potential to violate user privacy. Second, in applications where rankings influence people’s economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results.
This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects:
* We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility.
* We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity.
* We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user’s relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions.
* Last but not least, we propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles.

Yecheng GU
Intelligent Tutoring in Virtual Reality for Highly Dynamic Pedestrian Safety Training
(Advisor: Prof. Jörg Siekmann)
Monday, 01.04.2019, 14:00 h, in building E1 7, room 0.01

Abstract:

This thesis presents the design, implementation, and evaluation of an Intelligent Tutoring System (ITS) with a Virtual Reality (VR) interface for child pedestrian safety training. This system enables children to train practical skills in a safe and realistic virtual environment without the time and space dependencies of traditional roadside training. This system also employs Domain and Student Modelling techniques to analyze user data during training automatically and to provide appropriate instructions and feedback.
Thus, the traditional requirement of constant monitoring from teaching personnel is greatly reduced. Compared to previous work, especially the second aspect is a principal novelty for this domain. To achieve this, a novel Domain and Student Modeling method was developed in addition to a modular and extensible virtual environment for the target domain. While the Domain and Student Modeling framework is designed to handle the highly dynamic nature of training in traffic and the ill-defined characteristics of pedestrian tasks, the modular virtual environment supports different interaction methods and a simple and efficient way to create and adapt exercises. The thesis is complemented by two user studies with elementary school children. These studies testify great overall user acceptance and the system’s potential for improving key pedestrian skills through autonomous learning. Last but not least, the thesis presents experiments with different forms of VR input and provides directions for future work.

March

Praveen MANOHARAN
Novel Approaches to Anonymity and Privacy in Decentralized, Open Settings
(Advisor: Prof. Michael Backes)
Friday, 29.03.2019, 11:30 h, in building E9 1, room 0.01

Abstract:

The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with.
In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting.
Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks.
At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.

Peter EBERT
What we leave behind: reproducibility in chromatin analysis within and across species
(Advisor: Prof. Thomas Lengauer)
Thursday, 14.03.2019, 11:30 h, in building E2 1, room 007

Abstract:

Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. In my colloquium, I am going to present approaches dealing with three challenging problems in computational epigenomics: first, we designed a solution strategy for improving reproducibility of computational analysis as part of a large national epigenetics research consortium. Second, we developed a software tool for the comparative epigenome analysis of cell types within a species. For this project, we relied on a well-established and compact representation of epigenetic information known as chromatin states, which enabled us to realize fast and interpretable comparative chromatin analysis even for small sample numbers. Third, we extended comparative epigenomics beyond canonical model organisms to also include species for which little to no epigenetic data is available. In a proof of concept study, we implemented a computational pipeline for predicting biologically relevant information, e.g., cell-type specific gene expression profiles, in non-model organisms based only on reference epigenomes from well-studied species such as human or mouse.

February

Viktor ERDÉLYI
Scalable positioning of commodity mobile devices using audio signals
(Advisor: Prof. Peter Druschel)
Thursday, 14.02.2019, 9:00 h, in building E1 5, room 0.29

Abstract:

This thesis explores the problem of computing a position map for co-located mobile devices. The positioning should happen in a scalable manner without requiring specialized hardware and without requiring specialized infrastructure (except basic Wi-Fi or cellular access). At events like meetings, talks, or conferences, a position map can aid spontaneous communication among users based on their relative position in two ways. First, it enables users to choose message recipients based on their relative position, which also enables the position-based distribution of documents. Second, it enables senders to attach their position to messages, which can facilitate interaction between speaker and audience in a lecture hall and enables the collection of feedback based on users‘ location.
In this thesis, we present Sonoloc, a mobile app and system that, by relying on acoustic signals, allows a set of commodity smart devices to determine their _relative_ positions. Sonoloc can position any number of devices within acoustic range with a constant number of acoustic signals emitted by a subset of devices. Our experimental evaluation with up to 115 devices in real rooms shows that — despite substantial background noise — the system can locate devices with an accuracy of tens of centimeters using no more than 15 acoustic signals.

Anjo VAHLDIEK-OBERWAGNER
Techniques to Protect Confidentiality and Integrity of Persistent and In-Memory Data
(Advisor: Prof. Peter Druschel)
Tuesday, 05.02.2019, 17:30 h, in building E1 5, room 0.29

Abstract:

Today computers store and analyze valuable and sensitive data. As a result we need to protect this data against confidentiality and integrity violations that can result in the illicit release, loss, or modification of a user’s and an organization’s sensitive data such as personal media content or client records. Existing techniques protecting confidentiality and integrity lack either efficiency or are vulnerable to malicious attacks. In this thesis we suggest techniques, Guardat and ERIM, to efficiently and robustly protect persistent and in-memory data.
To protect the confidentiality and integrity of persistent data, clients specify per-file policies to Guardat declaratively, concisely and separately from code. Guardat enforces policies by mediating I/O in the storage layer. In contrast to prior techniques, we protect against accidental or malicious circumvention of higher software layers. We present the design and prototype implementation, and demonstrate that Guardat efficiently enforces example policies in a web server.
To protect the confidentiality and integrity of in-memory data, ERIM isolates sensitive data using Intel Memory Protection Keys (MPK), a recent x86 extension to partition the address space. However, MPK does not protect against malicious attacks by itself. We prevent malicious attacks by combining MPK with call gates to trusted entry points and ahead-of-time binary inspection. In contrast to existing techniques, ERIM efficiently protects frequently-used session keys of web servers, an in-memory reference monitor’s private state, and managed runtimes from native libraries. These use cases result in high switch rates of the order of 10 5 –10 6 switches/s. Our experiments demonstrate less then 1% runtime overhead per 100,000 switches/s, thus outperforming existing techniques.

Muhammad Bilal ZAFAR
Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms
(Advisor: Prof. Krishna Gummadi)
Monday, 04.02.2019, 18:00 h, in building E1 5, room 0.29

Abstract:

The rise of algorithmic decision making in a variety of applications has also raised concerns about its potential for discrimination against certain social groups. However, incorporating nondiscrimination goals into the design of algorithmic decision making systems (or, classifiers) has proven to be quite challenging. These challenges arise mainly due to the computational complexities involved in the process, and the inadequacy of existing measures to computationally capture discrimination in various situations. The goal of this thesis is to tackle these problems. First, with the aim of incorporating existing measures of discrimination (namely, disparate treatment and disparate impact) into the design of well-known classifiers, we introduce a mechanism of decision boundary covariance, that can be included in the formulation of any convex boundary-based classifier in the form of convex constraints. Second, we propose alternative measures of discrimination. Our first proposed measure, disparate mistreatment, is useful in situations when unbiased ground truth training data is available. The other two measures, preferred treatment and preferred impact, are useful in situations when feature and class distributions of different social groups are significantly different, and can additionally help reduce the cost of nondiscrimination (as compared to the existing measures). We also design mechanisms to incorporate these new measures into the design of convex boundary-based classifiers.

 

January

Christian LANDER
Ubiquitous Head-Mounted Gaze Tracking
(Advisor: Prof. Antonio Krüger)
Thursday, 24.01.2019, 15:00 h, in building D3 2,  „Reuse“ meeting room

Abstract:

Recently, gaze-based interfaces in stationary settings became a valid option, as remote eye trackers are easily available at a low price. However, making gaze-based interfaces ubiquitous remains an open challenge that cannot be resolved using remote approaches. The miniaturization in camera technology allows for gaze estimation with head-mounted devices, which are similar to a pair of glasses. A crucial step towards gaze estimation using such devices is calibration. Although the development is mainly technology driven, a hypothetical fully calibrated system in which all parameters are known and valid is affected by calibration drift. In addition, attributes like minimal intrusiveness and obstruction, easy and flexible setup and high accuracy are not present in the latest commercial devices. Hence their applicability to spontaneous interaction or long-lasting research experiments is questionable.
In this thesis we enable spontaneous, calibration-free and accurate mobile gaze estimation. We contribute by investigating the following three areas: Efficient Long-Term Usage of a head-mounted eye tracker; Location, Orientation & Target Independent head mounted eye tracking; and Mobile & Accurate Calibration-Free gaze estimation. Through the elaboration of the theoretical principles, we investigate novel concepts for each aspect; these are implemented and evaluated, converging into a final device, to show how to overcome current limitations.

2018

December

Juhi KULSHRESTHA
Quantifying & Characterizing Information Diets of Social Media Users
(Advisor: Prof. Krishna Gummadi)
Thursday, 20.12.2018, 16:00 h, in building E1 5, room 0.29

Abstract:

An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media.
With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives.
Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.

Johannes DOERFERT
Applicable and Sound Polyhedral Optimization of Low-Level Programs
(Advisor: Prof. Sebastian Hack)
Wednesday, 19.12.2018, 10:15 h, in building E1 7, room 0.01

Abstract:

Computers become increasingly complex. Current and future systems feature configurable hardware, multiple cores with different capabilities, as well as accelerators. In addition, the memory subsystem becomes diversified too. The cache hierarchy grows deeper, is augmented with scratchpads, low-latency memory, and high-bandwidth memory. The programmer alone cannot utilize this enormous potential. Compilers have to provide insight into the program behavior, or even arrange computations and data themselves. Either way, they need a more holistic view of the program. Local transformations, which treat the iteration order, computation unit, and data layout as fixed, will not be able to fully utilize a diverse system.
The polyhedral model, a high-level program representation and transformation framework, has shown great success tackling various problems in the context of diverse systems. While it is widely acknowledged for its analytical powers and transformation capabilities, it is also widely assumed to be too restrictive and fragile for real-world programs.
In this thesis, we improve the applicability and profitability of polyhedral-model-based techniques. Our efforts guarantee a sound polyhedral representation and extend the applicability to a wider range of programs. In addition, we introduce new applications to utilize the information available in the polyhedral program representation, including standalone optimizations and techniques to derive high-level properties.

Alexander WEINERT
Optimality and Resilience in Parity Games
(Advisor: Dr. Martin Zimmermann)
Friday, 14.12.2018, 14:15 h, in building E1 7, room 0.01

Abstract:

Modeling reactive systems as infinite games has yielded a multitude of results in the fields of program verification and program synthesis. The canonical parity condition used for such games, however, neither suffices to express non-functional requirements on the modeled system, nor to capture malfunctions of the deployed system.
In this work we address these issues by investigating quantitative games in which the above characteristics can be expressed. Parity games with costs are a variant of parity games in which traversing an edge incurs some nonnegative cost. The cost of a play is the limit of the cost incurred between answering odd colors by larger even ones. We extend that model by allowing for integer costs, obtaining parity games with weights, and determine both the computational complexity of solving the resulting games as well as the memory required by the protagonist to win such games.
Our results show that the increase in expressiveness comes at a steep cost in terms of both measures. Subsequently, we address the optimality question, i.e., the question asking whether the protagonist is able to ensure a given bound on the cost, and show that answering this question is harder than just solving the underlying game, even for special cases of parity games with weights.
Moreover, we further extend this model to allow for modeling disturbances that may occur during the operation of the resulting strategy and show how to compute a strategy that is resilient against as many such disturbances as possible. Finally, we show that the protagonist is able to trade memory, quality, and resilience of strategies for one another and we show how to compute the possible tradeoffs between these three values.

Paarijaat ADITYA
Towards Privacy-Compliant Mobile Computing
(Advisor: Prof. Peter Druschel)
Monday, 03.12.2018, 16:30 h, in building E1 5, room 0.29

Abstract:

Sophisticated mobile computing, sensing and recording devices like smartphones, smartwatches, and wearable cameras are carried by their users virtually around the clock, blurring the distinction between the online and offline worlds. While these devices enable transformative new applications and services, they also introduce entirely new threats to users‘ privacy, because they can capture a complete record of the user’s location, online and offline activities, and social encounters, including an audiovisual record. Such a record of users‘ personal information is highly sensitive and is subject to numerous privacy risks. In this thesis, we have investigated and built systems to mitigate two such privacy risks:
1) privacy risks due to ubiquitous digital capture, where bystanders may inadvertently be captured in photos and videos recorded by other nearby users,
2) privacy risks to users‘ personal information introduced by a popular class of apps called ‘mobile social apps’. In this thesis, we present two systems, called I-Pic and EnCore, built to mitigate these two privacy risks.
Both systems aim to put the users back in control of what personal information is being collected and shared, while still enabling innovative new applications. We built working prototypes of both systems and evaluated them through actual user deployments. Overall we demonstrate that it is possible to achieve privacy-compliant digital capture and it is possible to build privacy-compliant mobile social apps, while preserving their intended functionality and ease-of-use. Furthermore, we also explore how the two solutions can be merged into a powerful combination, one which could enable novel workflows for specifying privacy preferences in image capture that do not currently exist.

November

Pavel KOLEV
Algorithmic Results for Clustering and Refined Physarum Analysis
(Advisor: Prof. Kurt Mehlhorn)
Tuesday, 27.11.2018, 16:15 h, in building E1 5, room 0.29

Abstract:

In the first part of this thesis, we study the Binary $\ell_0$-Rank-$k$ problem which given a binary matrix $A$ and a positive integer $k$, seeks to find a rank-$k$ binary matrix $B$ minimizing the number of non-zero entries of $A-B$. A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant $k$ over the reals, $\mathbb{F}_2$, and the Boolean semiring. In addition, we give novel algorithms for important variants of $\ell_0$-low rank approximation. The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph $G$ into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal $k$-way node partition of $G$. In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.

Sigurd SCHNEIDER
A Verified Compiler for a Linear Imperative/Functional Intermediate Language
(Advisors: Prof. Sebastian Hack & Prof. Gert Smolka)
Friday, 16.11.2018, 12:00 h, in building E1 7, room 0.01

Abstract:

This thesis describes the design of the verified compiler LVC. LVC’s main novelty is the way its first-order, term-based intermediate language IL realizes the advantages of static single assignment (SSA) for verified compilation. IL is a term-based language not based on a control-flow graph (CFG) but defined in terms of an inductively defined syntax with lexically scoped mutually recursive function definitions. IL replaces the usual dominance-based SSA definition found in unverified and verified compilers with the novel notion of coherence. The main research question this thesis studies is whether IL with coherence offers a faithful implementation of SSA, and how the design influences the correctness invariants and the proofs in the verified compiler LVC. To study this question, we verify dead code elimination, several SSA-based value optimizations including sparse conditional constant propagation and SSA-based register allocation approach including spilling. In these case studies, IL with coherence provides the usual advantages of SSA and improves modularity of proofs. Furthermore, we propose a novel SSA construction algorithm based on coherence, and leverage the term structure of IL to obtain an inductive proof method for simulation proofs. LVC is implemented and verified with over 50,000 lines of code using the proof assistant Coq. To underline practicability of our approach, we integrate LVC with CompCert to obtain an executable compiler that generates PowerPC assembly code.

Pascal LESSEL
Supporting User’s Influence in Gamification Settings and Game Live-Streams
(Advisor: Prof. Antonio Krüger)
Tuesday, 06.11.2018, 15:00 h, in building D3 2 (DFKI), room -2.17 (Reuse)

Abstract:

Playing games has long been important to mankind. One reason for this is the associated autonomy, as players can decide on many aspects on their own and can shape the experience. Game-related sub-fields have appeared in Human-Computer Interaction where this autonomy is questionable: in this thesis, we consider gamification and game live-streams and here, we support the users‘ influence at runtime. We hypothesize that this should affect the perception of autonomy and should lead to positive effects overall. Our contribution is three-fold: first, we investigate crowd-based, self-sustaining systems in which the user’s influence directly impacts the outcome of the system’s service. We show that users are willing to expend effort in such systems even without additional motivation, but that gamification is still beneficial here. Second, we introduce „bottom-up“ gamification, i.e., the idea of self-tailored gamification. Here, users have full control over the gamification used in a system, i.e., they can set it up as they see fit at the system’s runtime. Through user studies, we show that this has positive behavioral effects and thus adds to the ongoing efforts to move away from „one-size-fits-all“ solutions. Third, we investigate how to make gaming live-streams more interactive, and how viewers perceive this. We also consider shared game control settings in live-streams, in which viewers have full control, and we contribute options to support viewers‘ self-administration here.

October

Konrad JAMROZIK
Mining Sandboxes
(Advisor: Prof. Andreas Zeller)
Monday, 22.10.2018, 11:00 h, in building E9 1, room 0.01

Abstract:

Modern software is ubiquitous, yet insecure. It has the potential to expose billions of humans to serious harm, up to and including losing fortunes and taking lives. Existing approaches for securing programs are either exceedingly hard and costly to apply, significantly decrease usability, or just don’t work well enough against a determined attacker.
In this thesis we propose a new solution that significantly increases application security yet it is cheap, easy to deploy, and has minimal usability impact. We combine in a novel way the best of what existing techniques of test generation, dynamic program analysis and runtime enforcement have to offer: We introduce the concept of sandbox mining. First, in a phase called mining, we use automatic test generation to discover application behavior. Second, we apply a sandbox to limit any behavior during normal usage to the one discovered during mining. Users of an application running in a mined sandbox are thus protected from the application suddenly changing its behavior, as compared to the one observed during automatic test generation. As a consequence, backdoors, advanced persistent threats and other kinds of attacks based on the passage of time become exceedingly hard to conduct covertly. They are either discovered in the secure mining phase, where they can do no damage, or are blocked altogether. Mining is cheap because we leverage fully automated test generation to provide baseline behavior. Usability is not degraded: the sandbox runtime enforcement impact is negligible; the mined behavior is comprehensive and presented in a human readable format, thus any unexpected behavior changes are rare and easy to reason about. Our BOXMATE prototype for Android applications shows the approach is technically feasible, has an easy setup process, and is widely applicable to existing apps. Experiments conducted with BOXMATE show less than one hour is required to mine Android applications sandboxes, requiring few to no confirmations for frequently used functionality.

September

Petro LUTSYK
Correctness of Multi-Core Processors with Operating System Support
(Advisor: Prof. Wolfgang Paul)
Thursday, 27.09.2018, 10:00 h, in building E1 5, room 0.29

Abstract:

In the course of adding hardware support for operating systems and hypervisors (like Microsoft Hyper-V), we verify a realistic multi-core processor with integrated mechanisms for two levels of virtual address translation. The second level of address translation (SLAT) is required to allow guests of a hypervisor (typically operating systems) to execute their programs in translated mode. In hardware, the scheme of SLAT is implemented in a dedicated processor component, called the two-level memory management unit (MMU). Both the specification and the implementation of the two-level MMU are presented in full detail. The main contribution is the complete paper and pencil correctness proof for the pipelined multi-core implementation of the MIPS-86 ISA with SLAT. MIPS-86 combines the instruction set of MIPS with the memory model of x86. First, we consider the sequential implementation, which serves to demonstrate integration of the two-level MMU into the MIPS processor. In the proof of our main result — correctness of the pipelined implementation — we refer to the sequential case to show correctness of the MMU operation. This allows us to shift the focus towards the problems of pipelining a machine with speculative execution and interrupts, which are necessary to consider in combination with address translation.

Xucong ZHANG
Gaze Estimation and Interaction in Real-World Environments
(Advisor: Dr. Andreas Bulling)
Wednesday, 26.09.2018, 11:00 h, in building E1 4, room 0.19

Abstract:

Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. Appearance-based gaze estimation methods directly regress from eye images to gaze targets without eye feature detection, and therefore have great potential to work with ordinary devices. However, these methods require a large amount of domain-specific training data to cover the significant variability in eye appearance. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. We collected a large-scale gaze estimation dataset from real-world setting, proposed a full-face gaze estimation method, and studied data normalization. We applied our gaze estimation methods to real-world interactive application including eye contact detection and gaze estimation with multiple personal devices.

Yongtao SHUAI
Dynamic Adaptive Video Streaming with Minimal Buffer Sizes

(Advisor: Prof. Thorsten Herfet)
Wednesday, 19.09.2018, 16:00 h, in building E1 1, room 407

Abstract:

Recently, adaptive streaming has been widely adopted in video streaming services to improve the quality-of-experience (QoE) of video delivery over the Internet. However, state-of-the-art bitrate adaptation achieves satisfactory performance only with extensive buffering of several tens of seconds. This leads to high playback latency in video delivery, which is undesirable especially in the context of live content and interactive features with a low upper bound on the latency. Therefore, this thesis aims at pushing the application of adaptive streaming to its limit with respect to the buffering size, which is the dominant factor of the streaming latency.
In the current work, we first address the minimum buffering size required in adaptive streaming, which provides us with guidelines to determine a reasonable low latency for streaming systems. Then, we tackle the fundamental challenge of achieving such a low latency streaming by developing a novel adaptation algorithm that stabilizes buffer dynamics despite a small buffering size. We also present advanced improvements by designing a novel adaptation architecture with low-delay feedback for the bitrate selection and optimizing the underlying transport layer to offer efficient real-time streaming. Experimental evaluations demonstrate that our approach achieves superior QoE in adaptive video streaming, especially in the particularly challenging case of low latency streaming.

August

Erik DERR
Understanding and Assessing Security on Android via Static Code Analysis
(Advisor: Prof. Michael Backes)
Tuesday, 28.08.2018, 14:00 h, in building E9 1, room 0.05

Abstract:

Smart devices have become a rich source of sensitive information including personal data (contacts and account data) and context information like GPS data that is continuously aggregated by onboard sensors. As a consequence, mobile platforms have become a prime target for malicious and over-curious applications. The growing complexity and the quickly rising number of mobile apps have further reinforced the demand for comprehensive application security vetting. This dissertation presents a line of work that advances security testing on Android via static code analysis. In the first part of this dissertation, we build an analysis framework that statically models the complex runtime behavior of apps and Android’s application framework (on which apps are built upon) to extract privacy and security-relevant data-flows. We provide the first classification of Android’s protected resources within the framework and generate precise API-to-permission mappings that excel over prior work. We then propose a third-party library detector for apps that is resilient against common code obfuscations to measure the outdatedness of libraries in apps and to attribute vulnerabilities to the correct software component. Based on these results, we identify root causes of app developers not updating their dependencies and propose actionable items to remedy the current status quo. Finally, we measure to which extent libraries can be updated automatically without modifying the application code.

Seong Joon OH
Image Manipulation against Learned Models: Privacy and Security Implications
(Advisor: Prof. Bernt Schiele)
Monday, 06.08.2018, 16:00 h, in building E1 4, room 0.24

Abstract:

Machine learning is transforming the world. Its application areas span privacy sensitive and security critical tasks such as human identification and self-driving cars. These applications raise privacy and security related questions that are not fully understood or answered yet: Can automatic person recognisers identify people in photos even when their faces are blurred? How easy is it to find an adversarial input for a self-driving car that makes it drive off the road?
This thesis contributes one of the first steps towards a better understanding of such concerns in the presence of data manipulation. From the point of view of user’s privacy, we show the inefficacy of common obfuscation methods like face blurring, and propose more advanced techniques based on head inpainting and adversarial examples. We discuss the duality of model security and user privacy problems and describe the implications of research in one area for the other. Finally, we study the knowledge aspect of the data manipulation problem: the more one knows about the target model, the more effective manipulations one can craft. We propose a game theoretic framework to systematically represent the partial knowledge on the target model and derive privacy and security guarantees. We also demonstrate that one can reveal architectural details and training hyperparameters of a model only by querying it, leading to even more effective data manipulations against it.

July

Pascal BERRANG
Quantifying and Mitigating Privacy Risks in Biomedical Data
(Advisor: Prof. Michael Backes)
Wednesday, 25.07.2018, 14:00 h, in building E9 1 (CISPA), room 001

Abstract:

The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, allowing for a breakthrough towards a more precise and personalized medicine. However, the release of these intrinsically highly sensitive, interdependent data poses a new severe privacy threat.
In this thesis, we provide means to quantify and protect the privacy of individuals’ biomedical data. Besides the genome, we specifically focus on two of the most important epigenetic elements influencing human health: microRNAs and DNA methylation. We quantify the privacy for multiple realistic attack scenarios. Our results underline that the privacy risks inherent to biomedical data have to be taken seriously. Moreover, we present and evaluate solutions to preserve the privacy of individuals. Our mitigation techniques stretch from the differentially private release of epigenetic data, considering its utility, up to cryptographic constructions to securely, and privately evaluate a random forest on a patient’s data.

Ruben BECKER
On Flows, Paths, Roots, and Zeros
(Advisor: Prof. Kurt Mehlhorn)
Monday, 23.07.2018, 14:00 h, in building E1 7, room 001

Abstract:

This thesis has two parts; in the first of which we give new results for various network flow problems. (1) We present a novel dual ascent algorithm for min-cost flow and show that an implementation of it is very efficient on certain instance classes. (2) We approach the problem of numerical stability of interior point network flow algorithms by giving a path following method that works with integer arithmetic solely and is thus guaranteed to be free of any nu-merical instabilities. (3) We present a gradient descent approach for the undirected transship-ment problem and its special case, the single source shortest path problem (SSSP). For distrib-uted computation models this yields the first SSSP-algorithm with near-optimal number of communication rounds.
The second part deals with fundamental topics from algebraic computation. (1) We give an algorithm for computing the complex roots of a complex polynomial. While achieving a com-parable bit complexity as previous best results, our algorithm is simple and promising to be of practical impact. It uses a test for counting the roots of a polynomial in a region that is based on Pellet’s theorem. (2) We extend this test to polynomial systems, i.e., we develop an algorithm that can certify the existence of a k-fold zero of a zero-dimensional polynomial system within a given region. For bivariate systems, we show experimentally that this approach yields signifi-cant improvements when used as inclusion predicate in an elimination method.

Bojana KODRIC
Incentives in Dynamic Markets
(Advisor: Prof. Martin Hoefer, now Uni Frankfurt)
Monday, 23.07.2018, 11:00 h, in building E1 7, room 001

Abstract:

In this thesis, we consider a variety of combinatorial optimization problems within a common theme of uncertainty and selfish behavior.
In our first scenario, the input is collected from selfish players. Here, we study extensions of the so-called smoothness framework for mechanisms, a very useful technique for bounding the inefficiency of equilibria, to the cases of varying mechanism availability and participation of risk-averse players. In both of these cases, our main results are general theorems for the class of (λ,μ)-smooth mechanisms. We show that these mechanisms guarantee at most a (small) constant factor performance loss in the extended settings.
In our second scenario, we do not have access to the exact numerical input. Within this context, we explore combinatorial extensions of the well-known secretary problem under the assumption that the incoming elements only reveal their ordinal position within the set of previously arrived elements. We first observe that many existing algorithms for special matroid structures maintain their competitive ratio in the ordinal model. In contrast, we provide a lower bound for algorithms that are oblivious to the matroid structure. Finally, we design new algorithms that obtain constant competitive ratios for a variety of combinatorial problems.

June

Manuel REINERT
Cryptographic Techniques for Privacy and Access Control in Cloud-Based Applications
(Advisor: Prof. Matteo Maffei, now Uni Wien)
Friday, 29.06.2018, 15:30 h, in building E9 1 (CISPA), room 001

Abstract:

Digitization is one of the key challenges for today’s industries and society. It affects more and more business areas and also user data and, in particular, sensitive information. Due to its sensitivity, it is important to treat personal information as secure and private as possible yet enabling cloud-based software to use that information when requested by the user.
In this thesis, we focus on the privacy-preserving outsourcing and sharing of data, the querying of outsourced protected data, and the usage of personal information as an access control mechanism for rating platforms, which should be protected from coercion attacks. In those three categories, we present cryptographic techniques and protocols that push the state of the art. In particular, we first present multi-client oblivious RAM (ORAM), which augments standard ORAM with selective data sharing through access control, confidentiality, and integrity. Second, we investigate on recent work in frequency-hiding order-preserving encryption and show that the state of the art misses rigorous treatment, allowing for simple attacks against the security of the existing scheme. As a remedy, we show how to fix the security definition and that the existing scheme, slightly adapted, fulfills it. Finally, we design and develop a coercion-resistant rating platform. Coercion-resistance has been dealt with mainly in the context of electronic voting yet also affects other areas of digital life such as rating platforms.

Maryam NAZARIEH
Understanding Regulatory Mechanisms Underlying Stem Cells Helps to Identify Cancer Biomarks
(Advisor: Prof. Volkhard Helms)
Thursday, 28.06.2018, 16:30 h, in building E2 1, room 001

Abstract:

In this thesis, we present novel algorithms to unravel regulatory mechanisms underlying stem cell differentiation and cancerogenesis. Inspired by the tightly interwoven topology of the regulators controlling the pluripotency network in mouse embryonic stem cells, we formulated the problem where a set of master regulatory genes in a regulatory network is identified by solving either of two combinatorial optimization problems namely a minimum dominating set or a minimum connected dominating set in weakly and strongly connected components. Then we applied the developed methods to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma. As not all the nodes in the solutions are critical, we developed a prioritization method to rank a set of candidate genes which are related to a certain disease based on systematic analysis of the genes that are differentially expressed in tumor and normal conditions. Moreover, we demonstrated that the topological features in regulatory networks surrounding differentially expressed genes are highly consistent in terms of using the output of several analysis tools for identifying differentially expressed genes. We compared two randomization strategies for TF-miRNA co-regulatory networks to infer significant network motifs underlying cellular identity. We showed that the edge-type conserving method surpasses the non-conserving method in terms of biological relevance and centrality overlap. We presented several web servers and software packages that were made publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identifies a set of key regulatory genes in a user-provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating set and minimum connected dominating set return the optimal solutions for the aforementioned problems. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-specific networks for the sets of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and offer detection of three- and four-node FFL motifs as a separate web service for mouse and human.

Alexander ANDREYCHENKO
Model Reconstruction for Moment-based Stochastic Chemical Kinetics
(Advisor: Prof. Verena Wolf)
Monday, 25.06.2018, 12:00 h, in building E1 7, room 0.01

Abstract:

Based on the theory of stochastic chemical kinetics, the inherent randomness and stochasticity of biochemical reaction networks can be accurately described by discrete-state continuous-time Markov chains, where each chemical reaction corresponds to a state transition of the process. However, the analysis of such processes is computationally expensive and sophisticated numerical methods are required. The main complication comes due to the largeness problem of the state space, so that analysis techniques based on an exploration of the state space are often not feasible and the integration of the moments of the underlying probability distribution has become a very popular alternative. In this thesis we propose an analysis framework in which we integrate a number of moments of the process instead of the state probabilities. This results in a more time-efficient simulation of the time evolution of the process. In order to regain the state probabilities from the moment representation, we combine the moment-based simulation (MM) with a maximum entropy approach: the maximum entropy principle is applied to derive a distribution that fits best to a given sequence of moments. We further extend this approach by incorporating the conditional moments (MCM) which allows not only to reconstruct the distribution of the species present in high amount in the system, but also to approximate the probabilities of species with low molecular counts. For the given distribution reconstruction framework, we investigate the numerical accuracy and stability using case studies from systems biology, compare two different moment approximation methods (MM and MCM), examine if it can be used for the reaction rates estimation problem and describe the possible future applications.

Abhishek BICHHAWAT
Practical Dynamic Information Flow Control
(Advisor: Prof. Christian Hammer, now Uni Potsdam)
Friday, 22.06.2018, 16:30 h, in building E2 1, room 007

Abstract:

Computer systems and applications handle a plethora of private user information. These are often assisted by a set of bugs with both malicious and non-malicious intent leading to information leaks. Information flow control has been studied extensively as an approach to mitigate such leaks but has focussed mainly on static analyses. However, some of the applications are developed using dynamic languages like JavaScript that make the static analyses techniques ineffective. Although there has been a growing interest to develop dynamic analysis techniques, it still isn’t at the helm of information flow security in such settings; the prime reason being that these techniques either over-approximate or are too restrictive in most cases. This thesis focuses on improving the usability of dynamic information flow analyses. It begins by presenting a sound improvement and enhancement of the permissive-upgrade strategy. It, then, presents a sound and precise dynamic information flow analysis for handling complex features like exceptions in higher-order languages. To address the issue of permissiveness, this thesis develops a sound approach to bound leaks dynamically while allowing information release in accordance to a developer-specified budget. It then applies these techniques to a Web browser and explores a policy specification mechanism for Web applications.

Shilpa GARG
Computational Haplotyping: theory and practice
(Advisor: Prof. Tobias Marschall)
Thursday, 21.06.2018, 16:30 h, in building E2 1, room 007

Abstract:

Genomics has paved a new way to comprehend life and its evolution, and also to investigate causes of diseases and their treatment. One of the important problems in genomic analyses is haplotype assembly. Constructing complete and accurate haplotypes plays an essential role in understanding population genetics and how species evolve.In this thesis, we focus on computational approaches to haplotype assembly from third generation sequencing technologies. This involves huge amounts of sequencing data, and such data contain errors due to the single molecule sequencing protocols employed. Taking advantage of combinatorial formulations helps  to correct for these errors to solve the haplotyping problem. Various computational techniques such as dynamic programming, parameterized algorithms, and graph algorithms are used to solve this problem.
This thesis presents several contributions concerning the area of haplotyping. First, a novel algorithm based on dynamic programming is proposed to provide approximation guarantees for phasing a single individual. Second, an integrative approach is introduced to combining multiple sequencing datasets to generating complete and accurate haplotypes. The effectiveness of this integrative approach is demonstrated on a real human genome. Third, we provide a novel efficient approach to phasing pedigrees and demonstrate its advantages in comparison to phasing a single individual.Fourth, we present a generalized graph-based framework for performing haplotype-aware de novo assembly. Specifically, this generalized framework consists of a hybrid pipeline for generating accurate and complete haplotypes from data stemming from multiple sequencing technologies, one that provides accurate reads and other that provides long reads.

Sandy HEYDRICH
A tale of two packing problems: Improved algorithms and tighter bounds for online bin packing and the geometric knapsack problem
(Advisor: Prof. Rob van Stee, now Uni Siegen)
Tuesday, 12.06.2018, 16:15 h, in building E1 4, room 024

Abstract:

In this thesis, we deal with two packing problems: the online bin packing and the geometric knapsack problem. In online bin packing, the aim is to pack a given number of items of different size into a minimal number of containers. The items need to be packed one by one without knowing future items. For online bin packing in one dimension, we present a new family of algorithms that constitutes the first improvement over the previously best algorithm in almost 15 years. While the algorithmic ideas are intuitive, an elaborate analysis is required to prove its competitive ratio. We also give a lower bound for the competitive ratio of this family of algorithms. For online bin packing in higher dimensions, we discuss lower bounds for the competitive ratio and show that the ideas from the one-dimensional case cannot be easily transferred to obtain better two-dimensional algorithms. In the geometric knapsack problem, one aims to pack a maximum weight subset of given rectangles into one square container. For this problem, we consider offline approximation algorithms. For geometric knapsack with square items, we improve the running time of the best known PTAS and obtain an EPTAS. This shows that large running times caused by some standard techniques for geometric packing problems are not always necessary and can be improved. Finally, we show how to use resource augmentation to compute optimal solutions in EPTAS-time, thereby improving upon the known PTAS for this case.

Georg NEIS
Compositional Compiler Correctness Via Parametric Simulations
(Advisor: Prof. Derek Dreyer)
Thursday, 07.06.2018, 14:00 h, in building E1 5, room 0.29

Abstract:

Compiler verification is essential for the construction of fully verified software, but most prior work (such as CompCert) has focused on verifying whole-program compilers. To support separate compilation and to enable linking of results from different verified compilers, it is important to develop a compositional notion of compiler correctness that is modular (preserved under linking), transitive (supports multi-pass compilation), and flexible (applicable to compilers that use different intermediate languages or employ non-standard program transformations). In this thesis, we formalize such a notion of correctness based on parametric simulations, thus developing a novel approach to compositional compiler verification.

May

Andreas TEUCKE
An Approximation and Refinement Approach to First-Order Automated Reasoning
(Advisor: Prof. Christoph Weidenbach)
Wednesday, 16.05.2018, 14:00 h, in building E1 5, room 0.02

Abstract:

With the goal of lifting model-based guidance from the propositional setting to first-order logic, I have developed an approximation theorem proving approach based on counterexample-guided abstraction refinement. A given clause set is transformed into a simplified form where satisfiability is decidable.This approximation extends the signature and preserves unsatisfiability:if the simplified clause set is satisfiable, so is the original clause set. A resolution refutation generated by a decision procedure on the simplified clause set can then either be lifted to a refutation in the original clause set, or it guides a refinement excluding the previously found unliftable refutation. This way the approach is refutationally complete. I have implemented my approach based on the SPASS theorem prover. On certain satisfiable problems, the implementation shows the ability to beat established provers such as SPASS, iProver, and Vampire.

Georg TAMM
Architectures for Ubiquitous 3D on Heterogeneous Computing Platforms
(Advisor: Prof. Philipp Slusallek)
Wednesday, 09.05.2018, 14:00 h, in building D3 4,  VisCenter DFKI

Abstract:

Today, a wide scope for 3D graphics applications exists, including domains such as scientific visualization, 3D-enabled web pages, and entertainment. At the same time, the devices and platforms that run and display the applications are more heterogeneous than ever. Display environments range from mobile devices to desktop systems and ultimately to distributed displays. While the capability of the client devices may vary considerably, the visualization experiences running on them should be consistent. The field of application should dictate how and on what devices users access the application, not the technical requirements to realize the 3D output.
The goal of this thesis is to examine the diverse challenges involved in providing consistent and scalable visualization experiences to heterogeneous computing platforms and display setups. We developed a comprehensive set of rendering architectures in the major domains of scientific and medical visualization, web-based 3D applications, and movie virtual production. To provide the required service quality, performance, and scalability for different client devices and displays, our architectures focus on the efficient utilization and combination of the available client, server, and network resources. We present innovative solutions that incorporate methods for hybrid and distributed rendering as well as means to stream rendering results. We establish the browser as a promising platform for accessible and portable visualization services. We collaborated with experts from the medical field and the movie industry to evaluate the usability of our technology in real-world scenarios.

Pascal MONTAG
Integrated Timing Verification for Distributed Embedded Real-Time Systems
(Advisor: Prof. Reinhard Wilhelm)
Tuesday, 08.05.2018, 14:00 h, in building E1 1,  room 407

Abstract:

The verification of timing properties with commercial tools based on static analysis has been available for safety-critical automotive systems for more than a decade. Although they offer many other advantagesin addition to safety guarantees, these tools are rarely applied in the automotive industry. This thesis analyses different possible reasons for this lack of application and suggests solutions where possible. Amongst the results of this thesis are technical findings, like the further increase of automation (e.g. for loop-bound detection) or the automatic reduction of WCET overestimation for highly variant systems. With regard to the development process, technical as well as non-technical solutions are presented that target the actual users of the timing verification, which range from software developers to system architects. The central result of this thesis is the integration of existing and newly developed timing verification methods and tools into a holistic development process approach called Timing Oriented Application Development (TOAD).

April

Wenbin LI
From Perception over Anticipation to Manipulation
(Advisor: Dr. Mario Fritz)
Wednesday, 25.04.2018, 16:00 h, in building E1 4,  room 019

Abstract:

From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this talk, we summarize our efforts of exploring different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. We first explore how to guide robot to place a single block into the scene without collapsing the existing structure and later introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. We validate our model on both synthetic and real-world settings.

Kerstin NEININGER
Towards the understanding of transcriptional and translational regulatory complexity
(Advisor: Prof. Dr. Volkhard Helms)
Wednesday, 11.04.2018, 10:00 h, in building E2 1,  room 001

Abstract:

Considering the same genome within every cell, the observed phenotypic diversity can only arise from highly regulated mechanisms beyond the encoded DNA sequence. We investigated several mechanisms of protein biosynthesis and analyzed DNA methylation patterns, alternative translation sites, and genomic mutations. As chromatin states are determined by epigenetic modifications and nucleosome occupancy,we conducted a structural superimposition approach between DNA methyltransferase 1 (DNMT1) and the nucleosome, which suggests that DNA methylation is dependent on accessibility of DNMT1 to nucleosome–bound DNA. Considering translation, alternative non–AUG translation initiation was observed. We developed reliable prediction models to detect these alternative start sites in a given mRNA sequence. Our tool PreTIS provides initiation confidences for all frame–independent non–cognate and AUG starts. Despite these innate factors, specific sequence variations can additionally affect a phenotype. We conduced a genome–wide analysis with millions of mutations and found an accumulation of SNPs next to transcription starts that could relate to a gene–specific regulatory signal. We also report similar conservation of canonical and alternative translation sites, highlighting the relevance of alternative mechanisms. Finally, our tool MutaNET automates variation analysis by scoring the impact of individual mutations on cell function while also integrating a gene regulatory network.

March

Arunav MISHRA
Biomedical Knowledge Base Construction form Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Klaus Berberich, now HTW Saar)
Monday, 12.03.2018, 10:00 h, in building E1 4,  room 024

Abstract:

Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems:
• We address a linking problem to  connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents thatcan be linked to a given excerpt.
• We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event.
• To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models.
Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.

Patrick ERNST
Biomedical Knowledge Base Construction from Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Gerhard Weikum)
Tuesday, 06.03.2018, 10:00 h, in building E1 4,  room 024

Abstract:

While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledge about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions:
– To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas.
– To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning overthe composability of partial facts.
– To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics andfor entity-relationship-oriented exploration.
Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.

February

Roland LEIßA
Language Support for Programming High-Performance Code
(Advisor: Prof. Dr. Sebastian Hack)
Thursday, 22.02.2018, 14:00 h, in building E1 7,  room 001

Abstract:

Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showingno signs of turning around. In particular, hardware becomes more and more specialized and exhibits different formsof parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD).
In this talk, we first introduce Sierra: an extension for C++ that facilitates portable and effective SIMD programming. Second, we present AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.

Mohammad MEHDI MONIRI
Gaze and Peripheral Vision Analysis for Human-Environment Interaction: Applications in Automotive and Mixed-Reality Scenarios
(Advisor: Prof. Dr. Wolfgang Wahlster)
Wednesday, 14.02.2018, 16:00 h, in building D3 2, Reuse seminar room -2.17

Abstract:

This thesis studies eye-based user interfaces which integrate information about the user’s perceptual focus-of-attention into multimodal systems to enrich the interaction with the surrounding environment. We examine two new modalities:gaze input and output in the peripheral field of view. All modalities are considered in the whole spectrum of the mixed-reality continuum. We show the added value of these new forms of multimodal interaction in two important application domains: Automotive User Interfaces and Human-Robot Collaboration. We present experiments that analyze gaze under various conditions and help to design a 3D model for peripheral vision. Furthermore, this work presents several new algorithms for eye-based interaction, like deictic reference in mobile scenarios, for non-intrusive user identification, or exploiting the peripheral field view for advanced multimodal presentations. These algorithms have been integrated into a number of software tools for eye-based interaction, which are used to implement 15 use cases for intelligent environment applications. These use cases cover a wide spectrum of applications, from spatial interactions with a rapidly changing environment from within a moving vehicle, to mixed-reality interaction between teams of human and robots.

Jonas OBERHAUSER
Justifying The Strong Memory Semantics of Concurrent High-Level Programming Languages for System Programming
(Advisor: Prof. Dr. Wolfgang Paul)
Wednesday, 07.02.2018, 15:00 h, in E1 7, room 001

Abstract:

We justify the strong memory semantics of high-level concurrent languages, such as C11 or Java, when compiled to x86-like machines. Languages such as C11 guarantee that programs that obey a specific software discipline behave as if they were executed on a simple coherent shared memory multi-processor. Real x86 machines, on the other hand, do not provide a coherent shared memory, and compilers add slow synchronizing instructions to provide the illusion of coherency. We show that one software discipline that closely matches software disciplines of languages such as C11 and Java – in a nutshell, that racing memory accesses are annotated as such by the programmer – and one particular way to add relatively few synchronization instructions – in a nutshell, between an annotated store and an annotated load in the same thread – suffice to create this illusion.
The software discipline has to be obeyed by the program in the semantics of the high-level language, therefore allowing the verification effort to take place completely in the strong memory semantics of the high-level language. We treat a machine model with operating system support, and accordingly our theorems can be used to verify interruptible multi-core operating systems, including those where users can run unverified programs that do not obey the software discipline.

David HAFNER
Variational Image Fusion
(Advisor: Prof. Dr. Joachim Weickert)
Monday, 05.02.2018, 10:15 h, in E1 1, room 407

Abstract:

The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework.
First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes.
Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields.
Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.

January

Linar MIKEEV
Numerical Analysis of Stochastic Biochemical Reaction Networks
(Advisor: Prof. Dr. Verena Wolf)
Wednesday, 31.01.2018, 14:00 h, in E1 7, room 001

Abstract:

Numerical solution of the chemical master equation for stochastic reaction networks typically suffers from the state space explosion problem due to the curse of dimensionality and from stiffness due to multiple time scales.The dimension of the state space equals the number of molecular species involved in the reaction network and the size of the system of differential equations equals the number of states in the corresponding continuous-time Markov chain, which is usually enormously huge and often even infinite. Thus, efficient numerical solution approaches must be able to handle huge, possibly infinite and stiff systems of differential equations efficiently.
In this thesis, we present efficient techniques for the numerical analysis of the biochemical reaction networks. We present an approximate numerical integration approach that combines a dynamical state space truncation procedure with efficient numerical integration schemes for systems of ordinary differential equations including adaptive step size selection based on local error estimates. We combine our dynamical state space truncation with the method of conditional moments, and present the implementation details and numerical results. We also incorporate ideas from importance sampling simulations into a non-simulative numerical method that approximates transient rare event probabilities based on a dynamical truncation of the state space.
Finally, we present a maximum likelihood method for the estimation of the model parameters given noisy time series measurements of molecular counts. All approaches presented in this thesis are implemented as part of the tool STAR, which allows to model and simulate the biochemical reaction networks. The efficiency and accuracy is demonstrated by numerical examples.

Ezgi CICEK
Relational Cost Analysis
(Advisor: Dr. Deepak Garg)
Monday, 22.01.2018, 16:00 h, in E1 5, room 029

Abstract:

Programming languages research has made great progress towards statically estimating the execution cost of a program. However, when one is interested in how the execution costs of two programs compare to each other (i.e., relational cost analysis), the use of unary techniques does not work well in many cases. In order to support a relational cost analysis, we must ultimately support reasoning about not only the executions of a single program, but also the executions of two programs, taking into account their similarities. This dissertation makes several contributions to the understanding and development of such a relational cost analysis. It shows how:
• Refinement types and effect systems can express functional and relational quantitative properties of pairs of programs, including the difference in execution costs.
• Relational cost analysis can be adapted to reason about dynamic stability, a measure of the update times of incremental programs as their inputs change.
• A sound and complete bidirectional type system can be developed (and implemented) for relational cost analysis.

Thesis defenses