PhD Thesis Defenses 2021
PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.
Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.
(Advisor: Prof. Christian Rossow)
Using Honeypots to Trace Back Amplification DDoS Attacks
Tuesday, 21.12.2021, 09:00 h
This thesis presents a line of work that enables practical DDoS attack traceback supported by honeypot reflectors. To this end, we investigate the tradeoffs between applicability, required a priori knowledge, and traceback granularity in three settings. First, we show how spoofed attack packets and non-spoofed scan packets can be linked using honeypot-induced fingerprints, which allows attributing attacks launched from the same infrastructures as scans. Second, we present a classifier-based approach to trace back attacks launched from booter services after collecting ground-truth data through self-attacks. Third, we propose to use BGP poisoning to locate the attacking network without prior knowledge and even when attack and scan infrastructures are disjoint. Finally, as all of our approaches rely on honeypot reflectors, we introduce an automated end-to-end pipeline to systematically find amplification vulnerabilities and synthesize corresponding honeypots.
(Advisor: Prof. Thorsten Herfet)
Provision of Multi-Camera Footage for Professional Production and Consumer Environments
Thursday, 16.12.2021, 13:00 h
Multi-camera footage contains much more data compared to that of conventional video. While the additional data enables a number of new effects that previously required a large amount of CGI magic and manual labor to achieve, it can easily overpower consumer hardware and networks. We explore the necessary steps to create an interactive multiview streaming system, from the cameras via the compression and streaming of the material, to the view interpolation to create immersive perspective shifts for viewers. By only using freely available consumer hardware, and making sure all steps can run in real-time, in combination with the others, the benefits of multi-camera video are made available to a wider public.
With the construction of a modular camera array for lightfield recording, we highlight the most important properties of such an array to allow for good post-processing of the recorded data. This includes a flexible yet sturdy frame, the management of computational and storage resources, as well as the required steps to make the raw material ready for further processing. With the Unfolding scene we display the possibilities of lightfields when a good camera array is combined with the talent of professional visual effect artists for the creation of future cinematic movies. Furthermore, we explore the benefits of 5D-lightfield video for scenes with fast motion, using precisely controlled time delays between the shutters of different cameras in the capturing array.
(Advisor: Prof. Bernt Schiele)
From Pixels to People: Recovering Location, Shape and Pose of Humans in Images
Wednesday, 15.12.2021, 12:00 h
Campus E1 4, room 24
Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as „looking at people“, aiming to perceive humans in images at different levels of granularity. This includes bounding-box level pedestrian detection, different pixel-level recognition tasks, and 3D human shape and pose estimation. We conclude with a discussion of benchmarking practices in computer vision.
(Advisor: Prof. Bernd Finkbeiner)
Logical and Deep Learning Methods for Temporal Reasoning
Monday, 13.12.2021, 17:15 h
In this thesis, we study logical and deep learning methods for the temporal reasoning of reactive systems.
In Part I, we determine decidability borders for the satisfiability and realizability problem of temporal hyperproperties. Temporal hyperproperties relate multiple computation traces to each other and are expressed in a temporal hyperlogic. In particular, we identify decidable fragments of the highly expressive hyperlogics HyperQPTL and HyperCTL*. As an application, we elaborate on an enforcement mechanism for temporal hyperproperties. We study explicit enforcement algorithms for specifications given as formulas in universally quantified HyperLTL.
In Part II, we train a (deep) neural network on the trace generation and realizability problem of linear-time temporal logic (LTL). We consider a method to generate large amounts of additional training data from practical specification patterns. The training data is generated with classical solvers, which provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions such that the neural network generalizes to the semantics of the logic. The neural network can predict solutions even for formulas from benchmarks from the literature on which the classical solver timed out.
Additionally, we show that it solves a significant portion of problems from the annual synthesis competition (SYNTCOMP) and even out-of-distribution examples from a recent case study.
(Advisor: Prof. Bernd Finkbeiner)
Synthesis of Asynchronous Distributed Systems from Global Specifications
Friday, 10.12.2021, 15:15 h
The synthesis problem asks whether there exists an implementation for a given formal specification and derives such an implementation if it exists. This approach enables engineers to think on a more abstract level about what a system should achieve instead of how it should accomplish its goal. The synthesis problem is often represented by a game between system players and environment players. Petri games define the synthesis problem for asynchronous distributed systems with causal memory. So far, decidability results for Petri games are mainly obtained for local winning conditions, which is limiting as global properties like mutual exclusion cannot be expressed.
In this thesis, we make two contributions. First, we present decidability and undecidability results for Petri games with global winning conditions. The global safety winning condition of bad markings defines markings that the players have to avoid. We prove that the existence of a winning strategy for the system players in Petri games with a bounded number of system players, at most one environment player, and bad markings is decidable. The global liveness winning condition of good markings defines markings that the players have to reach. We prove that the existence of a winning strategy for the system players in Petri games with at least two system players, at least three environment players, and good markings is undecidable.
Second, we present semi-decision procedures to find winning strategies for the system players in Petri games with global winning conditions and without restrictions on the distribution of players. The distributed nature of Petri games is employed by proposing encodings with true concurrency. We implement the semi-decision procedures in a corresponding tool.
(Advisors: Prof. Gerhard Weikum and Dr. Rishiraj Saha Roy)
Enhancing Explainability and Scrutability of Recommender Systems
Thursday, 09.12.2021, 11:00 h
Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm’s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in filtering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system’s behavior can be modified accordingly. In this thesis, we will present our contributions towards explainability and scrutability of recommender systems:
– We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms.
– We propose a method, PRINCE, to facilitate provider-side explainability through generating action-based, counterfactual and concise explanations.
– We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations.
(Advisor: Prof. Michael Backes)
Health Privacy: Methods for privacy-preserving data sharing of methylation, micro-biome and eye tracking data
Wednesday, 08.12.2021, 15:00 h
This thesis studies the privacy risks of biomedical data and develops mechanisms for privacy-preserving data sharing. The contribution of this work is two-fold: First, we demonstrate privacy risks of a variety of biomedical data types such as DNA methylation data, microbiome data and eye tracking data. Despite being less stable than well-studied genome data and more prone to environmental changes, well-known privacy attacks can be adopted and threaten the privacy of data donors. Nevertheless, data sharing is crucial to advance biomedical research given that collection the data of a sufficiently large population is complex and costly. Therefore, we develop as a second step privacy-preserving tools that enable researchers to share such biomedical data. And second, we equip researchers with tools to enable privacy-preserving data sharing. These tools are mostly based on differential privacy, machine learning techniques and adversarial examples and carefully tuned to the concrete use case to maintain data utility while preserving privacy.
(Advisors: Profs. Kurt Mehlhorn and Karl Bringmann)
Counting Patterns in Strings and Graphs
Thursday, 02.12.2021, 15:00 h
We study problems related to finding and counting patterns in strings and graphs.
In the string-regime, we are interested in counting how many substring of a text 𝑇 are at Hamming (or edit) distance at most 𝑘 to a pattern 𝑃. Among others, we are interested in the fully-compressed setting, where both 𝑇 and 𝑃 are given in a compressed representation. For both distance measures, we give the first algorithm that runs in (almost) linear time in the size of the compressed representations. We obtain the algorithms by new and tight structural insights into the solution structure of the problems.
In the graph-regime, we study problems related to counting homomorphisms between graphs. In particular, we study the parameterized complexity of the problem #IndSub(𝛷), where we are to count all 𝑘-vertex induced subgraphs of a graph that satisfy the property 𝛷. Based on a theory of Lovász, Curticapean et al., we express #IndSub(𝛷) as a linear combination of graph homomorphism numbers to obtain #W-hardness and almost tight conditional lower bounds for properties 𝛷 that are monotone or that depend only on the number of edges of a graph. Thereby, we prove a conjecture by Jerrum and Meeks.
In addition, we investigate the parameterized complexity of the problem #Hom(ℋ → 𝒢) for graph classes ℋ and 𝒢. In particular, we show that for any problem 𝑃 in the class #W, there are classes ℋ_𝑃 and 𝒢_𝑃 such that 𝑃 is equivalent to #Hom(ℋ_𝑃 → 𝒢_𝑃).
(Advisor: Prof. Matthias Hein, now Tübingen)
Beyond the Arithmetic Mean: Extensions of Spectral Clustering and Semi-supervised Learning for Signed and Multilayer Graphs via Matrix Power Means
Tuesday, 30.11.2021, 12:00 h
In this thesis we present extensions of spectral clustering and semi-supervised learning to signed and multilayer graphs. These extensions are based on a one-parameter family of matrix functions called Matrix Power Means. In the scalar case, this family has the arithmetic, geometric and harmonic means as particular cases.
We study the effectivity of this family of matrix functions through suitable versions of the stochastic block model to signed and multilayer graphs. We provide provable properties in expectation and further identify regimes where the state of
the art fails whereas our approach provably performs well. Some of the settings that we analyze are as follows: first, the case where each layer presents a reliable approximation to the overall clustering; second, the case when one single layer has information about the clusters whereas the remaining layers are potentially just noise; third, the case when each layer has only partial information but all together show global information about the underlying clustering structure.
We present extensive numerical verifications of all our results and provide matrix-free numerical schemes. With these numerical schemes we are able to show that our proposed approach based on matrix power means is scalable to large sparse signed and multilayer graphs.
Finally, we evaluate our methods in real world datasets. For instance, we show that our approach consistently identifies clustering structure in a real signed network where previous approaches failed. This further verifies that our methods are competitive to the state of the art.
(Advisor: Prof. Jörg Hoffmann)
Star-Topology Decoupled State-Space Search in AI Planning and Model Checking
Monday, 22.11.2021, 16:00 h
Building E1 1, Room 4.07
State-space search is a widely employed concept in many areas of computer science.
The well-known state explosion problem, however, imposes a severe limitation to the effective implementation of search in state spaces that are exponential in the size of a compact system description. Decoupled state-space search is a novel approach to tackle the state explosion. It decomposes the system such that the dependencies between components take the form of a star topology with a center and several leaf components. Decoupled search exploits that the leaves in that topology are conditionally independent. We introduce decoupled search in the context of AI planning and formal verification. Building on common formalisms, we develop the concept of the decoupled state space and prove its correctness.
For several existing techniques, most prominently partial-order reduction, symmetry breaking, Petri-net unfolding, and symbolic state representations, we prove that decoupled search can be exponentially more efficient, confirming that it is indeed a novel concept that exploits model properties in a unique way. Empirically, decoupled search favorably compares to state-of-the-art AI planners, and outperforms well-established model checking tools.
(Advisor: Prof. Gert Smolka)
Computability in Constructive Type Theory
– new date – Thursday, 25.11.21, 15:15h
Building E1 3, HS 002
We give a formalised and machine-checked account of computability theory in the Calculus of Inductive Constructions (CIC), the constructive type theory underlying the Coq proof assistant.
We first develop synthetic computability theory, pioneered by Richman, Bridges, and Baue, where one treats all functions as computable, eliminating the need for a model of computation. We assume a novel parametric axiom for synthetic computability and give proofs of results like Rice’s theorem, the Myhill isomorphism theorem, and the existence of Post’s simple and hypersimple predicates relying on no other axioms such as Markov’s principle or choice axioms.
As a second step, we introduce models of computation. We give a concise overview of definitions of various standard models and contribute machine-checked simulation proofs, posing a non-trivial engineering effort.
We identify a notion of synthetic undecidability relative to a fixed halting problem, allowing axiom-free machine-checked proofs of undecidability. We contribute such undecidability proofs for the historical foundational problems of computability theory which require the identification of invariants left out in the literature and now form the basis of the Coq Library of Undecidability Proofs.
We then identify the weak call-by-value λ-calculus L as sweet spot for programming in a model of computation. We introduce a certifying extraction framework and analyse an axiom stating that every function of type N → N is L-computable.
(Advisors: Profs. Dietrich Klakow and Gerhard Weikum)
Deep Latent-Variable Models for Neural Text Generation
Friday, 05.11.2021, 10:00 h
Text generation aims to produce human-like natural language output for down-stream tasks. It covers a wide range of applications like machine translation, document summarization, dialogue generation and so on. Recently deep neural network-based end-to-end architectures are known to be data-hungry, and text generated from them usually suffer from low diversity, interpretability and controllability. As a result, it is difficult to trust the output from them in real-life applications. Deep latent-variable models, by specifying the probabilistic distribution over an intermediate latent process, provide a potential way of addressing these problems while maintaining the expressive power of deep neural networks. This presentation will explain how deep latent-variable models can improve over the standard encoder-decoder model for text generation. We will start from an introduction of encoder-decoder and deep latent-variable models, then go over popular optimization strategies, and finally elaborate on how latent variable models can help improve the diversity, interpretability and data efficiency in different applications of text generation tasks.
(Advisor: Prof. Christian Theobalt)
Real-Time Human Performance Capture and Synthesis
Friday, 29.10.2021, 15:00 h
Most of the images one finds in the media, such as on the Internet or in textbooks and magazines, contain humans as the main point of attention. Thus, there is an inherent necessity for industry, society, and private persons to be able to thoroughly analyze and synthesize the human-related content in these images. One aspect of this analysis and subject of this thesis is to infer the 3D pose and surface deformation, using only visual information, which is also known as human performance capture.
This thesis proposes two monocular human performance capture methods, which for the first time allow the real-time capture of the dense deforming geometry as well as an unseen 3D accuracy for pose and surface deformations. At the technical core, this work introduces novel GPU-based and data-parallel optimization strategies in conjunction with other algorithmic design choices that are all geared towards real-time performance at high accuracy. Moreover, this thesis presents a new weakly supervised multi-view training strategy combined with a fully differentiable character representation that shows superior 3D accuracy.
However, there is more to human-related Computer Vision than only the analysis of people in images. It is equally important to synthesize new images of humans in unseen poses and also from camera viewpoints that have not been observed in the real world. To this end, this thesis presents a method and ongoing work on character synthesis, which allow the synthesis of controllable photoreal characters that achieve motion- and view-dependent appearance effects as well as 3D consistency and which run in real time. This is technically achieved by a novel coarse-to-fine geometric character representation for efficient synthesis, which can be solely supervised on multi-view imagery.
(Advisors: Profs. Bernt Schiele and Mario Fritz)
Long-term Future Prediction under Uncertainty and Multi-modality
Friday, 24.09.2021, 16:00 h
Humans have an innate ability to excel at activities that involve prediction of complex object dynamics such as predicting the possible trajectory of a billiard ball after it has been hit by the player or the prediction of motion of pedestrians while on the road. A key feature that enables humans to perform such tasks is anticipation. To advance the field of autonomous systems, particularly, self-driving agents, in this thesis, we focus on the task of future prediction in diverse real world settings, ranging from deterministic scenarios such as prediction of paths of balls on a billiard table to the predicting the future of non-deterministic street scenes. Specifically, we identify certain core challenges for long-term future prediction: long-term prediction, uncertainty, multi-modality, and exact inference. To address these challenges, this thesis makes the following core contributions. Firstly, for accurate long-term predictions, we develop approaches that effectively utilize available observed information in the form of image boundaries in videos or interactions in street scenes. Secondly, as uncertainty increases into the future in case of non-deterministic scenarios, we leverage Bayesian inference frameworks to capture calibrated distributions of likely future events. Finally, to further improve performance in highly-multimodal non-deterministic scenarios such as street scenes, we develop deep generative models based on conditional variational autoencoders as well as normalizing flow based exact inference methods. Furthermore, we introduce a novel dataset with dense pedestrian-vehicle interactions to further aid the development of anticipation methods for autonomous driving applications in urban environments.
High Mobility in OFDM based Wireless Communication Systems
(Advisor: Prof. Thorsten Herfet)
Wednesday, 15.09.2021, 13:30 h
Orthogonal Frequency Division Multiplexing (OFDM) has been adopted as the transmission scheme in most of the wireless systems we use on a daily basis. It brings with it several inherent advantages that make it an ideal waveform candidate in the physical layer. However, OFDM based wireless systems are severely affected in High Mobility scenarios. In this thesis, we investigate the effects of mobility on OFDM based wireless systems and develop novel techniques to estimate the channel and compensate its effects at the receiver. Compressed Sensing (CS) based channel estimation techniques like the Rake Matching Pursuit (RMP) and the Gradient Rake Matching Pursuit (GRMP) are developed to estimate the channel in a precise, robust and computationally efficient manner. In addition to this, a Cognitive Framework that can detect the mobility in the channel and configure an optimal estimation scheme is also developed and tested. The Cognitive Framework ensures a computationally optimal channel estimation scheme in all channel conditions. We also demonstrate that the proposed schemes can be adapted to other wireless standards easily. Accordingly, evaluation is done for three current broadcast, broadband and cellular standards. The results show the clear benefit of the proposed schemes in enabling high mobility in OFDM based wireless communication systems.
Design and Implementation of WCET Analyses – Including a Case Study on Multi-Core Processors with Shared Buses
(Advisor: Prof. Sebastian Hack)
Friday, 10.09.2021, 14:00 h
For safety-critical real-time embedded systems, the worst-case execution time WCET) analysis – determining an upper bound on the possible execution times of a program – is an important part of the system verification. Multi-core processors share resources (e.g. buses and caches) between multiple processor cores and, thus, complicate the WCET analysis as the execution times of a program executed on one processor core significantly depend on the programs executed in parallel on the concurrent cores. We refer to this phenomenon as shared-resource interference.
This thesis proposes a novel way of modeling shared-resource interference during WCET analysis. It enables an efficient analysis – as it only considers one processor core at a time – and it is sound for hardware platforms exhibiting timing anomalies. Moreover, this thesis demonstrates how to realize a timing-compositional verification on top of the proposed modeling scheme. In this way, this thesis closes the gap between modern hardware platforms, which exhibit timing anomalies, and existing schedulability analyses, which rely on timing compositionality.
In addition, this thesis proposes a novel method for calculating an upper bound on the amount of interference that a given processor core can generate in any time interval of at most a given length. Our experiments demonstrate that the novel method is more precise than existing methods.
Adversarial Content Manipulation for Analyzing and Improving Model Robustness
(Advisor: Prof. Bernt Schiele)
Tuesday, 27.07.2021, 14:30 h
Computer vision systems deployed in the real-world will encounter inputs far from its training distribution. For example, a self-driving car might see a blue stop-sign it has not seen before. To ensure safe operation when faced with such out-of-distribution corner cases, it is vital to quantify the model robustness to such data before deployment. In this dissertation, we build generative models to create synthetic data variations at scale and leverage them to test the robustness of target computer vision systems to these variations. First, we build generative models which can controllably manipulate image and text data. This includes models to a) change visual context by removing objects, b) edit the appearance of an object in images, and c) change the writing style of text. Next, using these generative models we create model-agnostic and model-targeted input variations, to study the robustness of computer vision systems. While in the model-agnostic case, the input image is manipulated based on design priors, in the model-targeted case the manipulation is directly guided by the target model performance. With these methods, we measure and improve the robustness of various computer vision systems — specifically image classification, segmentation, object detection and visual question answering systems — to semantic input variations. Additionally, in the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user.
Self-Supervised Reconstruction and Synthesis of Faces
(Advisor: Prof. Christian Theobalt)
Monday, 26.07.2021, 17:00 h
Photorealistic and semantically controllable digital models of human faces are important for a wide range of applications such as movies, virtual reality, and casual photography. Traditional approaches require expensive setups which capture the person from multiple cameras under different illumination conditions. Recent approaches have also explored digitizing faces under less constrained settings, even from a single image of the person. These approaches rely on priors, commonly known as 3D morphable models (3DMMs), which are learned from datasets of 3D scans. This thesis pushes the state of the art in high-quality 3D reconstruction of faces from monocular images. A model-based face autoencoder architecture is introduced which integrates convolutional neural networks, 3DMMs, and differentiable rendering for self-supervised training on large image datasets. This architecture is extended to enable the refinement of a pretrained 3DMM just from a dataset of monocular images, allowing for higher-quality reconstructions. In addition, this thesis demonstrates the learning of the identity components of a 3DMM directly from videos without using any 3D data. Since videos are more readily available, this model can generalize better compared to the models learned from limited 3D scans.
This thesis also presents methods for the photorealistic editing of portrait images. In contrast to traditional approaches, the presented methods do not rely on any supervised training. Self-supervised editing is achieved by integrating the semantically meaningful 3DMM-based monocular reconstructions with a pretrained and fixed generative adversarial network. While this thesis presents several ideas which enable self-supervised learning for the reconstruction and synthesis of faces, several open challenges remain. These challenges, as well as an outlook for future work are also discussed.
On Privacy in Home Automation Systems
(Advisor: Prof. Christoph Sorge)
Monday, 12.07.2021, 10:00 h
Home Automation Systems (HASs) are becoming increasingly popular in newly built as well as existing properties. While offering increased living comfort, resource saving features and other commodities, most current commercial systems do not protect sufficiently against passive attacks. In this thesis we investigate privacy aspects of Home Automation Systems. We analyse the threats of eavesdropping and traffic analysis attacks, demonstrating the risks of virtually undetectable privacy violations. By taking aspects of criminal and data protection law into account, we give an interdisciplinary overview of privacy risks and challenges in the context of HASs. We present the first framework to formally model privacy guarantees of Home Automation Systems and apply it to two different dummy traffic generation schemes. In a qualitative and quantitative study of these two algorithms, we show how provable privacy protection can be achieved and how privacy and energy efficiency are interdependent. This allows manufacturers to design and build secure Home Automation Systems which protect the users‘ privacy and which can be arbitrarily tuned to strike a compromise between privacy protection and energy efficiency.
Duc Cuong NGUYEN
Improving Android App Security and Privacy with Developers
(Advisor: Prof. Michael Backes)
Thursday, 08.07.2021, 14:00 h
Existing research has uncovered many security vulnerabilities in Android applications (apps) caused by inexperienced, and unmotivated developers. Especially, the lack of tool support makes it hard for developers to avoid common security and privacy problems in Android apps. As a result, this leads to apps with security vulnerability that exposes end users to a multitude of attacks.
This thesis presents a line of work that studies and supports Android developers in writing more secure code. We first studied to which extent tool support can help developers in creating more secure applications. To this end, we developed and evaluated an Android Studio extension that identifies common security problems of Android apps, and provides developers suggestions to more secure alternatives. Subsequently, we focused on the issue of outdated third-party libraries in apps which also is the root cause for a variety of security vulnerabilities. Therefore, we analyzed all popular 3rd party libraries in the Android ecosystem, and provided developers feedback and guidance in the form of tool support in their development environment to fix such security problems. In the second part of this thesis, we empirically studied and measured the impact of user reviews on app security and privacy evolution. Thus, we built a review classifier to identify security and privacy related reviews and performed regression analysis to measure their impact on the evolution of security and privacy in Android apps. Based on our results we proposed several suggestions to improve the security and privacy of Android apps by leveraging user feedbacks to create incentives for developers to improve their apps toward better versions.
Bhaskar RAY CHAUDHURY
Finding Fair and Efficient Allocations
(Advisor: Prof. Kurt Mehlhorn)
Monday, 05.07.2021, 18:00 h
Fair division has developed into a fundamental branch of mathematical economics over the last seven decades (since the seminal work of Hugo Steinhaus in the 1940s). In a classical fair division problem, the goal is to „fairly“ allocate a set items among a set of agents. Several real-life scenarios are paradigmatic of the problems in this domain.
In this defense, we explore some fundamental questions about fair division. In particular, we focus on settings in which the items to be divided are either indivisible goods or divisible bads. Despite their practical significance, both these settings have been relatively less investigated than the divisible goods setting.
Information-Theoretic Causal Discovery
(Advisor: Prof. Jilles Vreeken)
Tuesday, 29.06.2021, 14:00 h
It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Causal discovery tries to answer precisely this question by rigorously analyzing under which assumptions it is feasible to infer directed causal networks from passively collected, so-called observational data. Classical approaches assume the data to be faithful to the causal graph, that is, independencies found in the distribution are assumed to be due to separations in the true graph. Under this assumption, so-called constraint-based methods can infer the correct Markov equivalence class of the true graph (i.e. the correct undirected graph and some edge directions), only using conditional independence tests.
In this dissertation, we aim to alleviate some of the weaknesses of constraint-based algorithms. In the first part, we investigate causal mechanisms, which cannot be detected when assuming faithfulness. We then suggest a weaker assumption based on triple interactions, which allows for recovering a broader spectrum of causal mechanisms. Subsequently, we focus on conditional independence testing, which is a crucial tool for causal discovery. In particular, we propose to measure dependencies through conditional mutual information, which we show can be consistently estimated even for the most general setup: discrete-continuous mixture random variables. Last, we focus on distinguishing Markov equivalent graphs (i.e. infer the complete DAG structure), which boils down to inferring the causal direction between two random variables. In this setting, we focus on continuous and mixed-type data and develop our methods based on an information-theoretic postulate, which states that the true causal graph can be compressed best, i.e. has the smallest Kolmogorov complexity.
Evolutionary Models for Signal Enhancement and Approximation
(Advisor: Prof. Joachim Weickert)
Monday, 28.06.2021, 15:00 h
This thesis deals with nature-inspired evolution processes for the purpose of signal enhancement and approximation. The focus lies on mathematical models which originate from the description of swarm behaviour. We extend existing approaches and show the potential of swarming processes as a modelling tool in image processing. In our work, we discuss the use cases of grey scale quantisation, contrast enhancement, line detection, and coherence enhancement. Furthermore, we propose a new and purely repulsive model of swarming that turns out to describe a specific type of backward diffusion process. It is remarkable that our model provides extensive stability guarantees which even support the utilisation of standard numerics. In experiments, we demonstrate its applicability to global and local contrast enhancement of digital images. In addition, we study the problem of one-dimensional signal approximation with limited resources using an adaptive sampling approach including tonal optimisation. We suggest a direct energy minimisation strategy and validate its efficacy in experiments. Moreover, we show that our approximation model can outperform a method recently proposed by Dar and Bruckstein.
Understanding Emerging Client-Side Web Vulnerabilities using Dynamic Program Analysis
(Advisor: Dr. Ben Stock)
Monday, 28.06.2021, 14:00 h
Therefore in this thesis, we present techniques capable of finding vulnerabilities automatically and at scale that originate from malicious inputs to postMessage handlers, polluted prototypes, and client-side storage mechanisms.
Our results highlight that the investigated vulnerabilities are prevalent even among the most popular sites, showing the need for automated systems that help developers uncover them in a timely manner. Using the insights gained during our empirical studies, we provide recommendations for developers and browser vendors to tackle the underlying problems in the future. Furthermore, we show that security mechanisms designed to mitigate such and similar issues cannot currently be deployed by first-party applications due to their reliance on third-party functionality. This leaves developers in a no-win situation, in which either functionality can be preserved or security enforced.
Sreyasi NAG CHOWDHURY
Text-Image Synergy for Multimodal Retrieval and Annotation
(Advisor: Prof. Gerhard Weikum)
Monday, 28.06.2021, 10:00 h
Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.
Variety Membership Testing in Algebraic Complexity
(Advisor: Prof. Markus Bläser)
Thursday, 17.06.2021, 17:00 h
We study some of the central problems in algebraic complexity theory through the lens of the variety membership testing problem. In the first part, we investigate whether separations between algebraic complexity classes can be phrased as instances of the variety membership testing problem. We make progress in the monotone commutative and the noncommutative settings. In the second part, we give efficient algorithms for the blackbox polynomial identity testing problem, and the rank computation problem of symbolic ma-trices, both phrasable as instances of the variety membership testing problem. For the former, we give a randomness-efficient algorithm and for the latter, we give an approximation algorithm. In the final part, we prove lower bounds, showing the NP-hardness of two problems on 3-tensors — the orbit closure containment problem and the tensor slice rank problem, both of which are instances of the variety membership testing problem.
Information Consumption on Social Media: Efficiency, Divisiveness, and Trust
(Advisor: Prof. Krishna Gummadi)
Monday, 14.06.2021, 09:00 h
Over the last decade, the advent of social media has profoundly changed the way people produce and consume information online. On these platforms, users themselves play a role in selecting the sources from which they consume information, overthrowing traditional journalistic gatekeeping. Moreover, advertisers can target users with news stories using users’ personal data. This new model has many advantages: the propagation of news is faster, the number of news sources is large, and the topics covered are diverse. However, in this new model, users are often overloaded with redundant information, and they can get trapped in filter bubbles by consuming divisive and potentially false information. To tackle these concerns, in my thesis, I address the following important questions:
(i) How efficient are users at selecting their information sources?
(ii) How can we break the filter bubbles that users get trapped in?
(iii) How can we design an efficient framework for fact-checking?
Towards Principled Dynamic Analysis on Android
(Advisor: Prof. Michael Backes)
Tuesday, 08.06.2021, 13:00 h
The vast amount of information and services accessible through mobile handsets running the Android operating system has led to the tight integration of such devices into our daily routines. However, their capability to capture and operate upon user data provides an unprecedented insight into our private lives that needs to be properly protected, which demands for comprehensive analysis and thorough testing. While dynamic analysis has been applied to these problems in the past, the corresponding literature consists of scattered work that often specializes on sub-problems and keeps on re-inventing the wheel, thus lacking a structured approach. To overcome this unsatisfactory situation, this dissertation introduces two major systems that advance the state-of-the-art of dynamically analyzing the Android platform. First, we introduce a novel, fine-grained and non-intrusive compiler-based instrumentation framework that allows for precise and high-performance modification of Android apps and system components. Second, we present a unifying dynamic analysis platform with a special focus on Android’s middleware in order to overcome the common challenges we identified from related work. Together, these two systems allow for a more principled approach for dynamic analysis on Android that enables comparability and composability of both existing and future work.
Explainable Methods for Knowledge Graph Refinement and Exploration via Symbolic Reasoning
(Advisor: Prof. Gerhard Weikum)
Monday, 07.06.2021, 16:00 h
Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have led to the construction of large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine these KGs to enhance their coverage and accuracy via KG completion and KG validation. Both of these tasks are particularly challenging when performed under the open world assumption. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Moreover, exploring KGs, by search and browsing, is essential for users to understand the KG value and limitations towards down-stream applications. However, enabling KG exploration is a challenge task given the large size of existing KGs
This dissertation tackles the challenges of KG refinement and KG exploration by combining symbolic reasoning over the KG with other techniques such as KG embedding models and text mining. Through such a combination, the proposed methods handle the complex nature of KGs and provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used for accurately inferring missing links in the KG. Furthermore, we propose a framework for tracing evidence for supporting (refuting) KG facts from both KG and text. Extracted evidence is used to assess the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.
Novel graph based algorithms for transcriptome sequence analysis
(Advisor: Prof. Marcel Schulz, now Uni Frankfurt)
Wednesday, 02.06.2021, 9:00 h
RNA-sequencing (RNA-seq) is one of the most-widely used techniques in molecular biology. A key bioinformatics task in any RNA-seq workflow is the assembling the reads. As the size of transcriptomics data sets is constantly increasing, scalable and accurate assembly approaches have to be developed.Here, we propose several approaches to improve assembling of RNA-seq data generated by second-generation sequencing technologies. We demonstrated that the systematic removal of irrelevant reads from a high coverage dataset prior to assembly, reduces runtime and improves the quality of the assembly. Further, we propose a novel RNA-seq assembly workflow comprised of read error correction, normalization, assembly with informed parameter selection and transcript-level expression computation.
In recent years, the popularity of third-generation sequencing technologies increased as long reads allow for accurate isoform quantification and gene-fusion detection, which is essential for biomedical research. We present a sequence-to-graph alignment method to detect and to quantify transcripts for third-generation sequencing data. Also, we propose the first gene-fusion prediction tool which is specifically tailored towards long-read data and hence achieves accurate expression estimation even on complex data sets. Moreover, our method predicted experimentally verified fusion events along with some novel events, which can be validated in the future.
Parameterized Verification and Repair of Concurrent Systems
(Advisor: PD Dr. Swen Jacobs)
Monday, 31.05.2021, 17:15 h
Concurrent systems are hard to get correct, and the problem becomes much complex when the system is composed of an arbitrary number n of components. For such systems, methods such as parameterized model checking can provide correctness guarantees that hold regardless of n. Unfortunately the parameterized model checking problem (PMCP) is in general undecidable, however there exist several methods that decide the problem for specific classes of systems and properties. Furthermore, if a parameterized model checker detects an error in a given system, it does not provide a mechanism to repair it such that it satisfies the specification. Hence, to repair the system, the user has to find out which behavior of a component causes the error, and how it can be corrected. Both tasks may be nontrivial.
In this thesis, we present novel approaches for model checking, repair and synthesis of systems that may be parameterized in their number of components. We improve existing results for guarded protocols and we show that the PMCP of guarded protocols and token passing systems is decidable for specifications that add a quantitative aspect to LTL, called Prompt-LTL. Furthermore, we present, to our knowledge, the first parameterized repair algorithm. We show how this algorithm can be used on classes of systems that can be represented as well structured transition systems (WSTS).
Interpretable Methods in Cancer Diagnostics
(Advisor: Prof. Nico Pfeifer, now Tübingen)
Friday, 28.05.2021, 14:00 h
Cancer is a hard problem. It is hard for the patients, for the doctors and nurses, and for the researchers working on understanding the disease and finding better treatments for it. The challenges faced by a pathologist diagnosing the disease for a patient is not necessarily the same as the ones faced by cell biologists working on experimental treatments and understanding the fundamentals of cancer. In this thesis we work on different challenges faced by both of the above teams.
This thesis first presents methods to improve the analysis of the flow cytometry data used frequently in the diagnosis process, specifically for the two subtypes of non-Hodgkin Lymphoma which are our focus: Follicular Lymphoma and Diffuse Large B Cell Lymphoma. With a combination of concepts from graph theory, dynamic programming, and machine learning, we present methods to improve the diagnosis process and the analysis of the abovementioned data. The interpretability of the method helps a pathologist to better understand a patient’s disease, which itself improves their choices for a treatment. In the second part, we focus on the analysis of DNA-methylation and gene expression data, both of which present the challenge of being very high dimensional yet with a few number of samples comparatively. We present an ensemble model which adapts to different patterns seen in each given data, in order to adapt to noise and batch effects. At the same time, the interpretability of our model helps a pathologist to better find and tune the treatment for the patient: a step further towards personalized medicine.
(Advisor: Prof. Michael Backes)
Wednesday, 26.05.2021, 14:00 h
Why is Machine Learning so hard?
(Advisor: Prof. Michael Backes)
Thursday, 20.05.2021, 11:00 h
Security has always been a cat and mouse game. In some fields, solutions have emerged: for example cryptography found sound ways to represent attacker and defender, allowing to formally reason about vulnerability. Recently, researchers have raised concerns about the security of learning algorithms. Such learning models are shown labelled instances during training (for example Malware and benign programs). At so called test-time, the model is able to predict for an unseen program if it is benign or malicious. Although security in general is a hard problem, in Machine Learning (ML), many vulnerabilities are inherently linked to properties that are essential to the algorithms. As a result, ML security is still far from being solved.
In the first part of the talk, we will talk about the lottery ticket hypothesis and deep neural networks. Lottery tickets are small subnetworks that can be retrieved by an iterative training and pruning procedure. We show how randomness influences the resulting subnetwork: given a fixed set of initial weights, training with stochasticity during training will yield a different subnetwork each run. These different networks are truly different, and not an instance of weight space symmetry. This property illustrates the complexity of ML when it comes to the resulting classifiers.
However, works exist that link this diversity even inside a single model to the framework of Bayesian uncertainty. Bayesian uncertainty measures allow a classifier to quantify how certain it is about the class assignment it outputs. In this context, we investigate test-time vulnerability or evasion attacks, also called adversarial examples. In other words, we minimally alter inputs so they will be classified at high confidence by the Bayesian network. We further transfer this adversarial examples to another, second classifier that relies on a different approximation of Bayesian uncertainty. Yet, the adversarial examples work on both models equally well. We conclude that although ML models are very diverse, they are not diverse enough to provide protection against test-time attacks.
Ezekiel Olamide SOREMEKUN
Evidence-driven Testing and Debugging of Software Systems
(Advisor: Prof. Andreas Zeller)
Thursday, 08.04.2021, 14:00 h
Automated debugging techniques systematically expose, reproduce, diagnose and fix software bugs. These techniques aim to support developers during software testing and debugging activities. A key challenge is that the state-of-the-art debugging techniques are hardly used or adopted in practice because they fail to address the actual needs of software developers. Thus, there is a gap between proposed debugging techniques and the actual state of software practice. To bridge this gap, we propose an evidence-driven approach that collects empirical evidence from software practice to guide, direct and automate software testing and debugging tasks. Applying this approach provides several empirical results and techniques in this work. Firstly, to elicit developers‘ debugging needs, we conduct an empirical study to evaluate the state of debugging in practice and provide our empirical data as a benchmark (called DBGBench) to aid the automated evaluation of debugging techniques. To build effective debugging techniques, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) techniques using real bugs and apply our empirical results to develop a hybrid approach that outperforms the state-of-the-art AFL techniques. To repair invalid program inputs, we evaluate the prevalence and causes of invalid inputs in practice. We then build on our empirical results to develop a general-purpose algorithm that automatically diagnoses and repairs invalid inputs. Lastly, to generate inputs (dis)similar to real-world program inputs, we learn input distributions from real-world sample inputs using probabilistic grammar. Then, we employ the learned probabilistic grammar to guide the test generation of inputs (dis)similar to the sample inputs found in practice. Overall, our evaluation shows that our evidence-driven approach allows one to elicit relevant debugging requirements from software practice and develop novel and effective debugging techniques. In summary, our proposed debugging techniques build on empirical evidence obtained from software practice and improve over the state-of-the-art techniques.
Computational Solutions for Addressing Heterogeneity in DNA Methylation Data
(Advisor: Prof. Thomas Lengauer)
Tuesday, 30.03.2021, 11:00 h
DNA methylation is a cell-type-specific epigenetic modification with important roles in gene regulation and can be assayed genome-wide using various experimental techniques. The generated data enables in-depth characterization of heterogeneity in DNA methylation patterns at three levels: (i) between sample groups defined by a phenotype, (ii) between the samples sharing a phenotype, and (iii) within a sample. Within my thesis and in this talk, I will present novel computational solutions for addressing the three levels of heterogeneity. First, we extended the RnBeads software package facilitating DNA methylation data analysis with novel modules for epigenetic age prediction, missing value imputation, and differential variability analysis. Second, we developed a novel R-software package (named MAGAR) for associating DNA methylation alterations with genetic variations in so-called methylation quantitative trait loci (methQTL). We further discerned tissue-specific from shared methQTLs, which showed distinct properties and genomic location. To address between-sample heterogeneity, we developed a pipeline to deconvolve heterogeneous DNA methylation patterns into their cellular constituents. Lastly, we investigated within-sample heterogeneity (WSH), which can be quantified genome-wide using different scores. We developed a novel metric and compared the scores using simulated and experimental data. In this talk, I will discuss novel software tools for dissecting DNA methylation heterogenity with an emphasis on the methQTL analysis and the WSH scores. Furthermore, I will show how these software packages can be used to further our understanding of epigenetic regulation.
Discovering robust dependencies from data
(Advisor: Prof. Jilles Vreeken)
Thursday, 04.03.2021, 11:00 h
Nowadays, scientific discoveries are fueled by data-driven approaches. As a motivating example from Materials Science, using appropriate feature selection techniques the physicochemical process of compound semiconductor crystallization can now be described with only a few atomic material properties. The dissertation investigates this direction further, that is, we aim to develop interpretable knowledge discovery techniques that can uncover and meaningfully summarize complex phenomena from data.
Such feature selection tasks are very demanding. The dependencies in data can be of any type (e.g., linear, non-linear, multivariate), while the data can be of any type as well (i.e., mixtures of discrete and continuous attributes). We have to accommodate all types in our analysis, because otherwise we can miss important interactions (e.g., linear methods cannot find non-linear relationships). For interpretability, the degree of dependence should be meaningfully summarized (e.g., “target Y depends 50% on X”, or “the variables in set X exhibit 80% dependence”). From a statistical perspective, the degree of dependence should be robustly measured from data samples of any size to minimize spurious discoveries. Lastly, the combinatorial optimization problem to discover the top dependencies out of all possible dependencies is NP-hard, and hence, we need efficient exhaustive algorithms. For situations where this is prohibitive (i.e., large dimensionalities), we need approximate algorithms with guarantees such that we can interpret the quality of the solution.
In this dissertation, we solve the aforementioned challenges and propose effective algorithms that discover dependencies in both supervised and unsupervised scenarios. In particular, we employ notions from Information Theory to meaningfully quantify all types of dependencies (e.g., Mutual Information and Total Correlation), which we then couple with notions from statistical learning theory to robustly estimate them from data. Lastly, we derive tight and efficient bounding functions that can be used by branch-and-bound, enabling exact solutions in large search spaces. Case studies on Materials Science data confirm that we can indeed discover high-quality dependencies from complex data.
Structural Building Blocks in Graph Data: Characterised by Hyperbolic Communities and Uncovered by Boolean Tensor Clustering
(Advisor: Prof. Pauli Miettinen, now University of Eastern Finland)
Wednesday, 24.02.2021, 14:00 h
Graph data nowadays easily become so large that it is infeasible to study the underlying structures manually. Thus, computational methods are needed to uncover large-scale structural information. In this thesis, we present methods to understand and summarise large networks.
We propose the hyperbolic community model to describe groups of more densely connected nodes within networks using very intuitive parameters. The model accounts for a frequent connectivity pattern in real data: a few community members are highly interconnected; most members mainly have ties to this core. Our model fits real data much better than previously-proposed models. Our corresponding random graph generator, HyGen, creates graphs with realistic intra-community structure.
Using the hyperbolic model, we conduct a large-scale study of the temporal evolution of communities on online question–answer sites. We observe that the user activity within a community is constant with respect to its size throughout its lifetime, and a small group of users is responsible for the majority of the social interactions.
We propose an approach for Boolean tensor clustering. This special tensor factorisation is restricted to binary data and assumes that one of the tensor directions has only non-overlapping factors. These assumptions – valid for many real-world data, in particular time-evolving networks – enable the use of bitwise operators and lift much of the computational complexity from the task.
(Advisor: Prof. Andreas Zeller)
Monday, 08.02.2021, 15:00 h
Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stack—to the best of our knowledge a unique feature of our technique.
Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can transfer on average sixty percent of the tested application behavior to new apps—without any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say “add item to shopping cart”, it tests the specified behavior in an application–both in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frameworks, creates change resilient UI tests, and lays the foundation for learning, transferring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.
Philipp von STYP-REKOWSKY
Retrofitting Privacy Controls to Stock Android
(Advisor: Prof. Michael Backes)
Monday, 18.01.2021, 14:00 h
Android is the most popular operating system for mobile devices, making it a prime target for attackers. To counter these, Android’s security concept uses app isolation and access control to critical system resources. However, Android gives users only limited options to restrict app permissions according to their privacy preferences but instead lets developers dictate the permissions users must grant. Moreover, Android’s security model is not designed to be customizable by third-party developers, forcing users to rely on device manufacturers to address their privacy concerns. This thesis presents a line of work that retrofits comprehensive privacy controls to the Android OS to put the user back in charge of their device. It focuses on developing techniques that can be deployed to stock Android devices without firmware modifications or root privileges. The first part of this dissertation establishes fundamental policy enforcement on third-party apps using inlined reference monitors to enhance Android’s permission system. This approach is then refined by introducing a novel technique for dynamic method hook injection on Android’s Java VM. Finally, we present a system that leverages process-based privilege separation to provide a virtualized application environment that supports the enforcement of complex security policies. A systematic evaluation of our approach demonstrates its practical applicability, and over one million downloads of our solution confirm user demand for privacy-enhancing tools.