PhD Thesis Defenses

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.



Ezekiel Olamide SOREMEKUN

Evidence-driven Testing and Debugging of Software Systems
(Advisor: Prof. Andreas Zeller)

interested parties should ask the Dean’s Office or their group leaders for details on day prior
Thursday, 08.04.2021, 14:00 h


Automated debugging techniques systematically expose, reproduce, diagnose and fix software bugs. These techniques aim to support developers during software testing and debugging activities. A key challenge is that the state-of-the-art debugging techniques are hardly used or adopted in practice because they fail to address the actual needs of software developers. Thus, there is a gap between proposed debugging techniques and the actual state of software practice. To bridge this gap, we propose an evidence-driven approach that collects empirical evidence from software practice to guide, direct and automate software testing and debugging tasks. Applying this approach provides several empirical results and techniques in this work. Firstly, to elicit developers‘ debugging needs, we conduct an empirical study to evaluate the state of debugging in practice and provide our empirical data as a benchmark (called DBGBench) to aid the automated evaluation of debugging techniques. To build effective debugging techniques, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) techniques using real bugs and apply our empirical results to develop a hybrid approach that outperforms the state-of-the-art AFL techniques. To repair invalid program inputs, we evaluate the prevalence and causes of invalid inputs in practice. We then build on our empirical results to develop a general-purpose algorithm that automatically diagnoses and repairs invalid inputs. Lastly, to generate inputs (dis)similar to real-world program inputs, we learn input distributions from real-world sample inputs using probabilistic grammar. Then, we employ the learned probabilistic grammar to guide the test generation of inputs (dis)similar to the sample inputs found in practice. Overall, our evaluation shows that our evidence-driven approach allows one to elicit relevant debugging requirements from software practice and develop novel and effective debugging techniques. In summary, our proposed debugging techniques build on empirical evidence obtained from software practice and improve over the state-of-the-art techniques.



Computational Solutions for Addressing Heterogeneity in DNA Methylation Data
(Advisor: Prof. Thomas Lengauer)

interested parties should ask the Dean’s Office or their group leaders for details on day prior
Tuesday, 30.03.2021, 11:00 h


DNA methylation is a cell-type-specific epigenetic modification with important roles in gene regulation and can be assayed genome-wide using various experimental techniques. The generated data enables in-depth characterization of heterogeneity in DNA methylation patterns at three levels: (i) between sample groups defined by a phenotype, (ii) between the samples sharing a phenotype, and (iii) within a sample. Within my thesis and in this talk, I will present novel computational solutions for addressing the three levels of heterogeneity. First, we extended the RnBeads software package facilitating DNA methylation data analysis with novel modules for epigenetic age prediction, missing value imputation, and differential variability analysis. Second, we developed a novel R-software package (named MAGAR) for associating DNA methylation alterations with genetic variations in so-called methylation quantitative trait loci (methQTL). We further discerned tissue-specific from shared methQTLs, which showed distinct properties and genomic location. To address between-sample heterogeneity, we developed a pipeline to deconvolve heterogeneous DNA methylation patterns into their cellular constituents. Lastly, we investigated within-sample heterogeneity (WSH), which can be quantified genome-wide using different scores. We developed a novel metric and compared the scores using simulated and experimental data. In this talk, I will discuss novel software tools for dissecting DNA methylation heterogenity with an emphasis on the methQTL analysis and the WSH scores. Furthermore, I will show how these software packages can be used to further our understanding of epigenetic regulation.

Panagiotis MANDROS

Discovering robust dependencies from data
(Advisor: Prof. Jilles Vreeken)
Thursday, 04.03.2021, 11:00 h


Nowadays, scientific discoveries are fueled by data-driven approaches. As a motivating example from Materials Science, using appropriate feature selection techniques the physicochemical process of compound semiconductor crystallization can now be described with only a few atomic material properties. The dissertation investigates this direction further, that is, we aim to develop interpretable knowledge discovery techniques that can uncover and meaningfully summarize complex phenomena from data.
Such feature selection tasks are very demanding. The dependencies in data can be of any type (e.g., linear, non-linear, multivariate), while the data can be of any type as well (i.e., mixtures of discrete and continuous attributes). We have to accommodate all types in our analysis, because otherwise we can miss important interactions (e.g., linear methods cannot find non-linear relationships). For interpretability, the degree of dependence should be meaningfully summarized (e.g., “target Y depends 50% on X”, or “the variables in set X exhibit 80% dependence”). From a statistical perspective, the degree of dependence should be robustly measured from data samples of any size to minimize spurious discoveries. Lastly, the combinatorial optimization problem to discover the top dependencies out of all possible dependencies is NP-hard, and hence, we need efficient exhaustive algorithms. For situations where this is prohibitive (i.e., large dimensionalities), we need approximate algorithms with guarantees such that we can interpret the quality of the solution.
In this dissertation, we solve the aforementioned challenges and propose effective algorithms that discover dependencies in both supervised and unsupervised scenarios. In particular, we employ notions from Information Theory to meaningfully quantify all types of dependencies (e.g., Mutual Information and Total Correlation), which we then couple with notions from statistical learning theory to robustly estimate them from data. Lastly, we derive tight and efficient bounding functions that can be used by branch-and-bound, enabling exact solutions in large search spaces. Case studies on Materials Science data confirm that we can indeed discover high-quality dependencies from complex data.



Structural Building Blocks in Graph Data: Characterised by Hyperbolic Communities and Uncovered by Boolean Tensor Clustering
(Advisor: Prof. Pauli Miettinen, now University of Eastern Finland)
Wednesday, 24.02.2021, 14:00 h


Graph data nowadays easily become so large that it is infeasible to study the underlying structures manually. Thus, computational methods are needed to uncover large-scale structural information. In this thesis, we present methods to understand and summarise large networks.
We propose the hyperbolic community model to describe groups of more densely connected nodes within networks using very intuitive parameters. The model accounts for a frequent connectivity pattern in real data: a few community members are highly interconnected; most members mainly have ties to this core. Our model fits real data much better than previously-proposed models. Our corresponding random graph generator, HyGen, creates graphs with realistic intra-community structure.
Using the hyperbolic model, we conduct a large-scale study of the temporal evolution of communities on online question–answer sites. We observe that the user activity within a community is constant with respect to its size throughout its lifetime, and a small group of users is responsible for the majority of the social interactions.
We propose an approach for Boolean tensor clustering. This special tensor factorisation is restricted to binary data and assumes that one of the tensor directions has only non-overlapping factors. These assumptions – valid for many real-world data, in particular time-evolving networks – enable the use of bitwise operators and lift much of the computational complexity from the task.

Andreas RAU

Topic-Driven Testing
(Advisor: Prof. Andreas Zeller)
Monday, 08.02.2021, 15:00 h


Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stack—to the best of our knowledge a unique feature of our technique.

Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can transfer on average sixty percent of the tested application behavior to new apps—without any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say “add item to shopping cart”, it tests the specified behavior in an application–both in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frameworks, creates change resilient UI tests, and lays the foundation for learning, transferring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.



Retrofitting Privacy Controls to Stock Android
(Advisor: Prof. Michael Backes)
Monday, 18.01.2021, 14:00 h


Android is the most popular operating system for mobile devices, making it a prime target for attackers. To counter these, Android’s security concept uses app isolation and access control to critical system resources. However, Android gives users only limited options to restrict app permissions according to their privacy preferences but instead lets developers dictate the permissions users must grant. Moreover, Android’s security model is not designed to be customizable by third-party developers, forcing users to rely on device manufacturers to address their privacy concerns. This thesis presents a line of work that retrofits comprehensive privacy controls to the Android OS to put the user back in charge of their device. It focuses on developing techniques that can be deployed to stock Android devices without firmware modifications or root privileges. The first part of this dissertation establishes fundamental policy enforcement on third-party apps using inlined reference monitors to enhance Android’s permission system. This approach is then refined by introducing a novel technique for dynamic method hook injection on Android’s Java VM. Finally, we present a system that leverages process-based privilege separation to provide a virtualized application environment that supports the enforcement of complex security policies. A systematic evaluation of our approach demonstrates its practical applicability, and over one million downloads of our solution confirm user demand for privacy-enhancing tools.



Michael GERKE

Modeling and Verifying the FlexRay Physical Layer Protocol with Reachability Checking of Timed Automata
(Advisor: Prof. Bernd Finkbeiner)
Monday, 21.12.2020, 14:15 h


The FlexRay automotive bus protocol is used in many modern cars to facilitate communication between the various electronic control units (ECU) that control the car. FlexRay’s physical layer protocol facilitates communicating a message from an ECU to other ECUs that each have their own local oscillator which is not synchronized to the other ECUs local oscillators, even though they all run at roughly the same frequency. The physical layer protocol is thus the foundation that the upper protocol layers build upon, and it has to deal with a non-synchronized bit transfer over the bus‘ potentially long wire harness, with variances in delay that can be incurred while traversing the bus, with interference that might cause voltage values on the bus to change thus effectively flipping bits, and with local oscillators that will deviate slightly from their nominal frequency. Verifying the reliability and the resilience of such a physical layer protocol against various potentially fault inducing scenarios under a variety of realistic assumptions on the performance of the hardware the protocol is deployed on, is nontrivial. To this end, the thesis presents models of the physical layer protocol and its hardware environment using Timed-Automata, which include error models. Using a combination of Binary Decision Diagrams and Difference Bound Matrices, Algorithmic approaches for fully symbolic model checking of Timed Automata and data structures enabling these approaches are presented as well. This allows to handle the considerable complexity of model checking such an industrially used protocol. Furthermore, the thesis presents the lessons learned on how to optimize such Timed Automata models to enable model checking while retaining enough complexity to provide relevant results concerning the reliability and resilience of the modeled protocol. The results of the verification effort confirm the resilience of the FlexRay physical layer protocol against specific patterns of bit flips and against specific degradations in the hardware performance, exceeding the limits put forth by the protocol’s designers.

Tribhuvanesh OREKONDY

Understanding and Controlling Leakage in Machine Learning
(Advisor: Prof. Bernt Schiele)
Friday, 18.12.2020, 16:00 h


Machine learning models are being increasingly adopted in a variety of real-world scenarios. However, the privacy and confidentiality implications for users in such scenarios is not well understood. In this talk, I will present our work that focuses on better understanding and controlling leakage of information in three parts. The first part of the talk is motivated by the fact that a massive amount of personal photos are disseminated by individuals, many of which unintentionally reveals a wide range of private information. To address this, I will present our work on visual privacy, which attempts to identify privacy leakage in visual data. The second part of my talk analyzes privacy leakage during training of ML models, such as in the case of federated learning where individuals contribute towards the training of a model by sharing parameter updates. The third part of my talk focuses on leakage of information at inference-time by presenting our work on model functionality stealing threats, where an adversary exploits black-box access to a victim ML model to inexpensively create a replica model. In each part, I also discuss our work to mitigate leakage of sensitive information to enable wide-spread adoption of ML techniques in real-world scenarios.


Towards Accurate Multi-Person Pose Estimation in the Wild
(Advisor: Prof. Bernt Schiele)
Wednesday, 16.12.2020, 16:00 h


Human pose estimation is a computer vision task of localising major joints of a human skeleton in natural images and is one of the most important visual recognition tasks in the scenes containing humans with numerous applications in robotics, virtual and augmented reality, gaming and healthcare among others. This thesis proposes methods for human pose estimation and articulated pose tracking in images and video sequences. Unlike most of the prior work, which is focused on pose estimation of single pre-localised humans, we address a case with multiple people in real world images which entails several challenges such as person-person overlaps in highly crowded scenes, unknown number of people or people entering and leaving video sequences. Our multi-person pose estimation algorithm is based on the bottom-up detection-by-grouping paradigm through an optimization of a multicut graph partitioning objective. Our approach is general and is also applicable to multi-target pose tracking in videos. In order to facilitate further research on this problem we collect and annotate a large scale dataset and a benchmark for articulated multi-person tracking. We also propose a method for estimating 3D body pose using on-body wearable camera using a pair of downward facing, head-mounted cameras that capture an entire body. Our final contribution is a method for reconstructing 3D objects from weak supervision. Our approach represents objects as 3D point clouds and is able to learn them with 2D supervision only and without requiring camera pose information at training time. We design a differentiable renderer of point clouds as well as a novel loss formulation for dealing with camera pose ambiguity.

Daniel VAZ

Approximation Algorithms for Network Design and Cut Problems in Bounded-Treewidth
(Advisors: Prof. Kurt Mehlhorn and Prof. Parinya Chalermsook, now Aalto University, Finland)
Tuesday, 08.12.2020, 14:00 h


In this talk, we will present new approximation results for the group Steiner tree problem on bounded-treewidth graphs. In the group Steiner tree problem, the input is a graph together with sets of vertices called groups; the goal is to choose one representative vertex from each group and connect all the representatives with minimum cost. The problem is hard to approximate even when the input graph is a tree: it is unlikely that an approximation factor better than O(log^2 n) exists, and current techniques cannot achieve this ratio even on graphs with treewidth 2. We show an O(log^2 n)-approximation algorithm for bounded-treewidth graphs, which matches the known lower bound for trees, and improves upon previous techniques. These results imply new approximation algorithms for other related problems, such as a high-connectivity extension of group Steiner tree. We will also introduce the firefighter problem, which models the spread of fire in a graph, and briefly discuss our results for this problem on trees and bounded-treewidth graphs.

Franziska MÜLLER

Real-time 3D Hand Reconstruction in Challenging Scenes from a Single Color or Depth Camera
(Advisor: Prof. Christian Theobalt)
Wednesday, 02.12.2020, 16:00 h


Hands are one of the main enabling factors for performing complex tasks and humans naturally use them for interactions with their environment. Reconstruction and digitization of 3D hand motion opens up many possibilities for important applications, for example in human–computer interaction, augmented or virtual reality (AR/VR), automatic sign-language translation, activity recognition, or teaching robots. Different approaches for 3D hand motion capture have been actively researched in the past. Earlier methods have clear drawbacks like the need to wear intrusive markers or gloves or having to calibrate a stationary multi-camera setup. For interaction purposes, users need continuous control and immediate feedback. This means the algorithms have to run in real time and be robust in uncontrolled scenes. Thus, the majority of more recent methods uses a single color or depth camera which, however, makes the problem harder due to more ambiguities in the input. While recent research has shown promising results, current state-of-the-art methods still have strong limitations. Most approaches only track the motion of a single hand in isolation and do not take background-clutter or interactions with arbitrary objects or the other hand into account. The few methods that can handle more general and natural scenarios run far from real time or use complex multi-camera setups which makes them unusable for many of the aforementioned applications. This thesis pushes the state of the art for real-time 3D hand tracking and reconstruction in general scenes from a single RGB or depth camera. The presented approaches explore novel combinations of generative hand models and powerful cutting-edge machine learning techniques. In particular, this thesis proposes a novel method for hand tracking in the presence of strong occlusions and clutter, the first method for full global 3D hand tracking from in-the-wild RGB video, and a method for simultaneous pose and dense shape reconstruction of two interacting hands that, for the first time, combines a set of desirable properties previously unseen in the literature.


Program Analysis for Anomaly Detection
(Advisor: Prof. Andreas Zeller)
Wednesday, 02.12.2020, 10:30 h


When interacting with mobile applications, users may not always get what they expect. For instance, when users download Android applications from a market, they do not know much about their actual behavior. A brief description, a set of screens-hots and a list of permissions, which give a high level intuition of what applications might be doing, form user expectations. However applications do not always meet them. For example, a gaming application intentionally could send SMS messages to a premium number in a background without a user’s knowledge. A less harmful scenario would be a wrong message confirming a successful action that was never executed.
Whatever the behavior of a mobile application might (app) be, in order to test and fully understand it, there needs to be a technique that can analyse the User Interface (UI) of the app and the code associated with it as the whole.
This thesis presents a static analysis framework called SAFAND that given an ANDROID app performs the following analysis:
gathers information on how the application uses sensitive data;
identifies and analyses UI elements of the application;
binds UI elements with their corresponding behavior.
The thesis illustrates how results obtained from the framework can be used to identify problems ranging from small usability issues to malicious behavior of real-world applications.



Generating Renderers
(Advisor: Prof. Philipp Slusallek)
Friday, 27.11.2020, 15:30 h


Physically-based renderers are large pieces of software, and require a lot of computing power to produce pictures from complex 3D scenes. Part of this is due to the fact that in these renderers, optimizing performance is often diffi-cult because of the large configurability in the type of scene that is given as an input. For instance, optimizing for SIMD execution is difficult when the intersection functions need to perform dynamic dispatches to allow different primitive types. Another issue is that these renderers are written using currently available techniques and tools for software development, which are severely lacking when it comes to expressing parallelism or exploiting low-level hardware features. The direct consequence of that problem is that renderer implementations often rely on non-portable compiler-specific intrinsics, making them difficult to port and adapt to new architectures and platforms. The traditional solution to these problems is to design domain-specific languages that al-low the programmer to encode domain-specific knowledge into the code. With that knowledge, a domain-specific compiler can then optimize that code and choose the best strategy for execution on a given piece of hardware. However, writing domain-specific compilers is an arduous task, as it requires both an ex-tensive knowledge of the domain, and a deep understanding of compiler construction.
In this talk, we will present another way to encode domain-specific knowledge, and describe a rendering framework based on it. The general idea is to build generic, high-level and declarative rendering code in a langu-age supporting partial evaluation. Then, such code can be optimized for a target machine and particular use-case by provi-ding partial evaluation annotations. Those annotations instruct the compiler to specialize the entire code according to the available knowledge of the input: In the case of rendering, for instance, this means that shaders are specialized when their inputs are known, and in general, all redundant computations are eliminated. Additionally, partial evaluation allows to use the knowledge of the target architecture to adapt the execution strategies of various part of the renderer. The results of this technique show that the generated renderers are faster and more por-table than renderers written with state-of-the-art competing libraries, and that in compa-rison, the presented work requires less implementation effort.

Nataniel P. BORGES Junior

Learning the Language of Apps
(Advisor: Prof. Andreas Zeller)
Thursday, 26.11.2020, 14:00 h


To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and effciently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app’s language , that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to rene this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20% . We learn the language of textual inputs, by identifying the semantics of input felds in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage by ≈ 10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.


Philipp MÜLLER

Sensing, Interpreting, and Anticipating Human Social Behaviour in the Real World
(Advisor: Prof. Andreas Bulling, now Stuttgart)
Friday, 13.11.2020, 14:00 h


A prerequisite for the creation of social machines that are able to support humans in fields like education, psychotherapy, or human resources is the ability to automatically detect and analyse human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging.
This thesis moves closer to the vision of social machines in the real world, making fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, it advances the state of the art in human visual behaviour sensing, proposing a novel unsupervised method for eye contact detection in group interactions by exploiting the connection between gaze and speaking turns. Furthermore, the thesis makes use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, this thesis improves the interpretation of social signals by proposing datasets and methods for emotion recognition and low rapport detection in less constrained settings than previously studied. In addition, it for the first time investigates a cross-dataset evaluation setting for emergent leadership detection. Third, this thesis pioneers methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.

Aastha MEHTA

Ensuring Compliance with Data Privacy and Usage Policies in Online Services
(Advisors: Prof. Peter Druschel & Prof. Deepak Garg)
Tuesday, 03.11.2020, 14:30 h


Online services collect and process a variety of sensitive personal data that is subject to complex privacy and usage policies. Complying with the policies is critical and often legally binding for service providers, but it is challenging as applications are prone to many disclosure threats. In this thesis, I present two compliance systems, Qapla and Pacer, that ensure efficient policy compliance in the face of direct and side-channel disclosures, respectively.
Qapla prevents direct disclosures in database-backed applications (e.g., personnel management systems), which are subject to complex access control, data linking, and aggregation policies. Conventional methods inline policy checks with application code. Qapla instead specifies policies directly on the database and enforces them in a database adapter, thus separating compliance from the application code.
Pacer prevents network side-channel leaks in cloud applications. A tenant’s secrets may leak via its network traffic shape, which can be observed at shared network links (e.g., network cards, switches). Pacer implements a cloaked tunnel abstraction, which hides secret-dependent variation in tenant’s traffic shape, but allows variations based on non-secret information, enabling secure and efficient use of network resources in the cloud.
Both systems require modest development efforts, and incur moderate performance overheads, thus demonstrating their usability.


Frederic RABER

Supporting Lay Users in Privacy Decisions When Sharing Sensitive Data
(Advisor: Prof. Antonio Krüger)
Wednesday, 23.09.2020, 16:00 h


The first part of the thesis focuses on assisting users in choosing their privacy settings, by using machine learning to derive the optimal set of privacy settings for the user. In contrast to other work, our approach uses context factors as well as individual factors to provide a personalized set of privacy settings. The second part consists of a set of intelligent user interfaces to assist the users throughout the complete privacy journey, from defining friend groups that allow targeted information sharing; through user interfaces for selecting information recipients, to find possible errors or unusual settings, and to refine them; up to mechanisms to gather in-situ feedback on privacy incidents, and investigating how to use these to improve a user’s privacy in the future. Our studies have shown that including tailoring the privacy settings significantly increases the correctness of the predicted privacy settings; whereas the user interfaces have been shown to significantly decrease the amount of unwanted disclosures.

Thorsten WILL

From condition-specific interactions towards the differential complexome of proteins
(Advisor: Prof. Volkhard Helms)
Tuesday, 22.09.2020, 10:00 h


While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In my thesis, I present four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions.
PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, I developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples.
Furthermore, I demonstrate a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with the novel differential protein complex analysis tool CompleXChange.

Dushyant MEHTA

Real-time 3D Human Body Pose Estimation from Monocular RGB Input
(Advisor: Prof. Christian Theobalt)
Monday, 14.09.2020, 14:00 h


Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture systems require extensive on-body, and/or external instrumentation, which is not only cumbersome but expensive as well, putting it out of the reach of most creators. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to-entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes.

Alexander LÜCK

Stochastic spatial modelling of DNA methylation patterns and moment-based parameter estimation
(Advisor: Prof. Verena Wolf)
Friday, 11.09.2020, 14:30 h


In this thesis, we introduce and analyze spatial stochastic models for DNA methylation, an epigenetic mark with an important role in development. The underlying mechanisms controlling methylation are only partly understood. Several mechanistic models of enzyme activities responsible for methylation have been proposed. Here, we extend existing hidden Markov models (HMMs) for DNA methylation by describing the occurrence of spatial methylation patterns with stochastic automata networks. We perform numerical analysis of the HMMs applied to (non-)hairpin bisulfite sequencing KO data and accurately predict the wild-type data from these results. We find evidence that the activities of Dnmt3a/b responsible for de novo methylation depend on the left but not on the right CpG neighbors.


Synthesizing Stream Control
(Advisor: Prof. Bernd Finkbeiner)
Tuesday, 08.09.2020, 14:00 h


The management of reactive systems requires controllers, which coordinate time, data streams, and data transformations, all joint by the high level perspective of their control flow. This flow is required to drive the system correctly and continuously, which turns the development into a challenge. The process is error-prone, time consuming, unintuitive, and costly. An attractive alternative is to synthesize the system instead, where only the desired behavior needs to be specified. We introduce Temporal Stream Logic (TSL), a logic which exclusively argues about the system’s control, while treating data and functional transfor-mations as interchangeable black-boxes. In TSL it is possible to specify control flow prop-erties independently of the complexity of the handled data. Furthermore, with TSL a syn-thesis engine can check for realizability, even without a concrete implementation of the data transformations at hand. We present a modular development framework that uses syn-thesis to identify the high level control. If successful, the created control flow is extended with concrete data transformations in order to be compiled into a final executable.
Nevertheless, current synthesis approaches cannot replace existing manual development work flows immediately. The developer still may create incomplete or faulty specifications at first, that need the be refined after a subsequent inspection. In the worst case, require-ments are contradictory or miss important assumptions resulting in unrealizable specifica-tions. In both scenarios, additional feedback from the synthesis engine is needed to debug the errors. To this end, we also explore two further improvements. On the one hand, we consider output sensitive synthesis metrics. On the other hand, we analyse the extension with delay, whose requirement is a frequent reason for unrealizability.


Privacy Enhancing Technologies: Protocol verification, implementation and specification
(Advisor: Prof. Michael Backes)
Monday, 07.09.2020, 11:00 h


In this thesis, we present novel methods for verifying, implementing, and specifying protocols. In particular, we focus properties modeling data protection and the protection of privacy. In the first part of the thesis, the author introduces protocol verification and presents a model for verification that encompasses so-called Zero-Knowledge (ZK) proofs. These ZK proofs are a cryptographic primitive that is particularly suited for hiding information and hence serves the protection of privacy. The here presented model gives a list of criteria which allows the transfer of verification results from the model to the implementation if the criteria are met by the implementation. In particular, the criteria are less demanding than the ones of previous work regarding ZK proofs. The second part of the thesis contributes to the area of protocol implementations. Hereby, ZK proofs are used in order to improve multi-party computations. The third and last part of the thesis explains a novel approach for specifying data protection policies. Instead of relying on policies, this approach relies on actual legislation. The advantage of relying on legislation is that often a fair balancing is introduced which is typically not contained in regulations or policies.



Sequence to Graph Alignment: Theory, Practice and Applications
(Advisor: Prof. Tobias Marschall, now Uni Düsseldorf)
Tuesday, 25.08.2020, 10:00 h


Genome graphs, also called sequence graphs, provide a way of explicitly representing genomic variation and are used in several applications in bioinformatics. Due to the growing importance of sequence graphs, methods for handling graph-based data structures are becoming more important. In this work I examine the generalization of sequence alignment to graphs. First, I present a theoretical basis for quick bit-parallel sequence-to-graph alignment. The bit-parallel method outperforms previous algorithms by a factor of 3-21x in runtime depending on graph topology. Next, I present GraphAligner, a practical tool for aligning sequences to graphs. GraphAligner enables sequence-to-graph alignment to scale to mammalian sized genomes and is an order of magnitude faster than previous graph alignment tools. Finally, I present two applications where GraphAligner is an essential part. First, AERON is a tool for quantifying RNA expression and detecting gene fusion events with long reads. Second, I present a hybrid graph-based genome assembly pipeline that uses novel methods to combine short and long read sequencing technologies.


Understanding and Evolving the Rust Programming Language
(Advisor: Prof. Derek Dreyer)

interested parties should ask the Dean’s Office or their group leaders for details on day prior
Friday, 21.08.2020, 11:00 h


Rust is a young systems programming language that aims to fill the gap between high-level languages with strong safety guarantees and low-level languages that give up on safety in exchange for more control. Rust promises to achieve memory and thread safety while at the same time giving the programmer control over data layout and memory management. This dissertation presents two projects establishing the first formal foundations for Rust, enabling us to better understand and evolve this language: RustBelt and Stacked~Borrows. RustBelt is a formal model of Rust’s type system, together with a soundness proof establishing memory and thread safety. The model is also able to verify some intricate APIs from the Rust standard library that internally encapsulate the use of unsafe language features. Stacked Borrows is a proposed extension of the Rust specification which enables the compiler to use Rust types to better analyze and optimize the code it is compiling. The adequacy of this specification is not only evaluated formally, but also by running real Rust code in an instrumented version of Rust’s Miri interpreter that implements the Stacked Borrows semantics. RustBelt is built on top of Iris, a language-agnostic framework for building higher-order concurrent separation logics. This dissertation gives an introduction to Iris and explains how to build complex high-level reasoning principles from a few simple ingredients. In RustBelt, this technique is used to introduce the lifetime logic, a separation logic account of borrowing, which plays a key role in the Rust type system.


Alexander GRESS

Integration of protein three-dimensional structure into the workflow of interpretation of genetic variants
(Advisor: Prof. Olga Kalinina)
Friday, 24.07.2020, 12:00 h


In this thesis I present my work on the integration of protein 3D structure information into the methodological workflow of computational biology. We developed an algorithmic pipeline that is able to map protein sequences to protein structures, providing an additional source of information. We used this pipeline in order to analyze the effects of genetic variants from the perspective of protein 3D structures. We analyzed genetic variants associated with diseases and compared their structural arrangements to that of neutral variants. Additionally, we discussed how structural information can improve methods that aim to predict the consequences of genetic variants.


Computational models of gene expression regulation
(Advisor: Prof. Marcel Schulz, now Frankfurt)
Tuesday, 21.07.2020, 10:00 h


To date, many efforts have been put into elucidating the genetic or epigenetic defects that result in various diseases. Even though there has been a lot achieved through these efforts, there are still many unknowns about gene regulation. In this thesis, I describe our work on various problems about understanding the gene regulation. In the first project we proposed a new approach for predicting transcription factor binging across different tissues and cell lines. Next, we focused on a special group of genes, namely bidirectional genes. Through adopting appropriate regularization incorporated into our statistical model, we were able to achieve interpretable predictive models linking the histone modification data with gene expression. The two subsequent projects aimed at taking advantage of single cell sequencing data. By investigating the single cell expression patterns of bidirectional genes, we proposed a novel hypothetical model describing several transcription states existing in bidirectional genes. Finally, we developed a multi-task learning framework suitable for analyzing single cell data from which we were able to derive potential new discoveries through deciphering the complexities of the regulation mechanism.


Nephroplastoma in MRI Data
(Advisor: Prof. Joachim Weickert)
Thursday, 09.07.2020, 16:15 h


The main objective of this work is the mathematical analysis of nephroblastoma in MRI sequences. At the beginning we provide two different datasets for segmentation and classification. Based on the first dataset, we analyze the current clinical practice regarding therapy planning on the basis of annotations of a single radiologist. We can show with our benchmark that this approach is not optimal and that there may be significant differences between human annotators and even radiologists. In addition, we demonstrate that the approximation of the tumor shape currently used is too coarse granular and thus prone to errors. We address this problem and develop a method for interactive segmentation that allows an intuitive and accurate annotation of the tumor.
While the first part of this thesis is mainly concerned with the segmentation of Wilms‘ tumors, the second part deals with the reliability of diagnosis and the planning of the course of therapy. The second data set we compiled allows us to develop a method that dramatically improves the differential diagnosis between nephroblastoma and its precursor lesion nephroblastomatosis. Finally, we can show that even the standard MRI modality for Wilms‘ tumors is sufficient to estimate the developmental tendencies of nephroblastoma under chemotherapy.

Yongqin XIAN

Learning from Limited Labeled Data – Zero-Shot and Few-Shot Learning
(Advisor: Prof. Bernt Schiele)
Tuesday, 07.07.2020, 16:00 h


Human beings have the remarkable ability to recognize novel visual concepts after observing only few or zero examples of them. Deep learning, however, often requires a large amount of labeled data to achieve a good performance. Labeled instances are expensive, difficult and even infeasible to obtain because the distribution of training instances among labels naturally exhibits a long tail. Therefore, it is of great interest to investigate how to learn efficiently from limited labeled data. This thesis concerns an important subfield of learning from limited labeled data, namely, low-shot learning. The setting assumes the availability of many labeled examples from known classes and the goal is to learn novel classes from only a few~(few-shot learning) or ze-ro~(zero-shot learning) training examples of them. To this end, we have developed a series of multi-modal learning approaches to facilitate the knowledge transfer from known classes to novel classes for a wide range of visual recognition tasks including image classifica-tion, semantic image segmentation and video action recognition. More specifically, this thesis mainly makes the following contributions. First, as there is no agreed upon zero-shot image classification benchmark, we define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets. Second, in order to tackle the labeled data scarcity, we propose feature generation frameworks that synthesize data in the visual feature space for novel classes. Third, we extend zero-shot learning and few-shot learning to the semantic segmentation task and propose a challenging benchmark for it. We show that incorporating semantic information into a semantic segmentation network is effective in segmenting novel classes. Finally, we develop better video representation for the few-shot video classification task and leverage weakly-labeled videos by an efficient retrieval method.


Causal Inference on Discrete Data
(Advisor: Prof. Jilles Vreeken)
Monday, 06.07.2020, 14:00 h


Causal inference is one of the fundamental problems in science. A particularly interesting setting is to tell cause from effect between a pair of random variables X and Y given a sample from their joint distribution. For a long period of time, it was thought to be impossible to distinguish between causal structures X → Y and Y → X from observational data as the factorisation of the joint distribution is the same in both directions. In the past decade, researchers have made a long stride in this direction by exploiting sophisticated properties of the joint distribution. Most of the existing methods, however, are for continuous real-valued data.
In the first part of the thesis, we consider bivariate causal inference on different discrete data settings—univariate i.i.d., univariate non-i.i.d., and multivariate i.i.d. pairs. To this end, we build upon the principle of algorithmic independence of conditionals (AIC), which states that marginal distribution of the cause is algorithmically independent of conditional distribution of the effect given the cause. However, as Kolmogorov complexity is not computable, we approximate the AIC from above through the statistically sound Minimum Description Length (MDL) principle. On univariate i.i.d. and non-i.i.d. pairs, where causal mechanisms are simple, we use refined MDL codes that are minimax optimal w.r.t. a model class. We resort to crude MDL codes on a pair of multivariate i.i.d. variables.
Although useful, saying that there exists a causal relationship from a set of variables towards a certain variable of interest does not always fully satisfy one’s curiosity; for a domain expert it is of particular interest to know those conditions that are most effective, such as the combinations of drugs and their dosages that are most effective towards recovery. Motivated by this problem, in the second part of this thesis, we consider discovering statistically reliable causal rules from observational data. Overall, extensive evaluations show that methods proposed in this thesis are highly accurate, and discover meaningful causations from real-world data.

Jan-Hendrik LANGE

Multicut Optimization Guarantees & Geometry of Lifted Multicuts
(Advisor: Prof. Björn Andres, now Dresden)
Friday, 03.07.2020, 16:30 h


The multicut problem, a graph partitioning model purely based on similarities and dissimilarities defined on the edges, has gained attention in recent years in areas such as network analysis and computer vision. Its key feature is that the number of components of the partition need not be specified, but is determined as part of the optimization process given the edge information. In this thesis talk we address two challenges associated with this NP-hard combinatorial optimization problem.
Firstly, the methods that are commonly employed to solve practical multicut problems are fast heuristics, due to the large size of the instances. While the provided solutions are of high quality empirically, they come without any (non-trivial) guarantees, neither in terms of the objective value nor in terms of the variable values. In the first part of the talk we develop efficient algorithms as well as theory that both explain and exploit the structural simplicity of practical instances in order to compute non-trivial guarantees. Our results include lower bounds and parts of optimal solutions that are obtained efficiently as well as theoretical insights into the tightness of linear programming relaxations.
Secondly, multicuts only encode the same-cluster relationship for pairs of nodes that are adjacent in the underlying graph. A generalization of the multicut problem that allows to model such inter-cluster connectivity for arbitrary pairs of nodes is known as the lifted multicut problem. In the second part of the talk, we study the (convex hull of the) feasible set, the lifted multicut polytope. We investigate under which conditions the fundamental inequalities associated with lifted multicuts define facets of the lifted multicut polytope. For the special case of trees, we further draw a connection to pseudo-Boolean optimization and give a tighter description of the lifted multicut polytope, which is exact when the tree is a simple path. For the moral lineage tracing problem, which is a closely related clustering formulation for joint segmentation and tracking of biological cells in images, we present a tighter description of the associated polytope and an improved branch-and-cut method.


Inferring phylogenetic trees under the general Markov model via a minimum spanning tree backbone
(Advisor: Prof. Thomas Lengauer)
Thursday, 02.07.2020, 10:00 h


Phylogenetic trees are models of the evolutionary history of species. Phylogenetic trees are typ-ically leaf- labeled with all species placed on leaves, a model that may not be appropriate for rapidly evolving pathogens that may share ancestor-descendant relationships. In this thesis we model evolutionary relationships using so-called generally labeled phylogenetic trees that allow sampled species to be at interior nodes of the tree. The widely adopted strategy to infer phylo-genetic trees involves computationally intensive optimization of continuous-time hidden Mar-kov models (HMM) that make unrealistic assumptions about gene evolution. We present a method called SEM-GM that performs phylogeny inference under a more realistic discrete-time HMM on trees known as the general Markov model. Additionally, we presented a minimum spanning tree (MST) framework called MST-backbone which improves the scalability of SEM-GM. MST-backbone (SEM-GM) scales linearly with the number of leaves. However, the loca-tion of the root as inferred under the GM model is not realistic on empirical data, suggesting that the GM model may be overtrained. MST-backbone was inspired by the topological rela-tionship between MSTs and phylogenetic trees. We discovered that the topological relationship does not necessarily hold if there is no unique MST. We proposed so-called vertex-order based MSTs (VMSTs) that guarantee a topological relationship with phylogenetic trees.


Andreas ABEL

Automatic Generation of Models of Microarchitecture
(Advisor: Prof. Jan Reineke)
Friday, 12.06.2020, 10:00 h


Detailed microarchitectural models are necessary to predict, explain, or optimize the perfor-mance of software running on modern microprocessors. Building such models often requires a significant manual effort, as the documentation provided by hardware manufacturers is typical-ly not precise enough. The goal of this thesis is to develop techniques for generating microar-chitectural models automatically.
In the first part, we focus on recent x86 microarchitectures. We implement a tool to accurately evaluate small microbenchmarks using hardware performance counters. We then describe tech-niques to automatically generate microbenchmarks for measuring the performance of individual instructions and for characterizing cache architectures. We apply our implementations to more than a dozen different microarchitectures. In the second part of the thesis, we study more general techniques to obtain models of hardware components. In particular, we propose the concept of gray-box learning, and we develop a learning algorithm for Mealy machines that exploits prior knowledge about the system to be learned. Finally, we show how this algorithm can be adapted to minimize incompletely specified Mealy machines—-a well-known NP-complete problem. Our implementation outperforms existing exact minimization techniques by several orders of magnitude on a number of hard benchmarks; it is even competitive with state-of-the-art heuristic approaches.


Multi-omics integrative analyses for decision support systems in personalized cancer treatment
(Advisor: Prof. Hans-Peter Lenhof)
Wednesday, 10.06.2020, 09:30 h


Cancer is a heterogeneous class of diseases caused by the complex interplay of (epi-)genetic aberrations and is difficult to treat due to its heterogeneity. In this thesis, we present a tool suite of novel methods and computational tools for the genetic and molecular characterization of tumors to support decision making in personalized cancer treatments. The first tool included in this tool suite is GeneTrail2, a web service for the statistical analysis of molecular signatures. While GeneTrail2 focuses on the evaluation of aberrant biological pathways and signal cascades, RegulatorTrail identifies those transcriptional regulators that appear to have a high impact on these pathogenic processes. With DrugTargetInspector (DTI), we focus specifically on personalized medicine and translational research. DTI offers comprehensive functions for the evaluation of target molecules, the associated signaling cascades, and the corresponding drugs. Finally, we present ClinOmicsTrailbc, an analysis tool for stratification of breast cancer treatments. ClinOmicsTrailbc supports clinicians with a comprehensive evaluation of on- and off-label drugs as well as immune therapies. In summary, this work presents novel methods and computational tools for the integrative analysis of multi-omics data for translational research and clinical decision support, assisting researchers and clinicians in finding the best possible treatment options in a deeply personalized way.


Towards Efficient Analysis of Markov Automata
(Advisor: Prof. Holger Hermanns)
Thursday, 04.06.2020, 17:00 h


One of the most expressive formalisms to model concurrent systems is Markov automata. They serve as a semantics for many higher-level formalisms, such as generalised stochastic Petri nets and dynamic fault trees. Two of the most challenging problems for Markov automata to date are (i) the optimal time-bounded reachability probability and (ii) the optimal long-run average rewards. In this thesis, we aim at designing efficient sound techniques to analyse them. We approach the problem of time-bounded reachability from two different angles. First, we study the properties of the optimal solution and exploit this knowledge to construct an efficient algorithm that approximates the optimal values up to a guaranteed error bound. This algorithm is exhaustive, i. e. it computes values for each state of the Markov automaton. This may be a limitation for very large or even infinite Markov automata. To address this issue we design a second algorithm that approximates the optimal solution by only working with part of the total state-space. For the problem of long-run average rewards there exists a polynomial algorithm based on linear programming. Instead of chasing a better theoretical complexity bound we search for a practical solution based on an iterative approach. We design a value iteration algorithm that in our empirical evaluation turns out to scale several orders of magnitude better than the linear programming based approach.


Snapshot: Friend or Foe of Data Management — On Optimizing Transaction Processing in Database and Blockchain Systems
(Advisor: Prof. Jens Dittrich)
Wednesday, 03.06.2020, 14:00 h


Data management is a complicated task. Due to a wide range of data management tasks, businesses often need a sophisticated data management infrastructure with a plethora of distinct systems to fulfill their requirements. Moreover, since snapshot is an essential ingredient in solving many data management tasks such as checkpointing and recovery, they have been widely exploited in almost all major data management systems that have appeared in recent years. However, snapshots do not always guarantee exceptional performance. In this dissertation, we will see two different faces of the snapshot, one where it has a tremendous positive impact on the performance and usability of the system, and another where an incorrect usage of the snapshot might have a significant negative impact on the performance of the system. This dissertation consists of three loosely-coupled parts that represent three distinct projects that emerged during this doctoral research.
In the first part, we analyze the importance of utilizing snapshots in relational database systems. We identify the bottlenecks in state-of-the-art snapshotting algorithms, propose two snapshotting techniques, and optimize the multi-version concurrency control for handling hybrid workloads effectively. Our snapshotting algorithm is up to 100x faster and reduces the latency of analytical queries by up to 4x in comparison to the state-of-the-art techniques. In the second part, we recognize strict snapshotting used by Fabric as a critical bottleneck, and replace it with MVCC and propose some additional optimizations to improve the throughput of the permissioned-blockchain system by up to 12x under highly contended workloads. In the last part, we propose ChainifyDB, a platform that transforms an existing database infrastructure into a blockchain infrastructure. ChainifyDB achieves up to 6x higher throughput in comparison to another state-of-the-art permissioned blockchain system. Furthermore, its external concurrency control protocol outperforms the internal concurrency control protocol of PostgreSQL and MySQL, achieving up to 2.6x higher throughput in a blockchain setup in comparison to a standalone isolated setup. We also utilize snapshots in ChainifyDB to support recovery, which has been missing so far from the permissioned-blockchain world.


Alexander HEWER

Registration and Statistical Analysis of the Tongue Shape during Speech Production
(Advisor: Dr. Ingmar Steiner)
Wednesday, 27.05.2020, 16:15 h


The human tongue plays an important role in everyday life. In particular, it contributes to speech production by interacting with other articulators, like the palate and the teeth. Thus, it is of great interest to understand its shape and motion during this process. Magnetic resonance imaging (MRI) is nowadays regarded as the state-of-the-art modality for analyzing the vocal tract during speech production.
Results of such an analysis can be used to construct statistical tongue models that are able to synthesize realistic tongue motion and shape.
In my defense talk, I will present some highlights of my research that focused on deriving such models from static MRI data. For example, I will show how MRI scans can be denoised for further processing by using a diffusion based approach. We will also learn how image segmentation and template matching can help to estimate the tongue shape information from such data.
Finally, we will see how a statistical tongue model can be utilized to reconstruct the whole three-dimensional tongue shape from sparse motion capture data.



A type-theory for higher-order amortized analysis
(Advisor: Prof. Deepak Garg)
Wednesday, 15.04.2020, 14:00 h


Amortized analysis is a popular algorithmic technique used to analyze the resource consumption of a program. It basically works by using a ghost state called the /potential/ and showing that the available potential is sufficient to account for the resource consumption of the program.
In this thesis we present λ-amor, a type-theory for amortized resource analysis of higher-order functional programs. Verification in λ-amor is based on the use of a novel type theoretic construct to represent potential at the level of types and then building an affine type-theory around it. With λ-amor we show that, type-theoretic amortized analysis can be performed using well understood concepts from sub-structural and modal type theories. Yet, it yields an extremely expressive framework which can be used for resource analysis of higher-order programs, both in a strict and lazy setting. We show embeddings of two very different styles (one based on /effects/ and the other on /coeffects/) of type-theoretic resource analysis frameworks into λ-amor. We show that λ-amor is sound (using a logical relations model) and complete for cost analysis of PCF programs (using one of the embeddings).
Next, we apply ideas from λ-amor to develop another type theory (called λ-cg) for a very different domain — Information Flow Control (IFC). λ-cg uses a similar type-theoretic construct (which λ-amor uses for the potential) to represent confidentiality label (the ghost state for IFC).
Finally, we abstract away from the specific ghost states (potential and confidentiality label) and describe how to develop a type-theory for a general ghost state with a monoidal structure.


Cross-layer Latency-Aware and -Predictable Data Communication
(Advisor: Prof. Thorsten Herfet)

interested parties should ask the Dean’s Office or their group leaders for details on day prior
Tuesday, 07.04.2020, 14:00 h


Cyber-physical systems are making their way into more aspects of everyday life. These systems are increasingly distributed and hence require networked communication to coordinatively fulfil control tasks. Providing this in a robust and resilient manner demands for latency-awareness and -predictability at all layers of the communication and computation stack. This thesis addresses how these two latency-related properties can be implemented at the transport layer to serve control applications in ways that traditional approaches such as TCP or RTP cannot. Thereto, the Predictably Reliable Real-time Transport (PRRT) protocol is presented, including its unique features (e.g. partially reliable, ordered, in-time delivery, and latency-avoiding congestion control) and unconventional APIs. This protocol has been intensively evaluated using the X-Lap toolkit that has been specifically developed to support protocol designers in improving latency, timing, and energy characteristics of protocols in a cross-layer, intra-host fashion. PRRT effectively circumvents latency-inducing bufferbloat using X-Pace , an implementation of the cross-layer pacing approach presented in this thesis. This is shown using experimental evaluations on real Internet paths. Apart from PRRT, this thesis presents means to make TCP-based transport aware of individual link latencies and increases the predictability of the end-to-end delays using Transparent Transmission Segmentation.


Kathrin STARK

Mechanising Syntax with Binders in Coq
(Advisor: Prof. Gert Smolka)

Friday, 14.02.2020, 15:00 h, in building E1 7, room 0.01


Mechanising binders in general-purpose proof assistants such as Coq is cumbersome and difficult. Yet binders, substitutions, and instantiation of terms with substitutions are a critical ingredient of many programming languages. Any practicable mechanisation of the meta-theory of the latter hence requires a lean formalisation of the former. We investigate the topic from three angles:
First, we realise formal systems with binders based on both pure and scoped de Bruijn algebras together with basic syntactic rewriting lemmas and automation. We automate this process in a compiler called Autosubst; our final tool supports many-sorted, variadic, and modular syntax. Second, we justify our choice of realisation and mechanise a proof of convergence of the sigma-calculus, a calculus of explicit substitutions that is complete for equality of the de Bruijn algebra corresponding to the lambda-calculus.
Third, to demonstrate the practical usefulness of our approach, we provide concise, transparent, and accessible mechanised proofs for a variety of case studies refined to de Bruijn substitutions.

Abhimitra MEKA

Live Inverse Rendering
(Advisor: Prof. Christian Theobalt)
Monday, 03.02.2020, 13:45 h, in building E1 4, room 0.19


Seamlessly integrating graphics into real environments requires the estimation of the fundamental light transport components of a scene – geometry, reflectance and illumination. While the theory and practices for estimating environmental geometry and self-localization on mobile devices has progressed rapidly, the task of estimating scene reflectance and illumination from monocular images or videos in real-time (termed live inverse rendering) is still at a nascent stage. The challenge is that of designing efficient representations and models for these appearance parameters and solving the resulting high-dimensional, non-linear and under-constrained system of equations at frame rate.
This thesis talk will comprehensively explore various representations, formulations, algorithms and systems for addressing these challenges in monocular inverse rendering. Starting with simple assumptions on the light transport model – of Lambertian surface reflectance and single light bounce scenario – the talk will expand in several directions by including 3D geometry, multiple light bounces, non-Lambertian isotropic surface reflectance and data-driven reflectance representation to address various facets of this problem. In the first part, the talk will explore the design of fast parallel non-linear GPU optimization schemes for solving both sparse and dense set of equations underlying the inverse rendering problem. In the next part, the current advances in machine learning methods will be applied to design novel formulations and loss-energies to give a significant push to the state-of-the-art of reflectance and illumination estimation. Several real-time applications of illumination-aware scene editing, including relighting and material-cloning, will also be shown to be possible for first time by the new models proposed in this thesis.


Mathias FLEURY

Formalization of Logical Calculi in Isabelle/HOL
(Advisor: Prof. Christoph Weidenbach)
Tuesday, 28.01.2020, 10:00 h, in building E1 4, room 0.24


I develop a formal framework for propositional satifisfiability with the conflict-driven clause learning (CDCL) procedure using the Isabelle/HOL proof assistant. The framework offers a convenient way to prove metatheorems and experiment with variants, including the Davis-Putnam-Logemann-Loveland procedure. The most noteworthy aspects of my work are the inclusion of rules for forget, restart and the refinement approach.
Through a chain of refinements, I connect the abstract CDCL calculus first to a more concrete calculus, then to a SAT solver expressed in a simple functional programming language, and finally to a SAT solver in an imperative language, with total correctness guarantees. The imperative version relies on the two-watched-literal data structure and other optimizations found in modern solvers.

Maximilian JOHN

Of Keyboards and Beyond – Optimization in Human-Computer-Interaction
(Advisor: Dr. Andreas Karrenbauer)
Wednesday, 15.01.2020, 16:30 h, in building E1 4, room 0.24


This talk covers optimization challenges in the context of human-computer interaction, specifically in UI design. In particular, we focus on the design of the new French keyboard layout, discuss the special challenges of this national-scale project and our algorithmic contributions. Exploiting the special structure of this design problem, we propose an optimization framework that efficiently computes keyboard layouts and provides very good optimality guarantees in form of tight lower bounds. The optimized layout that we showed to be nearly optimal was the basis of the new French keyboard standard recently published in the National Assembly in Paris. Finally, we introduce a modeling language for mixed integer programs that especially focuses on the challenges and features that appear in participatory optimization problems similar to the French keyboard design process.


3D Exemplar-based Image Inpainting in Electron Microscopy
(Advisor: Prof. Philipp Slusallek)
Monday, 13.01.2020, 14:00 h, in building D3 4 (DFKI), VisCenter room


In electron microscopy (EM) a common problem is the non-availability of data, which causes artefacts in reconstructions. In this thesis the goal is to generate artificial data where missing in EM by using exemplar-based inpainting (EBI). We implement an accelerated 3D version tailored to applications in EM, which reduces reconstruction times from days to minutes.
We develop intelligent sampling strategies to find optimal data as input for reconstruction methods. Further, we investigate approaches to reduce electron dose and acquisition time. Sparse sampling followed by inpainting is the most promising approach. As common evaluation measures may lead to misinterpretation of results in EM and falsify a subsequent analysis, we propose to use application driven metrics and demonstrate this in a segmentation task.
A further application of our technique is the artificial generation of projections in tilt- based EM. EBI is used to generate missing projections, such that the full angular range is covered. Subsequent reconstructions are significantly enhanced in terms of resolution, which facilitates further analysis of samples.
In conclusion, EBI proves promising when used as an additional data generation step to tackle the non-availability of data in EM, which is evaluated in selected applications. Enhancing adaptive sampling methods and refining EBI, especially considering the mutual influence, promotes higher throughput in EM using less electron dose while not lessening quality.



Martin WEIER

Perception-driven Rendering: Techniques for the Efficient Visualization of 3D Scenes including View- and Gaze-contingent Approaches
(Advisor: Prof. Philipp Slusallek)

Thursday, 19.12.2019, 15:00 h, in building D3 4 (DFKI), VisCenter room


Modern image synthesis approaches enable to create digital images of astonishing complexity and beauty, yet processing resources remain a limiting factor. At the same time, advances in display technologies drive the development of novel display devices, thereupon, further tightening the requirements of rendering approaches. Rendering efficiency is a central challenge involving a trade-off between visual fidelity and interactivity. However, the human visual system has some limitations and the highest possible visual quality is not always necessary. The knowledge of those limitations can be used to develop better and more efficient rendering systems, a field known as perception-driven rendering.  The central question in this thesis is how to exploit the limitations or use the potentials of perception to enhance the quality of different rendering techniques whilst maintaining their performance and vice versa. To this end, this thesis presents some state-of-the-art research and models that exploit the limitations of perception in order to increase visual quality but also to reduce workload alike. This research results in several practical rendering approaches that tackle some of the fundamental challenges of computer graphics. Here, different sampling, filtering, and reconstruction techniques aid the visual quality of the synthesized images. An in-depth evaluation of the presented systems including benchmarks, comparative examination with image metrics as well as user studies and experiments demonstrated that the methods introduced are visually superior or on the same qualitative level as ground truth, whilst having a significantly reduced run time.

Hyeongwoo KIM
Learning-based Face Reconstruction and Editing
(Advisor: Prof. Christian Theobalt)

Thursday, 12.12.2019, 16:15 h, in building E1 4, room 0.19


Photo-realistic face editing – an important basis for a wide range of applications in movie and game productions, and applications for mobile devices – is based on computationally expensive algorithms that often require many tedious time-consuming manual steps. This thesis advances state-of-the-art face performance capture and editing pipelines by proposing machine learning-based algorithms for high-quality inverse face rendering in real time and highly realistic neural face rendering, and a video-based refocusing method for faces and general videos. In particular, the proposed contributions address fundamental open challenges towards real-time and highly realistic face editing. The first contribution addresses face reconstruction and introduces a deep convolutional inverse rendering framework that jointly estimates all facial rendering parameters from a single image in real time. The proposed method is based on a novel boosting process that iteratively updates the synthetic training data to better reflect the distribution of real-world images. Second, the thesis introduces a method for face video editing at previously unseen quality. It is based on a generative neural network with a novel space-time architecture, which enables photo-realistic re-animation of portrait videos using an input video. It is the first method to transfer the full 3D head position, head rotation, face expression, eye gaze and eye blinking from a source actor to a portrait video of a target actor. Third, the thesis contributes a new refocusing approach for faces and general videos in postprocessing. The proposed algorithm is based on a new depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video and the focus distance for each frame. The high-quality results shown with various applications and challenging scenarios demonstrate the contributions presented in the thesis, and also show potential for machine learning-driven algorithms to solve various open problems in computer graphics.

Search and Analytics Using Semantic Annotations
(Advisor: Dr. Klaus Berberich)

Thursday, 12.12.2019, 14:00 h, in building E1 4, room 0.24


Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user’s information need. To discern the user’s intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy.
This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections.
This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.

Decision Procedures for Linear Arithmetic
(Advisors: Prof. Christoph Weidenbach and Dr. Thomas Sturm, now LORIA, Nancy)

Tuesday, 10.12.2019, 14:00 h, in building E1 5, room 002


In this thesis, we present new decision procedures for linear arithmetic in the context of SMT solvers and theorem provers:
1) CutSAT++, a calculus for linear integer arithmetic that combines techniques from SAT solving and quantifier elimination in order to be sound, terminating, and complete.
2) The largest cube test and the unit cube test, two sound (although incomplete) tests that find integer and mixed solutions in polynomial time. The tests are especially efficient on absolutely unbounded constraint systems, which are difficult to handle for many other decision procedures.
3) Techniques for the investigation of equalities implied by a constraint system. Moreover, we present several applications for these techniques.
4) The Double-Bounded reduction and the Mixed-Echelon-Hermite transformation, two transformations that reduce any constraint system in polynomial time to an equisatisfiable constraint system that is bounded. The transformations are beneficial because they turn branch-and-bound into a complete and efficient decision procedure for unbounded constraint systems.
We have implemented the above decision procedures (except for CutSAT++) as part of our linear arithmetic theory solver SPASS-IQ and as part of our CDCL(LA) solver SPASS-SATT. We also present various benchmark evaluations that confirm the practical efficiency of our new decision procedures.

Felipe Fernandes ALBRECHT
Analyzing Epigenomic Data in a Large-Scale Context
(Advisor: Prof. Thomas Lengauer)

Monday, 09.12.2019, 16:15 h, in building E1 4, room 0.23


While large amounts of epigenomic data are publicly available, their retrieval in a form suitable for downstream analysis is a bottleneck in current research. In a typical analysis, users are required to download huge files that span the entire genome, even if they are only interested in a small subset or an aggregation thereof. The DeepBlue Epigenomic Data Server mitigates this issue by providing a robust server that affords a powerful API for searching, filtering, transforming, aggregating, enriching, and downloading data from several epigenomic consortia. This work also presents companion tools that utilize the DeepBlue API to enable users not proficient in programming to analyze epigenomic data: (i) an R/Bioconductor package that integrates DeepBlue into the R analysis workflow; (ii) a web portal that enables users to search, select, filter and download the epigenomic data available in the DeepBlue Server; (iii) DIVE, a web data analysis tool that allows researchers to perform large-epigenomic data analysis in a programming-free environment. Furthermore, these tools are integrated, being capable of sharing results among themselves, creating a powerful large-scale epigenomic data analysis environment.

Model Counting for Reactive Systems
(Advisor: Prof. Bernd Finkbeiner)

Thursday, 05.12.2019, 15:00 h, in building E1 7, room 0.01


Model counting is the problem of computing the number of solutions for a logical formula. In the last few years, it has been primarily studied for propositional logic, and has been shown to be useful in many applications. In planning, for example, propositional model counting has been used to compute the robustness of a plan in an incomplete domain. In information-flow control, model counting has been applied to measure the amount of information leaked by a security-critical system.
In this thesis, we introduce the model counting problem for linear-time properties, and show its applications in formal verification. In the same way propositional model counting generalizes the satisfiability problem for propositional logic, counting models for linear-time properties generalizes the emptiness problem for languages over infinite words to one that asks for the number of words in a language. The model counting problem, thus, provides a foundation for quantitative extensions of model checking, where not only the existence of computations that violate the specification is determined, but also the number of such violations.
We solve the model counting problem for the prominent class of omega-regular properties. We present algorithms for solving the problem for different classes of properties, and show the advantages of our algorithms in comparison to indirect approaches based on encodings into propositional logic.
We further show how model counting can be used for solving a variety of quantitative problems in formal verification, including quantitative information-flow in security-critical systems, and the synthesis of approximate implementations for reactive systems.

Yang HE
Improved Methods and Analysis for Semantic Image Segmentation
(Advisor: Prof. Mario Fritz)

Tuesday, 03.12.2019, 15:00 h, in building E1 4, room 0.24


Modern deep learning has enabled amazing developments of computer vision in recent years. As a fundamental task, semantic segmentation aims to predict class labels for each pixel of images, which empowers machines perception of the visual world. In spite of recent successes of fully convolutional networks, several challenges remain to be addressed. In this work, we focus on this topic, under different kinds of input formats and various types of scenes. Specifically, our study contains two aspects:
(1) Data-driven neural modules for improved performance.
(2) Leverage of datasets w.r.t. training systems with higher performances and better data privacy guarantees.

Digital Fabrication of Custom Interactive Objects with Rich Materials
(Advisor: Prof. Jürgen Steimle)

Tuesday, 03.12.2019, 14:00 h, in building E1 7, room 0.01


This thesis investigates digital fabrication as a key technology for prototyping physical user interfaces with custom interactivity. It contributes four novel design and fabrication approaches for interactive objects that leverage rich materials of everyday objects. The contributions enable easy, accessible, and versatile design and fabrication of interactive objects with custom stretchability, input and output on complex geometries and diverse materials, tactile output on 3D-object geometries, and capabilities of changing their shape and material properties.
Together, the contributions of this thesis advance the fields of digital fabrication, rapid prototyping, and ubiquitous computing towards the bigger goal of exploring interactive objects with rich materials as a new generation of physical interfaces.

Andreas SCHMID
Plane and Sample: Using Planar Subgraphs for Efficient Algorithms
(Advisor: Prof. Kurt Mehlhorn)

Monday, 02.12.2019, 11:00 h, in building E1 4, room 0.24


In my thesis, we showcased how planar subgraphs with special structural properties can be used to find efficient algorithms for two NP-hard problems in combinatorial optimization.
In this talk I will describe how we developed algorithms for the computation of Tutte paths and show how these special subgraphs can be used to efficiently compute long cycles and other relaxations of Hamiltonicity if we restrict the input to planar graphs.
In addition I will give an introduction to the Maximum Planar Subgraph Problem MPS and show how dense planar subgraphs can be used to develop new approximation algorithms for this NP-hard problem. All new algorithms and arguments I present are based on a novel approach that focuses on maximizing the number of triangular faces in the computed subgraph.


Kashyap POPAT
Credibility Analysis of Textual Claims with Explainable Evidence
(Advisor: Prof. Gerhard Weikum)

Tuesday, 26.11.2019, 16:00 h, in building E1 5, room 0.29


Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not.
To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources.
We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.

Cryptography for bitcoin and friends
(Advisor: Prof. Aniket Kate, now Purdue)

Thursday, 21.11.2019, 16:00 h, in building E1 7, room 0.01


Numerous cryptographic extensions to Bitcoin have been proposed since Satoshi Nakamoto introduced the revolutionary design in 2008. However, only few proposals have been adopted in Bitcoin and other prevalent cryptocurrencies, whose resistance to fundamental changes has proven to grow with their success. In this dissertation, we introduce four cryptographic techniques that advance the functionality and privacy provided by Bitcoin and similar cryptocurrencies without requiring fundamental changes in their design: First, we realize smart contracts that disincentivize parties in distributed systems from making contradicting statements by penalizing such behavior by the loss of funds in a cryptocurrency. Second, we propose CoinShuffle++, a coin mixing protocol which improves the anonymity of cryptocurrency users by combining their transactions and thereby making it harder for observers to trace those transactions. The core of CoinShuffle++ is DiceMix, a novel and efficient protocol for broadcasting messages anonymously without the help of any trusted third-party anonymity proxies and in the presence of malicious participants. Third, we combine coin mixing with the existing idea to hide payment values in homomorphic commitments toobtain the ValueShuffle protocol, which enables us to overcome major obstacles to the practical deployment of coin mixing protocols. Fourth, we show how to prepare the aforementioned homomorphic commitments for a safe transition to post-quantum cryptography.

Assessing the Security of Hardware-Assisted Isolation Techniques
(Advisor: Prof. Christian Rossow)

Thursday, 21.11.2019, 11:00 h, in building E9 1, room 0.01


Modern computer systems execute a multitude of different processes on the same hardware. To guarantee that these processes do not interfere with one another, either accidentally or maliciously, modern processors implement various isolation techniques, such as process, privilege, and memory isolation. Consequently, these isolation techniques constitute a fundamental requirement for a secure computer system.
This dissertation investigates the security guarantees of various isolation techniques that are used by modern computer systems. To this end, we present the isolation challenges from three different angles. First, we target fine-grained memory isolation that is used by code-reuse mitigation techniques to thwart attackers with arbitrary memory read vulnerabilities. We demonstrate that dynamically generated code can be used to undermine such memory isolation techniques and, ultimately, propose a way to improve the mitigation. Second, we study side effects of legitimate hardware features, such as speculative execution, and show how they can be leveraged to leak sensitive data across different security contexts, thereby breaking process and memory isolation techniques. Finally, we demonstrate that in the presence of a faulty hardware implementation, all isolation guarantees can be broken. We do this by using a novel microarchitectural issue—discovered by us—to leak arbitrary memory contents across all security contexts.


Symbolic Reactive Synthesis
(Advisor: Prof. Bernd Finkbeiner)

Friday, 15.11.2019, 16:00 h, in building E1 7, room 0.01


In this thesis, we develop symbolic algorithms for the synthesis of reactive systems. Synthesis, that is the task of deriving correct-by-construction implementations from formal specifications, has the potential to eliminate the need for the manual—and error-prone—programming task. The synthesis problem can be formulated as an infinite two-player game, where the system player has the objective to satisfy the specification against all possible actions of the environment player. The standard synthesis algorithms represent the underlying synthesis game explicitly and, thus, they scale poorly with respect to the size of the specification.
We provide an algorithmic framework to solve the synthesis problem symbolically. In contrast to the standard approaches, we use a succinct representation of the synthesis game which leads to improved scalability in terms of the symbolically represented parameters. Our algorithm reduces the synthesis game to the satisfiability problem of quantified Boolean formulas (QBF) and dependency quantified Boolean formulas (DQBF). In the encodings, we use propositional quantification to succinctly represent different parts of the implementation, such as the state space and the transition function.
We develop highly optimized satisfiability algorithms for QBF and DQBF. Based on a counterexample-guided abstraction refinement (CEGAR) loop, our algorithms avoid an exponential blow-up by using the structure of the underlying symbolic encodings. Further, we extend the solving algorithms to extract certificates in the form of Boolean functions, from which we construct implementations for the synthesis problem. Our empirical evaluation shows that our symbolic approach significantly outperforms previous explicit synthesis algorithms with respect to scalability and solution quality.

Unsupervised multiple kernel learning approaches for integrating molecular cancer patient data
(Advisor: Prof. Nico Pfeifer, now Tübingen)

Monday, 11.11.2019, 14:00 h, in building E1 7, room 0.01


Cancer is the second leading cause of death worldwide. A characteristic of this disease is its complexity leading to a wide variety of genetic and molecular aberrations in the tumors. This heterogeneity necessitates personalized therapies for the patients. However, currently defined cancer subtypes used in clinical practice for treatment decision-making are based on relatively few selected markers and thus provide only a coarse classification of tumors. The increased availability in multi-omics data measured for cancer patients now offers the possibility of defining more informed cancer subtypes. Such a more fine-grained characterization of cancer subtypes harbors the potential of substantially expanding treatment options in personalized cancer therapy.
In this work, we identify comprehensive cancer subtypes using multidimensional data. For this purpose, we apply and extend unsupervised multiple kernel learning methods. Three challenges of unsupervised multiple kernel learning are addressed: robustness, applicability, and interpretability. First, we show that regularization of the multiple kernel graph embedding framework, which enables the implementation of dimensionality reduction techniques, can increase the stability of the resulting patient subgroups. This improvement is especially beneficial for data sets with a small number of samples. Second, we adapt the objective function of kernel principal component analysis to enable the application of multiple kernel learning in combination with this widely used dimensionality reduction technique. Third, we improve the interpretability of kernel learning procedures by performing feature clustering prior to integrating the data via multiple kernel learning. On the basis of these clusters, we derive a score indicating the impact of a feature cluster on a patient cluster, thereby facilitating further analysis of the cluster-specific biological properties. All three procedures are successfully tested on real-world cancer data. 

On Approximate Polynomial Identity Testing and Real Root Finding
(Advisor: Prof. Markus Bläser)

Monday, 11.11.2019, 10:00 h, in building E1 7, room 0.01


In this thesis we study the following three themes, which share a connection through the (arithmetic) circuit complexity of polynomials.
1. Rank of symbolic matrices: There exist easy randomized polynomial time algorithms for computing the commutative rank of symbolic matrices, finding deterministic polynomial time algorithms remains elusive. We first demonstrate a deterministic polynomial time approximation scheme (PTAS) for computing the commutative rank. Prior to this work, deterministic polynomial time algorithms were known only for computing a 1/2-approximation of the commutative rank. We give two distinct proofs that our algorithm is a PTAS. We also give a min-max characterization of commutative and non-commutative ranks of symbolic matrices.
2. Computation of real roots of real sparse polynomials: It is known that solving a system of polynomial equations reduces to solving a uni-variate polynomial equation. We describe a polynomial time algorithm for sparse polynomials, which computes approximations of all the real roots (even though it may also compute approximations of some complex roots). Moreover, we also show that the roots of integer trinomials are well-separated.
3. Complexity of symmetric polynomials: It is known that symmetric Boolean functions are easy to compute. In contrast, we show that the assumption of inequality of VP and VNP implies that there exist hard symmetric polynomials. To prove this result, we use an algebraic analogue of the classical Newton’s iteration.

Julian STEIL
Mobile Eye Tracking for Everyone
(Advisor: Prof. Andreas Bulling, now Stuttgart)

Friday, 08.11.2019, 12:00 h, in building E1 4, room 0.24


Mobile eye tracking has significant potential to overcome fundamental limitations of remote systems in term of mobility and practical usefulness. However, mobile eye tracking currently faces two fundamental challenges that prevent it from being practically usable and that, consequently, have to be addressed before mobile eye tracking can truly be used by everyone: Mobile eye tracking needs to be advanced and made fully functional in unconstrained environments, and it needs to be made socially acceptable. Nevertheless, solving technical challenges and social obstacles alone is not sufficient to make mobile eye tracking attractive for the masses. The key to success is the development of convincingly useful, innovative, and essential applications. This thesis provides solutions for all of these challenges and paves the way for the development of socially acceptable, privacy-aware, but highly functional mobile eye tracking devices and novel applications, so that mobile eye tracking can develop its full potential to become an everyday technology for everyone.


Optimization Landscape of Deep Neural Networks
(Advisor: Prof. Matthias Hein, now Tübingen)

Friday, 18.10.2019, 13:00 h, in building E1 7, room 0.01


Deep learning has drawn increasing attention in artificial intelligence due to its superior empirical performance in various fields of applications, including computer vision, natural language processing and speech recognition. This raises important and fundamental questions on why these methods work so well in practice.
The goal of this thesis is to improve our theoretical understanding on the optimization landscape of deep neural nets. We show that under some over-parameterization conditions, in particular as the width of one of the hidden layers exceeds the number of training samples, every stationary point of the loss with some full-rank structure is a global minimum with zero training error. Moreover, every sub-level set of the loss is connected and unbounded. This suggests that by increasing over-parameterization, the loss function of deep neural nets can become much more favorable to local search algorithms than any other „wildly“ non-convex function. In an attempt to bring our results closer to practice, we come up with a new class of network architectures which provably have a nice optimization surface while empirically achieving good generalization performance on standard benchmark dataset. Lastly, we complement our results by identifying an intuitive function class which neural networks of arbitrary depth might not be able to learn if their hidden layers are not wide enough.

Non-Reformist Reform for Haskell Modularity
(Advisor: Prof. Derek Dreyer)

Tuesday, 15.10.2019, 15:00 h, in building E1 5, room 0.29


In this thesis, I present Backpack, a new language for building separately-typecheckable packages on top of a weak module system like Haskell’s. The design of Backpack is the first to bring the rich world of type systems to the practical world of packages via mixin modules. It’s inspired by the MixML module calculus of Rossberg and Dreyer but by choosing practicality over expressivity Backpack both simplifies that semantics and supports a flexible notion of applicative instantiation. Moreover, this design is motivated less by foundational concerns and more by the practical concern of integration into Haskell. The result is a new approach to writing modular software at the scale of packages.

Understanding Quantities in Web Tables and Text
(Advisor: Prof. Gerhard Weikum)

Tuesday, 08.10.2019, 15:00 h, in building E1 4, room 024


There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text.
In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures–for example “48 km/h” is equivalent to “30 mph”. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales.
In this thesis, we target these challenges through the following contributions:
We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources.
We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base.
We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines.
We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities.
We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.



Cornelius BRAND
Paths and Walks, Forests and Planes – Arcadian Algorithms and Complexity
(Advisor: Prof. Holger Dell, now ITU Copenhagen)

Thursday, 26.09.2019, 10:00 h, in building E1 7, room 001


We present new results on parameterized algorithms for hard graph problems. We generalize and improve established methods (i.a. Color-Coding and representative families) by an algebraic formulation of the problems in the language of exterior algebra.Our method gives (1) a faster algorithm to estimate the number of simple paths of given length in directed graphs, (2) faster deterministic algorithms for finding such paths if the input graph contains only few of them, and (3) faster deterministic algorithms to find spanning trees with few leaves. We also consider the algebraic foundations of our new method in exterior and commutative algebra.
Additionally, we investigate the fine-grained complexity of determining the number of forests with a given number of edges in a given undirected graph.
We (1) complete the complexity classification of the Tutte plane under the exponential time hypothesis, and (2) prove that counting forests with a given number of edges is at least as hard as counting cliques of a given size.

Accountable Infrastructure and Its Impact on Internet Security and Privacy
(Advisor: Prof. Michael Backes)

Tuesday, 17.09.2019, 14:00 h, in building E9 1, room 001


The Internet infrastructure relies on the correct functioning of the basic underlying protocols, which were designed for functionality. Security and privacy have been added post hoc, mostly by applying cryptographic means to different layers of communication. In the absence of accountability, as a fundamental property, the Internet infrastructure does not have a built-in ability to associate an action with the responsible entity, neither to detect or prevent misbehavior.
In this thesis, we study accountability from a few different perspectives. First, we study the need of having accountability in anonymous communication networks as a mechanism that provides repudiation for the proxy nodes by tracing back selected outbound traffic in a provable manner. Second, we design a framework that provides a foundation to support the enforcement of the right to be forgotten law in a scalable and automated manner. The framework provides a technical mean for the users to prove their eligibility for content removal from the search results. Third, we analyze the Internet infrastructure determining potential security risks and threats imposed by dependencies among the entities on the Internet. Finally, we evaluate the feasibility of using hop count filtering as a mechanism for mitigating Distributed Reflective Denial-of-Service attacks, and conceptually show that it cannot work to prevent these attacks.

On some covering, partition, and connectivity problems in graphs
(Advisor: Dr. Andreas Karrenbauer)

Monday, 09.09.2019, 14:00 h, in building E1 4, room 0.24


We look at some graph problems related to covering, partition, and connectivity. First, we study the problems of covering and partitioning edges with bicliques, especially from the viewpoint of parameterized complexity. For the partition problem, we develop much more efficient algorithms than the ones previously known. In contrast, for the cover problem, our lower bounds show that the known algorithms are probably optimal. Next, we move on to graph coloring, which is probably the most extensively studied partition problem in graphs. Hadwiger’s conjecture is a long-standing open problem related to vertex coloring. We prove the conjecture for a special class of graphs, namely squares of 2-trees, and show that square graphs are important in connection with Hadwiger’s conjecture. Then, we study a coloring problem that has been emerging recently, called rainbow coloring. This problem lies in the intersection of coloring and connectivity. We study different variants of rainbow coloring and present bounds and complexity results on them. Finally, we move on to another parameter related to connectivity called spanning tree congestion (STC). We give tight bounds for STC in general graphs and random graphs. While proving the results on STC, we also make some contributions to the related area of connected partitioning.


Applications, challenges and new perspectives on the analysis of transcriptional regulation using epigenomic and transcriptomic data
(Advisor: Dr. Marcel Schulz, now Frankfurt)

Monday, 26.08.2019, 10:00 h, in building E1 4, room 0.24


The integrative analysis of epigenomics and transcriptomics data is an active research field in Bioinformatics. New methods are required to interpret and process large omics data sets, as generated within consortia such as the International Human Epigenomics Consortium. In this thesis, we present several approaches illustrating how combined epigenomics and transcriptomics datasets, e.g. for differential or time series analysis, can be used to derive new biological insights on transcriptional regulation. In my work, I focused on regulatory proteins called transcription factors (TFs), which are essential for orchestrating cellular processes.
I present novel approaches, which combine epigenomics data, such as DNaseI-seq, predicted TF binding scores and gene-expression measurements in interpretable machine learning models. In joint work with our collaborators within and outside IHEC, we have shown that our methods lead to biological meaningful results, which could be validated with wet-lab experiments.
Aside from providing the community with new tools to perform integrative analysis of epigenomics and transcriptomics data, we have studied the characteristics of chromatin accessibility data and its relation to gene-expression in detail to better understand the implications of both computational processing and of different experimental methods on data interpretation.


Decidable Fragments of First-Order Logic and of First-Order Linear Arithmetic with Uninterpreted Predicates
(Advisor: Prof. Christoph Weidenbach)

Wednesday, 31.07.2019, 14:00 h, in building E1 4, room 0.24


First-order logic (FOL) is one of the most prominent formalisms in computer science. Unfortunately, FOL satisfiability is not solvable algorithmically in full generality. The classical decision problem is the quest for a delineation between the decidable and the undecidable parts of FOL. This talk aims to shed new light on the boundary. To this end, recently discovered structure on the decidable side of the first-order landscape shall be discussed.
The primary focus will be on the syntactic concept of separateness of variables and its applicability to the classical decision problem and beyond. Two disjoint sets of first-order variables are separated in a given formula if each atom in that formula contains variables from at most one of the two sets. This simple notion facilitates the definition of decidable extensions of many well-known decidable FOL fragments. Although the extensions exhibit the same expressive power as the respective originals, certain logical properties can be expressed much more succinctly. In at least two cases the succinctness gap cannot be bounded using any elementary function.
The secondary focus will be on linear first-order arithmetic over the rationals with uninterpreted predicates. Novel decidable fragments of this logical language shall be discussed briefly.

Counting Problems on Quantum Graphs
(Advisor: Dr. Holger Dell)

Monday, 15.07.2019, 12:00 h, in building E1 7, room 001


Quantum graphs, as defined by Lovász in the late 60s, are formal linear combinations of simple graphs with finite support. They allow for the complexity analysis of the problem of computing finite linear combinations of homomorphism counts, the latter of which constitute the foundation of the structural hardness theory for parameterized counting problems: The framework of parameterized counting complexity was introduced by Flum and Grohe, and McCartin in 2002 and forms a hybrid between the classical field of computational counting as founded by Valiant in the late 70s and the paradigm of parameterized complexity theory due to Downey and Fellows which originated in the early 90s.
The problem of computing homomorphism numbers of quantum graphs subsumes general motif counting problems and the complexity theoretic implications have only turned out recently in a breakthrough regarding the parameterized subgraph counting problem by Curticapean, Dell and Marx in 2017.
We study the problems of counting partially injective and edge-injective homomorphisms, counting induced subgraphs, as well as counting answers to existential first-order queries. We establish novel combinatorial, algebraic and even topological properties of quantum graphs that allow us to provide exhaustive parameterized and exact complexity classifications, including necessary, sufficient and mostly explicit tractability criteria, for all of the previous problems.

Engineering Formal Systems in Constructive Type Theory
(Advisor: Prof. Gert Smolka)

Friday, 12.07.2019, 14:15 h, in building E1 7, room 001


This thesis presents a practical methodology for formalizing the meta-theory of formal systems with binders and coinductive relations in constructive type theory. While constructive type theory offers support for reasoning about formal systems built out of inductive definitions, support for syntax with binders and coinductive relations is lacking. We provide this support. We implement syntax with binders using well-scoped de Bruijn terms and parallel substitutions. We solve substitution lemmas automatically using the rewriting theory of the σ-calculus. We present the Autosubst library to automate our approach in the proof assistant Coq. Our approach to coinductive relations is based on an inductive tower construction, which is a type-theoretic form of transfinite induction. The tower construction allows us to reduce coinduction to induction. This leads to a symmetric treatment of induction and coinduction and allows us to give a novel construction of the companion of a monotone function on a complete lattice. We demonstrate our methods with a series of case studies. In particular, we present a proof of type preservation for CCω, a proof of weak and strong normalization for System F, a proof that systems of weakly guarded equations have unique solutions in CCS, and a compiler verification for a compiler from a non-deterministic language into a deterministic language. All technical results in the thesis are formalized in Coq.

Formal Verification of the Equivalence of System F and the Pure Type System L2
(Advisor: Prof. Gert Smolka)

Thursday, 11.07.2019, 16:15 h, in building E1 7, room 001


We develop a formal proof of the equivalence of two different variants of System F. The first is close to the original presentation where expressions are separated into distinct syntactic classes of types and terms. The second, L2, is a particular pure type system (PTS) where the notions of types and terms, and the associated expressions are unified in a single syntactic class. The employed notion of equivalence is a bidirectional reduction of the respective typing relations. A machine-verified proof of this result turns out to be surprisingly intricate, since the two variants noticeably differ in their expression languages, their type systems and the binding of local variables.
Most of this work is executed in the Coq theorem prover and encompasses a general development of the PTS metatheory, an equivalence result for a stratified and a PTS variant of the simply typed lambda-calculus as well as the subsequent extension to the full equivalence result for System F. We utilise nameless de Bruijn syntax with parallel substitutions for the representation of variable binding and develop an extended notion of context morphism lemmas as a structured proof method for this setting.
We also provide two developments of the equivalence result in the proof systems Abella and Beluga, where we rely on higher-order abstract syntax (HOAS). This allows us to compare the three proof systems, as well as HOAS and de Bruijn for the purpose of developing formal metatheory.

Measuring User Experience for Virtual Reality
(Advisor: Prof. Antonio Krüger)

Wednesday, 10.07.2019, 13:00 h, in building D3 2,  Reuse seminar room (-2.17)


In recent years, Virtual Reality (VR) and 3D User Interfaces (3DUI) have seen a drastic increase in popularity, especially in terms of consumer-ready hardware and software. These technologies have the potential to create new experiences that combine the advantages of reality and virtuality. While the technology for input as well as output devices is market ready, only a few solutions for everyday VR – online shopping, games, or movies – exist, and empirical knowledge about performance and user preferences is lacking. All this makes the development and design of human-centered user interfaces for VR a great challenge.
This thesis investigates the evaluation and design of interactive VR experiences. We introduce the Virtual Reality User Experience (VRUX) model based on VR-specific external factors and evaluation metrics such as task performance and user preference. Based on our novel UX evaluation approach, we contribute by exploring the following directions: shopping in virtual environments, as well as text entry and menu control in the context of everyday VR. Along with this, we summarize our findings by design spaces and guidelines for choosing optimal interfaces and controls in VR.

Matrix Factorization over Dioids and its Applications in Data Mining
(Advisor: Prof. Pauli Miettinen, now University of Eastern Finland)

Wednesday, 10.07.2019, 10:30 h, in building E1 5, room 0.29


While classic matrix factorization methods, such as NMF and SVD, are known to be highly effective at finding latent patterns in data, they are limited by the underlying algebraic structure. In particular, it is often difficult to distinguish heavily overlapping patterns because they interfere with each other. To deal with this problem, we study matrix factorization over algebraic structures known as dioids, that are characterized by the idempotency of addition (a + a = a). Idempotency ensures that at every data point only a single pattern contributes, which makes it easier to distinguish them. In this thesis, we consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra).
The Boolean algebra is, perhaps, the most well known dioid, and there exist Boolean matrix factorization algorithms that produce low reconstruction errors. In this thesis, however, a different objective function is used — the description length of the data, which enables us to obtain more compact and highly interpretable results.
The tropical and subtropical algebras are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating most prominent features of the data.

Hosnieh SATTAR
Intents and Preferences Prediction Based on Implicit Human Cues
(Advisor: Dr. Mario Fritz)

Tuesday, 02.07.2019, 13:00 h, in building E1 4, room 0.22


Understanding implicit human cues is a key factor to build advanced human-machine interfaces. The modern interfaces should be able to understand and predict human intents and assist users from implicit data rather waiting for explicit input. Eye movement, electroencephalogram (EEG), electrocardiogram (ECG), body shape, are several sources of implicit human cues. In this thesis, we focused on eye data and body shape.
The eye is known as a window to the mind and could reflect allot about the ongoing process in mind. Thus, in the first part of the thesis, we focus on advancing techniques for search target prediction from human gaze data. We propose a series of techniques for search target inference algorithm from human fixation data recorded during the visual search that broaden the scope of search target prediction to categorical classes, such as object categories and attributes.
Moreover, body shape could reveal a lot of information about the people choice of outfit, food, sport, or living style. Hence, in the second part of this thesis, we studied how body shape could influences people outfits’ preferences. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded.


Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D
(Advisor: Prof. Nico Pfeifer, now Tübingen)

Monday, 24.06.2019, 16:00 h, in building E1 4, room 0.24


With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multi-pronged regulatory mechanisms of the genome. In my colloquium, I will present interpretable machine learning-based approaches for prediction and analysis of long-range chromatin interactions and how they can help gain novel, fine-grained insights into genome regulatory mechanisms. Specifically, I will first present our string kernels-based support vector classification method for prediction of long-range chromatin interactions. This method was one of the first bioinformatic approaches to suggest a potential general role of short tandem repeat sequences in long-range genome regulation and its 3D organization. I will then present CoMIK, a method we developed for handling variable-length DNA sequences in classification scenarios. CoMIK can not only identify the features important for classification but also locate them within the individual sequences, thus enabling the possibility of more fine-grained insights.

Artificial Intelligence for Efficient Image-based View Synthesis
(Advisor: Prof. Hans-Peter Seidel and Prof. Tobias Ritschel, now University College London)

Monday, 24.06.2019, 9:00 h, in building E1 4, room 0.19


Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or   multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. We demonstrate how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion.

Azim Dehghani Amirabad
From genes to transcripts: Integrative modeling and analysis of regularity networks
(Advisor: Dr. Marcel Schulz, now Frankfurt)

Tuesday, 11.06.2019, 10:00 h, in building E1 7, room 002


Although all the cells in an organism posses the same genome, the regulatory mechanisms lead to highly specic cell types. Elucidating these regulatory mechanisms is a great challenge in systems biology research. Nonetheless, it is known that a large fraction of our genome is comprised of regulatory elements, the precise mechanisms by which different combinations of regulatory elements are involved in controlling gene expression and cell identity are poorly understood. This thesis describes algorithms and approaches for modeling and analysis of different modes of gene regulation. We present POSTIT a novel algorithm for modeling and inferring transcript isoform regulation from transcriptomics and epigenomics data. POSTIT uses multi-task learning with structured-sparsity inducing regularizer to share the regulatory information between isoforms of a gene, which is shown to lead to accurate isoform expression prediction and inference of regulators.
We developed a novel statistical method, called SPONGE, for large-scale inference of ceRNA networks. In this framework, we designed an efficient empirical p-value computation approach, by sampling from derived null models, which addresses important confounding factors such as sample size, number of involved regulators and strength of correlation.
Finally, we present an integrative analysis of miRNA and protein-based post-transcriptional regulation. We postulate a competitive regulation of the RNA-binding protein IMP2 with miRNAs binding the same RNAs using expression and RNA binding data. This function of IMP2 is relevant in the contribution to disease in the context of adult cellular metabolism.


Model-based Human Performance Capture in Outdoor Scenes
(Advisor: Prof. Christian Theobalt)

Tuesday, 21.05.2019, 10:00 h, in building E1 4, room 019


Technologies for motion and performance capture of real actors have enabled the creation of realistic-looking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios.
Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and fine-scale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.

Monitoring with Parameters
(Advisor: Prof. Bernd Finkbeiner)

Friday, 17.05.2019, 14:15 h, in building E1 7, room 0.01


Runtime monitoring of embedded systems is a method to safeguard their reliable operation by detecting runtime failures within the system and recognizing unexpected environment behavior during system development and operation. Specification languages for runtime monitoring aim to provide the succinct, understandable, and formal specification of system and component properties.
Starting from temporal logics, this thesis extends the expressivity of specification languages for runtime monitoring in three key aspects while maintaining the predictability of resource usage. First, we provide monitoring algorithms for linear-time temporal logic with parameters (PLTL), where the parameters bound the number of steps until an eventuality is satisfied. Second, we introduce Lola 2.0, which adds data parameterization to the stream specification language Lola. Third, we integrate real-time specifications in RTLola and add real-time sliding windows, which aggregate data over real-time intervals. For the combination of these extensions, we present a design-time specification analysis which provides resource guarantees.
We report on a case study on the application of the language in an autonomous UAS. Component and system properties were specified together with domain experts in the developed stream specification language and evaluated in a real-time hardware-in-the-loop testbed with a complex environment simulation.


Matthias DÖRING
Computational Approaches for Improving Treatment and Prevention of Viral Infections
(Advisor: Prof. Nico Pfeifer, now Tübingen)

Tuesday, 30.04.2019, 14:00 h, in building E2 1, room 0.01


The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments.
geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.

Stochastic Modeling of DNA Demethylation Dynamics in ESCs
(Advisor: Prof. Verena Wolf)

Tuesday, 30.04.2019, 11:00 h, in building E1 5, room 0.29


DNA methylation and demethylation are opposing processes that when in balance create stable patterns of epigenetic memory. The control of DNA methylation pattern formation in replication dependent and independent demethylation processes has been suggested to be influenced by Tet mediated oxidation of a methylated cytosine, 5mC, to a hydroxylated cytosine, 5hmC. Based only on in vitro experiments, several alternative mechanisms have been proposed on how 5hmC influences replication dependent maintenance of DNA methylation and replication independent processes of active demethylation. In this thesis we design an extended and easily generalizable hidden Markov model that uses as input hairpin (oxidative-)bisulfite sequencing data to precisely determine the over time dynamics of 5mC and 5hmC, as well as to infer the activities of the involved enzymes at a single CpG resolution. Developing the appropriate statistical and computational tools, we apply the model to discrete high-depth sequenced genomic loci, and on a whole genome scale with a much smaller sequencing depth. Performing the analysis of the model’s output on mESCs data, we show that the presence of Tet enzymes and 5hmC has a very strong impact on replication dependent demethylation by establishing a passive demethylation mechanism, implicitly impairing methylation maintenance, but also down-regulating the de novo methylation activity.

Analysis of the Protein-Ligand and Protein-Peptide Interactions Using a Combined Sequence- and Structure-Based Approach
(Advisor: Prof. Olga Kalinina)

Wednesday, 17.04.2019, 15:00 h, in building E1 4, room 0.24


Proteins participate in most of the important processes in cells, and their ability to perform their function ultimately depends on their three-dimensional structure. They usually act in these processes through interactions with other molecules. Because of the importance of their role, proteins are also the common target for small molecule drugs that inhibit their activity, which may include targeting protein interactions. Understanding protein interactions and how they are affected by mutations is thus crucial for combating drug resistance and aiding drug design.
This dissertation combines bioinformatics studies of protein interactions at both primary sequence and structural level. We analyse protein-protein interactions through linear motifs, as well as protein-small molecule interactions, and study how mutations affect them. This is done in the context of two systems. In the first study of drug resistance mutations in the protease of the human immunodeficiency virus type 1, we successfully apply molecular dynamics simulations to estimate the effects of known resistance-associated mutations on the free binding energy, also revealing molecular mechanisms of resistance. In the second study, we analyse consensus profiles of linear motifs that mediate the recognition by the mitogen-activated protein kinases of their target proteins. We thus gain insights into the cellular processes these proteins are involved in.

 Abdalghani ABUJABAL
Question Answering over Knowledge Bases with Continuous Learning
(Advisor: Prof. Gerhard Weikum)

Friday, 12.04.2019, 11:00 h, in building E1 4, room 0.24


Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, two research issues arise:
(i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, and
(ii) how to design and curate datasets to advance research in question answering. In this dissertation, we make the following contributions in the area of question answering (QA). For issue (i), we present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. We also introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems.

Generalizations of the Multicut Problem for Computer Vision
(Advisor: Prof. Bernt Schiele)

Monday, 08.04.2019, 11:30 h, in building E1 4, room 0.24


Graph decompositions have always been an essential part of computer vision as a large variety of tasks can be formulated in this framework. One way to optimize for a decomposition of a graph is called the multicut problem, also known as correlation clustering. Its particular feature is that the number of clusters is not required to be known a priori, but is rather deduced from the optimal solution. On the downside, it is NP-hard to solve.
In this thesis we present several generalizations of the multicut problem, that allow to model richer dependencies between nodes of a graph. We also propose efficient local search algorithms, that in practice find good solution in reasonable time, and show applications to image segmentation, multi-human pose estimation, multi-object tracking and many others.

Sebastian HAHN
On Static Execution-Time Analysis – Compositionality, Pipeline Abstraction, and Predictable Hardware
(Advisor: Prof. Reinhard Wilhelm)

Thursday, 04.04.2019, 16:00 h, in building E1 1, room 407


Proving timeliness is an integral part of the verification of safety-critical real-time systems. To this end, timing analysis computes upper bounds on the execution times of programs that execute on a given hardware platform.
At least since multi-core processors have emerged, timing analysis separates concerns by analysing different aspects of the system’s timing behaviour individually. In this work, we validate the underlying assumption that a timing bound can be soundly composed from individual contributions. We show that even simple processors exhibit counter-intuitive behaviour – a locally slow execution can lead to an even slower overall execution – that impedes the soundness of the composition. We present the compositional base bound analysis that accounts for any such amplifying effects within its timing contribution. This enables a sound although expensive compositional analysis even for complex contemporary processors.
In addition, we discuss future hardware designs that enable efficient compositional analyses. We introduce a hardware design, the strictly in-order pipeline, that behaves monotonically with respect to the progress of a program’s execution. Monotonicity enables us to formally reason about properties such as compositionality.

Joanna BIEGA
Enhancing Privacy and Fairness in Search Systems
(Advisor: Prof. Gerhard Weikum)

Monday, 01.04.2019, 15:15 h, in building E1 4, room 024


Search systems have a substantial potential to harm individuals and the society. First, since they collect vast amounts of data about their users, they have the potential to violate user privacy. Second, in applications where rankings influence people’s economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results.
This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects:
* We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility.
* We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity.
* We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user’s relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions.
* Last but not least, we propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles.

Yecheng GU
Intelligent Tutoring in Virtual Reality for Highly Dynamic Pedestrian Safety Training
(Advisor: Prof. Jörg Siekmann)

Monday, 01.04.2019, 14:00 h, in building E1 7, room 0.01


This thesis presents the design, implementation, and evaluation of an Intelligent Tutoring System (ITS) with a Virtual Reality (VR) interface for child pedestrian safety training. This system enables children to train practical skills in a safe and realistic virtual environment without the time and space dependencies of traditional roadside training. This system also employs Domain and Student Modelling techniques to analyze user data during training automatically and to provide appropriate instructions and feedback.
Thus, the traditional requirement of constant monitoring from teaching personnel is greatly reduced. Compared to previous work, especially the second aspect is a principal novelty for this domain. To achieve this, a novel Domain and Student Modeling method was developed in addition to a modular and extensible virtual environment for the target domain. While the Domain and Student Modeling framework is designed to handle the highly dynamic nature of training in traffic and the ill-defined characteristics of pedestrian tasks, the modular virtual environment supports different interaction methods and a simple and efficient way to create and adapt exercises. The thesis is complemented by two user studies with elementary school children. These studies testify great overall user acceptance and the system’s potential for improving key pedestrian skills through autonomous learning. Last but not least, the thesis presents experiments with different forms of VR input and provides directions for future work.


Novel Approaches to Anonymity and Privacy in Decentralized, Open Settings
(Advisor: Prof. Michael Backes)

Friday, 29.03.2019, 11:30 h, in building E9 1, room 0.01


The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with.
In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting.
Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks.
At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.

What we leave behind: reproducibility in chromatin analysis within and across species
(Advisor: Prof. Thomas Lengauer)

Thursday, 14.03.2019, 11:30 h, in building E2 1, room 007


Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. In my colloquium, I am going to present approaches dealing with three challenging problems in computational epigenomics: first, we designed a solution strategy for improving reproducibility of computational analysis as part of a large national epigenetics research consortium. Second, we developed a software tool for the comparative epigenome analysis of cell types within a species. For this project, we relied on a well-established and compact representation of epigenetic information known as chromatin states, which enabled us to realize fast and interpretable comparative chromatin analysis even for small sample numbers. Third, we extended comparative epigenomics beyond canonical model organisms to also include species for which little to no epigenetic data is available. In a proof of concept study, we implemented a computational pipeline for predicting biologically relevant information, e.g., cell-type specific gene expression profiles, in non-model organisms based only on reference epigenomes from well-studied species such as human or mouse.


Scalable positioning of commodity mobile devices using audio signals
(Advisor: Prof. Peter Druschel)

Thursday, 14.02.2019, 9:00 h, in building E1 5, room 0.29


This thesis explores the problem of computing a position map for co-located mobile devices. The positioning should happen in a scalable manner without requiring specialized hardware and without requiring specialized infrastructure (except basic Wi-Fi or cellular access). At events like meetings, talks, or conferences, a position map can aid spontaneous communication among users based on their relative position in two ways. First, it enables users to choose message recipients based on their relative position, which also enables the position-based distribution of documents. Second, it enables senders to attach their position to messages, which can facilitate interaction between speaker and audience in a lecture hall and enables the collection of feedback based on users‘ location.
In this thesis, we present Sonoloc, a mobile app and system that, by relying on acoustic signals, allows a set of commodity smart devices to determine their _relative_ positions. Sonoloc can position any number of devices within acoustic range with a constant number of acoustic signals emitted by a subset of devices. Our experimental evaluation with up to 115 devices in real rooms shows that — despite substantial background noise — the system can locate devices with an accuracy of tens of centimeters using no more than 15 acoustic signals.

Techniques to Protect Confidentiality and Integrity of Persistent and In-Memory Data
(Advisor: Prof. Peter Druschel)

Tuesday, 05.02.2019, 17:30 h, in building E1 5, room 0.29


Today computers store and analyze valuable and sensitive data. As a result we need to protect this data against confidentiality and integrity violations that can result in the illicit release, loss, or modification of a user’s and an organization’s sensitive data such as personal media content or client records. Existing techniques protecting confidentiality and integrity lack either efficiency or are vulnerable to malicious attacks. In this thesis we suggest techniques, Guardat and ERIM, to efficiently and robustly protect persistent and in-memory data.
To protect the confidentiality and integrity of persistent data, clients specify per-file policies to Guardat declaratively, concisely and separately from code. Guardat enforces policies by mediating I/O in the storage layer. In contrast to prior techniques, we protect against accidental or malicious circumvention of higher software layers. We present the design and prototype implementation, and demonstrate that Guardat efficiently enforces example policies in a web server.
To protect the confidentiality and integrity of in-memory data, ERIM isolates sensitive data using Intel Memory Protection Keys (MPK), a recent x86 extension to partition the address space. However, MPK does not protect against malicious attacks by itself. We prevent malicious attacks by combining MPK with call gates to trusted entry points and ahead-of-time binary inspection. In contrast to existing techniques, ERIM efficiently protects frequently-used session keys of web servers, an in-memory reference monitor’s private state, and managed runtimes from native libraries. These use cases result in high switch rates of the order of 10 5 –10 6 switches/s. Our experiments demonstrate less then 1% runtime overhead per 100,000 switches/s, thus outperforming existing techniques.

Muhammad Bilal ZAFAR
Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms
(Advisor: Prof. Krishna Gummadi)

Monday, 04.02.2019, 18:00 h, in building E1 5, room 0.29


The rise of algorithmic decision making in a variety of applications has also raised concerns about its potential for discrimination against certain social groups. However, incorporating nondiscrimination goals into the design of algorithmic decision making systems (or, classifiers) has proven to be quite challenging. These challenges arise mainly due to the computational complexities involved in the process, and the inadequacy of existing measures to computationally capture discrimination in various situations. The goal of this thesis is to tackle these problems. First, with the aim of incorporating existing measures of discrimination (namely, disparate treatment and disparate impact) into the design of well-known classifiers, we introduce a mechanism of decision boundary covariance, that can be included in the formulation of any convex boundary-based classifier in the form of convex constraints. Second, we propose alternative measures of discrimination. Our first proposed measure, disparate mistreatment, is useful in situations when unbiased ground truth training data is available. The other two measures, preferred treatment and preferred impact, are useful in situations when feature and class distributions of different social groups are significantly different, and can additionally help reduce the cost of nondiscrimination (as compared to the existing measures). We also design mechanisms to incorporate these new measures into the design of convex boundary-based classifiers.



Christian LANDER
Ubiquitous Head-Mounted Gaze Tracking
(Advisor: Prof. Antonio Krüger)

Thursday, 24.01.2019, 15:00 h, in building D3 2,  „Reuse“ meeting room


Recently, gaze-based interfaces in stationary settings became a valid option, as remote eye trackers are easily available at a low price. However, making gaze-based interfaces ubiquitous remains an open challenge that cannot be resolved using remote approaches. The miniaturization in camera technology allows for gaze estimation with head-mounted devices, which are similar to a pair of glasses. A crucial step towards gaze estimation using such devices is calibration. Although the development is mainly technology driven, a hypothetical fully calibrated system in which all parameters are known and valid is affected by calibration drift. In addition, attributes like minimal intrusiveness and obstruction, easy and flexible setup and high accuracy are not present in the latest commercial devices. Hence their applicability to spontaneous interaction or long-lasting research experiments is questionable.
In this thesis we enable spontaneous, calibration-free and accurate mobile gaze estimation. We contribute by investigating the following three areas: Efficient Long-Term Usage of a head-mounted eye tracker; Location, Orientation & Target Independent head mounted eye tracking; and Mobile & Accurate Calibration-Free gaze estimation. Through the elaboration of the theoretical principles, we investigate novel concepts for each aspect; these are implemented and evaluated, converging into a final device, to show how to overcome current limitations.



Quantifying & Characterizing Information Diets of Social Media Users
(Advisor: Prof. Krishna Gummadi)

Thursday, 20.12.2018, 16:00 h, in building E1 5, room 0.29


An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media.
With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives.
Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.

Applicable and Sound Polyhedral Optimization of Low-Level Programs
(Advisor: Prof. Sebastian Hack)

Wednesday, 19.12.2018, 10:15 h, in building E1 7, room 0.01


Computers become increasingly complex. Current and future systems feature configurable hardware, multiple cores with different capabilities, as well as accelerators. In addition, the memory subsystem becomes diversified too. The cache hierarchy grows deeper, is augmented with scratchpads, low-latency memory, and high-bandwidth memory. The programmer alone cannot utilize this enormous potential. Compilers have to provide insight into the program behavior, or even arrange computations and data themselves. Either way, they need a more holistic view of the program. Local transformations, which treat the iteration order, computation unit, and data layout as fixed, will not be able to fully utilize a diverse system.
The polyhedral model, a high-level program representation and transformation framework, has shown great success tackling various problems in the context of diverse systems. While it is widely acknowledged for its analytical powers and transformation capabilities, it is also widely assumed to be too restrictive and fragile for real-world programs.
In this thesis, we improve the applicability and profitability of polyhedral-model-based techniques. Our efforts guarantee a sound polyhedral representation and extend the applicability to a wider range of programs. In addition, we introduce new applications to utilize the information available in the polyhedral program representation, including standalone optimizations and techniques to derive high-level properties.

Alexander WEINERT
Optimality and Resilience in Parity Games
(Advisor: Dr. Martin Zimmermann)

Friday, 14.12.2018, 14:15 h, in building E1 7, room 0.01


Modeling reactive systems as infinite games has yielded a multitude of results in the fields of program verification and program synthesis. The canonical parity condition used for such games, however, neither suffices to express non-functional requirements on the modeled system, nor to capture malfunctions of the deployed system.
In this work we address these issues by investigating quantitative games in which the above characteristics can be expressed. Parity games with costs are a variant of parity games in which traversing an edge incurs some nonnegative cost. The cost of a play is the limit of the cost incurred between answering odd colors by larger even ones. We extend that model by allowing for integer costs, obtaining parity games with weights, and determine both the computational complexity of solving the resulting games as well as the memory required by the protagonist to win such games.
Our results show that the increase in expressiveness comes at a steep cost in terms of both measures. Subsequently, we address the optimality question, i.e., the question asking whether the protagonist is able to ensure a given bound on the cost, and show that answering this question is harder than just solving the underlying game, even for special cases of parity games with weights.
Moreover, we further extend this model to allow for modeling disturbances that may occur during the operation of the resulting strategy and show how to compute a strategy that is resilient against as many such disturbances as possible. Finally, we show that the protagonist is able to trade memory, quality, and resilience of strategies for one another and we show how to compute the possible tradeoffs between these three values.

Paarijaat ADITYA
Towards Privacy-Compliant Mobile Computing
(Advisor: Prof. Peter Druschel)

Monday, 03.12.2018, 16:30 h, in building E1 5, room 0.29


Sophisticated mobile computing, sensing and recording devices like smartphones, smartwatches, and wearable cameras are carried by their users virtually around the clock, blurring the distinction between the online and offline worlds. While these devices enable transformative new applications and services, they also introduce entirely new threats to users‘ privacy, because they can capture a complete record of the user’s location, online and offline activities, and social encounters, including an audiovisual record. Such a record of users‘ personal information is highly sensitive and is subject to numerous privacy risks. In this thesis, we have investigated and built systems to mitigate two such privacy risks:
1) privacy risks due to ubiquitous digital capture, where bystanders may inadvertently be captured in photos and videos recorded by other nearby users,
2) privacy risks to users‘ personal information introduced by a popular class of apps called ‘mobile social apps’. In this thesis, we present two systems, called I-Pic and EnCore, built to mitigate these two privacy risks.
Both systems aim to put the users back in control of what personal information is being collected and shared, while still enabling innovative new applications. We built working prototypes of both systems and evaluated them through actual user deployments. Overall we demonstrate that it is possible to achieve privacy-compliant digital capture and it is possible to build privacy-compliant mobile social apps, while preserving their intended functionality and ease-of-use. Furthermore, we also explore how the two solutions can be merged into a powerful combination, one which could enable novel workflows for specifying privacy preferences in image capture that do not currently exist.


Algorithmic Results for Clustering and Refined Physarum Analysis
(Advisor: Prof. Kurt Mehlhorn)

Tuesday, 27.11.2018, 16:15 h, in building E1 5, room 0.29


In the first part of this thesis, we study the Binary $\ell_0$-Rank-$k$ problem which given a binary matrix $A$ and a positive integer $k$, seeks to find a rank-$k$ binary matrix $B$ minimizing the number of non-zero entries of $A-B$. A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant $k$ over the reals, $\mathbb{F}_2$, and the Boolean semiring. In addition, we give novel algorithms for important variants of $\ell_0$-low rank approximation. The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph $G$ into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal $k$-way node partition of $G$. In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.

A Verified Compiler for a Linear Imperative/Functional Intermediate Language
(Advisors: Prof. Sebastian Hack & Prof. Gert Smolka)

Friday, 16.11.2018, 12:00 h, in building E1 7, room 0.01


This thesis describes the design of the verified compiler LVC. LVC’s main novelty is the way its first-order, term-based intermediate language IL realizes the advantages of static single assignment (SSA) for verified compilation. IL is a term-based language not based on a control-flow graph (CFG) but defined in terms of an inductively defined syntax with lexically scoped mutually recursive function definitions. IL replaces the usual dominance-based SSA definition found in unverified and verified compilers with the novel notion of coherence. The main research question this thesis studies is whether IL with coherence offers a faithful implementation of SSA, and how the design influences the correctness invariants and the proofs in the verified compiler LVC. To study this question, we verify dead code elimination, several SSA-based value optimizations including sparse conditional constant propagation and SSA-based register allocation approach including spilling. In these case studies, IL with coherence provides the usual advantages of SSA and improves modularity of proofs. Furthermore, we propose a novel SSA construction algorithm based on coherence, and leverage the term structure of IL to obtain an inductive proof method for simulation proofs. LVC is implemented and verified with over 50,000 lines of code using the proof assistant Coq. To underline practicability of our approach, we integrate LVC with CompCert to obtain an executable compiler that generates PowerPC assembly code.

Supporting User’s Influence in Gamification Settings and Game Live-Streams
(Advisor: Prof. Antonio Krüger)

Tuesday, 06.11.2018, 15:00 h, in building D3 2 (DFKI), room -2.17 (Reuse)


Playing games has long been important to mankind. One reason for this is the associated autonomy, as players can decide on many aspects on their own and can shape the experience. Game-related sub-fields have appeared in Human-Computer Interaction where this autonomy is questionable: in this thesis, we consider gamification and game live-streams and here, we support the users‘ influence at runtime. We hypothesize that this should affect the perception of autonomy and should lead to positive effects overall. Our contribution is three-fold: first, we investigate crowd-based, self-sustaining systems in which the user’s influence directly impacts the outcome of the system’s service. We show that users are willing to expend effort in such systems even without additional motivation, but that gamification is still beneficial here. Second, we introduce „bottom-up“ gamification, i.e., the idea of self-tailored gamification. Here, users have full control over the gamification used in a system, i.e., they can set it up as they see fit at the system’s runtime. Through user studies, we show that this has positive behavioral effects and thus adds to the ongoing efforts to move away from „one-size-fits-all“ solutions. Third, we investigate how to make gaming live-streams more interactive, and how viewers perceive this. We also consider shared game control settings in live-streams, in which viewers have full control, and we contribute options to support viewers‘ self-administration here.


Mining Sandboxes
(Advisor: Prof. Andreas Zeller)

Monday, 22.10.2018, 11:00 h, in building E9 1, room 0.01


Modern software is ubiquitous, yet insecure. It has the potential to expose billions of humans to serious harm, up to and including losing fortunes and taking lives. Existing approaches for securing programs are either exceedingly hard and costly to apply, significantly decrease usability, or just don’t work well enough against a determined attacker.
In this thesis we propose a new solution that significantly increases application security yet it is cheap, easy to deploy, and has minimal usability impact. We combine in a novel way the best of what existing techniques of test generation, dynamic program analysis and runtime enforcement have to offer: We introduce the concept of sandbox mining. First, in a phase called mining, we use automatic test generation to discover application behavior. Second, we apply a sandbox to limit any behavior during normal usage to the one discovered during mining. Users of an application running in a mined sandbox are thus protected from the application suddenly changing its behavior, as compared to the one observed during automatic test generation. As a consequence, backdoors, advanced persistent threats and other kinds of attacks based on the passage of time become exceedingly hard to conduct covertly. They are either discovered in the secure mining phase, where they can do no damage, or are blocked altogether. Mining is cheap because we leverage fully automated test generation to provide baseline behavior. Usability is not degraded: the sandbox runtime enforcement impact is negligible; the mined behavior is comprehensive and presented in a human readable format, thus any unexpected behavior changes are rare and easy to reason about. Our BOXMATE prototype for Android applications shows the approach is technically feasible, has an easy setup process, and is widely applicable to existing apps. Experiments conducted with BOXMATE show less than one hour is required to mine Android applications sandboxes, requiring few to no confirmations for frequently used functionality.


Correctness of Multi-Core Processors with Operating System Support
(Advisor: Prof. Wolfgang Paul)

Thursday, 27.09.2018, 10:00 h, in building E1 5, room 0.29


In the course of adding hardware support for operating systems and hypervisors (like Microsoft Hyper-V), we verify a realistic multi-core processor with integrated mechanisms for two levels of virtual address translation. The second level of address translation (SLAT) is required to allow guests of a hypervisor (typically operating systems) to execute their programs in translated mode. In hardware, the scheme of SLAT is implemented in a dedicated processor component, called the two-level memory management unit (MMU). Both the specification and the implementation of the two-level MMU are presented in full detail. The main contribution is the complete paper and pencil correctness proof for the pipelined multi-core implementation of the MIPS-86 ISA with SLAT. MIPS-86 combines the instruction set of MIPS with the memory model of x86. First, we consider the sequential implementation, which serves to demonstrate integration of the two-level MMU into the MIPS processor. In the proof of our main result — correctness of the pipelined implementation — we refer to the sequential case to show correctness of the MMU operation. This allows us to shift the focus towards the problems of pipelining a machine with speculative execution and interrupts, which are necessary to consider in combination with address translation.

Xucong ZHANG
Gaze Estimation and Interaction in Real-World Environments
(Advisor: Dr. Andreas Bulling)

Wednesday, 26.09.2018, 11:00 h, in building E1 4, room 0.19


Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. Appearance-based gaze estimation methods directly regress from eye images to gaze targets without eye feature detection, and therefore have great potential to work with ordinary devices. However, these methods require a large amount of domain-specific training data to cover the significant variability in eye appearance. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. We collected a large-scale gaze estimation dataset from real-world setting, proposed a full-face gaze estimation method, and studied data normalization. We applied our gaze estimation methods to real-world interactive application including eye contact detection and gaze estimation with multiple personal devices.

Yongtao SHUAI
Dynamic Adaptive Video Streaming with Minimal Buffer Sizes

(Advisor: Prof. Thorsten Herfet)

Wednesday, 19.09.2018, 16:00 h, in building E1 1, room 407


Recently, adaptive streaming has been widely adopted in video streaming services to improve the quality-of-experience (QoE) of video delivery over the Internet. However, state-of-the-art bitrate adaptation achieves satisfactory performance only with extensive buffering of several tens of seconds. This leads to high playback latency in video delivery, which is undesirable especially in the context of live content and interactive features with a low upper bound on the latency. Therefore, this thesis aims at pushing the application of adaptive streaming to its limit with respect to the buffering size, which is the dominant factor of the streaming latency.
In the current work, we first address the minimum buffering size required in adaptive streaming, which provides us with guidelines to determine a reasonable low latency for streaming systems. Then, we tackle the fundamental challenge of achieving such a low latency streaming by developing a novel adaptation algorithm that stabilizes buffer dynamics despite a small buffering size. We also present advanced improvements by designing a novel adaptation architecture with low-delay feedback for the bitrate selection and optimizing the underlying transport layer to offer efficient real-time streaming. Experimental evaluations demonstrate that our approach achieves superior QoE in adaptive video streaming, especially in the particularly challenging case of low latency streaming.


Understanding and Assessing Security on Android via Static Code Analysis
(Advisor: Prof. Michael Backes)

Tuesday, 28.08.2018, 14:00 h, in building E9 1, room 0.05


Smart devices have become a rich source of sensitive information including personal data (contacts and account data) and context information like GPS data that is continuously aggregated by onboard sensors. As a consequence, mobile platforms have become a prime target for malicious and over-curious applications. The growing complexity and the quickly rising number of mobile apps have further reinforced the demand for comprehensive application security vetting. This dissertation presents a line of work that advances security testing on Android via static code analysis. In the first part of this dissertation, we build an analysis framework that statically models the complex runtime behavior of apps and Android’s application framework (on which apps are built upon) to extract privacy and security-relevant data-flows. We provide the first classification of Android’s protected resources within the framework and generate precise API-to-permission mappings that excel over prior work. We then propose a third-party library detector for apps that is resilient against common code obfuscations to measure the outdatedness of libraries in apps and to attribute vulnerabilities to the correct software component. Based on these results, we identify root causes of app developers not updating their dependencies and propose actionable items to remedy the current status quo. Finally, we measure to which extent libraries can be updated automatically without modifying the application code.

Seong Joon OH
Image Manipulation against Learned Models: Privacy and Security Implications
(Advisor: Prof. Bernt Schiele)

Monday, 06.08.2018, 16:00 h, in building E1 4, room 0.24


Machine learning is transforming the world. Its application areas span privacy sensitive and security critical tasks such as human identification and self-driving cars. These applications raise privacy and security related questions that are not fully understood or answered yet: Can automatic person recognisers identify people in photos even when their faces are blurred? How easy is it to find an adversarial input for a self-driving car that makes it drive off the road?
This thesis contributes one of the first steps towards a better understanding of such concerns in the presence of data manipulation. From the point of view of user’s privacy, we show the inefficacy of common obfuscation methods like face blurring, and propose more advanced techniques based on head inpainting and adversarial examples. We discuss the duality of model security and user privacy problems and describe the implications of research in one area for the other. Finally, we study the knowledge aspect of the data manipulation problem: the more one knows about the target model, the more effective manipulations one can craft. We propose a game theoretic framework to systematically represent the partial knowledge on the target model and derive privacy and security guarantees. We also demonstrate that one can reveal architectural details and training hyperparameters of a model only by querying it, leading to even more effective data manipulations against it.


Quantifying and Mitigating Privacy Risks in Biomedical Data
(Advisor: Prof. Michael Backes)

Wednesday, 25.07.2018, 14:00 h, in building E9 1 (CISPA), room 001


The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, allowing for a breakthrough towards a more precise and personalized medicine. However, the release of these intrinsically highly sensitive, interdependent data poses a new severe privacy threat.
In this thesis, we provide means to quantify and protect the privacy of individuals’ biomedical data. Besides the genome, we specifically focus on two of the most important epigenetic elements influencing human health: microRNAs and DNA methylation. We quantify the privacy for multiple realistic attack scenarios. Our results underline that the privacy risks inherent to biomedical data have to be taken seriously. Moreover, we present and evaluate solutions to preserve the privacy of individuals. Our mitigation techniques stretch from the differentially private release of epigenetic data, considering its utility, up to cryptographic constructions to securely, and privately evaluate a random forest on a patient’s data.

On Flows, Paths, Roots, and Zeros
(Advisor: Prof. Kurt Mehlhorn)

Monday, 23.07.2018, 14:00 h, in building E1 7, room 001


This thesis has two parts; in the first of which we give new results for various network flow problems. (1) We present a novel dual ascent algorithm for min-cost flow and show that an implementation of it is very efficient on certain instance classes. (2) We approach the problem of numerical stability of interior point network flow algorithms by giving a path following method that works with integer arithmetic solely and is thus guaranteed to be free of any nu-merical instabilities. (3) We present a gradient descent approach for the undirected transship-ment problem and its special case, the single source shortest path problem (SSSP). For distrib-uted computation models this yields the first SSSP-algorithm with near-optimal number of communication rounds.
The second part deals with fundamental topics from algebraic computation. (1) We give an algorithm for computing the complex roots of a complex polynomial. While achieving a com-parable bit complexity as previous best results, our algorithm is simple and promising to be of practical impact. It uses a test for counting the roots of a polynomial in a region that is based on Pellet’s theorem. (2) We extend this test to polynomial systems, i.e., we develop an algorithm that can certify the existence of a k-fold zero of a zero-dimensional polynomial system within a given region. For bivariate systems, we show experimentally that this approach yields signifi-cant improvements when used as inclusion predicate in an elimination method.

Incentives in Dynamic Markets
(Advisor: Prof. Martin Hoefer, now Uni Frankfurt)

Monday, 23.07.2018, 11:00 h, in building E1 7, room 001


In this thesis, we consider a variety of combinatorial optimization problems within a common theme of uncertainty and selfish behavior.
In our first scenario, the input is collected from selfish players. Here, we study extensions of the so-called smoothness framework for mechanisms, a very useful technique for bounding the inefficiency of equilibria, to the cases of varying mechanism availability and participation of risk-averse players. In both of these cases, our main results are general theorems for the class of (λ,μ)-smooth mechanisms. We show that these mechanisms guarantee at most a (small) constant factor performance loss in the extended settings.
In our second scenario, we do not have access to the exact numerical input. Within this context, we explore combinatorial extensions of the well-known secretary problem under the assumption that the incoming elements only reveal their ordinal position within the set of previously arrived elements. We first observe that many existing algorithms for special matroid structures maintain their competitive ratio in the ordinal model. In contrast, we provide a lower bound for algorithms that are oblivious to the matroid structure. Finally, we design new algorithms that obtain constant competitive ratios for a variety of combinatorial problems.


Cryptographic Techniques for Privacy and Access Control in Cloud-Based Applications
(Advisor: Prof. Matteo Maffei, now Uni Wien)

Friday, 29.06.2018, 15:30 h, in building E9 1 (CISPA), room 001


Digitization is one of the key challenges for today’s industries and society. It affects more and more business areas and also user data and, in particular, sensitive information. Due to its sensitivity, it is important to treat personal information as secure and private as possible yet enabling cloud-based software to use that information when requested by the user.
In this thesis, we focus on the privacy-preserving outsourcing and sharing of data, the querying of outsourced protected data, and the usage of personal information as an access control mechanism for rating platforms, which should be protected from coercion attacks. In those three categories, we present cryptographic techniques and protocols that push the state of the art. In particular, we first present multi-client oblivious RAM (ORAM), which augments standard ORAM with selective data sharing through access control, confidentiality, and integrity. Second, we investigate on recent work in frequency-hiding order-preserving encryption and show that the state of the art misses rigorous treatment, allowing for simple attacks against the security of the existing scheme. As a remedy, we show how to fix the security definition and that the existing scheme, slightly adapted, fulfills it. Finally, we design and develop a coercion-resistant rating platform. Coercion-resistance has been dealt with mainly in the context of electronic voting yet also affects other areas of digital life such as rating platforms.

Understanding Regulatory Mechanisms Underlying Stem Cells Helps to Identify Cancer Biomarks
(Advisor: Prof. Volkhard Helms)

Thursday, 28.06.2018, 16:30 h, in building E2 1, room 001


In this thesis, we present novel algorithms to unravel regulatory mechanisms underlying stem cell differentiation and cancerogenesis. Inspired by the tightly interwoven topology of the regulators controlling the pluripotency network in mouse embryonic stem cells, we formulated the problem where a set of master regulatory genes in a regulatory network is identified by solving either of two combinatorial optimization problems namely a minimum dominating set or a minimum connected dominating set in weakly and strongly connected components. Then we applied the developed methods to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma. As not all the nodes in the solutions are critical, we developed a prioritization method to rank a set of candidate genes which are related to a certain disease based on systematic analysis of the genes that are differentially expressed in tumor and normal conditions. Moreover, we demonstrated that the topological features in regulatory networks surrounding differentially expressed genes are highly consistent in terms of using the output of several analysis tools for identifying differentially expressed genes. We compared two randomization strategies for TF-miRNA co-regulatory networks to infer significant network motifs underlying cellular identity. We showed that the edge-type conserving method surpasses the non-conserving method in terms of biological relevance and centrality overlap. We presented several web servers and software packages that were made publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identifies a set of key regulatory genes in a user-provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating set and minimum connected dominating set return the optimal solutions for the aforementioned problems. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-specific networks for the sets of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and offer detection of three- and four-node FFL motifs as a separate web service for mouse and human.

Model Reconstruction for Moment-based Stochastic Chemical Kinetics
(Advisor: Prof. Verena Wolf)

Monday, 25.06.2018, 12:00 h, in building E1 7, room 0.01


Based on the theory of stochastic chemical kinetics, the inherent randomness and stochasticity of biochemical reaction networks can be accurately described by discrete-state continuous-time Markov chains, where each chemical reaction corresponds to a state transition of the process. However, the analysis of such processes is computationally expensive and sophisticated numerical methods are required. The main complication comes due to the largeness problem of the state space, so that analysis techniques based on an exploration of the state space are often not feasible and the integration of the moments of the underlying probability distribution has become a very popular alternative. In this thesis we propose an analysis framework in which we integrate a number of moments of the process instead of the state probabilities. This results in a more time-efficient simulation of the time evolution of the process. In order to regain the state probabilities from the moment representation, we combine the moment-based simulation (MM) with a maximum entropy approach: the maximum entropy principle is applied to derive a distribution that fits best to a given sequence of moments. We further extend this approach by incorporating the conditional moments (MCM) which allows not only to reconstruct the distribution of the species present in high amount in the system, but also to approximate the probabilities of species with low molecular counts. For the given distribution reconstruction framework, we investigate the numerical accuracy and stability using case studies from systems biology, compare two different moment approximation methods (MM and MCM), examine if it can be used for the reaction rates estimation problem and describe the possible future applications.

Practical Dynamic Information Flow Control
(Advisor: Prof. Christian Hammer, now Uni Potsdam)

Friday, 22.06.2018, 16:30 h, in building E2 1, room 007


Computer systems and applications handle a plethora of private user information. These are often assisted by a set of bugs with both malicious and non-malicious intent leading to information leaks. Information flow control has been studied extensively as an approach to mitigate such leaks but has focussed mainly on static analyses. However, some of the applications are developed using dynamic languages like JavaScript that make the static analyses techniques ineffective. Although there has been a growing interest to develop dynamic analysis techniques, it still isn’t at the helm of information flow security in such settings; the prime reason being that these techniques either over-approximate or are too restrictive in most cases. This thesis focuses on improving the usability of dynamic information flow analyses. It begins by presenting a sound improvement and enhancement of the permissive-upgrade strategy. It, then, presents a sound and precise dynamic information flow analysis for handling complex features like exceptions in higher-order languages. To address the issue of permissiveness, this thesis develops a sound approach to bound leaks dynamically while allowing information release in accordance to a developer-specified budget. It then applies these techniques to a Web browser and explores a policy specification mechanism for Web applications.

Shilpa GARG
Computational Haplotyping: theory and practice
(Advisor: Prof. Tobias Marschall)

Thursday, 21.06.2018, 16:30 h, in building E2 1, room 007


Genomics has paved a new way to comprehend life and its evolution, and also to investigate causes of diseases and their treatment. One of the important problems in genomic analyses is haplotype assembly. Constructing complete and accurate haplotypes plays an essential role in understanding population genetics and how species evolve.In this thesis, we focus on computational approaches to haplotype assembly from third generation sequencing technologies. This involves huge amounts of sequencing data, and such data contain errors due to the single molecule sequencing protocols employed. Taking advantage of combinatorial formulations helps  to correct for these errors to solve the haplotyping problem. Various computational techniques such as dynamic programming, parameterized algorithms, and graph algorithms are used to solve this problem.
This thesis presents several contributions concerning the area of haplotyping. First, a novel algorithm based on dynamic programming is proposed to provide approximation guarantees for phasing a single individual. Second, an integrative approach is introduced to combining multiple sequencing datasets to generating complete and accurate haplotypes. The effectiveness of this integrative approach is demonstrated on a real human genome. Third, we provide a novel efficient approach to phasing pedigrees and demonstrate its advantages in comparison to phasing a single individual.Fourth, we present a generalized graph-based framework for performing haplotype-aware de novo assembly. Specifically, this generalized framework consists of a hybrid pipeline for generating accurate and complete haplotypes from data stemming from multiple sequencing technologies, one that provides accurate reads and other that provides long reads.

A tale of two packing problems: Improved algorithms and tighter bounds for online bin packing and the geometric knapsack problem
(Advisor: Prof. Rob van Stee, now Uni Siegen)

Tuesday, 12.06.2018, 16:15 h, in building E1 4, room 024


In this thesis, we deal with two packing problems: the online bin packing and the geometric knapsack problem. In online bin packing, the aim is to pack a given number of items of different size into a minimal number of containers. The items need to be packed one by one without knowing future items. For online bin packing in one dimension, we present a new family of algorithms that constitutes the first improvement over the previously best algorithm in almost 15 years. While the algorithmic ideas are intuitive, an elaborate analysis is required to prove its competitive ratio. We also give a lower bound for the competitive ratio of this family of algorithms. For online bin packing in higher dimensions, we discuss lower bounds for the competitive ratio and show that the ideas from the one-dimensional case cannot be easily transferred to obtain better two-dimensional algorithms. In the geometric knapsack problem, one aims to pack a maximum weight subset of given rectangles into one square container. For this problem, we consider offline approximation algorithms. For geometric knapsack with square items, we improve the running time of the best known PTAS and obtain an EPTAS. This shows that large running times caused by some standard techniques for geometric packing problems are not always necessary and can be improved. Finally, we show how to use resource augmentation to compute optimal solutions in EPTAS-time, thereby improving upon the known PTAS for this case.

Georg NEIS
Compositional Compiler Correctness Via Parametric Simulations
(Advisor: Prof. Derek Dreyer)

Thursday, 07.06.2018, 14:00 h, in building E1 5, room 0.29


Compiler verification is essential for the construction of fully verified software, but most prior work (such as CompCert) has focused on verifying whole-program compilers. To support separate compilation and to enable linking of results from different verified compilers, it is important to develop a compositional notion of compiler correctness that is modular (preserved under linking), transitive (supports multi-pass compilation), and flexible (applicable to compilers that use different intermediate languages or employ non-standard program transformations). In this thesis, we formalize such a notion of correctness based on parametric simulations, thus developing a novel approach to compositional compiler verification.


Andreas TEUCKE
An Approximation and Refinement Approach to First-Order Automated Reasoning
(Advisor: Prof. Christoph Weidenbach)

Wednesday, 16.05.2018, 14:00 h, in building E1 5, room 0.02


With the goal of lifting model-based guidance from the propositional setting to first-order logic, I have developed an approximation theorem proving approach based on counterexample-guided abstraction refinement. A given clause set is transformed into a simplified form where satisfiability is decidable.This approximation extends the signature and preserves unsatisfiability:if the simplified clause set is satisfiable, so is the original clause set. A resolution refutation generated by a decision procedure on the simplified clause set can then either be lifted to a refutation in the original clause set, or it guides a refinement excluding the previously found unliftable refutation. This way the approach is refutationally complete. I have implemented my approach based on the SPASS theorem prover. On certain satisfiable problems, the implementation shows the ability to beat established provers such as SPASS, iProver, and Vampire.

Georg TAMM
Architectures for Ubiquitous 3D on Heterogeneous Computing Platforms
(Advisor: Prof. Philipp Slusallek)

Wednesday, 09.05.2018, 14:00 h, in building D3 4,  VisCenter DFKI


Today, a wide scope for 3D graphics applications exists, including domains such as scientific visualization, 3D-enabled web pages, and entertainment. At the same time, the devices and platforms that run and display the applications are more heterogeneous than ever. Display environments range from mobile devices to desktop systems and ultimately to distributed displays. While the capability of the client devices may vary considerably, the visualization experiences running on them should be consistent. The field of application should dictate how and on what devices users access the application, not the technical requirements to realize the 3D output.
The goal of this thesis is to examine the diverse challenges involved in providing consistent and scalable visualization experiences to heterogeneous computing platforms and display setups. We developed a comprehensive set of rendering architectures in the major domains of scientific and medical visualization, web-based 3D applications, and movie virtual production. To provide the required service quality, performance, and scalability for different client devices and displays, our architectures focus on the efficient utilization and combination of the available client, server, and network resources. We present innovative solutions that incorporate methods for hybrid and distributed rendering as well as means to stream rendering results. We establish the browser as a promising platform for accessible and portable visualization services. We collaborated with experts from the medical field and the movie industry to evaluate the usability of our technology in real-world scenarios.

Integrated Timing Verification for Distributed Embedded Real-Time Systems
(Advisor: Prof. Reinhard Wilhelm)

Tuesday, 08.05.2018, 14:00 h, in building E1 1,  room 407


The verification of timing properties with commercial tools based on static analysis has been available for safety-critical automotive systems for more than a decade. Although they offer many other advantagesin addition to safety guarantees, these tools are rarely applied in the automotive industry. This thesis analyses different possible reasons for this lack of application and suggests solutions where possible. Amongst the results of this thesis are technical findings, like the further increase of automation (e.g. for loop-bound detection) or the automatic reduction of WCET overestimation for highly variant systems. With regard to the development process, technical as well as non-technical solutions are presented that target the actual users of the timing verification, which range from software developers to system architects. The central result of this thesis is the integration of existing and newly developed timing verification methods and tools into a holistic development process approach called Timing Oriented Application Development (TOAD).


Wenbin LI
From Perception over Anticipation to Manipulation
(Advisor: Dr. Mario Fritz)

Wednesday, 25.04.2018, 16:00 h, in building E1 4,  room 019


From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this talk, we summarize our efforts of exploring different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. We first explore how to guide robot to place a single block into the scene without collapsing the existing structure and later introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. We validate our model on both synthetic and real-world settings.

Towards the understanding of transcriptional and translational regulatory complexity
(Advisor: Prof. Dr. Volkhard Helms)

Wednesday, 11.04.2018, 10:00 h, in building E2 1,  room 001


Considering the same genome within every cell, the observed phenotypic diversity can only arise from highly regulated mechanisms beyond the encoded DNA sequence. We investigated several mechanisms of protein biosynthesis and analyzed DNA methylation patterns, alternative translation sites, and genomic mutations. As chromatin states are determined by epigenetic modifications and nucleosome occupancy,we conducted a structural superimposition approach between DNA methyltransferase 1 (DNMT1) and the nucleosome, which suggests that DNA methylation is dependent on accessibility of DNMT1 to nucleosome–bound DNA. Considering translation, alternative non–AUG translation initiation was observed. We developed reliable prediction models to detect these alternative start sites in a given mRNA sequence. Our tool PreTIS provides initiation confidences for all frame–independent non–cognate and AUG starts. Despite these innate factors, specific sequence variations can additionally affect a phenotype. We conduced a genome–wide analysis with millions of mutations and found an accumulation of SNPs next to transcription starts that could relate to a gene–specific regulatory signal. We also report similar conservation of canonical and alternative translation sites, highlighting the relevance of alternative mechanisms. Finally, our tool MutaNET automates variation analysis by scoring the impact of individual mutations on cell function while also integrating a gene regulatory network.


Biomedical Knowledge Base Construction form Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Klaus Berberich, now HTW Saar)

Monday, 12.03.2018, 10:00 h, in building E1 4,  room 024


Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems:
• We address a linking problem to  connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents thatcan be linked to a given excerpt.
• We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event.
• To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models.
Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.

Patrick ERNST
Biomedical Knowledge Base Construction from Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Gerhard Weikum)

Tuesday, 06.03.2018, 10:00 h, in building E1 4,  room 024


While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledge about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions:
– To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas.
– To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning overthe composability of partial facts.
– To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics andfor entity-relationship-oriented exploration.
Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.


Roland LEIßA
Language Support for Programming High-Performance Code
(Advisor: Prof. Dr. Sebastian Hack)

Thursday, 22.02.2018, 14:00 h, in building E1 7,  room 001


Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showingno signs of turning around. In particular, hardware becomes more and more specialized and exhibits different formsof parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD).
In this talk, we first introduce Sierra: an extension for C++ that facilitates portable and effective SIMD programming. Second, we present AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.

Gaze and Peripheral Vision Analysis for Human-Environment Interaction: Applications in Automotive and Mixed-Reality Scenarios
(Advisor: Prof. Dr. Wolfgang Wahlster)

Wednesday, 14.02.2018, 16:00 h, in building D3 2, Reuse seminar room -2.17


This thesis studies eye-based user interfaces which integrate information about the user’s perceptual focus-of-attention into multimodal systems to enrich the interaction with the surrounding environment. We examine two new modalities:gaze input and output in the peripheral field of view. All modalities are considered in the whole spectrum of the mixed-reality continuum. We show the added value of these new forms of multimodal interaction in two important application domains: Automotive User Interfaces and Human-Robot Collaboration. We present experiments that analyze gaze under various conditions and help to design a 3D model for peripheral vision. Furthermore, this work presents several new algorithms for eye-based interaction, like deictic reference in mobile scenarios, for non-intrusive user identification, or exploiting the peripheral field view for advanced multimodal presentations. These algorithms have been integrated into a number of software tools for eye-based interaction, which are used to implement 15 use cases for intelligent environment applications. These use cases cover a wide spectrum of applications, from spatial interactions with a rapidly changing environment from within a moving vehicle, to mixed-reality interaction between teams of human and robots.

Justifying The Strong Memory Semantics of Concurrent High-Level Programming Languages for System Programming
(Advisor: Prof. Dr. Wolfgang Paul)

Wednesday, 07.02.2018, 15:00 h, in E1 7, room 001


We justify the strong memory semantics of high-level concurrent languages, such as C11 or Java, when compiled to x86-like machines. Languages such as C11 guarantee that programs that obey a specific software discipline behave as if they were executed on a simple coherent shared memory multi-processor. Real x86 machines, on the other hand, do not provide a coherent shared memory, and compilers add slow synchronizing instructions to provide the illusion of coherency. We show that one software discipline that closely matches software disciplines of languages such as C11 and Java – in a nutshell, that racing memory accesses are annotated as such by the programmer – and one particular way to add relatively few synchronization instructions – in a nutshell, between an annotated store and an annotated load in the same thread – suffice to create this illusion.
The software discipline has to be obeyed by the program in the semantics of the high-level language, therefore allowing the verification effort to take place completely in the strong memory semantics of the high-level language. We treat a machine model with operating system support, and accordingly our theorems can be used to verify interruptible multi-core operating systems, including those where users can run unverified programs that do not obey the software discipline.

Variational Image Fusion
(Advisor: Prof. Dr. Joachim Weickert)

Monday, 05.02.2018, 10:15 h, in E1 1, room 407


The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework.
First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes.
Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields.
Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.


Numerical Analysis of Stochastic Biochemical Reaction Networks
(Advisor: Prof. Dr. Verena Wolf)

Wednesday, 31.01.2018, 14:00 h, in E1 7, room 001


Numerical solution of the chemical master equation for stochastic reaction networks typically suffers from the state space explosion problem due to the curse of dimensionality and from stiffness due to multiple time scales.The dimension of the state space equals the number of molecular species involved in the reaction network and the size of the system of differential equations equals the number of states in the corresponding continuous-time Markov chain, which is usually enormously huge and often even infinite. Thus, efficient numerical solution approaches must be able to handle huge, possibly infinite and stiff systems of differential equations efficiently.
In this thesis, we present efficient techniques for the numerical analysis of the biochemical reaction networks. We present an approximate numerical integration approach that combines a dynamical state space truncation procedure with efficient numerical integration schemes for systems of ordinary differential equations including adaptive step size selection based on local error estimates. We combine our dynamical state space truncation with the method of conditional moments, and present the implementation details and numerical results. We also incorporate ideas from importance sampling simulations into a non-simulative numerical method that approximates transient rare event probabilities based on a dynamical truncation of the state space.
Finally, we present a maximum likelihood method for the estimation of the model parameters given noisy time series measurements of molecular counts. All approaches presented in this thesis are implemented as part of the tool STAR, which allows to model and simulate the biochemical reaction networks. The efficiency and accuracy is demonstrated by numerical examples.

Relational Cost Analysis
(Advisor: Dr. Deepak Garg)

Monday, 22.01.2018, 16:00 h, in E1 5, room 029


Programming languages research has made great progress towards statically estimating the execution cost of a program. However, when one is interested in how the execution costs of two programs compare to each other (i.e., relational cost analysis), the use of unary techniques does not work well in many cases. In order to support a relational cost analysis, we must ultimately support reasoning about not only the executions of a single program, but also the executions of two programs, taking into account their similarities. This dissertation makes several contributions to the understanding and development of such a relational cost analysis. It shows how:
• Refinement types and effect systems can express functional and relational quantitative properties of pairs of programs, including the difference in execution costs.
• Relational cost analysis can be adapted to reason about dynamic stability, a measure of the update times of incremental programs as their inputs change.
• A sound and complete bidirectional type system can be developed (and implemented) for relational cost analysis.

Thesis defenses