PhD Thesis Defenses

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.



Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D
(Advisor: Prof. Nico Pfeifer, now Tübingen)

Monday, 24.06.2019, 16:00 h, in building E1 4, room 0.24


With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multi-pronged regulatory mechanisms of the genome. In my colloquium, I will present interpretable machine learning-based approaches for prediction and analysis of long-range chromatin interactions and how they can help gain novel, fine-grained insights into genome regulatory mechanisms. Specifically, I will first present our string kernels-based support vector classification method for prediction of long-range chromatin interactions. This method was one of the first bioinformatic approaches to suggest a potential general role of short tandem repeat sequences in long-range genome regulation and its 3D organization. I will then present CoMIK, a method we developed for handling variable-length DNA sequences in classification scenarios. CoMIK can not only identify the features important for classification but also locate them within the individual sequences, thus enabling the possibility of more fine-grained insights.

Artificial Intelligence for Efficient Image-based View Synthesis
(Advisor: Prof. Hans-Peter Seidel and Prof. Tobias Ritschel, now University College London)

Monday, 24.06.2019, 9:00 h, in building E1 4, room 0.19


Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or   multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. We demonstrate how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion.

Azim Dehghani Amirabad
From genes to transcripts: Integrative modeling and analysis of regularity networks
(Advisor: Dr. Marcel Schulz, now Frankfurt)

Tuesday, 11.06.2019, 10:00 h, in building E1 7, room 002


Although all the cells in an organism posses the same genome, the regulatory mechanisms lead to highly specic cell types. Elucidating these regulatory mechanisms is a great challenge in systems biology research. Nonetheless, it is known that a large fraction of our genome is comprised of regulatory elements, the precise mechanisms by which different combinations of regulatory elements are involved in controlling gene expression and cell identity are poorly understood. This thesis describes algorithms and approaches for modeling and analysis of different modes of gene regulation. We present POSTIT a novel algorithm for modeling and inferring transcript isoform regulation from transcriptomics and epigenomics data. POSTIT uses multi-task learning with structured-sparsity inducing regularizer to share the regulatory information between isoforms of a gene, which is shown to lead to accurate isoform expression prediction and inference of regulators.
We developed a novel statistical method, called SPONGE, for large-scale inference of ceRNA networks. In this framework, we designed an efficient empirical p-value computation approach, by sampling from derived null models, which addresses important confounding factors such as sample size, number of involved regulators and strength of correlation.
Finally, we present an integrative analysis of miRNA and protein-based post-transcriptional regulation. We postulate a competitive regulation of the RNA-binding protein IMP2 with miRNAs binding the same RNAs using expression and RNA binding data. This function of IMP2 is relevant in the contribution to disease in the context of adult cellular metabolism.


Model-based Human Performance Capture in Outdoor Scenes
(Advisor: Prof. Christian Theobalt)

Tuesday, 21.05.2019, 10:00 h, in building E1 4, room 019


Technologies for motion and performance capture of real actors have enabled the creation of realistic-looking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios.
Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and fine-scale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.

Monitoring with Parameters
(Advisor: Prof. Bernd Finkbeiner)

Friday, 17.05.2019, 14:15 h, in building E1 7, room 0.01


Runtime monitoring of embedded systems is a method to safeguard their reliable operation by detecting runtime failures within the system and recognizing unexpected environment behavior during system development and operation. Specification languages for runtime monitoring aim to provide the succinct, understandable, and formal specification of system and component properties.
Starting from temporal logics, this thesis extends the expressivity of specification languages for runtime monitoring in three key aspects while maintaining the predictability of resource usage. First, we provide monitoring algorithms for linear-time temporal logic with parameters (PLTL), where the parameters bound the number of steps until an eventuality is satisfied. Second, we introduce Lola 2.0, which adds data parameterization to the stream specification language Lola. Third, we integrate real-time specifications in RTLola and add real-time sliding windows, which aggregate data over real-time intervals. For the combination of these extensions, we present a design-time specification analysis which provides resource guarantees.
We report on a case study on the application of the language in an autonomous UAS. Component and system properties were specified together with domain experts in the developed stream specification language and evaluated in a real-time hardware-in-the-loop testbed with a complex environment simulation.


Matthias DÖRING
Computational Approaches for Improving Treatment and Prevention of Viral Infections
(Advisor: Prof. Nico Pfeifer, now Tübingen)

Tuesday, 30.04.2019, 15:00 h, in building E2 1, room 0.01


The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments.
geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.

Stochastic Modeling of DNA Demethylation Dynamics in ESCs
(Advisor: Prof. Verena Wolf)

Tuesday, 30.04.2019, 11:00 h, in building E1 5, room 0.29


DNA methylation and demethylation are opposing processes that when in balance create stable patterns of epigenetic memory. The control of DNA methylation pattern formation in replication dependent and independent demethylation processes has been suggested to be influenced by Tet mediated oxidation of a methylated cytosine, 5mC, to a hydroxylated cytosine, 5hmC. Based only on in vitro experiments, several alternative mechanisms have been proposed on how 5hmC influences replication dependent maintenance of DNA methylation and replication independent processes of active demethylation. In this thesis we design an extended and easily generalizable hidden Markov model that uses as input hairpin (oxidative-)bisulfite sequencing data to precisely determine the over time dynamics of 5mC and 5hmC, as well as to infer the activities of the involved enzymes at a single CpG resolution. Developing the appropriate statistical and computational tools, we apply the model to discrete high-depth sequenced genomic loci, and on a whole genome scale with a much smaller sequencing depth. Performing the analysis of the model’s output on mESCs data, we show that the presence of Tet enzymes and 5hmC has a very strong impact on replication dependent demethylation by establishing a passive demethylation mechanism, implicitly impairing methylation maintenance, but also down-regulating the de novo methylation activity.

Analysis of the Protein-Ligand and Protein-Peptide Interactions Using a Combined Sequence- and Structure-Based Approach
(Advisor: Prof. Olga Kalinina)

Wednesday, 17.04.2019, 15:00 h, in building E1 4, room 0.24


Proteins participate in most of the important processes in cells, and their ability to perform their function ultimately depends on their three-dimensional structure. They usually act in these processes through interactions with other molecules. Because of the importance of their role, proteins are also the common target for small molecule drugs that inhibit their activity, which may include targeting protein interactions. Understanding protein interactions and how they are affected by mutations is thus crucial for combating drug resistance and aiding drug design.
This dissertation combines bioinformatics studies of protein interactions at both primary sequence and structural level. We analyse protein-protein interactions through linear motifs, as well as protein-small molecule interactions, and study how mutations affect them. This is done in the context of two systems. In the first study of drug resistance mutations in the protease of the human immunodeficiency virus type 1, we successfully apply molecular dynamics simulations to estimate the effects of known resistance-associated mutations on the free binding energy, also revealing molecular mechanisms of resistance. In the second study, we analyse consensus profiles of linear motifs that mediate the recognition by the mitogen-activated protein kinases of their target proteins. We thus gain insights into the cellular processes these proteins are involved in.

 Abdalghani ABUJABAL
Question Answering over Knowledge Bases with Continuous Learning
(Advisor: Prof. Gerhard Weikum)

Friday, 12.04.2019, 11:00 h, in building E1 4, room 0.24


Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, two research issues arise:
(i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, and
(ii) how to design and curate datasets to advance research in question answering. In this dissertation, we make the following contributions in the area of question answering (QA). For issue (i), we present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. We also introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems.

Generalizations of the Multicut Problem for Computer Vision
(Advisor: Prof. Bernt Schiele)

Monday, 08.04.2019, 11:30 h, in building E1 4, room 0.24


Graph decompositions have always been an essential part of computer vision as a large variety of tasks can be formulated in this framework. One way to optimize for a decomposition of a graph is called the multicut problem, also known as correlation clustering. Its particular feature is that the number of clusters is not required to be known a priori, but is rather deduced from the optimal solution. On the downside, it is NP-hard to solve.
In this thesis we present several generalizations of the multicut problem, that allow to model richer dependencies between nodes of a graph. We also propose efficient local search algorithms, that in practice find good solution in reasonable time, and show applications to image segmentation, multi-human pose estimation, multi-object tracking and many others.

Sebastian HAHN
On Static Execution-Time Analysis – Compositionality, Pipeline Abstraction, and Predictable Hardware
(Advisor: Prof. Reinhard Wilhelm)

Thursday, 04.04.2019, 16:00 h, in building E1 1, room 407


Proving timeliness is an integral part of the verification of safety-critical real-time systems. To this end, timing analysis computes upper bounds on the execution times of programs that execute on a given hardware platform.
At least since multi-core processors have emerged, timing analysis separates concerns by analysing different aspects of the system’s timing behaviour individually. In this work, we validate the underlying assumption that a timing bound can be soundly composed from individual contributions. We show that even simple processors exhibit counter-intuitive behaviour – a locally slow execution can lead to an even slower overall execution – that impedes the soundness of the composition. We present the compositional base bound analysis that accounts for any such amplifying effects within its timing contribution. This enables a sound although expensive compositional analysis even for complex contemporary processors.
In addition, we discuss future hardware designs that enable efficient compositional analyses. We introduce a hardware design, the strictly in-order pipeline, that behaves monotonically with respect to the progress of a program’s execution. Monotonicity enables us to formally reason about properties such as compositionality.

Joanna BIEGA
Enhancing Privacy and Fairness in Search Systems
(Advisor: Prof. Gerhard Weikum)

Monday, 01.04.2019, 15:15 h, in building E1 4, room 024


Search systems have a substantial potential to harm individuals and the society. First, since they collect vast amounts of data about their users, they have the potential to violate user privacy. Second, in applications where rankings influence people’s economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results.
This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects:
* We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility.
* We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity.
* We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user’s relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions.
* Last but not least, we propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles.

Yecheng GU
Intelligent Tutoring in Virtual Reality for Highly Dynamic Pedestrian Safety Training
(Advisor: Prof. Jörg Siekmann)

Monday, 01.04.2019, 14:00 h, in building E1 7, room 0.01


This thesis presents the design, implementation, and evaluation of an Intelligent Tutoring System (ITS) with a Virtual Reality (VR) interface for child pedestrian safety training. This system enables children to train practical skills in a safe and realistic virtual environment without the time and space dependencies of traditional roadside training. This system also employs Domain and Student Modelling techniques to analyze user data during training automatically and to provide appropriate instructions and feedback.
Thus, the traditional requirement of constant monitoring from teaching personnel is greatly reduced. Compared to previous work, especially the second aspect is a principal novelty for this domain. To achieve this, a novel Domain and Student Modeling method was developed in addition to a modular and extensible virtual environment for the target domain. While the Domain and Student Modeling framework is designed to handle the highly dynamic nature of training in traffic and the ill-defined characteristics of pedestrian tasks, the modular virtual environment supports different interaction methods and a simple and efficient way to create and adapt exercises. The thesis is complemented by two user studies with elementary school children. These studies testify great overall user acceptance and the system’s potential for improving key pedestrian skills through autonomous learning. Last but not least, the thesis presents experiments with different forms of VR input and provides directions for future work.


Novel Approaches to Anonymity and Privacy in Decentralized, Open Settings
(Advisor: Prof. Michael Backes)

Friday, 29.03.2019, 11:30 h, in building E9 1, room 0.01


The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with.
In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting.
Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks.
At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.

What we leave behind: reproducibility in chromatin analysis within and across species
(Advisor: Prof. Thomas Lengauer)

Thursday, 14.03.2019, 11:30 h, in building E2 1, room 007


Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. In my colloquium, I am going to present approaches dealing with three challenging problems in computational epigenomics: first, we designed a solution strategy for improving reproducibility of computational analysis as part of a large national epigenetics research consortium. Second, we developed a software tool for the comparative epigenome analysis of cell types within a species. For this project, we relied on a well-established and compact representation of epigenetic information known as chromatin states, which enabled us to realize fast and interpretable comparative chromatin analysis even for small sample numbers. Third, we extended comparative epigenomics beyond canonical model organisms to also include species for which little to no epigenetic data is available. In a proof of concept study, we implemented a computational pipeline for predicting biologically relevant information, e.g., cell-type specific gene expression profiles, in non-model organisms based only on reference epigenomes from well-studied species such as human or mouse.


Scalable positioning of commodity mobile devices using audio signals
(Advisor: Prof. Peter Druschel)

Thursday, 14.02.2019, 9:00 h, in building E1 5, room 0.29


This thesis explores the problem of computing a position map for co-located mobile devices. The positioning should happen in a scalable manner without requiring specialized hardware and without requiring specialized infrastructure (except basic Wi-Fi or cellular access). At events like meetings, talks, or conferences, a position map can aid spontaneous communication among users based on their relative position in two ways. First, it enables users to choose message recipients based on their relative position, which also enables the position-based distribution of documents. Second, it enables senders to attach their position to messages, which can facilitate interaction between speaker and audience in a lecture hall and enables the collection of feedback based on users‘ location.
In this thesis, we present Sonoloc, a mobile app and system that, by relying on acoustic signals, allows a set of commodity smart devices to determine their _relative_ positions. Sonoloc can position any number of devices within acoustic range with a constant number of acoustic signals emitted by a subset of devices. Our experimental evaluation with up to 115 devices in real rooms shows that — despite substantial background noise — the system can locate devices with an accuracy of tens of centimeters using no more than 15 acoustic signals.

Techniques to Protect Confidentiality and Integrity of Persistent and In-Memory Data
(Advisor: Prof. Peter Druschel)

Tuesday, 05.02.2019, 17:30 h, in building E1 5, room 0.29


Today computers store and analyze valuable and sensitive data. As a result we need to protect this data against confidentiality and integrity violations that can result in the illicit release, loss, or modification of a user’s and an organization’s sensitive data such as personal media content or client records. Existing techniques protecting confidentiality and integrity lack either efficiency or are vulnerable to malicious attacks. In this thesis we suggest techniques, Guardat and ERIM, to efficiently and robustly protect persistent and in-memory data.
To protect the confidentiality and integrity of persistent data, clients specify per-file policies to Guardat declaratively, concisely and separately from code. Guardat enforces policies by mediating I/O in the storage layer. In contrast to prior techniques, we protect against accidental or malicious circumvention of higher software layers. We present the design and prototype implementation, and demonstrate that Guardat efficiently enforces example policies in a web server.
To protect the confidentiality and integrity of in-memory data, ERIM isolates sensitive data using Intel Memory Protection Keys (MPK), a recent x86 extension to partition the address space. However, MPK does not protect against malicious attacks by itself. We prevent malicious attacks by combining MPK with call gates to trusted entry points and ahead-of-time binary inspection. In contrast to existing techniques, ERIM efficiently protects frequently-used session keys of web servers, an in-memory reference monitor’s private state, and managed runtimes from native libraries. These use cases result in high switch rates of the order of 10 5 –10 6 switches/s. Our experiments demonstrate less then 1% runtime overhead per 100,000 switches/s, thus outperforming existing techniques.

Muhammad Bilal ZAFAR
Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms
(Advisor: Prof. Krishna Gummadi)

Monday, 04.02.2019, 18:00 h, in building E1 5, room 0.29


The rise of algorithmic decision making in a variety of applications has also raised concerns about its potential for discrimination against certain social groups. However, incorporating nondiscrimination goals into the design of algorithmic decision making systems (or, classifiers) has proven to be quite challenging. These challenges arise mainly due to the computational complexities involved in the process, and the inadequacy of existing measures to computationally capture discrimination in various situations. The goal of this thesis is to tackle these problems. First, with the aim of incorporating existing measures of discrimination (namely, disparate treatment and disparate impact) into the design of well-known classifiers, we introduce a mechanism of decision boundary covariance, that can be included in the formulation of any convex boundary-based classifier in the form of convex constraints. Second, we propose alternative measures of discrimination. Our first proposed measure, disparate mistreatment, is useful in situations when unbiased ground truth training data is available. The other two measures, preferred treatment and preferred impact, are useful in situations when feature and class distributions of different social groups are significantly different, and can additionally help reduce the cost of nondiscrimination (as compared to the existing measures). We also design mechanisms to incorporate these new measures into the design of convex boundary-based classifiers.



Christian LANDER
Ubiquitous Head-Mounted Gaze Tracking
(Advisor: Prof. Antonio Krüger)

Thursday, 24.01.2019, 15:00 h, in building D3 2,  „Reuse“ meeting room


Recently, gaze-based interfaces in stationary settings became a valid option, as remote eye trackers are easily available at a low price. However, making gaze-based interfaces ubiquitous remains an open challenge that cannot be resolved using remote approaches. The miniaturization in camera technology allows for gaze estimation with head-mounted devices, which are similar to a pair of glasses. A crucial step towards gaze estimation using such devices is calibration. Although the development is mainly technology driven, a hypothetical fully calibrated system in which all parameters are known and valid is affected by calibration drift. In addition, attributes like minimal intrusiveness and obstruction, easy and flexible setup and high accuracy are not present in the latest commercial devices. Hence their applicability to spontaneous interaction or long-lasting research experiments is questionable.
In this thesis we enable spontaneous, calibration-free and accurate mobile gaze estimation. We contribute by investigating the following three areas: Efficient Long-Term Usage of a head-mounted eye tracker; Location, Orientation & Target Independent head mounted eye tracking; and Mobile & Accurate Calibration-Free gaze estimation. Through the elaboration of the theoretical principles, we investigate novel concepts for each aspect; these are implemented and evaluated, converging into a final device, to show how to overcome current limitations.



Quantifying & Characterizing Information Diets of Social Media Users
(Advisor: Prof. Krishna Gummadi)

Thursday, 20.12.2018, 16:00 h, in building E1 5, room 0.29


An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media.
With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives.
Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.

Applicable and Sound Polyhedral Optimization of Low-Level Programs
(Advisor: Prof. Sebastian Hack)

Wednesday, 19.12.2018, 10:15 h, in building E1 7, room 0.01


Computers become increasingly complex. Current and future systems feature configurable hardware, multiple cores with different capabilities, as well as accelerators. In addition, the memory subsystem becomes diversified too. The cache hierarchy grows deeper, is augmented with scratchpads, low-latency memory, and high-bandwidth memory. The programmer alone cannot utilize this enormous potential. Compilers have to provide insight into the program behavior, or even arrange computations and data themselves. Either way, they need a more holistic view of the program. Local transformations, which treat the iteration order, computation unit, and data layout as fixed, will not be able to fully utilize a diverse system.
The polyhedral model, a high-level program representation and transformation framework, has shown great success tackling various problems in the context of diverse systems. While it is widely acknowledged for its analytical powers and transformation capabilities, it is also widely assumed to be too restrictive and fragile for real-world programs.
In this thesis, we improve the applicability and profitability of polyhedral-model-based techniques. Our efforts guarantee a sound polyhedral representation and extend the applicability to a wider range of programs. In addition, we introduce new applications to utilize the information available in the polyhedral program representation, including standalone optimizations and techniques to derive high-level properties.

Alexander WEINERT
Optimality and Resilience in Parity Games
(Advisor: Dr. Martin Zimmermann)

Friday, 14.12.2018, 14:15 h, in building E1 7, room 0.01


Modeling reactive systems as infinite games has yielded a multitude of results in the fields of program verification and program synthesis. The canonical parity condition used for such games, however, neither suffices to express non-functional requirements on the modeled system, nor to capture malfunctions of the deployed system.
In this work we address these issues by investigating quantitative games in which the above characteristics can be expressed. Parity games with costs are a variant of parity games in which traversing an edge incurs some nonnegative cost. The cost of a play is the limit of the cost incurred between answering odd colors by larger even ones. We extend that model by allowing for integer costs, obtaining parity games with weights, and determine both the computational complexity of solving the resulting games as well as the memory required by the protagonist to win such games.
Our results show that the increase in expressiveness comes at a steep cost in terms of both measures. Subsequently, we address the optimality question, i.e., the question asking whether the protagonist is able to ensure a given bound on the cost, and show that answering this question is harder than just solving the underlying game, even for special cases of parity games with weights.
Moreover, we further extend this model to allow for modeling disturbances that may occur during the operation of the resulting strategy and show how to compute a strategy that is resilient against as many such disturbances as possible. Finally, we show that the protagonist is able to trade memory, quality, and resilience of strategies for one another and we show how to compute the possible tradeoffs between these three values.

Paarijaat ADITYA
Towards Privacy-Compliant Mobile Computing
(Advisor: Prof. Peter Druschel)

Monday, 03.12.2018, 16:30 h, in building E1 5, room 0.29


Sophisticated mobile computing, sensing and recording devices like smartphones, smartwatches, and wearable cameras are carried by their users virtually around the clock, blurring the distinction between the online and offline worlds. While these devices enable transformative new applications and services, they also introduce entirely new threats to users‘ privacy, because they can capture a complete record of the user’s location, online and offline activities, and social encounters, including an audiovisual record. Such a record of users‘ personal information is highly sensitive and is subject to numerous privacy risks. In this thesis, we have investigated and built systems to mitigate two such privacy risks:
1) privacy risks due to ubiquitous digital capture, where bystanders may inadvertently be captured in photos and videos recorded by other nearby users,
2) privacy risks to users‘ personal information introduced by a popular class of apps called ‘mobile social apps’. In this thesis, we present two systems, called I-Pic and EnCore, built to mitigate these two privacy risks.
Both systems aim to put the users back in control of what personal information is being collected and shared, while still enabling innovative new applications. We built working prototypes of both systems and evaluated them through actual user deployments. Overall we demonstrate that it is possible to achieve privacy-compliant digital capture and it is possible to build privacy-compliant mobile social apps, while preserving their intended functionality and ease-of-use. Furthermore, we also explore how the two solutions can be merged into a powerful combination, one which could enable novel workflows for specifying privacy preferences in image capture that do not currently exist.


Algorithmic Results for Clustering and Refined Physarum Analysis
(Advisor: Prof. Kurt Mehlhorn)

Tuesday, 27.11.2018, 16:15 h, in building E1 5, room 0.29


In the first part of this thesis, we study the Binary $\ell_0$-Rank-$k$ problem which given a binary matrix $A$ and a positive integer $k$, seeks to find a rank-$k$ binary matrix $B$ minimizing the number of non-zero entries of $A-B$. A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant $k$ over the reals, $\mathbb{F}_2$, and the Boolean semiring. In addition, we give novel algorithms for important variants of $\ell_0$-low rank approximation. The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph $G$ into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal $k$-way node partition of $G$. In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.

A Verified Compiler for a Linear Imperative/Functional Intermediate Language
(Advisors: Prof. Sebastian Hack & Prof. Gert Smolka)

Friday, 16.11.2018, 12:00 h, in building E1 7, room 0.01


This thesis describes the design of the verified compiler LVC. LVC’s main novelty is the way its first-order, term-based intermediate language IL realizes the advantages of static single assignment (SSA) for verified compilation. IL is a term-based language not based on a control-flow graph (CFG) but defined in terms of an inductively defined syntax with lexically scoped mutually recursive function definitions. IL replaces the usual dominance-based SSA definition found in unverified and verified compilers with the novel notion of coherence. The main research question this thesis studies is whether IL with coherence offers a faithful implementation of SSA, and how the design influences the correctness invariants and the proofs in the verified compiler LVC. To study this question, we verify dead code elimination, several SSA-based value optimizations including sparse conditional constant propagation and SSA-based register allocation approach including spilling. In these case studies, IL with coherence provides the usual advantages of SSA and improves modularity of proofs. Furthermore, we propose a novel SSA construction algorithm based on coherence, and leverage the term structure of IL to obtain an inductive proof method for simulation proofs. LVC is implemented and verified with over 50,000 lines of code using the proof assistant Coq. To underline practicability of our approach, we integrate LVC with CompCert to obtain an executable compiler that generates PowerPC assembly code.

Supporting User’s Influence in Gamification Settings and Game Live-Streams
(Advisor: Prof. Antonio Krüger)

Tuesday, 06.11.2018, 15:00 h, in building D3 2 (DFKI), room -2.17 (Reuse)


Playing games has long been important to mankind. One reason for this is the associated autonomy, as players can decide on many aspects on their own and can shape the experience. Game-related sub-fields have appeared in Human-Computer Interaction where this autonomy is questionable: in this thesis, we consider gamification and game live-streams and here, we support the users‘ influence at runtime. We hypothesize that this should affect the perception of autonomy and should lead to positive effects overall. Our contribution is three-fold: first, we investigate crowd-based, self-sustaining systems in which the user’s influence directly impacts the outcome of the system’s service. We show that users are willing to expend effort in such systems even without additional motivation, but that gamification is still beneficial here. Second, we introduce „bottom-up“ gamification, i.e., the idea of self-tailored gamification. Here, users have full control over the gamification used in a system, i.e., they can set it up as they see fit at the system’s runtime. Through user studies, we show that this has positive behavioral effects and thus adds to the ongoing efforts to move away from „one-size-fits-all“ solutions. Third, we investigate how to make gaming live-streams more interactive, and how viewers perceive this. We also consider shared game control settings in live-streams, in which viewers have full control, and we contribute options to support viewers‘ self-administration here.


Mining Sandboxes
(Advisor: Prof. Andreas Zeller)

Monday, 22.10.2018, 11:00 h, in building E9 1, room 0.01


Modern software is ubiquitous, yet insecure. It has the potential to expose billions of humans to serious harm, up to and including losing fortunes and taking lives. Existing approaches for securing programs are either exceedingly hard and costly to apply, significantly decrease usability, or just don’t work well enough against a determined attacker.
In this thesis we propose a new solution that significantly increases application security yet it is cheap, easy to deploy, and has minimal usability impact. We combine in a novel way the best of what existing techniques of test generation, dynamic program analysis and runtime enforcement have to offer: We introduce the concept of sandbox mining. First, in a phase called mining, we use automatic test generation to discover application behavior. Second, we apply a sandbox to limit any behavior during normal usage to the one discovered during mining. Users of an application running in a mined sandbox are thus protected from the application suddenly changing its behavior, as compared to the one observed during automatic test generation. As a consequence, backdoors, advanced persistent threats and other kinds of attacks based on the passage of time become exceedingly hard to conduct covertly. They are either discovered in the secure mining phase, where they can do no damage, or are blocked altogether. Mining is cheap because we leverage fully automated test generation to provide baseline behavior. Usability is not degraded: the sandbox runtime enforcement impact is negligible; the mined behavior is comprehensive and presented in a human readable format, thus any unexpected behavior changes are rare and easy to reason about. Our BOXMATE prototype for Android applications shows the approach is technically feasible, has an easy setup process, and is widely applicable to existing apps. Experiments conducted with BOXMATE show less than one hour is required to mine Android applications sandboxes, requiring few to no confirmations for frequently used functionality.


Correctness of Multi-Core Processors with Operating System Support
(Advisor: Prof. Wolfgang Paul)

Thursday, 27.09.2018, 10:00 h, in building E1 5, room 0.29


In the course of adding hardware support for operating systems and hypervisors (like Microsoft Hyper-V), we verify a realistic multi-core processor with integrated mechanisms for two levels of virtual address translation. The second level of address translation (SLAT) is required to allow guests of a hypervisor (typically operating systems) to execute their programs in translated mode. In hardware, the scheme of SLAT is implemented in a dedicated processor component, called the two-level memory management unit (MMU). Both the specification and the implementation of the two-level MMU are presented in full detail. The main contribution is the complete paper and pencil correctness proof for the pipelined multi-core implementation of the MIPS-86 ISA with SLAT. MIPS-86 combines the instruction set of MIPS with the memory model of x86. First, we consider the sequential implementation, which serves to demonstrate integration of the two-level MMU into the MIPS processor. In the proof of our main result — correctness of the pipelined implementation — we refer to the sequential case to show correctness of the MMU operation. This allows us to shift the focus towards the problems of pipelining a machine with speculative execution and interrupts, which are necessary to consider in combination with address translation.

Xucong ZHANG
Gaze Estimation and Interaction in Real-World Environments
(Advisor: Dr. Andreas Bulling)

Wednesday, 26.09.2018, 11:00 h, in building E1 4, room 0.19


Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. Appearance-based gaze estimation methods directly regress from eye images to gaze targets without eye feature detection, and therefore have great potential to work with ordinary devices. However, these methods require a large amount of domain-specific training data to cover the significant variability in eye appearance. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. We collected a large-scale gaze estimation dataset from real-world setting, proposed a full-face gaze estimation method, and studied data normalization. We applied our gaze estimation methods to real-world interactive application including eye contact detection and gaze estimation with multiple personal devices.

Yongtao SHUAI
Dynamic Adaptive Video Streaming with Minimal Buffer Sizes

(Advisor: Prof. Thorsten Herfet)

Wednesday, 19.09.2018, 16:00 h, in building E1 1, room 407


Recently, adaptive streaming has been widely adopted in video streaming services to improve the quality-of-experience (QoE) of video delivery over the Internet. However, state-of-the-art bitrate adaptation achieves satisfactory performance only with extensive buffering of several tens of seconds. This leads to high playback latency in video delivery, which is undesirable especially in the context of live content and interactive features with a low upper bound on the latency. Therefore, this thesis aims at pushing the application of adaptive streaming to its limit with respect to the buffering size, which is the dominant factor of the streaming latency.
In the current work, we first address the minimum buffering size required in adaptive streaming, which provides us with guidelines to determine a reasonable low latency for streaming systems. Then, we tackle the fundamental challenge of achieving such a low latency streaming by developing a novel adaptation algorithm that stabilizes buffer dynamics despite a small buffering size. We also present advanced improvements by designing a novel adaptation architecture with low-delay feedback for the bitrate selection and optimizing the underlying transport layer to offer efficient real-time streaming. Experimental evaluations demonstrate that our approach achieves superior QoE in adaptive video streaming, especially in the particularly challenging case of low latency streaming.


Understanding and Assessing Security on Android via Static Code Analysis
(Advisor: Prof. Michael Backes)

Tuesday, 28.08.2018, 14:00 h, in building E9 1, room 0.05


Smart devices have become a rich source of sensitive information including personal data (contacts and account data) and context information like GPS data that is continuously aggregated by onboard sensors. As a consequence, mobile platforms have become a prime target for malicious and over-curious applications. The growing complexity and the quickly rising number of mobile apps have further reinforced the demand for comprehensive application security vetting. This dissertation presents a line of work that advances security testing on Android via static code analysis. In the first part of this dissertation, we build an analysis framework that statically models the complex runtime behavior of apps and Android’s application framework (on which apps are built upon) to extract privacy and security-relevant data-flows. We provide the first classification of Android’s protected resources within the framework and generate precise API-to-permission mappings that excel over prior work. We then propose a third-party library detector for apps that is resilient against common code obfuscations to measure the outdatedness of libraries in apps and to attribute vulnerabilities to the correct software component. Based on these results, we identify root causes of app developers not updating their dependencies and propose actionable items to remedy the current status quo. Finally, we measure to which extent libraries can be updated automatically without modifying the application code.

Seong Joon OH
Image Manipulation against Learned Models: Privacy and Security Implications
(Advisor: Prof. Bernt Schiele)

Monday, 06.08.2018, 16:00 h, in building E1 4, room 0.24


Machine learning is transforming the world. Its application areas span privacy sensitive and security critical tasks such as human identification and self-driving cars. These applications raise privacy and security related questions that are not fully understood or answered yet: Can automatic person recognisers identify people in photos even when their faces are blurred? How easy is it to find an adversarial input for a self-driving car that makes it drive off the road?
This thesis contributes one of the first steps towards a better understanding of such concerns in the presence of data manipulation. From the point of view of user’s privacy, we show the inefficacy of common obfuscation methods like face blurring, and propose more advanced techniques based on head inpainting and adversarial examples. We discuss the duality of model security and user privacy problems and describe the implications of research in one area for the other. Finally, we study the knowledge aspect of the data manipulation problem: the more one knows about the target model, the more effective manipulations one can craft. We propose a game theoretic framework to systematically represent the partial knowledge on the target model and derive privacy and security guarantees. We also demonstrate that one can reveal architectural details and training hyperparameters of a model only by querying it, leading to even more effective data manipulations against it.


Quantifying and Mitigating Privacy Risks in Biomedical Data
(Advisor: Prof. Michael Backes)

Wednesday, 25.07.2018, 14:00 h, in building E9 1 (CISPA), room 001


The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, allowing for a breakthrough towards a more precise and personalized medicine. However, the release of these intrinsically highly sensitive, interdependent data poses a new severe privacy threat.
In this thesis, we provide means to quantify and protect the privacy of individuals’ biomedical data. Besides the genome, we specifically focus on two of the most important epigenetic elements influencing human health: microRNAs and DNA methylation. We quantify the privacy for multiple realistic attack scenarios. Our results underline that the privacy risks inherent to biomedical data have to be taken seriously. Moreover, we present and evaluate solutions to preserve the privacy of individuals. Our mitigation techniques stretch from the differentially private release of epigenetic data, considering its utility, up to cryptographic constructions to securely, and privately evaluate a random forest on a patient’s data.

On Flows, Paths, Roots, and Zeros
(Advisor: Prof. Kurt Mehlhorn)

Monday, 23.07.2018, 14:00 h, in building E1 7, room 001


This thesis has two parts; in the first of which we give new results for various network flow problems. (1) We present a novel dual ascent algorithm for min-cost flow and show that an implementation of it is very efficient on certain instance classes. (2) We approach the problem of numerical stability of interior point network flow algorithms by giving a path following method that works with integer arithmetic solely and is thus guaranteed to be free of any nu-merical instabilities. (3) We present a gradient descent approach for the undirected transship-ment problem and its special case, the single source shortest path problem (SSSP). For distrib-uted computation models this yields the first SSSP-algorithm with near-optimal number of communication rounds.
The second part deals with fundamental topics from algebraic computation. (1) We give an algorithm for computing the complex roots of a complex polynomial. While achieving a com-parable bit complexity as previous best results, our algorithm is simple and promising to be of practical impact. It uses a test for counting the roots of a polynomial in a region that is based on Pellet’s theorem. (2) We extend this test to polynomial systems, i.e., we develop an algorithm that can certify the existence of a k-fold zero of a zero-dimensional polynomial system within a given region. For bivariate systems, we show experimentally that this approach yields signifi-cant improvements when used as inclusion predicate in an elimination method.

Incentives in Dynamic Markets
(Advisor: Prof. Martin Hoefer, now Uni Frankfurt)

Monday, 23.07.2018, 11:00 h, in building E1 7, room 001


In this thesis, we consider a variety of combinatorial optimization problems within a common theme of uncertainty and selfish behavior.
In our first scenario, the input is collected from selfish players. Here, we study extensions of the so-called smoothness framework for mechanisms, a very useful technique for bounding the inefficiency of equilibria, to the cases of varying mechanism availability and participation of risk-averse players. In both of these cases, our main results are general theorems for the class of (λ,μ)-smooth mechanisms. We show that these mechanisms guarantee at most a (small) constant factor performance loss in the extended settings.
In our second scenario, we do not have access to the exact numerical input. Within this context, we explore combinatorial extensions of the well-known secretary problem under the assumption that the incoming elements only reveal their ordinal position within the set of previously arrived elements. We first observe that many existing algorithms for special matroid structures maintain their competitive ratio in the ordinal model. In contrast, we provide a lower bound for algorithms that are oblivious to the matroid structure. Finally, we design new algorithms that obtain constant competitive ratios for a variety of combinatorial problems.


Cryptographic Techniques for Privacy and Access Control in Cloud-Based Applications
(Advisor: Prof. Matteo Maffei, now Uni Wien)

Friday, 29.06.2018, 15:30 h, in building E9 1 (CISPA), room 001


Digitization is one of the key challenges for today’s industries and society. It affects more and more business areas and also user data and, in particular, sensitive information. Due to its sensitivity, it is important to treat personal information as secure and private as possible yet enabling cloud-based software to use that information when requested by the user.
In this thesis, we focus on the privacy-preserving outsourcing and sharing of data, the querying of outsourced protected data, and the usage of personal information as an access control mechanism for rating platforms, which should be protected from coercion attacks. In those three categories, we present cryptographic techniques and protocols that push the state of the art. In particular, we first present multi-client oblivious RAM (ORAM), which augments standard ORAM with selective data sharing through access control, confidentiality, and integrity. Second, we investigate on recent work in frequency-hiding order-preserving encryption and show that the state of the art misses rigorous treatment, allowing for simple attacks against the security of the existing scheme. As a remedy, we show how to fix the security definition and that the existing scheme, slightly adapted, fulfills it. Finally, we design and develop a coercion-resistant rating platform. Coercion-resistance has been dealt with mainly in the context of electronic voting yet also affects other areas of digital life such as rating platforms.

Understanding Regulatory Mechanisms Underlying Stem Cells Helps to Identify Cancer Biomarks
(Advisor: Prof. Volkhard Helms)

Thursday, 28.06.2018, 16:30 h, in building E2 1, room 001


In this thesis, we present novel algorithms to unravel regulatory mechanisms underlying stem cell differentiation and cancerogenesis. Inspired by the tightly interwoven topology of the regulators controlling the pluripotency network in mouse embryonic stem cells, we formulated the problem where a set of master regulatory genes in a regulatory network is identified by solving either of two combinatorial optimization problems namely a minimum dominating set or a minimum connected dominating set in weakly and strongly connected components. Then we applied the developed methods to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma. As not all the nodes in the solutions are critical, we developed a prioritization method to rank a set of candidate genes which are related to a certain disease based on systematic analysis of the genes that are differentially expressed in tumor and normal conditions. Moreover, we demonstrated that the topological features in regulatory networks surrounding differentially expressed genes are highly consistent in terms of using the output of several analysis tools for identifying differentially expressed genes. We compared two randomization strategies for TF-miRNA co-regulatory networks to infer significant network motifs underlying cellular identity. We showed that the edge-type conserving method surpasses the non-conserving method in terms of biological relevance and centrality overlap. We presented several web servers and software packages that were made publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identifies a set of key regulatory genes in a user-provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating set and minimum connected dominating set return the optimal solutions for the aforementioned problems. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-specific networks for the sets of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and offer detection of three- and four-node FFL motifs as a separate web service for mouse and human.

Model Reconstruction for Moment-based Stochastic Chemical Kinetics
(Advisor: Prof. Verena Wolf)

Monday, 25.06.2018, 12:00 h, in building E1 7, room 0.01


Based on the theory of stochastic chemical kinetics, the inherent randomness and stochasticity of biochemical reaction networks can be accurately described by discrete-state continuous-time Markov chains, where each chemical reaction corresponds to a state transition of the process. However, the analysis of such processes is computationally expensive and sophisticated numerical methods are required. The main complication comes due to the largeness problem of the state space, so that analysis techniques based on an exploration of the state space are often not feasible and the integration of the moments of the underlying probability distribution has become a very popular alternative. In this thesis we propose an analysis framework in which we integrate a number of moments of the process instead of the state probabilities. This results in a more time-efficient simulation of the time evolution of the process. In order to regain the state probabilities from the moment representation, we combine the moment-based simulation (MM) with a maximum entropy approach: the maximum entropy principle is applied to derive a distribution that fits best to a given sequence of moments. We further extend this approach by incorporating the conditional moments (MCM) which allows not only to reconstruct the distribution of the species present in high amount in the system, but also to approximate the probabilities of species with low molecular counts. For the given distribution reconstruction framework, we investigate the numerical accuracy and stability using case studies from systems biology, compare two different moment approximation methods (MM and MCM), examine if it can be used for the reaction rates estimation problem and describe the possible future applications.

Practical Dynamic Information Flow Control
(Advisor: Prof. Christian Hammer, now Uni Potsdam)

Friday, 22.06.2018, 16:30 h, in building E2 1, room 007


Computer systems and applications handle a plethora of private user information. These are often assisted by a set of bugs with both malicious and non-malicious intent leading to information leaks. Information flow control has been studied extensively as an approach to mitigate such leaks but has focussed mainly on static analyses. However, some of the applications are developed using dynamic languages like JavaScript that make the static analyses techniques ineffective. Although there has been a growing interest to develop dynamic analysis techniques, it still isn’t at the helm of information flow security in such settings; the prime reason being that these techniques either over-approximate or are too restrictive in most cases. This thesis focuses on improving the usability of dynamic information flow analyses. It begins by presenting a sound improvement and enhancement of the permissive-upgrade strategy. It, then, presents a sound and precise dynamic information flow analysis for handling complex features like exceptions in higher-order languages. To address the issue of permissiveness, this thesis develops a sound approach to bound leaks dynamically while allowing information release in accordance to a developer-specified budget. It then applies these techniques to a Web browser and explores a policy specification mechanism for Web applications.

Shilpa GARG
Computational Haplotyping: theory and practice
(Advisor: Prof. Tobias Marschall)

Thursday, 21.06.2018, 16:30 h, in building E2 1, room 007


Genomics has paved a new way to comprehend life and its evolution, and also to investigate causes of diseases and their treatment. One of the important problems in genomic analyses is haplotype assembly. Constructing complete and accurate haplotypes plays an essential role in understanding population genetics and how species evolve.In this thesis, we focus on computational approaches to haplotype assembly from third generation sequencing technologies. This involves huge amounts of sequencing data, and such data contain errors due to the single molecule sequencing protocols employed. Taking advantage of combinatorial formulations helps  to correct for these errors to solve the haplotyping problem. Various computational techniques such as dynamic programming, parameterized algorithms, and graph algorithms are used to solve this problem.
This thesis presents several contributions concerning the area of haplotyping. First, a novel algorithm based on dynamic programming is proposed to provide approximation guarantees for phasing a single individual. Second, an integrative approach is introduced to combining multiple sequencing datasets to generating complete and accurate haplotypes. The effectiveness of this integrative approach is demonstrated on a real human genome. Third, we provide a novel efficient approach to phasing pedigrees and demonstrate its advantages in comparison to phasing a single individual.Fourth, we present a generalized graph-based framework for performing haplotype-aware de novo assembly. Specifically, this generalized framework consists of a hybrid pipeline for generating accurate and complete haplotypes from data stemming from multiple sequencing technologies, one that provides accurate reads and other that provides long reads.

A tale of two packing problems: Improved algorithms and tighter bounds for online bin packing and the geometric knapsack problem
(Advisor: Prof. Rob van Stee, now Uni Siegen)

Tuesday, 12.06.2018, 16:15 h, in building E1 4, room 024


In this thesis, we deal with two packing problems: the online bin packing and the geometric knapsack problem. In online bin packing, the aim is to pack a given number of items of different size into a minimal number of containers. The items need to be packed one by one without knowing future items. For online bin packing in one dimension, we present a new family of algorithms that constitutes the first improvement over the previously best algorithm in almost 15 years. While the algorithmic ideas are intuitive, an elaborate analysis is required to prove its competitive ratio. We also give a lower bound for the competitive ratio of this family of algorithms. For online bin packing in higher dimensions, we discuss lower bounds for the competitive ratio and show that the ideas from the one-dimensional case cannot be easily transferred to obtain better two-dimensional algorithms. In the geometric knapsack problem, one aims to pack a maximum weight subset of given rectangles into one square container. For this problem, we consider offline approximation algorithms. For geometric knapsack with square items, we improve the running time of the best known PTAS and obtain an EPTAS. This shows that large running times caused by some standard techniques for geometric packing problems are not always necessary and can be improved. Finally, we show how to use resource augmentation to compute optimal solutions in EPTAS-time, thereby improving upon the known PTAS for this case.

Georg NEIS
Compositional Compiler Correctness Via Parametric Simulations
(Advisor: Prof. Derek Dreyer)

Thursday, 07.06.2018, 14:00 h, in building E1 5, room 0.29


Compiler verification is essential for the construction of fully verified software, but most prior work (such as CompCert) has focused on verifying whole-program compilers. To support separate compilation and to enable linking of results from different verified compilers, it is important to develop a compositional notion of compiler correctness that is modular (preserved under linking), transitive (supports multi-pass compilation), and flexible (applicable to compilers that use different intermediate languages or employ non-standard program transformations). In this thesis, we formalize such a notion of correctness based on parametric simulations, thus developing a novel approach to compositional compiler verification.


Andreas TEUCKE
An Approximation and Refinement Approach to First-Order Automated Reasoning
(Advisor: Prof. Christoph Weidenbach)

Wednesday, 16.05.2018, 14:00 h, in building E1 5, room 0.02


With the goal of lifting model-based guidance from the propositional setting to first-order logic, I have developed an approximation theorem proving approach based on counterexample-guided abstraction refinement. A given clause set is transformed into a simplified form where satisfiability is decidable.This approximation extends the signature and preserves unsatisfiability:if the simplified clause set is satisfiable, so is the original clause set. A resolution refutation generated by a decision procedure on the simplified clause set can then either be lifted to a refutation in the original clause set, or it guides a refinement excluding the previously found unliftable refutation. This way the approach is refutationally complete. I have implemented my approach based on the SPASS theorem prover. On certain satisfiable problems, the implementation shows the ability to beat established provers such as SPASS, iProver, and Vampire.

Georg TAMM
Architectures for Ubiquitous 3D on Heterogeneous Computing Platforms
(Advisor: Prof. Philipp Slusallek)

Wednesday, 09.05.2018, 14:00 h, in building D3 4,  VisCenter DFKI


Today, a wide scope for 3D graphics applications exists, including domains such as scientific visualization, 3D-enabled web pages, and entertainment. At the same time, the devices and platforms that run and display the applications are more heterogeneous than ever. Display environments range from mobile devices to desktop systems and ultimately to distributed displays. While the capability of the client devices may vary considerably, the visualization experiences running on them should be consistent. The field of application should dictate how and on what devices users access the application, not the technical requirements to realize the 3D output.
The goal of this thesis is to examine the diverse challenges involved in providing consistent and scalable visualization experiences to heterogeneous computing platforms and display setups. We developed a comprehensive set of rendering architectures in the major domains of scientific and medical visualization, web-based 3D applications, and movie virtual production. To provide the required service quality, performance, and scalability for different client devices and displays, our architectures focus on the efficient utilization and combination of the available client, server, and network resources. We present innovative solutions that incorporate methods for hybrid and distributed rendering as well as means to stream rendering results. We establish the browser as a promising platform for accessible and portable visualization services. We collaborated with experts from the medical field and the movie industry to evaluate the usability of our technology in real-world scenarios.

Integrated Timing Verification for Distributed Embedded Real-Time Systems
(Advisor: Prof. Reinhard Wilhelm)

Tuesday, 08.05.2018, 14:00 h, in building E1 1,  room 407


The verification of timing properties with commercial tools based on static analysis has been available for safety-critical automotive systems for more than a decade. Although they offer many other advantagesin addition to safety guarantees, these tools are rarely applied in the automotive industry. This thesis analyses different possible reasons for this lack of application and suggests solutions where possible. Amongst the results of this thesis are technical findings, like the further increase of automation (e.g. for loop-bound detection) or the automatic reduction of WCET overestimation for highly variant systems. With regard to the development process, technical as well as non-technical solutions are presented that target the actual users of the timing verification, which range from software developers to system architects. The central result of this thesis is the integration of existing and newly developed timing verification methods and tools into a holistic development process approach called Timing Oriented Application Development (TOAD).


Wenbin LI
From Perception over Anticipation to Manipulation
(Advisor: Dr. Mario Fritz)

Wednesday, 25.04.2018, 16:00 h, in building E1 4,  room 019


From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this talk, we summarize our efforts of exploring different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. We first explore how to guide robot to place a single block into the scene without collapsing the existing structure and later introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. We validate our model on both synthetic and real-world settings.

Towards the understanding of transcriptional and translational regulatory complexity
(Advisor: Prof. Dr. Volkhard Helms)

Wednesday, 11.04.2018, 10:00 h, in building E2 1,  room 001


Considering the same genome within every cell, the observed phenotypic diversity can only arise from highly regulated mechanisms beyond the encoded DNA sequence. We investigated several mechanisms of protein biosynthesis and analyzed DNA methylation patterns, alternative translation sites, and genomic mutations. As chromatin states are determined by epigenetic modifications and nucleosome occupancy,we conducted a structural superimposition approach between DNA methyltransferase 1 (DNMT1) and the nucleosome, which suggests that DNA methylation is dependent on accessibility of DNMT1 to nucleosome–bound DNA. Considering translation, alternative non–AUG translation initiation was observed. We developed reliable prediction models to detect these alternative start sites in a given mRNA sequence. Our tool PreTIS provides initiation confidences for all frame–independent non–cognate and AUG starts. Despite these innate factors, specific sequence variations can additionally affect a phenotype. We conduced a genome–wide analysis with millions of mutations and found an accumulation of SNPs next to transcription starts that could relate to a gene–specific regulatory signal. We also report similar conservation of canonical and alternative translation sites, highlighting the relevance of alternative mechanisms. Finally, our tool MutaNET automates variation analysis by scoring the impact of individual mutations on cell function while also integrating a gene regulatory network.


Biomedical Knowledge Base Construction form Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Klaus Berberich, now HTW Saar)

Monday, 12.03.2018, 10:00 h, in building E1 4,  room 024


Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems:
• We address a linking problem to  connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents thatcan be linked to a given excerpt.
• We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event.
• To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models.
Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.

Patrick ERNST
Biomedical Knowledge Base Construction form Text and its Application in Knowledge-based Systems
(Advisor: Prof. Dr. Gerhard Weikum)

Tuesday, 06.03.2018, 10:00 h, in building E1 4,  room 024


While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledge about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions:
– To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas.
– To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning overthe composability of partial facts.
– To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics andfor entity-relationship-oriented exploration.
Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.


Roland LEIßA
Language Support for Programming High-Performance Code
(Advisor: Prof. Dr. Sebastian Hack)

Thursday, 22.02.2018, 14:00 h, in building E1 7,  room 001


Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showingno signs of turning around. In particular, hardware becomes more and more specialized and exhibits different formsof parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD).
In this talk, we first introduce Sierra: an extension for C++ that facilitates portable and effective SIMD programming. Second, we present AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.

Gaze and Peripheral Vision Analysis for Human-Environment Interaction: Applications in Automotive and Mixed-Reality Scenarios
(Advisor: Prof. Dr. Wolfgang Wahlster)

Wednesday, 14.02.2018, 16:00 h, in building D3 2, Reuse seminar room -2.17


This thesis studies eye-based user interfaces which integrate information about the user’s perceptual focus-of-attention into multimodal systems to enrich the interaction with the surrounding environment. We examine two new modalities:gaze input and output in the peripheral field of view. All modalities are considered in the whole spectrum of the mixed-reality continuum. We show the added value of these new forms of multimodal interaction in two important application domains: Automotive User Interfaces and Human-Robot Collaboration. We present experiments that analyze gaze under various conditions and help to design a 3D model for peripheral vision. Furthermore, this work presents several new algorithms for eye-based interaction, like deictic reference in mobile scenarios, for non-intrusive user identification, or exploiting the peripheral field view for advanced multimodal presentations. These algorithms have been integrated into a number of software tools for eye-based interaction, which are used to implement 15 use cases for intelligent environment applications. These use cases cover a wide spectrum of applications, from spatial interactions with a rapidly changing environment from within a moving vehicle, to mixed-reality interaction between teams of human and robots.

Justifying The Strong Memory Semantics of Concurrent High-Level Programming Languages for System Programming
(Advisor: Prof. Dr. Wolfgang Paul)

Wednesday, 07.02.2018, 15:00 h, in E1 7, room 001


We justify the strong memory semantics of high-level concurrent languages, such as C11 or Java, when compiled to x86-like machines. Languages such as C11 guarantee that programs that obey a specific software discipline behave as if they were executed on a simple coherent shared memory multi-processor. Real x86 machines, on the other hand, do not provide a coherent shared memory, and compilers add slow synchronizing instructions to provide the illusion of coherency. We show that one software discipline that closely matches software disciplines of languages such as C11 and Java – in a nutshell, that racing memory accesses are annotated as such by the programmer – and one particular way to add relatively few synchronization instructions – in a nutshell, between an annotated store and an annotated load in the same thread – suffice to create this illusion.
The software discipline has to be obeyed by the program in the semantics of the high-level language, therefore allowing the verification effort to take place completely in the strong memory semantics of the high-level language. We treat a machine model with operating system support, and accordingly our theorems can be used to verify interruptible multi-core operating systems, including those where users can run unverified programs that do not obey the software discipline.

Variational Image Fusion
(Advisor: Prof. Dr. Joachim Weickert)

Monday, 05.02.2018, 10:15 h, in E1 1, room 407


The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework.
First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes.
Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields.
Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.


Numerical Analysis of Stochastic Biochemical Reaction Networks
(Advisor: Prof. Dr. Verena Wolf)

Wednesday, 31.01.2018, 14:00 h, in E1 7, room 001


Numerical solution of the chemical master equation for stochastic reaction networks typically suffers from the state space explosion problem due to the curse of dimensionality and from stiffness due to multiple time scales.The dimension of the state space equals the number of molecular species involved in the reaction network and the size of the system of differential equations equals the number of states in the corresponding continuous-time Markov chain, which is usually enormously huge and often even infinite. Thus, efficient numerical solution approaches must be able to handle huge, possibly infinite and stiff systems of differential equations efficiently.
In this thesis, we present efficient techniques for the numerical analysis of the biochemical reaction networks. We present an approximate numerical integration approach that combines a dynamical state space truncation procedure with efficient numerical integration schemes for systems of ordinary differential equations including adaptive step size selection based on local error estimates. We combine our dynamical state space truncation with the method of conditional moments, and present the implementation details and numerical results. We also incorporate ideas from importance sampling simulations into a non-simulative numerical method that approximates transient rare event probabilities based on a dynamical truncation of the state space.
Finally, we present a maximum likelihood method for the estimation of the model parameters given noisy time series measurements of molecular counts. All approaches presented in this thesis are implemented as part of the tool STAR, which allows to model and simulate the biochemical reaction networks. The efficiency and accuracy is demonstrated by numerical examples.

Relational Cost Analysis
(Advisor: Dr. Deepak Garg)

Monday, 22.01.2018, 16:00 h, in E1 5, room 029


Programming languages research has made great progress towards statically estimating the execution cost of a program. However, when one is interested in how the execution costs of two programs compare to each other (i.e., relational cost analysis), the use of unary techniques does not work well in many cases. In order to support a relational cost analysis, we must ultimately support reasoning about not only the executions of a single program, but also the executions of two programs, taking into account their similarities. This dissertation makes several contributions to the understanding and development of such a relational cost analysis. It shows how:
• Refinement types and effect systems can express functional and relational quantitative properties of pairs of programs, including the difference in execution costs.
• Relational cost analysis can be adapted to reason about dynamic stability, a measure of the update times of incremental programs as their inputs change.
• A sound and complete bidirectional type system can be developed (and implemented) for relational cost analysis.

Thesis defenses