PhD Thesis Defenses

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.

2024

April

Johnnatan Messias Peixoto AFONSO
On Fairness Concerns in the Blockchain Ecosystem
(Advisor: Prof. Krishna Gummadi)
Thursday, 25.04.24 13:00 h , building E1 5, room 0.29

Blockchains revolutionized centralized sectors like banking and finance by promoting de-centralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state.
Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks.
In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.

Clayton M. GREENBERG
Evaluating Humanness in Language Models
(Advisor: Prof. Dietrich Klakow)
Wednesday, 24.04.24 16:15 h , building E1 7, room 008

Advances with language models, systems that predict upcoming words in context, have enabled an era in which people sometimes cannot distinguish between human-written and artificially created text. Perplexity, the simplest and most popular way to evaluate the quality of a language model, rewards any pattern captured by the system as long as it robustly constrains the upcoming possibilities. By capturing patterns that humans do not use, optimizing a language model for minimal perplexity could trigger a divergence between the most probable text and the most human-like text.
In this thesis, I argue that this divergence has happened for state-of-the-art language models. Part I characterizes the kinds of knowledge captured by language models. First, I present three novel language model architectures whose neural connections were inspired by human behavior. Then, I discuss novel morphology- and sentiment-based paradigms that capture human knowledge quantitatively. Part II establishes several methods for evaluating language models by comparison against human behavior measures. I consider the suitability and potential confounds for offline ratings and two paradigms of online reading times: eye-tracking and G-Maze. Then, I use a novel dataset of G-Maze response times to show computational and linguistic evidence of the divergence.

Dingfan CHEN
Towards Privacy-preserving Machine Learning: Generative Modeling and Discriminative Analysis
(Advisor: Prof. Mario Fritz)
Tuesday, 23.04.24 11:00 h , building E9 1, room 0.05

The digital era is characterized by the widespread availability of rich data, which has fueled the growth of machine learning applications across diverse fields. Nevertheless, data sharing is often at odds with serious privacy and ethical issues. The sensitive nature of personal information necessitates careful handling and adherence to stringent regulations like GDPR and HIPAA. Addressing such privacy challenges is pivotal for maintaining public trust and ensuring sustainable technological progress.
This talk presents several projects on data privacy in machine learning completed during the speaker’s Ph.D. studies, including exploration of privacy-preserving generative modeling, privacy attack and defense mechanisms, and practical applications for responsible data sharing within real-world sensitive domains.

Florian SATTLER
Understanding Variability in Space and Time – Analyzing Features and Revisions in Concert
(Advisor: Prof. Sven Apelt)
Monday, 15.04.24 14:00 h , building E1 1, room 2.06

The static or dynamic analysis of configurable software systems imposes significant challenges regarding complexity and computation time due to the software systems’ large configuration spaces. These are aggravated further by the evolution of software systems: developers frequently produce new revisions, adapting and modifying the system. Thereby, analysis results can quickly become out of date or are difficult to interpret. The key problem is that current analyses, even when already specialized for configurable software systems, cannot contextualize their findings within the development context of the software project in question.
We address this problem by empowering existing program analyses through a unified abstraction of code regions that incorporates information about the configurability of the system as well as the evolutionary context into the analysis. This way, we enable existing program analyses to relate and interpret their results in the context of variability. In this thesis, we demonstrate the applicability of a uniform abstraction of code regions by addressing two novel research problems:
First, we combine evolutionary information, mined from software repositories, with an inter-procedural data-flow analysis to determine how evolutionary changes interact within a software project, revealing new and interesting connections between changes and developers.
Second, we combine different automated localization approaches that detect configuration-specific code with state-of-the-art performance profilers to enable configuration-aware performance profiling.
Our results show that this enables performance profilers to attribute performance regressions directly to configuration options without introducing unnecessary overhead. In summary, this thesis bridges the gap between variability information and precise program analysis.

Soshi SHIMADA
Physically plausible 3D human motion capture and synthesis with interactions
(Advisor: Prof. Christian Theobalt)
Thursday, 04.04.24 10:30 h , building E1 4, room 0.24

Capturing 3D human motion realistically from a minimal setup, such as a single RGB camera, is challenging and important for downstream applications like AR/VR, avatar communications, and character animations. The problem becomes more challenging when the person in the scene interacts with a complex environment or when interactions lead to non-rigid deformations. This thesis addresses these challenges by explicitly introducing 1) physics-based equations and/or 2) modeling of rigid/non-rigid interactions with the environment, thereby enhancing the realism of the reconstructed 3D motions. Moreover, the thesis expands its focus to include the synthesis of 3D hand-object interaction motions, which are conditioned by the physical properties of the objects for improved realism and greater control over the generated motions.

March

Debasmita LOHAR
Expanding the Horizons of Finite-Precision Analysis
(Advisor: Prof. Eva Darulova, now Uppsala Univ.)
Wednesday, 27.03.24 15:00 h , building E1 5, room 0.29

Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. In this work, we expand the individual capabilities of these techniques by capturing the impact of uncertain inputs on discrete decisions and roundoff errors, by scaling floating-point verification for larger programs, and by specializing optimization for feed-forward deep neural networks.

Mang ZHAO
Provable Security and Real-World Protocols: Theory and Practice
(Advisor: Prof. Cas Cremers)
Monday, 18.03.24 13:00 h , building E9 1, room 0.05

In our modern life, network communication has become one of the primary mediums for information transmission, e.g., instant messaging, online shopping, and video conferencing. In order to protect the security of information transmitted over networks, real-world applications are often equipped with cryptographic communication protocols, the provable security analyses of which are however often missing. A natural question arises: whether these protocols really secure?
This talk presents five projects that the speaker have completed during his Ph.D studies, with more focus on two of them: the theoretical analysis of authenticated encryption with associated data and the provable security analysis of real-world video-conferencing Zoom protocol. Moreover, this talk addresses common obstacles to (large-scale) protocol designs and provable security analyses, provides intuition on the feasibility, and presents his future plan.

Gustavo ANDRADE DO VALE
Investigating the Merge Conflict Life-Cycle Taking the Social Dimension into Account
(Advisor: Prof. Sven Apel)
Monday, 11.03.24 16:00 h , building E1 1, room 206

Merge conflicts arise when developers integrate concurrent code changes and whereas merge conflicts are common to introduce, they bring several issues to software projects. For instance, merge conflicts distract developers from their workflow and resolving them is a difficult, time-consuming, and often error-prone task. Despite a substantial number of studies investigating merge conflicts, the social dimension of the problem is often ignored. In this thesis, we seek out to understand the role the social dimension plays in the merge conflict life-cycle. To reach our goals, we conducted a series of empirical studies investigating the merge conflict life-cycle. In one of these studies we found that active GitHub communication is not associated with the emergence or avoidance of merge conflicts even though developers communicate with each other. In another study, we moved to the end of the merge conflict life-cycle investigating the challenges and factors related to the merge conflict resolution. Our results show that measures indirectly related to mer-ge conflicts (i.e., measures related to the merge scenario changes) are more strongly correlated with merge conflict resolution time than measures directly related to merge conflicts (i.e., merge conflict characteristics). In this thesis, we call the attention of researchers, tool builders, and practitioners to the importance of including the social dimension when investigating merge conflicts. Our findings also provide evidence that they should also look at the technical dimension more closely.

Mohamed ALZAYAT
Efficient Request Isolation in Function-as-a-Service
(Advisors: Prof. Peter Druschel & Prof. Deepak Garg)
Friday, 08.03.24 14:00 h , building E1 5, room 0.02

As cloud applications become increasingly event-driven, Function-as-a-Service (FaaS) is emerging as an important abstraction. FaaS allows tenants to state their application logic as stateless functions without managing the underlying infrastructure that runs and scales their applications. FaaS providers ensure the confidentiality of tenants’ data, to a limited extent, by isolating function instances from one another. However, for performance considerations, the same degree of isolation does not apply to sequential requests activating the same function instance. This compromise can lead to confidentiality breaches since bugs in a function implementation or its dependencies may retain state and leak data across activations. Moreover, platform optimizations that assume function statelessness may introduce unexpected behavior if the function retains state, jeopardizing correctness.
This dissertation presents two complementary systems: Groundhog and CtxTainter. Groundhog is a black-box and programming-language-agnostic solution that enforces confidentiality by efficiently rolling back changes to a function’s state after each function activation, effectively enforcing statelessness by breaking all data flows at the request boundary. CtxTainter is a development-phase dynamic data flow analysis tool that detects data flows that violate the statelessness assumption and reports them to the developer for reviewing and fixing.

February

Toghrul KARIMOV
Algorithmic Verification of Linear Dynamical Systems
(Advisor: Prof. Joël Ouaknine)
Thursday, 08.02.24 15:00 h , building E1 5, room 0.29

Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an omega-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Dio-phantine approximation, automata theory, and combinatorics on words, we prove decidability of the Model-Checking Problem for large classes of linear dynamical systems and omega-regular properties. We further exploit deep connections between linear dynamical systems and contemporary number theory to show that improving any of our decidability results would amount to major mathematical breakthroughs. Our results delineate the boundaries of decision problems of linear dynamical systems that, at the present time, can be solved algorithmically.

Pascal GRITTMANN
Rethinking multiple importance sampling for general and efficient Monte Carlo rendering
(Advisor: Prof. Philipp Slusallek)
Tuesday, 06.02.24 9:00 h , building D3 2, room -1.63 (VisCenter)

Computer generated images are essential for many applications from art to engineering. Unfortunately, rendering such images is costly, with render times easily in the hours, days, or even weeks. On top of that, the demands regarding complexity and visual fidelity are ever rising. Consequently, there is an insatiable need for faster rendering. Efficient render times are often achieved through user intervention. For example, modifying the scene and removing difficult lighting effects can keep render times below an acceptable threshold. Also, algorithm parameters can be tuned manually. For instance, diffuse outdoor scenes are best rendered by unidirectional path tracing, while interiors featuring caustics benefit greatly from bidirectional sampling. Such manual tuning, however, is unfortunate as it puts much burden on the user and poses a hurdle for novices. In this thesis, we pave the way for more universal rendering algorithms with less need of user intervention. For that, we revisit multiple importance sampling (MIS), an essential tool to universalize rendering algorithms by combining diverse sampling techniques. We identify hitherto unknown shortcomings of MIS and propose practical solutions and improvements. As a tangible result, we achieve adaptive bidirectional rendering with performance never worse than unidirectional path tracing.

Sanem GHORBANI LYASTANI
Studying User Experience and Acceptance of Web Authentication Solutions
(Advisor: Prof. Michael Backes)
Monday, 05.02.24 14:00 h , building E9 1, room 0.01

To improve the security of their web authentication, users can employ password managers, set up two-factor authentication, or replace passwords with FIDO2 authenticator devices. However, for those solutions to be accepted by the user, their user experience must match the users‘ mental models. This thesis contributes the novel methodologies and results of three studies that measured the user experience and acceptance of three web authentication solutions. Our results show that a) whether password managers are beneficial for security or aggravate existing problems depends on the users’ strategies and how well the manager supports the users’ individual password management right from the time of password creation; b) users consider FIDO2 passwordless authentication as more usable and more acceptable than password-based authentication, but also that impeding concerns remain that are rooted in a gap between the user’s personal perspective onto this new technology and the global view of the FIDO2 designers; c) there is a lack of consistency be-tween the two-factor authentication user journeys of top websites and that the more con-sistent design patterns are problematic for usability, which could increase users‘ cognitive friction and lead to rejection. Based on those results, we make suggestions for further re-search into understanding and improving the users‘ experience of web authentication

January

Marcel KÖSTER
Improving Performance of Simulations and Heuristic Optimization on GPUs
(Advisor: Prof. Antonio Krüger)
Tuesday, 30.01.24 13:00 h , building D3 2, Reuse meeting room

Parallelization is a ubiquitous technique for improving runtime performance of algorithms. Although parallelization is generally challenging and often leads to programming bugs, it is a leading method for processing growing amounts of data today. Due to the ongoing trend of exploring the unexplored, known methods are reaching their limits in terms of scalability and thus applicability. Particularly challenging is the use of graphics processing units (GPUs) that require specially optimized algorithms but feature impressive compute power. Unfortunately, the term „optimized“ usually refers to newly developed algorithms that exploit the peculiarities of the underlying GPUs or at least follow their specific programming method-ologies. The list of tweaked algorithms available for GPUs is already quite long and touch a wide range of domains. These include the well-known fields of massively parallel simulations and solving of optimization problems. Prominent examples in this context include particle simulations of physical processes (like molecular-dynamics simulations) and machine-learning based optimizers. However, existing approaches from these two domains often suffer from severe runtime, memory consumption, and applicability limitations. In this thesis, we present new approaches for both domains. Our methods considerably outperform current state of the art in terms of runtime and memory consumption. We were able to achieve runtime speedups of up to several orders of magnitude while reducing the amount of memory required compared to existing methods. Regarding applicability, our algorithms are designed to fit seamlessly into existing simulation programs and optimizers. This makes them a particularly valuable contribution to real-world applications as well.

Edith TRETSCHK
Representing and Reconstructing General Non-Rigid Objects with Neural Models
(Advisor: Prof. Christian Theobalt)
Monday, 29.01.24 15:00 h , building E1 4, room 0.23

Despite a lot of effort, creating virtual clones of real-world objects remains an unsolved scientific challenge. While human-centered approaches are already advanced, the handling of general deformable objects is far less explored and the topic of this thesis. To digitize an object, it first needs to be reconstructed from sensor observations and then re-presented in a suitable manner for downstream tasks. Over the past decade, neural techniques have led to great advancement in both areas.
This thesis contributes to both areas. In the first part, it focuses on representing deformations and geometry. In particular, it introduces a low-dimensional deformation model. Unlike prior work that hand-crafts these for specific categories, it can be trained for any general non-rigid object category via mesh auto-encoding using graph convolutions. Next, coordinate-based networks model geometry at infinite resolution but they do not generalize due to their global representation. This thesis makes them generalizable, thereby making these new models much easier to apply to general objects where training data is lacking.
In the second part, this thesis advances the reconstruction side. It extends neural radiance fields, which were previously restricted to static scenes, to deformable objects. Finally, this thesis extends the previous method to handle large motions, a non-trivial endeavor due to backwards deformation modeling.

Christian KALTENECKER
Black-Box Performance Modeling of Configurable Software Systems
(Advisor: Prof. Sven Apel)
Monday, 29.01.24 13:00 h , building E1 1, room 2.06

Configurable software systems provide a multitude of configuration options to adjust and optimize the performance of the software. However, it is often unclear which configuration options influence the performance of the system. To achieve clarity, measuring every configuration of a system is intractable for many configurable systems due to the sheer number of configurations. In a first step, we propose a sampling strategy used in combination with statistical machine learning to identify the influence of configuration options on the performance. This way, our approach overcomes multiple disadvantages of existing approaches.
Furthermore, the performance influence of configuration options can change over time, for instance, by introducing performance regressions, and these performance regressions can, in some cases, be detected only in certain workloads.
However, it is often unclear which configuration options are affected by performance changes. In a second step, to address this gap, we propose an approach to pinpoint such performance changes over time and workloads. Among other findings, we found that developers mentioned the configuration options that are affected by performance changes, although performance regressions are only rarely directly reported.

Marius MOSBACH
Analyzing Pre-trained and Fine-tuned Language Models
(Advisor: Prof. Dietrich Klakow)
Thursday, 18.01.24 17:30 h , building C9 3 (Graduate Center)

Since the introduction of transformer-based language models in 2018, the current generation of natural language processing (NLP) models continues to demonstrate impressive capabilities on a variety of academic benchmarks and real-world applications. This progress is based on a simple but general pipeline which consists of pre-training neural language models on large quantities of text, followed by an adaptation step that fine-tunes the pre-trained model to perform a specific NLP task of interest. However, despite the impressive progress on academic benchmarks and the widespread deployment of pre-trained and fine-tuned language models in industry we still lack a fundamental under-standing of how and why pre-trained and fine-tuned language models work, as well as they do. My PhD thesis makes several contributions towards improving our understanding of pre-trained and fine-tuned language models ranging ranging from analyzing the lingu-istic knowledge of pre-trained language models and how it is affected by fine-tuning, to a rigorous analysis of the fine-tuning process itself and how the choice of adaptation technique affects the generalization of models. We thereby provide new insights about previ-ously unexplained phenomena and the capabilities of pre-trained and fine-tuned language models.

Markus BAUER
Compiler-based Defenses against Code Execution Attacks
(Advisor: Prof. Christian Rossow)
Thursday, 11.01.24 13:00 h , building E9 1, room 0.01

Memory corruption attacks have haunted computer systems for decades. Attackers abuse subtle bugs in an application’s memory management, corrupting data and executing arbitrary code and, consequently, taking over systems. In particular, C and C++ applications are at risk, while developers often fail or lack time to identify or rewrite risky parts of their software.
In this thesis, we approach this problem with compilers that protect applications without requiring code changes or developer effort. We cover the most treated aspects in legacy applications: indirect forward jumps in both C and C++ and immutable libraries. First, we protect virtual dispatch in C++ applications from hijacking. We employ a type analysis and a compiler transformation that implements virtual dispatch efficiently without hijackable pointers. Second, we protect indirect calls to function pointers in C applications. We use a new type-based analysis to find indirect call targets and transform indirect calls into a secure and fast version with limited targets. Finally, we propose a method to isolate potentially vulnerable code, particularly unprotected closed-source libraries, into compartments with restricted access to its environment.

2023

December

Charilaos ZISOPOULOS
On the expected number of zeros of polynomials and the real tau-conjecture
(Advisor: Prof. Markus Bläser)
Tuesday, 19.12.23 14:00 h , building E1 7, room 0.01

The central open problem in Algebraic Complexity Theory is the VP versus VNP question, which can be thought of as the algebraic analogue of the classical P versus NP question. In fact, settling the former question is considered as a first step towards the latter. One approach towards resolving the VP versus VNP question is the real tau-conjecture, which states that if every sum of products of k-sparse univariate polynomials has a number of real roots bounded polynomially by the number of summands, factors, and the sparsity k, then VP is not equal to VNP. Unsurprisingly, solving the real tau-conjecture has proven difficult, thus research has shifted towards randomized versions of the conjecture. Our contribution is studying the expected number of real zeros of random k-sparse polynomials. In particular, we show that for k-sparse univariate polynomials whose coefficients are standard normal random variables, the expected number of real roots is upper bounded asymptotically by the square root of the sparsity k. This result is complemented by an asymptotically matching lower bound, that completely settles this question in the univariate setting, as well as observations about the distribution of zeros of such polynomials. In ad-dition, we show that previous work can be adapted to the sparse case, thus providing an upper bound for the case where the coefficients of the polynomial follow the Rademacher distribution. This work also surveys previously known results on the number of real zeros of fixed and random polynomials, while also presenting a detailed analysis of results due to Descartes and Laguerre.
We also generalize the techniques used to random k-sparse polynomials following absolutely continuous distributions, as well as propose research questions and goals that we believe both merit further investigation, both with the aim to identify the next steps to-wards resolving the conjecture.

Aniss MAGHSOUDLOU
Towards Uncovering Hidden Internet Traffic Characteristics
(Advisor: Prof. Anja Feldmann)
Thursday, 14.12.23 14:00 h , building E1 5, room 0.02

With the growing digitization of human life, the Internet has become an inevitable utility. Since the Internet is designed in a non-centralized manner with a best-effort mindset, it is essential to measure different aspects of the Internet including security, performance, and scalability. The rise of remote work has emphasized the need for measuring security of the Internet traffic.In this thesis, we first address the need for measuring large-scale Internet traffic to gain useful insights into the security and traffic trends in large Internet Service Providers (ISPs) and Internet eXchange Points (IXPs) by designing a system called Flowyager for querying network-wide flow data in a near real-time manner. Next, we propose FlowDNS to augment flow data with domain names to infer the actual service/domain to which the traffic belongs. This system lays the foundation for monitoring the services that are being used and gives network operators the chance to predict their bandwidth demands. To gain a more comprehensive picture, we need to combine the results from the above-mentioned systems with active measurement techniques. This gives us the chance to discover the existence and origin of hidden characteristics of the Internet traffic. For instance, in a large European ISP, we detect a large amount of Internet traffic using port number 0 when querying Flowyager. Complementing passive measurement results with active measurement techniques, we find that this traffic is mostly caused by fragmentation, scanning, and misconfigured devices.
Finally, given the widespread usage of Virtual Private Networks (VPNs) during the COVID-19 pandemic for remote work, we strive to characterize VPN traffic in the Internet. We use active measurement techniques to detect VPN servers and analyze their security aspects. Then, with the help of FlowDNS, we detect VPN traffic on the Internet to provide insights about the VPN traffic patterns in the Internet.
This dissertation helps researchers and network operators to gain insights about some hidden characteristics of Internet traffic, and also provides the means to look for specific traffic patterns in the network flow data and investigate its characteristics.

Torsten  SPIELDENNER
Linked Data as Medium for distributed Multi-Agent Systems
(Advisor: Prof. Philipp Slusallek)
Wednesday, 13.12.23 09:15 h , building D3 4, VisRoom (-1.63)

The conceptual design and discussion of multi-agent systems (MAS) typically focuses on agents and their models, and the elements and effects in the environment which they perceive. This view, however, leaves out potential pitfalls in the later implementation of the system that may stem from limitations in data models, interfaces, or protocols by which agents and environments exchange information. By today, the research community agrees that for this, that the environment should be understood as well as abstraction layer by which agents access, interpret, and modify elements within the environment. This, however, blurs the the line of the environment being the sum of interactive elements and phenomena perceivable by agents, and the underlying technology by which this information and interactions are offered to agents.
This thesis proposes as remedy to consider as third component of multi-agent systems, besides agents and environments, the digital medium by which the environment is provided to agents. „Medium“ then refers to exactly this technological component via which environment data is published interactively towards the agents, and via which agents perceive, interpret, and finally, modify the underlying environment data. Furthermore, this thesis will detail how MAS may use capabilities of
a properly chosen medium to achieve coordinating system behaviors. A suitable candidate technology for digital agent media comes from the Semantic Web in form of Linked Data. In addition to conceptual discussions about the notions of digital agent media, this thesis will provide in detail a specification of a Linked Data agent medium, and detail on means to implement MAS around Linked Data media technologies.

Michael  SAMMLER
Automated and Foundational Verification of Low-Level Programs
(Advisors: Prof. Derek Dreyer and Prof. Deepak Garg)
Monday, 04.12.23 17:00 h , building E1 5, room 0.29

Formal verification is a promising technique to ensure the reliability of low-level programs like operating systems and hypervisors, since it can show the absence of whole classes of bugs and prevent critical vulnerabilities.
To realize the full potential of formal verification for real-world low-level programs, however one has to overcome several challenges, including:
(1) dealing with the complexities of realistic models of real-world programming languages;
(2) ensuring the trustworthiness of the verification, ideally by providing foundational proofs (i.e., proofs that can be checked by a general-purpose proof assistant);
and (3) minimizing the manual effort required for verification by providing a high degree of automation.
This dissertation presents multiple projects that advance formal verification along these three axes:
RefinedC provides the first approach for verifying C code that combines foundational proofs with a high degree of automation via a novel refinement and ownership type system.
Islaris shows how to scale verification of assembly code to realistic models of modern instruction set architectures-in particular, Armv8-A and RISC-V.
DimSum develops a decentralized approach for reasoning about programs that consist of components written in multiple different languages (e.g., assembly and C), as is common for low-level programs.
RefinedC and Islaris rest on Lithium, a novel proof engine for separation logic that combines automation with foundational proofs.

Krzysztof WOLSKI
Design and Applications of Perception-Based Mesh, Image, and Display-Related Quality Metrics
(Advisors: Dr.-Ing. habil. Karol Myszkowski and Prof. Hans-Peter Seidel)
Monday, 04.12.23 10:00 h , building E1 4, room 0.19

Computer graphics have become an integral part of our daily lives, enabling immersive experiences in movies, video games, virtual reality, and augmented reality. However, the various stages of the computer graphics pipeline, from content generation to rendering and display, present their own challenges that can reduce visual quality and thus degrade the overall experience.
Perceptual metrics are crucial for evaluating visual quality. However, many existing methods have limitations in reproducing human perception accurately, as they must account for the complexities of the human visual system. This dissertation aims to tackle these issues by proposing innovative advancements across different pipeline stages.
Firstly, it introduces a novel neural-based visibility metric to improve the assessment of near-threshold image distortions. Secondly, it addresses shortcomings of the mesh quality metrics, vital for enhancing the integrity of three-dimensional models. Thirdly, the dissertation focuses on optimizing the visual quality of animated content while considering display characteristics and a limited rendering budget. Finally, the work delves into the challenges specific to stereo vision in a virtual reality setting.
The ultimate objective is to enable the creation of more efficient and automated designs for virtual experiences, benefiting fields like entertainment and education. Through these contributions, this research seeks to elevate the standard of visual quality in computer graphics, enriching the way we interact with virtual worlds.

November

Denise KAHL
Visual-haptic Perception in the Digitally Augmented World
(Advisor: Prof. Antonio Krüger)
Wednesday, 29.11.23 14:30 h , building D3 2 (DFKI), Reuse Room

In everyday life, we are confronted with a growing amount of digital content that is integrated into our surroundings. Visual elements, such as digital advertising or information boards, change our perception of the environment and make it increasingly difficult to perceive personally meaningful information.
In this work, we investigate how visual augmentations of the environment affect our visu-al and haptic perception of reality and explore how visual attention can be directed as subtly as possible toward personally relevant information in real-world environments.
We present a concept to evaluate visual stimuli for gaze guidance in instrumented environments and explore stimuli suitable for gaze guidance in real-world settings using a prototypical implementation of it. Moreover, we explore the potential of using overlays displayed in Optical See-through Augmented Reality glasses to guide visual attention u-sing subtle visual cue stimuli.
Additionally, we introduce a concept to investigate perceptual changes in physical objects interacted with that may result from overlaying them with digital augmentations. We investigate the extent to which the overlying virtual model can differ from the underlying physical object without significantly affecting the feeling of presence, the usability, and the performance. We provide results in terms of shape and size differences and demonst-rate the influence of environmental lighting conditions.

Sebastian DALLEIGER
Characteristics and Commonalities – Differentially Describing Datasets with Insightful Patterns
(Advisor: Prof. Jilles Vreeken)
Thursday, 16.11.23 14:00 h , building E1 4, room 0.24

Empirical science revolves around gaining insights from complex data. With the advent of computational science, increasingly more, larger, and richer datasets are becoming avail-able to expand our scientific knowledge. However, the analysis of these datasets by domain experts is often impaired by a lack of suitable computational tools. In particular, there is a shortage of methods identifying insightful patterns, i.e., sets of strongly associated feature values that are informative, contrasting, probabilistically sound, statistically sound, and discoverable using scalable algorithms. This thesis leverages ideas and concepts from pattern-set mining, maximum-entropy modeling, statistical testing, and matrix factorization to develop methods for discovering insightful patterns.

October

Sihang PU
Towards Compact Bandwidth and Efficient Privacy-Preserving Computation
(Advisor: Dr. Nico Döttling)
Monday, 30.10.23 16:00 h , building E9 1, room 0.01

In traditional cryptographic applications, cryptographic mechanisms are employed to ensure the security and integrity of communication or storage. In these scenarios, the primary threat is usually an external adversary trying to intercept or tamper with the communication between two parties. On the other hand, in the context of privacy-preserving computation or secure computation, the cryptographic techniques are developed with a different goal in mind: to protect the privacy of the participants involved in a computation from each other. Specifically, privacy-preserving computation allows multiple parties to jointly compute a function without revealing their inputs and it has numerous applications in various fields, including finance, healthcare, and data analysis. It allows for collaboration and data sharing without compromising the privacy of sensitive data, which is becoming increasingly important in today’s digital age. While privacy-preserving computation has gained significant attention in recent times due to its strong security and numerous potential applications, its efficiency remains its Achilles’ heel. Privacy-preserving protocols require significantly higher computational overhead and bandwidth when compared to baseline (i.e., insecure) protocols. Therefore, finding ways to minimize the overhead, whether it be in terms of computation or communication, asymptotically or concretely, while maintaining security in a reasonable manner remains an exciting problem to work on.

Hiba ARNAOUT
Enriching Open-world Knowledge Graphs with Expressive Negative Statements
(Advisor: Prof. Gerhhard Weikum)
Friday, 27.10.23 09:00 h , building E1 4, room 0.24

Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation findings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.

Corinna COUPETTE
Beyond Flatland: Exploring Graphs in Many Dimensions
(Advisors: Dr. Christoph Lenzen and Dr. Bastian Rieck)
Monday, 23.10.23 09:00 h , Video conference

Societies, technologies, economies, ecosystems, organisms, . . . Our world is composed of complex networks—systems with many elements that interact in nontrivial ways. Graphs are natural models of these systems, and scientists have made tremendous progress in developing tools for their analysis. However, research has long focused on relatively simple graph representations and problem specifications, often discarding valuable real-world information in the process. In recent years, the limitations of this approach have become increasingly apparent, but we are just starting to comprehend how more intricate data representations and problem formulations might benefit our understanding of relational phenomena. Against this background, our thesis sets out to explore graphs in five dimensions:
descriptivity, multiplicity, complexity, expressivity, and responsibility.
Leveraging tools from graph theory, information theory, probability theory, geometry, and topology, we develop methods to (1) descriptively compare individual graphs, (2) characterize similarities and differences between groups of multiple graphs, (3) critically assess the complexity of relational data representations and their associated scientific culture, (4) extract expressive features from and for hypergraphs, and (5) responsibly mitigate the risks induced by graph-structured content recommendations. Thus, our thesis is naturally situated at the intersection of graph mining, graph learning, and network analysis.

Lukas FLOHR
Context-Based Prototyping of Human-Machine Interfaces for Autonomous Vehicles
(Advisor: Prof. Antonio Krüger)
Friday, 13.10.23, 15:00 h, building D3 2, DFKI, VisRoom NB – 1.63

Autonomous vehicles (AVs; SAE levels 4 and 5) face substantial challenges regarding acceptance, human factors, and user experience. Human-machine interfaces (HMIs) offer the potential to account for those and facilitate AV adoption. Since AVs‘ capabilities and availability are still limited, suitable prototyping methods are required to create, evaluate, and optimize novel HMI concepts from early development phases. In all human-centered design activities, physical and social contexts are vital. This thesis argues for applying context-based interface prototyping of human-AV interactions to account for their interrelation with contextual factors. We adopt a ‚research in and through design‘ approach and explore the two intertwined areas: design and prototyping. Regarding the latter, we concentrate on straightforward methods. We demonstrate an immersive video-based approach for lab simulation of AVs and a wizard-of-oz-based method for on-road AV simulation and prototyping of HMIs providing real-time information. We apply these methods in empirical studies to assess their suitability and explore HMI concepts created to counter the aforementioned challenges. Thereby we investigate the potential of (AR-based) object detection visualization and concepts for mobile and in-vehicle interaction with (shared) AVs. Based on the findings, we provide design and prototyping recommendations that will aid researchers and practitioners in creating suitable human-AV interactions.

Zheng LI
On the Privacy Risks of Machine Learning Models
(Advisor: Dr. Yang Zhang)
Thursday, 05.10.23, 15:00 h, building E9 1, Room 0.01

In this dissertation, we investigate the significant privacy risks in the era of advancing machine learning (ML) from two perspectives. Firstly, we explore vulnerabilities within ML models, with a specific focus on membership inference attacks (MIA). Through two studies, we unveil the severity of MIA by introducing a novel label-only attack and assessing the susceptibility of multi-exit networks. Secondly, we examine the misuse of ML models that compromise privacy, particularly in the context of deepfake face manipulation. To counter GAN-based face manipulation effectively, an innovative defense system called UnGANable is proposed to disrupt the crucial GAN inversion process. These findings provide valuable insights into privacy risks associated with ML models and emphasize the necessity for ongoing research vigilance in this rapidly evolving ML landscape.

August

Nick FISCHER
Algorithms for Sparse Convolution and Sublinear Edit Distance
(Advisor: Prof. Karl Bringmann)
Tuesday, 29.08.23, 15:00 h, building E1 4, Room 0.24

In this PhD thesis on fine-grained algorithm design and complexity, we investigate output-sensitive and sublinear-time algorithms for two important problems.
* Sparse Convolution: Computing the convolution of two vectors is a funda-mental algorithmic primitive. In the sparse convolution problem we assume that the input and output vectors have at most $t$ nonzero entries, and the goal is to design algorithms with running times dependent on $t$. For the special case where all entries are nonnegative, which is particularly important for algorithm design, it is known since twenty years that sparse convolutions can be computed in near-linear randomized time $O(t \log^2 n)$. In this thesis we develop a randomized algorithm with running time $O(t \log t)$ which is optimal (under some mild assumptions), and the first near-linear deterministic algorithm for sparse nonnegative convolution. We also present an application of these results, leading to seemingly unrelated fine-grained lower bounds against distance oracles in graphs.
* Sublinear Edit Distance: The edit distance of two strings is a well-studied si-milarity measure with numerous applications in computational biology. While computing the edit distance exactly provably requires quadratic time, a long line of research has lead to a constant-factor approximation algorithm in almost-linear time. Perhaps surprisingly, it is also possible to approximate the edit distance $k$ within a large factor $O(k)$ in sublinear time $\widetilde{O}(\frac nk + k^{O(1)})$. We drastically improve the approximation factor of the known sublinear algorithms from $O(k)$ to $k^{o(1)}$ while pre-serving the $O(\frac nk + k^{O(1)})$ running time.

Xinlei HE
Privacy Risk Assessment of Emerging Machine Learning Paradigms
(Advisor: Dr. Yang Zhang)
Wednesday, 16.08.23, 14:00 h, building E9 1, Room 0.01

Machine learning (ML) has progressed tremendously, and data is the key factor to drive such development. However, there are two main challenges regarding collecting the data and handling it with ML models. First, the acquisition of high-quality labeled data can be difficult and expensive due to the need for extensive human annotation. Second, to model the complex relationship between entities, e.g., social networks or molecule structures, graphs have been leveraged. However, conventional ML models may not effectively handle graph data due to the non-linear and complex nature of the relationships between nodes. To address these challenges, recent developments in semi-supervised learning and self-supervised learning have been introduced to leverage unlabeled data for ML tasks. In addition, a new family of ML models known as graph neural networks has been proposed to tackle the challenges associated with graph data. Despite being powerful, the potential privacy risk stemming from these paradigms should also be taken into account. In this dissertation, we perform the privacy risk assessment of the emerging machine learning paradigms. Firstly, we investigate the membership privacy leakage stemming from semi-supervised learning. Concretely, we propose the first data augmentation-based membership inference attack that is tailored to the training paradigm of semi-supervised learning methods. Secondly, we quantify the privacy leakage of self-supervised learning through the lens of membership inference attacks and attribute inference attacks. Thirdly, we study the privacy implications of training GNNs on graphs. In particular, we propose the first attack to steal a graph from the outputs of a GNN model that is trained on the graph. Finally, we also explore potential defense mechanisms to mitigate these attacks.

July

Trung Tin NGUYEN
Understanding and Measuring Privacy Violations in Android Apps
(Advisor: Prof. Michael Backes)
Tuesday, 25.07.23, 13:00 h, building E9 1, Room 0.01

Increasing data collection and tracking of consumers by today’s online services is becoming a major problem for individuals‘ rights. It raises a serious question about whether such data collection can be legally justified under legislation around the globe. Unfortunately, the community lacks insight into such violations in the mobile ecosystem.
In this dissertation, we approach these problems by presenting a line of work that provides a comprehensive understanding of privacy violations in Android apps in the wild and automatically measures such violations at scale. First, we build an automated tool that detects unexpected data access based on user perception when interacting with the apps‘ user interface. Subsequently, we perform a large-scale study on Android apps to understand how prevalent violations of GDPR’s explicit consent requirement are in the wild. Finally, un-til now, no study has systematically analyzed the currently implemented consent notices and whether they conform to GDPR in mobile apps. Therefore, we propose a mostly automated and scalable approach to identify the current practices of implemented consent notices. We then develop an automatic tool that detects data sent out to the Internet with different consent conditions.
Our result shows the urgent need for more transparent user interface designs to better inform users of data access and call for new tools to support app developers in this endeavor.

Mojtaba BEMANA
Efficient Image-Based Rendering
(Advisors: Dr. habil. Karol Myszkowski & Prof. Hans-Peter Seidel)
Wednesday, 12.07.23, 11:00 h, building E1 4, Room 0.19

Despite recent advancements in real-time ray tracing and deep learning for producing photo-realistic computer-generated images (CGI), the creation of CGI remains time-consuming and resource-intensive. Image-based rendering (IBR) provides an alternative by using pre-captured real-world images to generate realistic images in real-time, elimi-nating the need for extensive modeling. However, achieving faithful IBR reconstruction often requires dense scene sampling, leading to storage, capture, and processing challenges. Furthermore, IBR still struggles to offer the same level of control over scene attributes as traditional CG pipelines or accurately reproduce complex scenes and objects with materials like transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and human perception. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-driven approach that enables accurate novel view synthesis of the refractive objects. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.

Lars PREHN
Routegazing: Analysing the Evolving Internet Routing Ecosystem
(Advisor: Prof. Anja Feldmann)
Thursday, 06.07.23, 13:00 h, building E1 5, Room 002

The Internet’s routing ecosystem constantly evolves to meet the needs of its stakeholders and users. Tracking this evolution is essential, e.g., to identify business opportunities, address security challenges, or inform protocol design. However, most Internet protocols were designed without measurability in mind; hence, many measurements and inference methods rely on exploiting protocol-specific side effects.
This dissertation first assesses the limitations of our deployed observation infrastructures and commonly used inference methods via three orthogonal contributions: a case study on a European Internet Exchange Point to assess our visibility into the Internet’s AS topology; a framework to identify and measure biases in the placement of our vantage points across multiple dimensions; and a systematic analysis of the biases and sensitivity of AS relationship inference algorithms. We found that our view of the Internet’s AS topology diminishes over time, and that our AS relationship models are more biased and sensitive to short-term routing dynamics than previously assumed.
With these limitations in mind, we focused on one of the most critical routing ecosystem changes, IPv4 exhaustion, and two ways network operators can deal with it. First, we explored the IPv4 buying and leasing markets, identified market trends, and discussed the viability of these markets for different network types. Second, we analyzed the benefits, usage patterns, and disadvantages of announcing tiny address blocks—which we call „hyper-specific.“ We argue that a combination of leased IPv4 addresses and hyper-specific prefix announcements likely suffice for many networks to bridge the gap until full IPv6 adoption.
Besides its IPv6 adoption, the routing ecosystem also evolved in other dimensions. We first studied AS path prepending to assess the security implication of these changes. We found a typical configuration with no benefits yet an increase of an AS’s vulnerability to prefix hijacks. Infrastructural changes led to an overall decrease in prepending sizes over time and hence a safer use of the technique. However, we demonstrated that we can exploit the same changes to re-orchestrate prefix de-aggregation attacks to overcome widely deployed prevention mechanisms. We validated our assumptions and attack model using a real-world testbed and proposed updates to existing prevention mechanisms. Our two-stage disclosure campaign contributed to a safer routing ecosystem.

Noemi PASSING
Compositional Synthesis of Reactive Systems
(Advisor: Prof. Bernd Finkbeiner)
Tuesday, 04.07.23, 16:30 h, building E1 7, Room 0.01

Synthesis is the task of automatically deriving correct-by-construction implementations from formal specifications. While it is a promising path toward developing verified programs, it is infamous for being hard to solve. Compositionality is recognized as a key technique for reducing the complexity of synthesis. So far, compositional approaches require extensive manual effort. In this thesis, we introduce algorithms that automate these steps.
In the first part, we develop compositional synthesis techniques for distributed systems. Providing assumptions on other processes‘ behavior is fundamental in this setting due to inter-process dependencies. We establish delay-dominance, a new requirement for im-plementations that allows for implicitly assuming that other processes will not maliciously violate the shared goal. Furthermore, we present an algorithm that computes explicit assumptions on process behavior to address more complex dependencies.
In the second part, we transfer the concept of compositionality from distributed to single-process systems. We present a preprocessing technique for synthesis that identifies independently synthesizable system components. We extend this approach to an incremental synthesis algorithm, resulting in more fine-grained decompositions. Our experimental evaluation shows that our techniques automate the required manual efforts, resulting in fully automated compositional synthesis algorithms for both distributed and single-process systems.

Tobias STARK
Real-Time Execution Management in the ROS 2 Framework
(Advisor: Dr. Björn Brandenburg)
Tuesday, 04.07.23, 10:00 h, building E1 7, Room 0.01

Over the past decade, the ROS ecosystem has emerged as the most popular repository of opensource robotics software. As a result, many robots rely on ROS-based software to make timing-critical decisions in real time. However, there is little evidence that real-time theory is used to analytically bound or control the worst-case response time in ROS components.
This dissertation identifies three main hurdles to adopt real-time theory in the context of ROS 2: first, the complex and non-obvious timing effects introduced by the ROS 2 frame-work; second, the expertise required to use real-time scheduling mechanisms correctly; and third, the inherent unpredictability of typical robotics workloads, which defy static provisioning.
To overcome these hurdles, the dissertation introduces a timing model for ROS 2 application, together with a response-time analysis that allows robotics developers to bound the worst-case response time of individual components and multi-component processing chains.
However, modeling and provisioning ROS 2 systems remains a cumbersome and error-prone task. In a second step, the dissertation hence proposes ROS-Llama, an automatic latency manager for ROS 2. ROS-Llama automatically controls the latency of a ROS 2 system through real-time scheduling, while requiring only little effort and no real-time scheduling expertise by the user. It runs in parallel with the deployed application and can therefore measure all required information without user involvement and adapt to changes at runtime. As part of ROS-Llama’s design, the dissertation discusses the conceptual and practical challenges in developing such an automatic tool, identifying relevant properties of ROS 2 and essential requirements of the robotics domain.

June

David I. ADELANI
Natural Language Processing for African Languages
(Advisor: Prof. Dietrich Klakow)
Tuesday, 27.06.23, 13:00 h, building C7 4, Conference room

Recent advances in pre-training of word embeddings and language models leverage large amounts of unlabelled texts and self-supervised learning to learn distributed representations that have significantly improved the performance of deep learning models on a large variety of natural language processing tasks. Similarly, multilingual variants of these models have been developed from web-crawled multilingual resources like Wikipedia and Common crawl. However, there are some drawbacks to building these multilingual representation models using these web texts. First, the models only include few low-resource languages in the training corpus, and additionally, the texts of these languages are often noisy or of low quality texts. Second, their performance on downstream NLP tasks is difficult to evaluate because of the absence of labelled datasets, therefore, they are typically only evaluated on English and other high-resource languages.
In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in multilingual evaluations, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.
In order to advance the progress of NLP for African languages, future work should focus on expanding benchmark datasets for African languages in other important NLP tasks like part of speech tagging, sentiment analysis, hate speech detection, and question answering. Another direction is to focus on development of Africa-centric PLMs. Lastly, research on speech that involves developing corpora and techniques that require zero or few paired speech-text data would be very essential for the survival of many under-resourced African languages.

Elena ARABADZHIYSKA-KOLEVA
Perceptually driven methods for improved gaze-contingent rendering
(Advisor: Prof. Piotr Didyk, now Univ. della Svizzera Italiana)
Wednesday, 21.06.23, 15:00 h, building E1 5, Room 0.29

Computer graphics is responsible for the creation of beautiful and realistic content. However, visually pleasing results often come at an immense computational cost, especially for new display devices such as virtual reality headsets. A promising solution to overcome this problem is to use foveated rendering, which exploits the limitations of the human visual system with the help of eye trackers. In particular, visual acuity is not uniform across the visual field but it is rather focused in its center and it is rapidly declining towards the periphery. Foveated rendering takes advantage of this feature by displaying high-quality content only at the gaze location, gradually decreasing it towards the periphery. While this method is effective, it is subject to some limitations. An example of such limitation is the system latency, which becomes noticeable during rapid eye movements when the central vision is exposed to low-resolution content, reserved only for the peripheral vision. Another example is the prediction of the allowed quality degradation, which is based solely on the visual eccentricity; however, the loss of the peripheral acuity is more complex and it relies on the image content as well.
This thesis addresses these limitations by designing new, perceptually-driven methods for gaze-contingent rendering. The first part introduces a new model for saccade landing position prediction to combat system latency during rapid eye movements. This method extrapolates the gaze information from delayed eye-tracking samples and predicts the saccade’s landing position. The new gaze estimate is then used in the rendering pipeline in order to forestall the system latency. The model is further refined by considering the idiosyncratic characteristics of the saccades. The second part of this thesis introduces a new luminance-contrast-aware foveated rendering technique, which models the allowed peripheral quality degradation as a function of both visual eccentricity and local luminance contrast. The advantage of this model lies in its prediction of the perceived quality loss due to foveated rendering without full-resolution reference. As a consequence, it can be applied to foveated rendering to achieve better computational savings.

Narges POURJAFARIAN
Physical Sketching Tools and Techniques for Customized Sensate Surfaces
(Advisor: Prof. Jürgen Steimle)
Wednesday, 07.06.23, 16:00 h, building E1 1, Room 2.06

Sensate surfaces are a promising avenue for enhancing human interaction with digital systems due to their inherent intuitiveness and natural user interface. Recent technological advancements have enabled sensate surfaces to surpass the constraints of conventional touchscreens by integrating them into everyday objects, creating interactive interfaces that can detect various inputs such as touch, pressure, and gestures. This allows for more natu-ral and intuitive control of digital systems. However, prototyping interactive surfaces that are customized to users‘ requirements using conventional techniques remains technically challenging due to limitations in accommodating complex geometric shapes and varying sizes. Furthermore, it is crucial to consider the context in which customized surfaces are utilized, as relocating them to fabrication labs may lead to the loss of their original design context. Additionally, prototyping high-resolution sensate surfaces presents challenges due to the complex signal processing requirements involved. This work investigates the design and fabrication of customized sensate surfaces that meet the diverse requirements of different users and contexts. The research aims to develop novel tools and techniques that overcome the technical limitations of current methods and enable the creation of sensate surfaces that enhance human interaction with digital systems.

May

Sebastian BIEWER
Software Doping – Theory and Detection
(Advisor: Prof. Holger Hermanns)
Thursday, 25.05.23, 15:15 h, building E1 7, Room 0.01

Software is doped if it contains a hidden functionality that is intentionally included by the manufacturer and is not in the interest of the user or society. This thesis complements this informal definition by a set of formal cleanness definitions that characterise the absence of software doping. These definitions reflect common expectations on clean software behaviour and are applicable to many types of software, from printers to cars to discriminatory AI systems. We use these definitions to propose white-box and black-box analysis techniques to detect software doping. In particular, we present a provably correct, model-based testing algorithm that is intertwined with a probabilistic-falsification-based test input selection technique. We identify and explain how to overcome the challenges that are specific to real-world software doping tests and analyses. We use the Diesel Emissions Scandal to demonstrate the strength of our cleanness definitions and analysis techniques by applying them to emission cleaning systems of diesel cars.

Ikhansul HABIBIE
Leveraging EEG-based Speech Imagery Brain-Computer Interfaces
(Advisor: Prof. Christian Theobalt)
Tuesday, 16.05.23, 10:00 h, building E14, Room 0.24

Realistic virtual human avatar is a crucial element in a wide range of applications, from 3D animated movies to emerging AR/VR technologies. However, producing a believable 3D motion for such avatars is widely known to be a challenging task. A traditional 3D human motion generation pipeline consists of several stages, each requiring expensive equipment and skilled human labor to perform, limiting its usage beyond the entertainment industry despite its massive potential benefits.
This thesis attempts to explore some alternative solutions to reduce the complexity of the traditional 3D animation pipeline. To this end, it presents several novel ways to perform 3D human motion capture, synthesis, and control.
Specifically, it focuses on using learning-based methods to bypass the critical bottlenecks of the classical animation approach. First, a new 3D pose estimation method from in-the-wild monocular images is proposed, eliminating the need for a multi-camera setup in the traditional motion capture system. Second, it explores several data-driven designs to achieve a believable 3D human motion synthesis and control that can potentially reduce the need for manual animation. In particular, the problem of speech-driven 3D gesture synthesis is chosen as the case study due to its uniquely ambiguous nature. The improved motion generation quality is achieved by introducing a novel adversarial objective that rates the difference between real and synthetic data. A novel motion generation strategy is also introduced by combining a classical database search algorithm with a powerful deep learning method, resulting in a greater motion control variation than the purely predictive counterparts.
Furthermore, this thesis also contributes a new way of collecting a large-scale 3D motion dataset through the use of learning-based monocular estimations methods. This result demonstrates the promising capability of learning-based monocular approaches and shows the prospect of combining these learning-based modules into an integrated 3D animation framework.
The presented learning-based solutions open the possibility of democratizing the traditional 3D animation system that can be enabled using low-cost equipment, e.g., a single RGB camera. Finally, this thesis also discusses the potential further integration of these learning-based approaches to enhance 3D animation technology.

Maurice REKRUT
Leveraging EEG-based Speech Imagery Brain-Computer Interfaces
(Advisor: Prof. Antonio Krüger)
Friday, 04.05.23, 15:00 h, building D3 2, Room -2.17 (Reuse)

Speech Imagery Brain-Computer Interfaces (BCIs) provide an intuitive and flexible way of interaction via brain activity recorded during imagined speech. Imagined speech can be decoded in form of syllables or words and captured even with non-invasive measurement methods as for example the Electroencephalography (EEG). Over the last decade, research in this field has made tremendous progress and prototypical implementations of EEG-based Speech Imagery BCIs are numerous. However, most work is still conducted in controlled laboratory environments with offline classification and does not find its way to real online scenarios.
Within this thesis we identify three main reasons for these circumstances, namely, the mentally and physically exhausting training procedures, insufficient classification accuracies and cumbersome EEG setups with usually high-resolution headsets. We furthermore elaborate on possible solutions to overcome the aforementioned problems and present and evaluate new methods in each of the domains. In detail we introduce two new training concepts for imagined speech BCIs, one based on EEG activity during silently reading and the other recorded during overtly speaking certain words. Insufficient classification accuracies are addressed by introducing the concept of a Semantic Speech Imagery BCI, which classifies the semantic category of an imagined word prior to the word itself to increase the performance of the system. Finally, we investigate on different techniques for electrode reduction in Speech Imagery BCIs and aim at finding a suitable subset of electrodes for EEG-based imagined speech detection, therefore facilitating the cumbersome setups. All of our presented results together with general remarks on experiences and best practice for study setups concerning imagined speech are summarized and supposed to act as guidelines for further research in the field, thereby leveraging Speech Imagery BCIs towards real-world application.

Han DU
Modeling Variation of Human Motion
(Advisor: Prof. Philipp Slusallek)
Thursday, 04.05.23, 08:30 h, building D3 2, Room -2.17 (Reuse)

This thesis presents a series of works that analyze and model the variations of human motion data. The goal is to learn statistical generative models to create any number of new human animations with rich variations and styles. The synthesis of realistic human motion with large variations and different styles has a growing interest in simulation applications such as the game industry, psychological experiments, and ergonomic analysis. The statistical generative models are used by motion controllers in our motion synthesis framework to create new animations for different scenarios.

Jiayi WANG
3D Hand Reconstruction from Monocular Camera with Model-Based Priors
(Advisor: Prof. Christian Theobalt)
Wednesday, 03.05.23, 14:00 h, building E1 4, Room 0.24

As virtual and augmented reality (VR/AR) technology gains popularity, facilitating intuitive digital interactions in 3D is of crucial importance. Tools such as VR controllers exist, but such devices support only a limited range of interactions, mapped onto complex sequences of button presses that can be intimidating to learn. In contrast, users already have an instinctive understanding of manual interactions in the real world, which is readily transferable to the virtual world. For enabling these interactions, hand-tracking systems using monocular images are desirable since they do not constrain articulation, unlike gloves or markers, and suitable input devices are pervasive in everyday life.
However existing learning-based methods have many limitations, such as their requirement for vast amounts of 3D annotations, the assumption only one hand appears in the scene, and their inability to characterize the 3D ambiguities in the input. Existing methods also focused primarily on modeling geometry while neglecting hand appearance. To tackle the aforementioned shortcomings of previous methods, this thesis advances the state-of-the-art through the novel use of model-based priors to incorporate hand-specific knowledge. In particular, this thesis presents a training method that reduces the amount of annotations required and is robust to systemic biases; it presents the first tracking method that addresses the challenging two-hand-interaction scenario using monocular RGB video, and also the first probabilistic method to model image ambiguity for two-hand interactions. Additionally, this thesis also contributes the first parametric hand texture model with example applications in hand personalization.

April

Sebastian ROTH
How to Deploy Security Mechanisms Online (Consistently)
(Advisor: Dr. Ben Stock)
Friday, 28.04.23, 10:00 h, building E9 1, Room 0.01

To mitigate a myriad of Web attacks, modern browsers support client-side security policies shipped through HTTP response headers. To enforce these policies, the operator can set response headers that the server then communicates to the client. We have shown that one of those, namely the Content Security Policy (CSP), requires massive engineering effort to be deployed in a non-trivially bypassable way. Thus, many policies deployed on Web sites are misconfigured. Due to the capability of CSP to also defend against framing-based attacks, it has a functionality-wise overlap with the X-Frame-Options header. We have shown that this overlap leads to inconsistent behavior of browsers, but also inconsistent deployment on real-world Web applications. Not only overloaded defense mechanisms are prone to security inconsistencies. We investigated that due to the structure of the Web itself, misconfigured origin servers or geolocation-based CDN caches can cause unwanted security inconsistencies. To not disregard the high number of misconfigurations of CSP, we also took a closer look at the deployment process of the mechanism. By conducting a semi-structured interview, including a coding task, we were able to shed light on motivations, strategies, and roadblocks of CSP deployment. However, due to the wide usage of CSP, drastic changes are generally considered impractical. Therefore, we also evaluated if one of the newest Web security features, namely Trusted Types, can be improved.

Edgar SCHÖNFELD
Improving Quality and Controllability in GAN-based Image Synthesis
(Advisor: Prof. Bernt Schiele)
Tuesday, 18.04.23, 09:30 h, building E1 4, DFKI, Room 0.24

The goal of the field of deep learning-based image synthesis is to achieve perfect visual realism, and to let users precisely control the content of the synthetic images. Generative adversarial networks (GANs) have been the most popular image synthesis framework until recently, due to their unrivaled image quality. Yet, there is still much room for improve-ment regarding synthesis quality and precisely controlling the image content. For this reason, this thesis introduces methods that improve both the synthesis quality and control-lability of GANs. Specifically, we address the following subproblems. First, we propose the idea of segmentation-based discriminator networks and segmentation-based regularizations for GANs. The new approach improves the quality of conditional and unconditional image synthesis. Second, we show that this approach is naturally well-suited for semantic image synthesis. Centered around the idea of segmentation-based discriminators, this thesis introduces techniques that strongly improve image quality and multi-modality. Additionally, the methods result in better modeling of long-tailed data and new possibilities for global and local image editing. Finally, the improvements in multi-modality and image editing in semantic image synthesis open the door for controlling the image content via the latent space of the GAN generator. Therefore, this thesis introduces a method for finding interpretable directions in the latent space of semantic image synthesis GANs, which enables an additional form of control over the image content next to the semantic layouts.

March

Nicklas LINZ
Automatic Detection of Dementia and related Affective Disorders through Processing of Speech and Language
(Advisor: Prof. Antonio Krüger)
Friday, 24.03.23, 15:00 h, building D3 2, DFKI, ViS Room (SB-1.61)

In 2019, dementia is has become a trillion dollar disorder. Alzheimer’s disease (AD) is a type of dementia in which the main observable symptom is a decline in cognitive functions, notably memory, as well as language and problem-solving. Experts agree that early detection is crucial to effectively develop and apply interventions and treatments, underlining the need for effective and pervasive assessment and screening tools. The goal of this thesis is to explores how computational techniques can be used to process speech and language samples produced by patients suffering from dementia or related affective disorders, to the end of automatically detecting them in large populations using machine learning models. A strong focus is laid on the detection of early stage dementia (MCI), as most clinical trials today focus on intervention at this level. To this end, novel automatic and semi-automatic analysis schemes for a speech-based cognitive task, i.e., verbal fluency, are explored and evaluated to be an appropriate screening task. Due to a lack of available patient data in most languages, world-first multilingual approaches to detecting dementia are introduced in this thesis. Results are encouraging and clear benefits on a small French dataset become visible. Lastly, the task of detecting these people with dementia who also suffer from an affective disorder called apathy is explored. Since they are more likely to convert into later stage of dementia faster, it is crucial to identify them. These are the first experiments that consider this task using solely speech and language as inputs. Results are again encouraging, both using only speech or language data elicited using emotional questions. Overall, strong results encourage further research in establishing speech-based biomarkers for early detection and monitoring of these disorders to better patients’ lives.

Donald DEGRAEN
Designing Tactile Experiences for Immersive Virtual Environments
(Advisor: Prof. Antonio Krüger)
Tuesday, 21.03.23, 14:00 h, building D3 2, Reuse meeting room

Designing for the sense of touch is essential in creating convincing and realistic experiences in Virtual Reality (VR). Currently, a variety of methods exist for simulating touch experiences. However, developing effective and convincing haptic feedback still remains challenging. In this work, we study how real-world touch experiences can inform haptic design processes for VR. Firstly, we investigate the reproduction of haptic features by capturing and fabricating surface microgeometry. We show that haptic reproduction is able to create a wide range of feel aesthetics. Furthermore, we build upon procedural design by generating and fabricating haptically-varying surface structures. We show that digital design processes are able to generate flexible and universal structures that directly influence tactile dimensions, such as roughness and hardness. Lastly, we investigate correspondences between different sensory modalities to enhance the design of tactile experiences. We show that vocal expressions can translate a designer’s intent into effective haptic feedback, while providing a rapid in-situ design process. This thesis advances the fields of VR, haptic design, and fabrication by contributing knowledge to the question of how effective tactile experiences can be designed.

Anna HAKE (née Feldmann)
Predicting and analyzing HIV-1 adaptation to broadly neutralizing antibodies and the host immune system using machine learning
(Advisors: Prof. Nico Pfeifer, now Uni Tübingen)
Monday, 20.03.23, 14:00 h, building E1 4, Rm 0.24

With neither a cure nor a vaccine at hand, infection with the human immunode-ficiency virus type 1 (HIV-1) is still a major global health threat. Viral control is usually gained using lifelong therapy with antiretroviral drugs and rarely by the immune system alone. Without drug exposure, interindividual differences in viral control are partly influenced by host genetic factors like the human leukocyte antigen (HLA) system, and viral genetic factors like the predominant coreceptor usage of the virus. Thanks to its extraordinarily high mutation and replication rate, HIV-1 is however able to rapidly adapt to the selection pressure imposed by the host immune system or antiretroviral drug exposure.
For a successful control of the virus, it is thus vital to have fast and reliable methods in place that assess the viral adaptation to drugs of interest prior to their (further) administration. For a better assessment of our ability to control the virus, it is also important to estimate the viral adaptation to the host immune system.
In this talk, I will present four studies all aiming to further our understanding of HIV-1 adaptation and our ability to reliably predict it. In particular, we present a SVM approach to predict HIV adaptation to broadly neutralizing antibodies (bNAbs), a promising new treatment option. In addition, we use statistical learn-ing to further characterize antibody-mediated therapy with the promising bNAb 3BNC177 by investigating its ability (i) to suppress the virus and (ii) to boost the immune system. Finally, I will introduce a novel way to predict HIV-1 adaptation to the host immune system using Bayesian generalized linear mixed models, which allowed us to investigate the relationship between HIV-1 coreceptor usage and its adaptation to the host HLA system.

Bharat Lal Bhatnagar
Modelling 3D Humans: Pose, Shape, Clothing and Interactions
(Advisors: Prof. Gerard Pons-Moll, now Uni Tübingen)
Thursday, 16.03.23, 18:00 h, building E1 4, Rm 0.24

Digital humans are increasingly becoming a part of our lives with applications like animation, gaming, virtual try-on, Metaverse and much more. In recent years there has been a great push to make our models of digital humans as real as possible. In this thesis we present methodologies to model two key characteristics of real humans, their „appearance“ and „actions“. To this end, we discuss what are the best representations for humans, clothing and their interactions with their surroundings? How can we extract human appearance cues like pose, shape and clothing from scans, point clouds and images? How can we capture and in-turn model human-object interaction? and more

Fajar HAIFANI
On a Notion of Abduction and Relevance for First-Order Logic Clause Sets
(Advisors: Prof. Christoph Weidenbach and Dr. Sophie Tourret)
Thursday, 09.03.23, 14:00 h, building E1 4, Rm 0.24

I propose techniques to help explain entailment and non-entailment in first-order logic. For entailment, I classify clauses necessary for any possible deduction (syntactically relevant), usable for some deduction (syntactically semi-relevant), or unusable (syntactically irrelevant) along with a semantic characterization via conflict literals (contradictory simple facts). This offers a novel insight beyond the existing notion of minimal unsatisfiable set. The need to test if a clause is syntactically semi-relevant leads to a generalization of the completeness result of a well-known resolution strategy: resolution with the set-of-support (SOS) strategy is refutationally complete on a clause set N and SOS M if and only if there is a resolution refutation from N ∪ M using a clause in M. For non-entailment, abductive reasoning helps find extensions of a knowledge base to obtain an entailment of some missing consequence. I focus on EL TBox abduction that is lightweight but prevalent in practice. The solution space can be huge so, to help sort the chaff from the grain, I introduce connection-minimality, a criterion such that accepted hypotheses always immediately relate the observation to the given axioms. I show that such hypotheses are computable using prime implicate-based abduction in first-order logic. I evaluate this notion on ontologies from the medical domain using an implementation with SPASS as a prime implicate generation engine.

Mirko PALMER
Towards Enabling Cross-layer Information Sharing to Improve Today’s Content Delivery Systems
(Advisor: Prof. Anja Feldmann)
Thursday, 02.03.23, 15:00 h, building E1 4, Rm 0.24

Content is omnipresent and without content the Internet would not be what it is today. End users consume content throughout the day, from checking the latest news on Twitter in the morning, to streaming music in the background (while working), to streaming movies or playing online games in the evening, and to using apps (e.g., sleep trackers) even while we sleep in the night. All of these different kinds of content have very specific and different requirements on a transport—on one end, online gaming often requires a low latency connection but needs little throughput, and, on the other, streaming a video requires high throughput, but it performs quite poorly under packet loss. Yet, all content is transferred opaquely over the same transport, adhering to a strict separation of network layers. Even a modern transport protocol such as Multi-Path TCP, which is capable of utilizing multiple paths, cannot take the (above) requirements or needs of that content into account for its path selection. In this work we challenge the layer separation and show that sharing information across the layers is beneficial for consuming web and video content. To this end, we created an event-based simulator for evaluating how applications can make informed decisions about which interfaces to use delivering different content based on a set of pre-defined policies that encode the (performance) requirements or needs of that content. Our policies achieve speedups of a factor of two in 20% of our cases, have benefits in more than 50%, and create no overhead in any of the cases. For video content we created a full streaming system that allows an even finer grained information sharing between the transport and the application. Our streaming system, called VOXEL, enables applications to select dynamically and on a frame granularity which video data to transfer based on the current network conditions. VOXEL drastically reduces video stalls in the 90th-percentile by up to 97% while not sacrificing the stream’s visual fidelity. We confirmed our performance improvements in a real-user study where 84% of the participants clearly preferred watching videos streamed with VOXEL over the state-of-the-art.

February

Johannes BUND
Hazard-Free Clock Synchronization
(Advisor: Dr. Christoph Lenzen)
Tuesday, 28.02.23, 13:00 h, building E1 4, Rm 0.24

The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with small clock skew. Hence, chips are split into multiple clock regions, which are each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on implementation of clock synchronization algorithms.
A major obstacle when implementing circuits on clock domain crossings are hazardous signals. Extending the Boolean logic by a third value ‚u‘ we can formally define hazards. In this thesis we describe a theory for design and analysis of hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them.

Said Jawad SAIDI
Characterizing the IoT Ecosystem at Scale
(Advisor: Prof. Anja Feldmann)
Friday, 24.02.23, 15:00 h, building E1 4, Rm 0.24

Internet of Things (IoT) devices are extremely popular with home, business, and industrial users. To provide their services, they typically rely on a backend server infrastructure on the Internet, which collectively form the IoT Ecosystem. This ecosystem is rapidly growing and offers users an increasing number of services. It also has been a source and target of significant security and privacy risks. One notable example is the recent large-scale coordinated global attacks, like Mirai, which disrupted large service providers. Thus, characterizing this ecosystem yields insights that help end-users, network operators, policymakers, and researchers better understand it, obtain a detailed view, and keep track of its evolution. In addition, they can use these insights to inform their decision-making process for mitigating this ecosystem’s security and privacy risks. In this dissertation, we characterize the IoT ecosystem at scale by (i) detecting the IoT devices in the wild, (ii) conducting a case study to measure how deployed IoT devices can affect users’ privacy, and (iii) detecting and measuring the IoT backend infrastructure.
To conduct our studies, we collaborated with a large European Internet Service Provider (ISP) and a major European Internet eXchange Point (IXP). They routinely collect large volumes of passive, sampled data, e.g., NetFlow and IPFIX, for their operational purposes. These data sources help providers obtain insights about their networks, and we used them to characterize the IoT ecosystem at scale.
We start with IoT devices and study how to track and trace their activity in the wild. We developed and evaluated a scalable methodology to accurately detect and monitor IoT devices with limited, sparsely sampled data in the ISP and IXP.
Next, we conduct a case study to measure how a myriad of deployed devices can affect the privacy of ISP subscribers. Unfortunately, we found that the privacy of a substantial fraction of IPv6 end-users is at risk. We noticed that a single device at home that encodes its MAC address into the IPv6 address could be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage.
Finally, we focus on the backend server infrastructure and propose a methodology to identify and locate IoT backend servers operated by cloud services and IoT vendors. We analyzed their IoT traffic patterns as observed in the ISP. Our analysis sheds light on their diverse operational and deployment strategies.
The need for issuing a priori unknown network-wide queries against large volumes of network flow capture data, which we used in our studies, motivated us to develop Flowyager. It is a system built on top of existing traffic capture utilities, and it relies on flow summarization techniques to reduce (i) the storage and transfer cost of flow captures and (ii) query response time. We deployed a prototype of Flowyager at both the IXP and ISP.

January

Yaoyao LIU
Learning from Imperfect Data: Incremental Learning and Few-shot Learning
(Advisor: Prof. Bernt Schiele)
Friday, 27.01.23, 16:30 h, building E1 4, Rm 0.24

In recent years, artificial intelligence (AI) has achieved great success in many fields. Although impressive advances have been made, AI algorithms still suffer from an important limitation: they rely on static and large-scale datasets. In contrast, human beings naturally possess the ability to learn novel knowledge from imperfect real-world data such as a small number of samples or a non-static continual data stream. Attaining such an ability is particularly appealing and will push the AI models one step further toward human-level Intelligence. In this talk, I will present my work on addressing these challenges in the context of class-incremental learning and few-shot learning. Specifically, I will first discuss how to get better exemplars for class-incremental learning based on optimization. I parameterize exemplars and optimize them in an end-to-end manner to obtain high-quality memory-efficient exemplars. I will present my work on how to apply incremental techniques to a more challenging and realistic scenario, object detection. I will provide algorithm design on a transformer-based incremental object detection framework. I will briefly mention my work on addressing other challenges and discuss future research directions.

Dominik KIRST
Mechanised Metamathematics: An Investigation of First-Order Logic and Set Theory in Constructive Type Theory
(Advisor: Prof. Gert Smolka)
Friday, 27.01.23, 15:15 h, building E1 1, Rm 4.07

In this thesis, we investigate several key results in the canon of metamathematics, applying the contemporary perspective of formalisation in constructive type theory and mechanisation in the Coq proof assistant. Concretely, we consider the central completeness, undecidability, and incompleteness theorems of first-order logic as well as properties of the axiom of choice and the continuum hypothesis in axiomatic set theory. Due to their fundamental role in the foundations of mathematics and their technical intricacies, these results have a long tradition in the codification as standard literature and, in more recent investigations, increasingly serve as a benchmark for computer mechanisation.
With the present thesis, we continue this tradition by uniformly analysing the aforementioned cornerstones of metamathematics in the formal framework of constructive type theory. This programme offers novel insights into the constructive content of completeness, a synthetic approach to undecidability and incompleteness that largely eliminates the notorious tedium obscuring the essence of their proofs, as well as natural representations of set theory in the form of a second-order axiomatisation and of a fully type-theoretic account. The mechanisation concerning first-order logic is organised as a com-prehensive Coq library open to usage and contribution by external users.

Tim KEHL
Following the trail of cellular signatures: Computational methods for the analysis of molecular high-throughput profiles
(Advisor: Prof. Hans-Peter Lenhof)
Friday, 13.01.23, 11:00 h, building E2 1, Rm 007

Over the last three decades, high-throughput techniques, such as next- generation sequencing, microarrays, or mass spectrometry, have revolutionized biomedical research by enabling scientists to generate detailed molecular profiles of biological samples on a large scale. These profiles are usually complex, high-dimensional, and often prone to technical noise, which makes a manual inspection practically impossible. Hence, powerful computational methods are required that enable the analysis and exploration of these data sets and thereby help researchers to gain novel insights into the underlying biology.
In this thesis, we present a comprehensive collection of algorithms, tools, and databases for the integrative analysis of molecular high- throughput profiles. We developed these tools with two primary goals in mind. The detection of deregulated biological processes in complex diseases, like cancer, and the identification of driving factors within those processes.
Our first contribution in this context are several major extensions of the GeneTrail web service that make it one of the most comprehen- sive toolboxes for the analysis of deregulated biological processes and signaling pathways. GeneTrail offers a collection of powerful enrichment and network analysis algorithms that can be used to examine genomic, epigenomic, transcriptomic, miRNomic, and proteomic data sets. In addition to approaches for the analysis of individual -omics types, our framework also provides functionality for the integrative analysis of multi-omics data sets, the investigation of time-resolved expression profiles, and the exploration of single-cell experiments. Besides the analysis of deregulated biological processes, we also focus on the identification of driving factors within those processes, in particular, miRNAs and transcriptional regulators.
For miRNAs, we created the miRNA pathway dictionary database miRPathDB, which compiles links between miRNAs, target genes, and target pathways. Furthermore, it provides a variety of tools that help to study associations between them. For the analysis of transcriptional regulators, we developed REGGAE, a novel algorithm for the identification of key regulators that have a significant impact on deregulated genes, e.g., genes that show large expression differences in a comparison between disease and control samples. To analyze the influence of transcriptional regulators on deregulated biological processes we also created the RegulatorTrail web service. In addition to REGGAE, this tool suite compiles a range of powerful algorithms that can be used to identify key regulators in transcriptomic, proteomic, and epigenomic data sets.
Moreover, we evaluate the capabilities of our tool suite through several case studies that highlight the versatility and potential of our framework. In particular, we used our tools to conducted a detailed analysis of a Wilms’ tumor data set. Here, we could identify a circuitry of regulatory mechanisms, including new potential biomarkers, that might contribute to the blastemal subtype’s increased malignancy, which could potentially lead to new therapeutic strategies for Wilms’ tumors.
In summary, we present and evaluate a comprehensive framework of powerful algorithms, tools, and databases to analyze molecular high-throughput profiles. The provided methods are of broad inter- est to the scientific community and can help to elucidate complex pathogenic mechanisms.