PhD Thesis Defenses 2024

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.

 

December

Jordan ETHAN
Provable Security of Symmetric-Key Cryptographic Schemes in Classical and Quantum Frameworks
(Advisor: Prof. Antoine Joux)
Thursday, 19.12.24 14:30 h , building E9 1, room 0.02

In symmetric-key cryptography, most well-known algorithms are built as modes of operation on fixed $n$-bit primitives. These primitives are typically based on pseudorandom functions (PRFs) or pseudorandom permutations (PRPs), which are efficiently computable keyed functions or permutations indistinguishable from random ones. In particular, $2n$-bit-to-$n$-bit PRFs are key for constructing secure classical schemes, such as deterministic MACs and authenticated encryption with the SIV construction. While these results do not directly apply to the quantum setting, constructing a quantum-secure compressing PRF from smaller PRFs is a crucial step in developing sophisticated quantum cryptographic algorithms.
In this talk, we study a class of $2n$-bit-to-$n$-bit compression functions, featuring a non-linear „call“ between every two linear layers. We analyze the security of two- and three-call constructions, assessing their vulnerabilities to both classical and quantum attacks. Using a new framework for proving indistinguishability in the quantum setting, we show the quantum PRF (qPRF) security of two three-call constructions, TNT and LRWQ, introduced by Hosoyamada and Iwata. We also identify a flaw in the security proof of the Luby-Rackoff construction, proposed by Hosoyamada and Iwata, and show how it can be fixed by focusing on non-adaptive adversaries.
Additionally, this dissertation includes results in the classical setting, such as the design of efficient tweakable block ciphers with public permutations achieving beyond-birthday-bound security, and the analysis of authenticated encryption schemes through vulnerabilities in protocols like Telegram’s MTProto, as well as leakage resilience and context commitment in single-pass schemes like Triplex.

Sahar ABDELNABI
On the Opportunities and Risks of Machine Learning to Online and Societal Safety
(Advisor: Prof. Mario Fritz)
Thursday, 19.12.24 10:00 h , building E9 1, room 0.01

Machine Learning (ML), with its continuous and ever-growing significant advances, has great potential to accelerate decision-making, alleviate some of our societal problems, and reshape and facilitate our daily lives. However, ML has inherent security vulnerabilities and limitations and can itself be exploited and misused to exacerbate such societal problems, which requires a thorough evaluation of capabilities, attacks, and countermeasures.
In this thesis, we evaluate the interplay between ML, security, and online and societal safety aspects, such as misinformation and risks imposed by the use of Large Language Models (LLMs). To counter risks imposed by LLMs and generative models and help identify the context and provenance of information, we propose watermarking as an active defence against deep-fakes and model abuse. To exemplify ML opportunities to promote online safety, we leverage ML to automate multi-modal fact-checking and identify the underlying context of images that might be used out of context. On the other hand, to evaluate the risk of how ML can exacerbate misinformation and cause information contamination and poisoning, we comprehensively study attacks against fact-checking models and possible ones against real-world deployed LLM-integrated search engines. Besides that, we broadly discuss LLM-integrated applications and their potential security risks induced by the indirect prompt injection vulnerability that we uncover. Finally, to proactively evaluate LLMs in interactive setups that better match real-world use cases, such as customer service chatbots, we propose a new benchmark of complex text-based negotiation games to examine LLMs’ performance and reasoning in multi-agent setups, including adversarial ones that assume attacks between agents.

Immanuel HAFFNER
Optimization and Processing of Relational Database Queries
(Advisor: Prof. Jens Dittrich)
Wednesday, 18.12.24 14:00 h , building E1 1, room 407

The purpose of this Ph.D. thesis is to advance the state of the art of query processing in relational database systems. As first contribution, I present a reduction of the classical, NP-hard join order optimization problem to a shortest path problem. Consequently, I develop a heuristic search algorithm for solving this reduced problem. I provide a strong theoretical foundation for the reduction and the search. A thorough evaluation shows improvements in optimization time of orders of magnitude. My second contribution is a simplified design for query execution by just-in-time compilation to machine code. Architecturally simple solutions such as query compilation with LLVM suffer from unacceptably high compilation times. Modern just-in-time query compilers with significantly reduced compilation times, on the other side, are extravagantly hand-crafted. Rather than reinventing compiler technology in a DBMS, I propose to embed an off-the-shelf just-in-time compiling engine that is designed, built, and tested by compiler experts. I am able to achieve the lowest compilation times and competitive execution performance in all experiments. As my third contribution, I design a relational DBMS as a software system that is composed of individual components, each implementing an isolated logical task. For example, join ordering is one such component of a composable DBMS. I describe the design principles that guide my and my colleagues‘ efforts towards implementing such a DBMS. Most importantly, my thesis is accompanied by the release of our open-source research DBMS mutable, that implements this design of composition.

Björn MATHIS
Strategic Parser Fuzzing: Leveraging Domain Knowledge For Input Generation
(Advisor: Prof. Andreas Zeller)
Monday, 16.12.24 14:30 h , building E9 1, room 0.01

Testing software is one of the most important parts of the development process. Without tests, programs would often crash or contain security vulnerabilities. Since testing is time-consuming and complex, techniques were developed to simplify this process. For instance, fuzzers create mostly random inputs and test how a program reacts to those. Especially software that uses parser for input processing is classified as interesting by us, but it is also hard to automatically test it with general purpose techniques—they cannot generate the syntactically correct inputs to test the program logic. Thus, we present a new approach specifically for analyzing software with recursive descent parsers. A central feature of parsers are the iterative comparisons of parts of the input against terminals of a respective context-free grammar. We show, how those comparisons from a program execution can be combined with other, parser-specific features to infer syntactically valid and diverse inputs. In our evaluation we combine our techniques with the fuzzer AFL. Our generated inputs contain on average 77.7% of all possible lexemes with more than three characters. We obtain on average a 2.9 and in best case up to 17 percentage points higher branch coverage in comparison to running AFL alone.

Max EISELE
Debugger-driven Embedded Fuzzing
(Advisor: Prof. Andreas Zeller)
Thursday, 12.12.24 13:00 h , building E1 1, room 407

Embedded Systems – hidden computers – are all around. Programed by humans or, eventually, artificial intelligences, embedded systems run software to enrich, entertain, and evaluate our lives on all imaginable grounds. Safety-critical systems, such as vehicles or industrial production plants, feature numerous bespoke embedded computers whose software monitors, measures, and manipulates things in their environment. Since programmers and artificial intelligences make mistakes, software contains errors. Software testing is therefore indispensable for finding programming flaws, ideally before deployment.
Automated software testing methods can automatically generate test data for programs to assist test engineers with the task of crafting test cases. However, available techniques are primarily established for testing applications on personal computers and servers. Deploying automated software testing techniques on embedded systems is subject to additional challenges, mainly arising from the low number of shared communalities in terms of interfaces, peripherals, as well as hardware and software architectures. This thesis examines obstacles to fuzzing embedded systems and defects of state-of-the-art approaches. Furthermore, it introduces two new methods for fuzz testing embedded systems, overcoming the distilled defects. Also, both methods leverage only generic and widespread features for analyzing embedded programs during runtime and thus are applicable on a variety of devices in practice.

Rahul Mohideen Kaja MOHIDEEN
Efficient Methods for Inpainting-Based Image Compression
(Advisor: Prof. Joachim Weickert)
Monday, 09.12.24 16:15 h , building E1 7, room 0.02

This thesis is dedicated to image compression methods that optimise and store a small subset of the image pixels, called the mask, and reconstruct the rest of the image through inpainting. However, saving fully optimised pixel positions is expensive and has, till now, received little attention. We propose two new families of codecs specifically addressing this problem that offer better performance or a better trade-off between speed and performance. Inpainting-based image compression methods have long since employed partial differential equations for their inpainting method. However, naive implementations are very slow, requiring significant efforts to be sped up. In this thesis, we explore Shepard inpainting as a simple and efficient alternative. By considering data selection and extensions such as anisotropy, we propose our own Shepard-inpainting-based compression methods that offer a good mix of simplicity, efficiency, and quality. Multi-channel images are encountered more than single-channel images today. However, very few inpainting-based codecs have dedicated colour modes. We propose colour extensions for Shepard inpainting, including a luma-preference mode and a vector quantisation mode that has not yet been proposed. This thesis tries to present efficient and better methods to compress fully optimised masks and to see how far one can go with a simple but efficient operator such as Shepard inpainting.

Anapuma CHINGACHAM
Exploring Paraphrasing for Enhancing Speech Perception in Noisy Environments
(Advisor: Prof. Dietrich Klakow)
Monday, 09.12.24 14:00 h , building C7 4, room 1.17

This thesis addresses the challenge of speech perception in noisy environments, where echoes, reverberations and background noise can distort communication. It proposes using paraphrases, instead of acoustic modifications, to improve speech intelligibility in noise without causing signal distortions. The first study investigates the effectiveness of replacing words with synonyms in noisy conditions, finding that it can enhance word recognition by up to 37% in high-noise environments. The second study expands on this by exploring sentential paraphrases, showing that choosing the right paraphrase can improve intelligibility by 33%. It also develops a paraphrase ranking model that outperforms baseline models in identifying the most intelligible paraphrases. The final study examines how Large Language Models (LLMs) can generate both semantically equivalent and acoustically intelligible paraphrases. It reveals that while LLMs struggle to improve acoustic intelligibility in standard setups, a post-processing approach called „prompt-and-select“ yields better results. Overall, the thesis contributes two new datasets and a novel framework for generating noise-robust speech, offering a promising direction for developing spoken dialogue systems that adapt to noisy environments.

Sanjay Kumar SRIKAKULAM
Simulations and data structures to study drug resistance
(Advisor: Prof. Olga Kalinina)
Tuesday, 03.12.24 13:30 h , building E2 1, room 0.01

Drug resistance critically hinders treatment success across diseases, often causing relapse and failure. It poses a significant therapeutic barrier, primarily driven by genetic mutations that alter the structure and function of key proteins. Addressing this challenge requires a comprehensive approach and integrating advanced computational techniques with molecular insights for effective intervention. In this thesis, we address the challenge of drug resistance by exploring the impact of mutations through modern computational techniques. We first use molecular dynamics simulations to study how specific mutations influence protein dynamics, structure, and function, particularly r elevant in diseases like cancer and viral infections. Studies on the receptor tyrosine kinase KIT and the NS3 protease in the hepatitis C virus illustrate mutation-induced stabilization that reduces drug efficacy. To complement these molecular insights, the second project introduces MetaProFi, a novel, state-of-the-art tool for large-scale genomic analysis. It uniquely indexes nucleotide and amino acid sequences, with the added capability of querying amino acid indexes using nucleotide sequences crucial for accurately identifying functionally relevant genetic variants, as protein sequences are more conserved across evolutionary distances. MetaProFi incorporates advanced optimizations, including data chunking, shared memory systems, and compression algorithms, providing a scalable and efficient solution for handling extensive genomic datasets. This thesis provides tools to advance targeted, personalized therapeutic strategies against drug resistance by integrating molecular dynamics with large-scale genomic analysis.

November

Zeyang SHA
Towards Comprehensive Security Assessment in Machine Learning Pipelines
(Advisor: Dr. Yang Zhang)
Tuesday, 26.11.24 13:00 h , building E9 1, room 0.07

Machine learning (ML) has become essential in various critical applications. As ML models are increasingly deployed, they also face a rising number of attacks targeting different stages of the ML pipeline. This pipeline can broadly be divided into three phases: data used for training, model parameters, and the outputs of the trained model.
This thesis introduces the comprehensive security assessment of machine learning pipelines. We begin by addressing data security, focusing particularly on backdoor attacks through data poisoning. We demonstrate that such attacks can be effectively mitigated with simple fine-tuning methods. Next, we explore model security by investigating model stealing attacks. We introduce novel model stealing techniques targeting contrastive learning models and develop adaptive defenses to counteract these threats. Lastly, we turn our attention to the outputs of the model. Here, we delve into detecting and attributing fake images, proposing innovative detection methods that utilize prompts to enhance performance. These multifaceted approaches allow us to tackle machine learning security comprehensively, spanning all the ML pipeline’s critical stages.

David KALTENPOTH
Don’t Confound Yourself: Causality from Biased Data
(Advisor: Prof. Jilles Vreeken)
Monday, 25.11.24 14:00 h , building E1 4, room 0.24

Machine Learning has achieved remarkable success in predictive tasks across diverse domains, from autonomous cars to LLMs, this predictive prowess masks a fundamental limitation: ML systems excel at capturing statistical associations in observational data but fail to uncover the underlying causal mechanisms that generate these patterns. While machine learning models may accurately predict patient outcomes or identify tumors in medical imaging, they cannot answer crucial counterfactual questions regarding how these systems would respond to novel actions or policy changes.
A fundamental problem with understanding causation is due to the pervasive influence of unmeasured confounding and selection bias in observational data. Unmeasured confounding occurs when hidden variables influence both our observed predictors and outcomes, creating spurious correlations that ML models eagerly learn but that don’t represent genuine causal relationships. Selection bias further compounds this problem by systematically distorting our sample in ways that generalization to a broader class of instances may be impossible.
These challenges cannot be overcome by simply collecting more data or building more sophisticated predictive models. They require the use of a formal framework to reason under which conditions we can expect our models to recover the underlying causal graph. In this thesis we provide one such framework allowing us to derive conditions under which accurate causal networks and effects may be discovered, allowing us to deal with partially observed systems under novel conditions.

Marcel MALTRY
From Component to System: Evaluating Recursive Model Indexes and Index Scan Execution Strategies
(Advisor: Prof. Jens Dittrich)
Wednesday, 20.11.24 14:00 h , building E1 1, room 4.07

Indexes are essential for efficient data retrieval and peak query performance in database systems. Traditional indexes are general-purpose data structures, agnostic to data distributions. In contrast, learned indexes, like the recursive model index (RMI), leverage machine-learned models to exploit data-specific patterns, achieving superior lookup performance and space efficiency.
The first part of this thesis focuses on analyzing the hyperparameters of RMIs, providing key insights into their influence on performance and leading to a practical guideline for their configuration. This guideline achieves competitive lookup performance with minimal tuning effort compared to previous approaches. Its effectiveness is validated through comparison against both learned and traditional state-of-the-art indexes.
The second part of the thesis moves beyond the isolated analysis of RMIs and examines their integration into a database system with a compiling query engine. We focus on the index scan operator and develop three strategies for its execution, which differ in the extent to which they precompute parts of the index scan during code generation. While we identify the optimal workload characteristics for each strategy, our experiments show that RMIs do not improve index scan performance compared to a simple baseline.

Jian WANG
Egocentric Human Motion Capture
(Advisor: Prof. Christian Theobalt)
Friday, 15.11.24 12:00 h , building E1 5, room 630

The human motion capture (mocap) technology has wide applications, especially in entertainment, sports analysis, and human-computer interactions. Among the motion capture techniques, egocentric motion capture provides a unique perspective from the individual’s point of view. Being able to capture human motion in an unconstrained environment, egocentric motion capture is crucial for AR/VR applications.
This thesis focuses on the task of egocentric motion capture with a single, head-mounted, downward-facing fisheye camera. This setup can provide a broad field of view, which enables the capture of both body movements and interactions within the environment. Despite the advantages of egocentric cameras, this setup suffers from several challenges, which are discussed in this thesis. These challenges include global motion estimation, self-occlusion, fisheye lens distortion, and the lack of large-scale training datasets. This thesis addresses these challenges by introducing new datasets and technical contributions: To address the lack of large-scale training datasets, the thesis presents new datasets, including EgoPW, EgoGTA, and EgoWholeBody. These datasets cover a wide range of motions and environments, containing detailed annotations for human motion and scene geometry. By proposing new datasets, this thesis also reduces the gap between synthetic and real-world data. To capture global human motion, the thesis employs the SLAM method to obtain the global camera poses. The camera poses and the initial local human motion estimations are simultaneously optimized with the motion prior. The thesis also presents methods to overcome the issue of self-occlusion. These include leveraging temporal information, applying human motion priors, and incorporating scene geometry information. To mitigate the fisheye distortion issue, this thesis introduces FisheyeViT. It rectifies fisheye distortion with image patches and employs a Vision Transformer (ViT) network for feature extraction.
All of the methods in this thesis provide new solutions to some of the main challenges of egocentric motion capture with different technical and dataset contributions. These contributions enhance the capability to capture human motion under unconstrained scenarios, which offers new possibilities for applications in VR, AR, interactive gaming, and more.

Georges P. SCHMARTZ
Addressing Microbial Threats through Genomics, Metagenomics, and Bioinformatics
(Advisor: Prof. Andreas Keller)
Monday, 11.11.24 13:00 h , building E2 1, room 0.01

This thesis begins by highlighting the detrimental impact of multi-resistant bacterial infections through a clinical case study from war-torn Ukraine, where whole genome sequencing was used to explain infection spread. It then presents a whole series of projects demonstrating how metagenomic sequencing at scale can be leveraged to identify biosynthetic gene clusters with the potential to inspire new antibiotic compounds. Finally, the thesis showcases Mibianto, a web-based solution designed for efficient and user-friendly analysis of metagenomic data with a focus on microbial community composition analysis.

Maximilian Alexander KÖHL
Provably Accurate Verdictors: Foundations and Applications
(Advisor: Prof. Holger Hermanns)
Tuesday, 05.11.24 10:00 h , building E1 7, room 0.01

The thesis introduces verdictors and their provable accuracy. Provably accurate verdictors provide answers to critical operational questions, derived from observations of a system’s behavior. They can detect issues like aircraft sensor faults or incorrect configurations in manufacturing equipment, which in turn enables timely and targeted interventions. Using a model-based approach, we develop the foundational basis and the algorithmic machi-nery for synthesizing and implementing verdictors that are not only provably accurate but also robust in the presence of observational imperfections. The approach unifies and ge-neralizes a thus far fragmented research area, it enables entirely novel applications, and it is supported by an efficient software toolbox, upon which the empirical evaluation of our work rests.

October

Uğur Çoğalan
Machine Learning Solutions for High Dynamic Range Video Processing and Image Quality Evaluation
(Advisors: Dr. habil. Karol Myszkowski and Prof. Hans-Peter Seidel)
Monday, 21.10.24 15:30 h , building E1 4, room 0.19

This thesis explores the use of multi-exposure sensors to improve computational imaging tasks like denoising, motion blur removal, and high-dynamic range (HDR) image reconstruction. Unlike conventional sensors, multi-exposure sensors capture neighboring pixels with different exposures, providing reference points that simplify these tasks while preserving image detail. In video, the motion blur from longer-exposed pixels enhances optical flow computation and enables higher-quality video frame interpolation (VFI) for smoother slow-motion playback, even in HDR scenes. Additionally, the thesis proposes improving traditional and learning-based image quality metrics (PSNR, SSIM, LPIPS, DISTS) by incorporating visual masking models that better reflect human visual perception, resulting in more accurate and perceptually aligned evaluations.

Mohamad HOSEINI
Analyzing Information Propagation in Online Messaging Platforms
(Advisor: Prof. Anja Feldmann)
Thursday, 17.10.24 16:00 h , building E1 4, room 0.24

This thesis explores the spread of misinformation and conspiracy theories across online messaging platforms like Telegram, WhatsApp, and Discord. It addresses the lack of research on these platforms by analyzing public groups discovered on Twitter and examining the exposure of personal information. A detailed study of QAnon-related Telegram groups reveals the multilingual spread of conspiracies. Additionally, the research investigates message forwarding patterns and the lifespan of content within fringe communities, offering insights into misinformation dissemination.

Fabian RITTER
Inferring and Analyzing Microarchitectural Performance Models
(Advisor: Prof. Sebastian Hack)
Wednesday, 09.10.24 10:00 h , building E1 1, room 407

Modern processors are complex systems that employ a wide range of techniques to execute programs as fast and efficiently as possible. However, these hardware intricacies make reasoning about the efficiency of code for the processor difficult. Microarchitectural performance models are therefore indispensable for estimating and improving how efficiently software takes advantage of the hardware. This dissertation presents several advancements in the field of microarchitectural performance modeling.
The first part of this thesis proposes techniques to characterize how a processor exploits instruction-level parallelism. Based on a formal model, we explore ways to infer a processor’s port mapping from throughput measurements, i.e., how it splits instructions into micro-operations and how these are executed on the processor’s functional units. Our techniques enable accurate port mapping inference for processors that prior methods could not reason about.
In the second part, we introduce AnICA, a method to analyze inconsistencies between performance models. AnICA takes inspiration from differential testing and abstract interpretation to systematically characterize differences in the outputs of basic block throughput predictors. It can summarize thousands of inconsistencies in a few dozen descriptions that provide high-level insights into the differing behaviors of such predictors. These results have lead to improvements in the scheduling models of the widely used LLVM compiler infrastructure.
The defense talk will mainly focus on the first part of the thesis, the inference of processor port mappings.

September

Shrestha GHOSH
Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web
(Advisor: Prof. Simon Razniewski, now TU Dresden)
Monday, 30.09.24 10:00 h , building E1 4, room 0.19

Extracting information from the Web remains a critical component in knowledge harvesting systems for building curated knowledge structures, such as Knowledge Bases (KBs), and satisfying evolving user needs, which require operations such as aggregation and reasoning. Estimating the cardinality of a set of entities on the Web to fulfill the information need of questions of the form “how many ..?” is a challenging task. While, intuitively, cardinality can be estimated by explicitly enumerating the constituent entities, this is usually not possible due to the low recall of entities on the Web. In this dissertation we present our contributions towards retrieving and estimating cardinalities of entity sets on the Web.

Alejandro CASSIS
Algorithms for Knapsacks, Paths and Strings
(Advisor: Prof. Karl Bringmann)
Tuesday, 17.09.24 18:00 h , building E1 4, room 0.24

In this thesis we study three problems:
1. Knapsack: The Knapsack problem is a classic combinatorial optimization problem. We give a collection of improved exact and approximation algorithms for Knapsack and some of its variants. Our study is guided by the connection between Knapsack and min-plus convolution, a central problem in fine-grained complexity.
2. Sublinear Edit Distance: The edit distance is a popular and practically motivated measure of similarity between texts. We focus on sublinear-time algorithms, which aim to approximate the edit distance 𝑘 of two texts without reading them entirely. Our main result is a 𝑘𝑜(1)-approximation in time 𝑂(𝑛/𝑘 + 𝑘2+𝑜(1)). This constitutes a quadratic improvement over the previous state of the art, and matches an unconditional lower bound for small k (up to subpolynomial factors in the running time and approximation factor).
3. Negative Weight Single-Source Shortest Paths: Computing shortest paths from a source in a weighted directed graph is a fundamental problem. When all edge weights are non-negative, the classic Dijkstra’s algorithm solves this problem in near-linear time. It has been a long-standing open question to obtain a near-linear time algorithm when the graph contains negative edge weights. This has been solved recently in a breakthrough by Bernstein, Nanongkai and Wulff-Nilsen, who presented an algorithm in time 𝑂(𝑚 log8 𝑛 log𝑊 ). Our contribution is an improvement by nearly 6 log factors.

Hoang-Hai DANG
Scaling Up Relaxed Memory Verification with Separation Logics
(Advisor: Prof. Derek Dreyer)
Tuesday, 10.09.24 15:00 h , building E1 5, room 0.05

Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients:
(1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model;
(2) modular reasoning so that we can compose smaller verification results into larger ones;
(3) reasoning extensibility so that we can derive new reasoning principles for both complex language features and algorithms without rebuilding our logic from scratch; and
(4) machine-checked proofs so that we do not miss potential unsoundness in our verifications. With these ingredients in hand, a logic designer can flexibly accommodate the intricacy of relaxed memory features and the ingenuity of programmers who exploit those features.
In this dissertation, I present how to develop strong, abstract, modular, extensible, and machine-checked separation logics for realistic relaxed memory concurrency in the Iris framework, using multiple layers of abstractions.
I report two main applications of such logics:
(i) the verification of the Rust type system with a relaxed memory model, where relaxed memory effects are encapsulated behind the safe interface of libraries and thus are not visible to clients, and
(ii) the compositional specification and verification of relaxed memory libraries, in which relaxed memory effects are exposed to clients.

Ilkan ESIYOK
Three modest proposals for building trust in social media discourse, in the software that powers it and in the browsers that run the software
(Advisor: Prof. Michael Backes)
Tuesday, 03.09.24 14:00 h , building E9 1, room 0.01

The web has become the primary platform for information exchange, yet the increasing complexity of web applications and interactions has introduced significant trust challenges. This thesis addresses these challenges by propo-sing three modest proposals aimed at enhancing trust in web-based interac-tions. The first proposal, TrollThrottle, strengthens the authenticity of social media discourse by mitigating the impact of trolls using cryptographic techni-ques. The second, Accountable JavaScript, ensures the integrity and accoun-tability of web application software by verifying that users receive unaltered code. The third proposal, introduces a formal framework for model-based browser testing, enabling the detection of vulnerabilities in browser compo-nents and their interactions. Collectively, these proposals provide an end-to-end approach for improving the trust on the web, offering practical solutions for mitigating security risks and ensuring reliable web experiences.

August

Soheil KHODAYARI
Security Testing at Scale: Studying Emerging Client-side Vulnerabilities in the Modern Web
(Advisor: Dr. Giancarlo Pellegrino)
Wednesday, 28.08.24 14:00 h , building E9 1, room 0.05

The recent rapid evolution of client-side technologies have introduced new variants of traditional security issues that now manifest exclusively on client-side JavaScript programs. We have little-to-no knowledge of these new emerging threats, and exploratory security evaluations of JavaScript-based web applications are impeded by the scarcity of reliable and scalable testing techniques. In this thesis, we address these challenges by presenting JAW, an open-source, static-dynamic framework to study client-side vulnerabilities at scale, focusing particularly on client-side request hijacking and DOM Clobbering vulnerabilities where we investigate their patterns, prevalence, and impact in the wild. We instantiate JAW on over half a million pages of top 10K sites, processing over 56B lines of code in total, showing that these new variants are ubiquitous on the Web. We demonstrate the impact of these vulnerabilities by constructing proof-of-concept exploits, making it possible to mount arbitrary code execution, information leakage, open redirections and CSRF also against popular websites that were not reachable through the traditional attack vectors. Finally, we review and evaluate the adoption and efficacy of existing countermeasures against these attacks, including input validation and browser-based solutions like SameSite cookies and Content Security Policy.

Fabian F. SCHWARZ
TEE-based Designs for Network Gateways, Web Authentication, and VM Introspection
(Advisor: Prof. Christian Rossow)
Wednesday, 28.08.24 11:00 h , building E9 1, room 0.01

The complexity of client, server, and network devices has drastically increased — and so has the number of sophisticated attacks against them, including system-level exploits un-dermining their software defenses. As a potential solution, architectural extensions for trusted execution environments (TEEs) have been designed that enable a strong hardware-based isolation of critical software components from a compromised system. However, the benefits of TEEs have still not been considered for many security-critical network and web authentication services.
Therefore, this dissertation explores how such services can benefit from TEEs. In particular, we propose three TEE-based designs for network firewalls and web authentication. First, we show how client-side TEEs can enable gateway firewalls to enforce trusted per-application network policies. Second, we enhance the TEE protection to the gateway by designing a TEE-based router architecture with isolated network paths and policy enforcement. Third, we combine electronic IDs with TEE-protected cloud services to solve the recovery and cost issues of FIDO2 web authentication. Finally, we design secure remote forensics of services protected by TEE virtual machines while preserving their strong hardware protection.

Ahmed ABBAS
Efficient and differentiable combinatorial optimization for visual computing
(Advisor: Prof. Paul Swoboda, now Uni Düsseldorf)
Friday, 02.08.24 15:00 h , building E1 4, room 0.24

Many visual computing tasks involve reasoning over structured domains and discrete objects which can be modeled as combinatorial optimization (CO) problems. Tremendous speed-up in general-purpose CO solvers has allowed to tackle these problems in many cases despite being NP-hard. Approaching large-scale structured prediction problems from a combinatorial optimization standpoint however, has been a challenge due to a variety of reasons. These include near real-time performance requirements and lack of differentiability of CO approaches. The latter causes difficulties in harnessing machine learning to specify problem specifications and in developing learnable CO algorithms. We aim to address these short-comings on multiple avenues.
(i) We focus on a specific CO problem for clustering known as the multicut problem with many applications in a variety of visual computing tasks. We aim towards reducing human effort in multicut model specification by utilizing neural networks and address scalability issues by devising parallel algorithms.
(ii) Next we shift our focus to general-purpose CO solvers and tackle challenges related to their scalability. As a first step, we devise parallel algorithms that can harness GPUs gaining up to an order of magnitude speed-up over sequential CPU counterparts. For the second step, we exploit machine learn-ing in solving relaxations of CO problems. Given a few problem instances from a task of interest, our general-purpose solver can be adapted by learning instead of task-specific solver development. Empirically our learned approach achieves significantly better performance than its non-learned version, better solutions than task-specific solvers, and exhibits better anytime performance compared to a commercial solver (Gurobi). In summary, we study methods to make CO for visual computing more practical by devising differentiable, massively parallel, and data-driven methods.

Anna KUKLEVA
Advancing Image and Video Recognition with Less Supervision
(Advisor: Prof. Bernt Schiele)
Thursday, 01.08.24 14:00 h , building E1 4, room 0.24

Deep learning has become an essential component of modern life, transforming various tasks across multiple domains such as entertainment, education, and autonomous driving. However, the increasing demand for data to train models for emerging tasks poses significant challenges. Deep learning models heavily rely on high-quality labeled datasets, yet obtaining comprehensive supervision is resource-intensive and can introduce biases. Therefore, we explore strategies to mitigate the need for full supervision and reduce data acquisition costs. The first part of the discussion focuses on self-supervised and unsupervised learning methods, which enable learning without explicit labels by leveraging inherent data structures and injecting prior knowledge for robust data representations. The second part of the presentation discusses strategies such as minimizing precise annotations in multimodal learning, allowing for effective utilization of correlated information across different modalities. Moreover, we discuss open-world scenarios, proposing novel setup and method to adapt vision-language models to the new domains. Overall, this research contributes to understanding learning dynamics and biases present in data, advancing training methods that require less supervision.

July

Kerstin LENHOF
Machine learning-based anti-cancer drug treatment optimization
(Advisor: Prof. Hans-Peter Lenhof)
Monday, 22.07.24 14:30 h , building E2 1, room 001

Machine learning (ML) systems are about to expand into every area of our lives, including human healthcare. Thus, ensuring their trustworthiness represents one of today’s most pressuring scientific and societal issues.
In this thesis, we present novel ML-based decision support tools for one of the most complex, prevalent, and mortal diseases of our time: cancer. In particular, we focus on developing trustworthy ML methods for predicting anti-cancer drug responses from personalized multi-omics data. Our methods encompass strategies to minimize the effect of data-related issues such as class or regression imbalance, to achieve the interpretability of the models, and to increase the reliability of the models.
Our first approach, MERIDA, is dedicated to interpretability: it delivers Boolean rules as output and considers a priori pharmacogenomic knowledge to a previously unconsidered extent. With SAURON-RF, we devised a simultaneous classification and regression method that improved the statistical performance for the under-represented yet essential group of drug-sensitive samples, whose performance has mainly been neglected in the scientific literature. Its successor, reliable SAURON-RF, provides a conformal prediction frame-work, which, for the first time, ensures the reliability of classification and regression with certainty guarantees. Moreover, we propose a novel drug sensitivity measure that addresses the shortcomings of the commonly used measures.

Matthias FASSL
Averting Security Theater: Methods to Investigate and Integrate Secure Experience in a User-Centered Security Design Process
(Advisor: Dr. Katharina Krombholz)
Monday, 15.07.24 14:00 h , building E9 1, room 0.01

End users‘ interaction with computer security mechanisms can make them feel protected — resulting in a so-called secure experience. However, these secure experiences can be deceiving. First, the lack of secure experience reduces end users‘ ability to make informed decisions about their security practices. Second, unjustified secure experiences — that make end users feel protected while they are not — are dangerous. Such a security theater may result in users forgoing effective security practices or engaging in riskier behavior.
This thesis tackles both types of deceptive secure experiences by adapting existing methods from Human-Computer Interaction to Security and proposing new design methods. The presentation will give an overview of the thesis work before presenting a case study on secure experience and an adapted design method.

Boris WEIGAND
Understanding, Predicting, Optimizing Business Processes Using Data
(Advisors: Prof. Dietrich Klakow and Prof. Jilles Vreeken)
Thursday, 11.07.24 14:15 h , building C9 3, conference room

Companies face a disruptive digital transformation, which forces them to adapt their business model and innovate faster than ever before. Companies that do not transform rapidly enough risk to fall behind and fail in competition. On the other hand, changing existing business processes with complex behavior is highly risky. Analyzing process event logs promises to facilitate understanding, predicting and optimizing processes, and thus sup-ports a successful transformation. As wrong transformation decisions impose an existential threat, understandable models in each of these steps are non-negotiable.
In this thesis, we propose novel approaches to discover inherently interpretable models from process event data. First, we explore how to summarize the actual behavior of complex processes in terms of control-flow and how event data changes throughout a process. Second, we study accurate yet interpretable event sequence prediction and learning queuing behavior. Third, we alleviate the effort of modelling optimization and AI planning problems by learning constraints from exemplary solutions.

June

Pablo Gil PEREIRA
Predictable Data Transport: A Delay and Energy Perspective
(Advisor: Prof. Thorsten Herfet)
Monday, 24.06.24 14:00 h , building C6 3, room 9.05

Cyber-physical systems extend the digital revolution to almost every aspect of our lives by bridging the gap between the digital and physical worlds. These systems demand unprecedented timeliness and reliability guarantees that the current operating and network systems do not provide. Transport layer protocols are the direct communication interface for the application layer and, hence, are key to providing end-to-end guarantees to the application. This thesis addresses how transport layer protocols should be designed to support cyber-physical systems. A clear candidate is the Predictably Reliable Real-time Transport (PRRT) protocol, which provides the application with a predictably reliable service within the specified time budget. This thesis makes original contributions to PRRT’s error control function, which decides when and how much redundancy must be transmitted to meet the reliability and delay requirements of the application. The main contributions of this thesis are threefold: i) the SHARQ algorithm, which obtains the optimal error control configuration meeting the application constraints, and has been optimized to achieve predictably quick reactions to channel changes, ii) the DeepSHARQ algorithm, which leverages neural networks and a novel output regularization method to bring this predictability to resource-constrained devices, and iii) a systematic analysis of binary codes as an energy-efficient alternative for error coding at the transport layer, questioning the long-held belief that Vandermonde codes are a more suitable alternative due to their better error correction capabilities.

Thomas BOCK
Emerging Organizational Patterns in Evolving Open-Source Software Projects: An Analysis of Developer Activity and Coordination
(Advisor: Prof. Sven Apel)
Tuesday, 04.06.24 14:00 h , building E1 1, room 206

Many popular and widely-used software projects are organized and developed as open-source software (OSS) projects, which are able to attract a high number of contributors who develop source code or participate in public communication channels. When multiple developers contribute to the source code of a project simultaneously, proper coordination among them is necessary to avoid unexpected interactions and to reduce the risk of introducing bugs, which are often caused by a lack of coordination or by problems in the organizational structure of the project. Consequently, to improve developer coordination in OSS projects, it is essential to understand the organizational structure of these projects, which sometimes are seen as self-organizing communities. For that reason, we analyze the organizational structure of OSS projects and how it evolves over time, by means of five different empirical studies on widely-used and well-known OSS projects. With our studies, we address three different aspects: the evolution of developer collaboration and communication, the identification of developer roles, and the relation between organizational events and developer-network characteristics. We devise multiple methods to comprehensively analyze developers‘ programming and communication activities in OSS projects and demonstrate the applicability of the proposed methods. In summary, we obtain insights into the organizational structure of OSS projects that shall serve as a foundation for future improvement of coordination processes.

Daniel WAGNER
Improving Reactive Capabilities of Internet Peering Infrastructure in Stressful Situations
(Advisor: Prof. Anja Feldmann)
Monday, 03.06.24 16:00 h , building E1 4, room 0.24

The Internet has revolutionized communication, entertainment, and access to information since its inception. All of this depends on a resilient Internet infrastructure. Resilience is threatened both by analog and digital stress situations. Analog stress situations include, e.g., a global pandemic, and digital stress situations include, e.g., attacks in the Internet.
In this thesis, we investigate how the Internet can handle both analog and digital stress situations. We use a diverse set of Internet vantage points to measure the stress situations and propose conceptual, reactive, and proactive solutions to improve the overall resilience of the Internet.

May

Min CHEN
Understanding and Assessment of Privacy Risks in Machine Learning Systems
(Advisor: Prof. Michael Backes)
Wednesday, 22.05.24 14:00 h , building E9 1, room 0.07

Data is one of the major factors driving the rapid development of machine learning systems. While achieving state-of-the-art performance in many tasks, they are often trained on sensitive data, which may compromise privacy. In this talk, Min will first briefly introduce her research on machine unlearning techniques, which aim to understand the privacy implications of machine learning systems through the lens of privacy laws. She will then discuss her recent findings regarding privacy risks inherent in different machine learning models and explain how to mitigate these privacy risks. Finally, she will conclude the talk by discussing emerging and future research directions for designing privacy-preserving machine learning systems.

 

Tahleen A. RAHMAN
An analysis of Privacy and Group Fairness issues of Online Social Network and Fitness Tracker users
(Advisor: Prof. Michael Backes)
Tuesday, 14.05.24 14:00 h , building E9 1, room 2.22

The advent of smart handheld and wearable device technologies along with advances in the field of Artificial Intelligence and Machine Learning have progressively changed people’s lifestyle immensely over the last decade. Among them, Online Social Networks (OSNs) and fitness trackers have become integral parts of daily lives of millions of people around the world. In addition to offering a wide range of functionalities, these tools allow people to share huge amounts of data everyday, which becomes a serious concern from the privacy point of view. Moreover, the machine learning algorithms which are at the core of many features offered by OSNs and fitness trackers are quite sensitive to bias that is prevalent in the training data.
This dissertation presents an analysis of privacy and fairness issues that arise out of downstream processing of such user generated data. We first present an analysis of privacy for multiple scenarios namely: inference of user location, social relationships, sensitive user attributes namely gender, age and education as well as user linkability along the temporal dimension. We demonstrate the severity of the privacy attacks by an extensive evaluation on real life datasets and derive key insights. We introduce a
system called Tagvisor, that uses various obfuscation techniques for thwarting one of the attacks, namely location inference from hashtags. Secondly, we analyze the fairness of node2vec, a popular graph embedding method, which we use in our analysis of OSNs.
Our experiments demonstrate the existence of bias in node2vec when used for friendship recommendation. We propose a fairness-aware embedding method, namely Fairwalk, which extends node2vec and demonstrate that Fairwalk reduces bias under multiple fairness metrics while still preserving the utility.

Moritz BÖHLE
Towards Designing Inherently Interpretable Deep Neural Networks for Image Classification
(Advisor: Prof. Bernt Schiele)
Friday, 03.05.24 14:15 h , building E1 4, room 0.24

Over the last decade, Deep Neural Networks (DNNs) have proven successful in a wide range of applications and hold the promise to have a positive impact on our lives. However, especially in high-stakes situations in which a wrong decision can be disastrous, it is imperative that we can understand and obtain an explanation for a model’s ‘decision’. This thesis studies this problem for image classification models from three directions. First, we evaluate methods that explain DNNs in a post-hoc fashion and highlight promises and shortcomings of existing approaches. Second, we study how to design inherently interpretable DNNs. In contrast to explaining the models post hoc, this approach not only takes the training procedure and the DNN architecture into account, but also modifies them to ensure that the decision process becomes inherently more transparent. In particular, two novel DNN architectures are introduced: the CoDA and the B-cos Networks. For every prediction, the computations of those models can be expressed by an equivalent linear transformation. As the corresponding linear matrix is optimised during training to align with task-relevant input patterns, it is shown to localise relevant input features well and thus lends itself to be used as an explanation for humans. Finally, we investigate how to leverage explanations to guide models during training, e.g., to suppress reliance on spuriously correlated features or to increase the fidelity of knowledge distillation approaches.

Benedikt WAGNER
Techniques for Highly Efficient and Secure Interactive Signatures
(Advisor: Dr. Julian Loss)
Thursday, 02.05.24 16:00 h , building E9 1, room 0.05

In this dissertation, we develop new techniques to construct variants of digital signatures, specifically blind signatures, multi-signatures, and threshold signatures. Our constructions are efficient and achieve strong security notions based on well-studied non-interactive assumptions. To mention one example, existing constructions of blind signatures are inefficient, insecure for many concurrent signing interactions, or rely on non-standard assumptions. This is true even in the random oracle model. In our work, we design the first blind signature schemes based on conservative assumptions that are fully secure and concretely efficient. Other results include tightly-secure two-round multi-signatures in pairing-free groups, and threshold signatures with full security against an adaptive adversary. In the talk, I will give an overview of these results, with a special focus on the results on threshold signatures, and conclude with open problems for future work.

April

Alessandro ERBA
Security Aspects of Anomaly Detection for Cyber-Physical Systems
(Advisor: Dr. Nils Ole Tippenhauer)
Tuesday, 29.04.24 14:00 h , building E9 1, room 0.01

Cyber-Physical Systems (CPS) autonomously accomplish tasks in the physical environment. CPS employs computational resources, sensor, actuators, and communication protocols. Examples of such systems are Industrial Control Systems (ICS) and Unmanned Aerial Vehicles (UAV). Their security is of primary importance in our society. Attacks on such systems occurred in the past, harming humans, and causing environmental pollution, and economic losses. To mitigate the risk of attacks on CPS, anomaly detection (AD) techniques have been proposed. In this thesis, we systematically assess the security of CPS and the security aspects of anomaly detection. First, we explore the security of modern ICS protocols and show how deploying secure systems is often impossible. Second, we investigate the security of UAVs, we propose Sensor Deprivation Attacks, a novel attack vector that enables adversarial control of the victim vehicle, the proposed attacks are stealthy from state-of-the-art UAVs AD. Third we investigate the security aspects of anomaly detection for CPS when targeted with concealment attacks for classifier evasion. We evaluate the robustness of process-based AD proposed in the literature against attacks that aim to conceal process anomalies. We propose three frameworks to assess the security of AD by exploring attacker constraints, detection properties, and minimal perturbation boundaries. Our proposed frameworks enable the first systematic security analysis of CPS anomaly detectors.

Johnnatan Messias Peixoto AFONSO
On Fairness Concerns in the Blockchain Ecosystem
(Advisor: Prof. Krishna Gummadi)
Thursday, 25.04.24 13:00 h , building E1 5, room 0.29

Blockchains revolutionized centralized sectors like banking and finance by promoting de-centralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state.
Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks.
In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.

Clayton M. GREENBERG
Evaluating Humanness in Language Models
(Advisor: Prof. Dietrich Klakow)
Wednesday, 24.04.24 16:15 h , building E1 7, room 008

Advances with language models, systems that predict upcoming words in context, have enabled an era in which people sometimes cannot distinguish between human-written and artificially created text. Perplexity, the simplest and most popular way to evaluate the quality of a language model, rewards any pattern captured by the system as long as it robustly constrains the upcoming possibilities. By capturing patterns that humans do not use, optimizing a language model for minimal perplexity could trigger a divergence between the most probable text and the most human-like text.
In this thesis, I argue that this divergence has happened for state-of-the-art language models. Part I characterizes the kinds of knowledge captured by language models. First, I present three novel language model architectures whose neural connections were inspired by human behavior. Then, I discuss novel morphology- and sentiment-based paradigms that capture human knowledge quantitatively. Part II establishes several methods for evaluating language models by comparison against human behavior measures. I consider the suitability and potential confounds for offline ratings and two paradigms of online reading times: eye-tracking and G-Maze. Then, I use a novel dataset of G-Maze response times to show computational and linguistic evidence of the divergence.

Dingfan CHEN
Towards Privacy-preserving Machine Learning: Generative Modeling and Discriminative Analysis
(Advisor: Prof. Mario Fritz)
Tuesday, 23.04.24 11:00 h , building E9 1, room 0.05

The digital era is characterized by the widespread availability of rich data, which has fueled the growth of machine learning applications across diverse fields. Nevertheless, data sharing is often at odds with serious privacy and ethical issues. The sensitive nature of personal information necessitates careful handling and adherence to stringent regulations like GDPR and HIPAA. Addressing such privacy challenges is pivotal for maintaining public trust and ensuring sustainable technological progress.
This talk presents several projects on data privacy in machine learning completed during the speaker’s Ph.D. studies, including exploration of privacy-preserving generative modeling, privacy attack and defense mechanisms, and practical applications for responsible data sharing within real-world sensitive domains.

Florian SATTLER
Understanding Variability in Space and Time – Analyzing Features and Revisions in Concert
(Advisor: Prof. Sven Apel)
Monday, 15.04.24 14:00 h , building E1 1, room 2.06

The static or dynamic analysis of configurable software systems imposes significant challenges regarding complexity and computation time due to the software systems’ large configuration spaces. These are aggravated further by the evolution of software systems: developers frequently produce new revisions, adapting and modifying the system. Thereby, analysis results can quickly become out of date or are difficult to interpret. The key problem is that current analyses, even when already specialized for configurable software systems, cannot contextualize their findings within the development context of the software project in question.
We address this problem by empowering existing program analyses through a unified abstraction of code regions that incorporates information about the configurability of the system as well as the evolutionary context into the analysis. This way, we enable existing program analyses to relate and interpret their results in the context of variability. In this thesis, we demonstrate the applicability of a uniform abstraction of code regions by addressing two novel research problems:
First, we combine evolutionary information, mined from software repositories, with an inter-procedural data-flow analysis to determine how evolutionary changes interact within a software project, revealing new and interesting connections between changes and developers.
Second, we combine different automated localization approaches that detect configuration-specific code with state-of-the-art performance profilers to enable configuration-aware performance profiling.
Our results show that this enables performance profilers to attribute performance regressions directly to configuration options without introducing unnecessary overhead. In summary, this thesis bridges the gap between variability information and precise program analysis.

Soshi SHIMADA
Physically plausible 3D human motion capture and synthesis with interactions
(Advisor: Prof. Christian Theobalt)
Thursday, 04.04.24 10:30 h , building E1 4, room 0.24

Capturing 3D human motion realistically from a minimal setup, such as a single RGB camera, is challenging and important for downstream applications like AR/VR, avatar communications, and character animations. The problem becomes more challenging when the person in the scene interacts with a complex environment or when interactions lead to non-rigid deformations. This thesis addresses these challenges by explicitly introducing 1) physics-based equations and/or 2) modeling of rigid/non-rigid interactions with the environment, thereby enhancing the realism of the reconstructed 3D motions. Moreover, the thesis expands its focus to include the synthesis of 3D hand-object interaction motions, which are conditioned by the physical properties of the objects for improved realism and greater control over the generated motions.

March

Debasmita LOHAR
Expanding the Horizons of Finite-Precision Analysis
(Advisor: Prof. Eva Darulova, now Uppsala Univ.)
Wednesday, 27.03.24 15:00 h , building E1 5, room 0.29

Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. In this work, we expand the individual capabilities of these techniques by capturing the impact of uncertain inputs on discrete decisions and roundoff errors, by scaling floating-point verification for larger programs, and by specializing optimization for feed-forward deep neural networks.

Mang ZHAO
Provable Security and Real-World Protocols: Theory and Practice
(Advisor: Prof. Cas Cremers)
Monday, 18.03.24 13:00 h , building E9 1, room 0.05

In our modern life, network communication has become one of the primary mediums for information transmission, e.g., instant messaging, online shopping, and video conferencing. In order to protect the security of information transmitted over networks, real-world applications are often equipped with cryptographic communication protocols, the provable security analyses of which are however often missing. A natural question arises: whether these protocols really secure?
This talk presents five projects that the speaker have completed during his Ph.D studies, with more focus on two of them: the theoretical analysis of authenticated encryption with associated data and the provable security analysis of real-world video-conferencing Zoom protocol. Moreover, this talk addresses common obstacles to (large-scale) protocol designs and provable security analyses, provides intuition on the feasibility, and presents his future plan.

Gustavo ANDRADE DO VALE
Investigating the Merge Conflict Life-Cycle Taking the Social Dimension into Account
(Advisor: Prof. Sven Apel)
Monday, 11.03.24 16:00 h , building E1 1, room 206

Merge conflicts arise when developers integrate concurrent code changes and whereas merge conflicts are common to introduce, they bring several issues to software projects. For instance, merge conflicts distract developers from their workflow and resolving them is a difficult, time-consuming, and often error-prone task. Despite a substantial number of studies investigating merge conflicts, the social dimension of the problem is often ignored. In this thesis, we seek out to understand the role the social dimension plays in the merge conflict life-cycle. To reach our goals, we conducted a series of empirical studies investigating the merge conflict life-cycle. In one of these studies we found that active GitHub communication is not associated with the emergence or avoidance of merge conflicts even though developers communicate with each other. In another study, we moved to the end of the merge conflict life-cycle investigating the challenges and factors related to the merge conflict resolution. Our results show that measures indirectly related to mer-ge conflicts (i.e., measures related to the merge scenario changes) are more strongly correlated with merge conflict resolution time than measures directly related to merge conflicts (i.e., merge conflict characteristics). In this thesis, we call the attention of researchers, tool builders, and practitioners to the importance of including the social dimension when investigating merge conflicts. Our findings also provide evidence that they should also look at the technical dimension more closely.

Mohamed ALZAYAT
Efficient Request Isolation in Function-as-a-Service
(Advisors: Prof. Peter Druschel & Prof. Deepak Garg)
Friday, 08.03.24 14:00 h , building E1 5, room 0.02

As cloud applications become increasingly event-driven, Function-as-a-Service (FaaS) is emerging as an important abstraction. FaaS allows tenants to state their application logic as stateless functions without managing the underlying infrastructure that runs and scales their applications. FaaS providers ensure the confidentiality of tenants’ data, to a limited extent, by isolating function instances from one another. However, for performance considerations, the same degree of isolation does not apply to sequential requests activating the same function instance. This compromise can lead to confidentiality breaches since bugs in a function implementation or its dependencies may retain state and leak data across activations. Moreover, platform optimizations that assume function statelessness may introduce unexpected behavior if the function retains state, jeopardizing correctness.
This dissertation presents two complementary systems: Groundhog and CtxTainter. Groundhog is a black-box and programming-language-agnostic solution that enforces confidentiality by efficiently rolling back changes to a function’s state after each function activation, effectively enforcing statelessness by breaking all data flows at the request boundary. CtxTainter is a development-phase dynamic data flow analysis tool that detects data flows that violate the statelessness assumption and reports them to the developer for reviewing and fixing.

February

Toghrul KARIMOV
Algorithmic Verification of Linear Dynamical Systems
(Advisor: Prof. Joël Ouaknine)
Thursday, 08.02.24 15:00 h , building E1 5, room 0.29

Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an omega-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Dio-phantine approximation, automata theory, and combinatorics on words, we prove decidability of the Model-Checking Problem for large classes of linear dynamical systems and omega-regular properties. We further exploit deep connections between linear dynamical systems and contemporary number theory to show that improving any of our decidability results would amount to major mathematical breakthroughs. Our results delineate the boundaries of decision problems of linear dynamical systems that, at the present time, can be solved algorithmically.

Pascal GRITTMANN
Rethinking multiple importance sampling for general and efficient Monte Carlo rendering
(Advisor: Prof. Philipp Slusallek)
Tuesday, 06.02.24 9:00 h , building D3 2, room -1.63 (VisCenter)

Computer generated images are essential for many applications from art to engineering. Unfortunately, rendering such images is costly, with render times easily in the hours, days, or even weeks. On top of that, the demands regarding complexity and visual fidelity are ever rising. Consequently, there is an insatiable need for faster rendering. Efficient render times are often achieved through user intervention. For example, modifying the scene and removing difficult lighting effects can keep render times below an acceptable threshold. Also, algorithm parameters can be tuned manually. For instance, diffuse outdoor scenes are best rendered by unidirectional path tracing, while interiors featuring caustics benefit greatly from bidirectional sampling. Such manual tuning, however, is unfortunate as it puts much burden on the user and poses a hurdle for novices. In this thesis, we pave the way for more universal rendering algorithms with less need of user intervention. For that, we revisit multiple importance sampling (MIS), an essential tool to universalize rendering algorithms by combining diverse sampling techniques. We identify hitherto unknown shortcomings of MIS and propose practical solutions and improvements. As a tangible result, we achieve adaptive bidirectional rendering with performance never worse than unidirectional path tracing.

Sanem GHORBANI LYASTANI
Studying User Experience and Acceptance of Web Authentication Solutions
(Advisor: Prof. Michael Backes)
Monday, 05.02.24 14:00 h , building E9 1, room 0.01

To improve the security of their web authentication, users can employ password managers, set up two-factor authentication, or replace passwords with FIDO2 authenticator devices. However, for those solutions to be accepted by the user, their user experience must match the users‘ mental models. This thesis contributes the novel methodologies and results of three studies that measured the user experience and acceptance of three web authentication solutions. Our results show that a) whether password managers are beneficial for security or aggravate existing problems depends on the users’ strategies and how well the manager supports the users’ individual password management right from the time of password creation; b) users consider FIDO2 passwordless authentication as more usable and more acceptable than password-based authentication, but also that impeding concerns remain that are rooted in a gap between the user’s personal perspective onto this new technology and the global view of the FIDO2 designers; c) there is a lack of consistency be-tween the two-factor authentication user journeys of top websites and that the more con-sistent design patterns are problematic for usability, which could increase users‘ cognitive friction and lead to rejection. Based on those results, we make suggestions for further re-search into understanding and improving the users‘ experience of web authentication

January

Marcel KÖSTER
Improving Performance of Simulations and Heuristic Optimization on GPUs
(Advisor: Prof. Antonio Krüger)
Tuesday, 30.01.24 13:00 h , building D3 2, Reuse meeting room

Parallelization is a ubiquitous technique for improving runtime performance of algorithms. Although parallelization is generally challenging and often leads to programming bugs, it is a leading method for processing growing amounts of data today. Due to the ongoing trend of exploring the unexplored, known methods are reaching their limits in terms of scalability and thus applicability. Particularly challenging is the use of graphics processing units (GPUs) that require specially optimized algorithms but feature impressive compute power. Unfortunately, the term „optimized“ usually refers to newly developed algorithms that exploit the peculiarities of the underlying GPUs or at least follow their specific programming method-ologies. The list of tweaked algorithms available for GPUs is already quite long and touch a wide range of domains. These include the well-known fields of massively parallel simulations and solving of optimization problems. Prominent examples in this context include particle simulations of physical processes (like molecular-dynamics simulations) and machine-learning based optimizers. However, existing approaches from these two domains often suffer from severe runtime, memory consumption, and applicability limitations. In this thesis, we present new approaches for both domains. Our methods considerably outperform current state of the art in terms of runtime and memory consumption. We were able to achieve runtime speedups of up to several orders of magnitude while reducing the amount of memory required compared to existing methods. Regarding applicability, our algorithms are designed to fit seamlessly into existing simulation programs and optimizers. This makes them a particularly valuable contribution to real-world applications as well.

Edith TRETSCHK
Representing and Reconstructing General Non-Rigid Objects with Neural Models
(Advisor: Prof. Christian Theobalt)
Monday, 29.01.24 15:00 h , building E1 4, room 0.23

Despite a lot of effort, creating virtual clones of real-world objects remains an unsolved scientific challenge. While human-centered approaches are already advanced, the handling of general deformable objects is far less explored and the topic of this thesis. To digitize an object, it first needs to be reconstructed from sensor observations and then re-presented in a suitable manner for downstream tasks. Over the past decade, neural techniques have led to great advancement in both areas.
This thesis contributes to both areas. In the first part, it focuses on representing deformations and geometry. In particular, it introduces a low-dimensional deformation model. Unlike prior work that hand-crafts these for specific categories, it can be trained for any general non-rigid object category via mesh auto-encoding using graph convolutions. Next, coordinate-based networks model geometry at infinite resolution but they do not generalize due to their global representation. This thesis makes them generalizable, thereby making these new models much easier to apply to general objects where training data is lacking.
In the second part, this thesis advances the reconstruction side. It extends neural radiance fields, which were previously restricted to static scenes, to deformable objects. Finally, this thesis extends the previous method to handle large motions, a non-trivial endeavor due to backwards deformation modeling.

Christian KALTENECKER
Black-Box Performance Modeling of Configurable Software Systems
(Advisor: Prof. Sven Apel)
Monday, 29.01.24 13:00 h , building E1 1, room 2.06

Configurable software systems provide a multitude of configuration options to adjust and optimize the performance of the software. However, it is often unclear which configuration options influence the performance of the system. To achieve clarity, measuring every configuration of a system is intractable for many configurable systems due to the sheer number of configurations. In a first step, we propose a sampling strategy used in combination with statistical machine learning to identify the influence of configuration options on the performance. This way, our approach overcomes multiple disadvantages of existing approaches.
Furthermore, the performance influence of configuration options can change over time, for instance, by introducing performance regressions, and these performance regressions can, in some cases, be detected only in certain workloads.
However, it is often unclear which configuration options are affected by performance changes. In a second step, to address this gap, we propose an approach to pinpoint such performance changes over time and workloads. Among other findings, we found that developers mentioned the configuration options that are affected by performance changes, although performance regressions are only rarely directly reported.

Marius MOSBACH
Analyzing Pre-trained and Fine-tuned Language Models
(Advisor: Prof. Dietrich Klakow)
Thursday, 18.01.24 17:30 h , building C9 3 (Graduate Center)

Since the introduction of transformer-based language models in 2018, the current generation of natural language processing (NLP) models continues to demonstrate impressive capabilities on a variety of academic benchmarks and real-world applications. This progress is based on a simple but general pipeline which consists of pre-training neural language models on large quantities of text, followed by an adaptation step that fine-tunes the pre-trained model to perform a specific NLP task of interest. However, despite the impressive progress on academic benchmarks and the widespread deployment of pre-trained and fine-tuned language models in industry we still lack a fundamental under-standing of how and why pre-trained and fine-tuned language models work, as well as they do. My PhD thesis makes several contributions towards improving our understanding of pre-trained and fine-tuned language models ranging ranging from analyzing the lingu-istic knowledge of pre-trained language models and how it is affected by fine-tuning, to a rigorous analysis of the fine-tuning process itself and how the choice of adaptation technique affects the generalization of models. We thereby provide new insights about previ-ously unexplained phenomena and the capabilities of pre-trained and fine-tuned language models.

Markus BAUER
Compiler-based Defenses against Code Execution Attacks
(Advisor: Prof. Christian Rossow)
Thursday, 11.01.24 13:00 h , building E9 1, room 0.01

Memory corruption attacks have haunted computer systems for decades. Attackers abuse subtle bugs in an application’s memory management, corrupting data and executing arbitrary code and, consequently, taking over systems. In particular, C and C++ applications are at risk, while developers often fail or lack time to identify or rewrite risky parts of their software.
In this thesis, we approach this problem with compilers that protect applications without requiring code changes or developer effort. We cover the most treated aspects in legacy applications: indirect forward jumps in both C and C++ and immutable libraries. First, we protect virtual dispatch in C++ applications from hijacking. We employ a type analysis and a compiler transformation that implements virtual dispatch efficiently without hijackable pointers. Second, we protect indirect calls to function pointers in C applications. We use a new type-based analysis to find indirect call targets and transform indirect calls into a secure and fast version with limited targets. Finally, we propose a method to isolate potentially vulnerable code, particularly unprotected closed-source libraries, into compartments with restricted access to its environment.