PhD Thesis Defenses

PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.

Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.



Michael  SAMMLER
Automated and Foundational Verification of Low-Level Programs
(Advisors: Prof. Derek Dreyer and Prof. Deepak Garg)
Monday, 04.12.23 17:00 h , building E1 5, room 0.29

Formal verification is a promising technique to ensure the reliability of low-level programs like operating systems and hypervisors, since it can show the absence of whole classes of bugs and prevent critical vulnerabilities.
To realize the full potential of formal verification for real-world low-level programs, however one has to overcome several challenges, including:
(1) dealing with the complexities of realistic models of real-world programming languages;
(2) ensuring the trustworthiness of the verification, ideally by providing foundational proofs (i.e., proofs that can be checked by a general-purpose proof assistant);
and (3) minimizing the manual effort required for verification by providing a high degree of automation.
This dissertation presents multiple projects that advance formal verification along these three axes:
RefinedC provides the first approach for verifying C code that combines foundational proofs with a high degree of automation via a novel refinement and ownership type system.
Islaris shows how to scale verification of assembly code to realistic models of modern instruction set architectures-in particular, Armv8-A and RISC-V.
DimSum develops a decentralized approach for reasoning about programs that consist of components written in multiple different languages (e.g., assembly and C), as is common for low-level programs.
RefinedC and Islaris rest on Lithium, a novel proof engine for separation logic that combines automation with foundational proofs.

Krzysztof WOLSKI
Design and Applications of Perception-Based Mesh, Image, and Display-Related Quality Metrics
(Advisors: Dr.-Ing. habil. Karol Myszkowski and Prof. Hans-Peter Seidel)
Monday, 04.12.23 10:00 h , building E1 4, room 0.19

Computer graphics have become an integral part of our daily lives, enabling immersive experiences in movies, video games, virtual reality, and augmented reality. However, the various stages of the computer graphics pipeline, from content generation to rendering and display, present their own challenges that can reduce visual quality and thus degrade the overall experience.
Perceptual metrics are crucial for evaluating visual quality. However, many existing methods have limitations in reproducing human perception accurately, as they must account for the complexities of the human visual system. This dissertation aims to tackle these issues by proposing innovative advancements across different pipeline stages.
Firstly, it introduces a novel neural-based visibility metric to improve the assessment of near-threshold image distortions. Secondly, it addresses shortcomings of the mesh quality metrics, vital for enhancing the integrity of three-dimensional models. Thirdly, the dissertation focuses on optimizing the visual quality of animated content while considering display characteristics and a limited rendering budget. Finally, the work delves into the challenges specific to stereo vision in a virtual reality setting.
The ultimate objective is to enable the creation of more efficient and automated designs for virtual experiences, benefiting fields like entertainment and education. Through these contributions, this research seeks to elevate the standard of visual quality in computer graphics, enriching the way we interact with virtual worlds.


Denise KAHL
Visual-haptic Perception in the Digitally Augmented World
(Advisor: Prof. Antonio Krüger)
Wednesday, 29.11.23 14:30 h , building D3 2 (DFKI), Reuse Room

In everyday life, we are confronted with a growing amount of digital content that is integrated into our surroundings. Visual elements, such as digital advertising or information boards, change our perception of the environment and make it increasingly difficult to perceive personally meaningful information.
In this work, we investigate how visual augmentations of the environment affect our visu-al and haptic perception of reality and explore how visual attention can be directed as subtly as possible toward personally relevant information in real-world environments.
We present a concept to evaluate visual stimuli for gaze guidance in instrumented environments and explore stimuli suitable for gaze guidance in real-world settings using a prototypical implementation of it. Moreover, we explore the potential of using overlays displayed in Optical See-through Augmented Reality glasses to guide visual attention u-sing subtle visual cue stimuli.
Additionally, we introduce a concept to investigate perceptual changes in physical objects interacted with that may result from overlaying them with digital augmentations. We investigate the extent to which the overlying virtual model can differ from the underlying physical object without significantly affecting the feeling of presence, the usability, and the performance. We provide results in terms of shape and size differences and demonst-rate the influence of environmental lighting conditions.

Characteristics and Commonalities – Differentially Describing Datasets with Insightful Patterns
(Advisor: Prof. Jilles Vreeken)
Thursday, 16.11.23 14:00 h , building E1 4, room 0.24

Empirical science revolves around gaining insights from complex data. With the advent of computational science, increasingly more, larger, and richer datasets are becoming avail-able to expand our scientific knowledge. However, the analysis of these datasets by domain experts is often impaired by a lack of suitable computational tools. In particular, there is a shortage of methods identifying insightful patterns, i.e., sets of strongly associated feature values that are informative, contrasting, probabilistically sound, statistically sound, and discoverable using scalable algorithms. This thesis leverages ideas and concepts from pattern-set mining, maximum-entropy modeling, statistical testing, and matrix factorization to develop methods for discovering insightful patterns.


Sihang PU
Towards Compact Bandwidth and Efficient Privacy-Preserving Computation
(Advisor: Dr. Nico Döttling)
Monday, 30.10.23 16:00 h , building E9 1, room 0.01

In traditional cryptographic applications, cryptographic mechanisms are employed to ensure the security and integrity of communication or storage. In these scenarios, the primary threat is usually an external adversary trying to intercept or tamper with the communication between two parties. On the other hand, in the context of privacy-preserving computation or secure computation, the cryptographic techniques are developed with a different goal in mind: to protect the privacy of the participants involved in a computation from each other. Specifically, privacy-preserving computation allows multiple parties to jointly compute a function without revealing their inputs and it has numerous applications in various fields, including finance, healthcare, and data analysis. It allows for collaboration and data sharing without compromising the privacy of sensitive data, which is becoming increasingly important in today’s digital age. While privacy-preserving computation has gained significant attention in recent times due to its strong security and numerous potential applications, its efficiency remains its Achilles’ heel. Privacy-preserving protocols require significantly higher computational overhead and bandwidth when compared to baseline (i.e., insecure) protocols. Therefore, finding ways to minimize the overhead, whether it be in terms of computation or communication, asymptotically or concretely, while maintaining security in a reasonable manner remains an exciting problem to work on.

Enriching Open-world Knowledge Graphs with Expressive Negative Statements
(Advisor: Prof. Gerhhard Weikum)
Friday, 27.10.23 09:00 h , building E1 4, room 0.24

Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation findings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.

Beyond Flatland: Exploring Graphs in Many Dimensions
(Advisors: Dr. Christoph Lenzen and Dr. Bastian Rieck)
Monday, 23.10.23 09:00 h , Video conference

Societies, technologies, economies, ecosystems, organisms, . . . Our world is composed of complex networks—systems with many elements that interact in nontrivial ways. Graphs are natural models of these systems, and scientists have made tremendous progress in developing tools for their analysis. However, research has long focused on relatively simple graph representations and problem specifications, often discarding valuable real-world information in the process. In recent years, the limitations of this approach have become increasingly apparent, but we are just starting to comprehend how more intricate data representations and problem formulations might benefit our understanding of relational phenomena. Against this background, our thesis sets out to explore graphs in five dimensions:
descriptivity, multiplicity, complexity, expressivity, and responsibility.
Leveraging tools from graph theory, information theory, probability theory, geometry, and topology, we develop methods to (1) descriptively compare individual graphs, (2) characterize similarities and differences between groups of multiple graphs, (3) critically assess the complexity of relational data representations and their associated scientific culture, (4) extract expressive features from and for hypergraphs, and (5) responsibly mitigate the risks induced by graph-structured content recommendations. Thus, our thesis is naturally situated at the intersection of graph mining, graph learning, and network analysis.

Context-Based Prototyping of Human-Machine Interfaces for Autonomous Vehicles
(Advisor: Prof. Antonio Krüger)
Friday, 13.10.23, 15:00 h, building D3 2, DFKI, VisRoom NB – 1.63

Autonomous vehicles (AVs; SAE levels 4 and 5) face substantial challenges regarding acceptance, human factors, and user experience. Human-machine interfaces (HMIs) offer the potential to account for those and facilitate AV adoption. Since AVs‘ capabilities and availability are still limited, suitable prototyping methods are required to create, evaluate, and optimize novel HMI concepts from early development phases. In all human-centered design activities, physical and social contexts are vital. This thesis argues for applying context-based interface prototyping of human-AV interactions to account for their interrelation with contextual factors. We adopt a ‚research in and through design‘ approach and explore the two intertwined areas: design and prototyping. Regarding the latter, we concentrate on straightforward methods. We demonstrate an immersive video-based approach for lab simulation of AVs and a wizard-of-oz-based method for on-road AV simulation and prototyping of HMIs providing real-time information. We apply these methods in empirical studies to assess their suitability and explore HMI concepts created to counter the aforementioned challenges. Thereby we investigate the potential of (AR-based) object detection visualization and concepts for mobile and in-vehicle interaction with (shared) AVs. Based on the findings, we provide design and prototyping recommendations that will aid researchers and practitioners in creating suitable human-AV interactions.

Zheng LI
On the Privacy Risks of Machine Learning Models
(Advisor: Dr. Yang Zhang)
Thursday, 05.10.23, 15:00 h, building E9 1, Room 0.01

In this dissertation, we investigate the significant privacy risks in the era of advancing machine learning (ML) from two perspectives. Firstly, we explore vulnerabilities within ML models, with a specific focus on membership inference attacks (MIA). Through two studies, we unveil the severity of MIA by introducing a novel label-only attack and assessing the susceptibility of multi-exit networks. Secondly, we examine the misuse of ML models that compromise privacy, particularly in the context of deepfake face manipulation. To counter GAN-based face manipulation effectively, an innovative defense system called UnGANable is proposed to disrupt the crucial GAN inversion process. These findings provide valuable insights into privacy risks associated with ML models and emphasize the necessity for ongoing research vigilance in this rapidly evolving ML landscape.


Algorithms for Sparse Convolution and Sublinear Edit Distance
(Advisor: Prof. Karl Bringmann)
Tuesday, 29.08.23, 15:00 h, building E1 4, Room 0.24

In this PhD thesis on fine-grained algorithm design and complexity, we investigate output-sensitive and sublinear-time algorithms for two important problems.
* Sparse Convolution: Computing the convolution of two vectors is a funda-mental algorithmic primitive. In the sparse convolution problem we assume that the input and output vectors have at most $t$ nonzero entries, and the goal is to design algorithms with running times dependent on $t$. For the special case where all entries are nonnegative, which is particularly important for algorithm design, it is known since twenty years that sparse convolutions can be computed in near-linear randomized time $O(t \log^2 n)$. In this thesis we develop a randomized algorithm with running time $O(t \log t)$ which is optimal (under some mild assumptions), and the first near-linear deterministic algorithm for sparse nonnegative convolution. We also present an application of these results, leading to seemingly unrelated fine-grained lower bounds against distance oracles in graphs.
* Sublinear Edit Distance: The edit distance of two strings is a well-studied si-milarity measure with numerous applications in computational biology. While computing the edit distance exactly provably requires quadratic time, a long line of research has lead to a constant-factor approximation algorithm in almost-linear time. Perhaps surprisingly, it is also possible to approximate the edit distance $k$ within a large factor $O(k)$ in sublinear time $\widetilde{O}(\frac nk + k^{O(1)})$. We drastically improve the approximation factor of the known sublinear algorithms from $O(k)$ to $k^{o(1)}$ while pre-serving the $O(\frac nk + k^{O(1)})$ running time.

Xinlei HE
Privacy Risk Assessment of Emerging Machine Learning Paradigms
(Advisor: Dr. Yang Zhang)
Wednesday, 16.08.23, 14:00 h, building E9 1, Room 0.01

Machine learning (ML) has progressed tremendously, and data is the key factor to drive such development. However, there are two main challenges regarding collecting the data and handling it with ML models. First, the acquisition of high-quality labeled data can be difficult and expensive due to the need for extensive human annotation. Second, to model the complex relationship between entities, e.g., social networks or molecule structures, graphs have been leveraged. However, conventional ML models may not effectively handle graph data due to the non-linear and complex nature of the relationships between nodes. To address these challenges, recent developments in semi-supervised learning and self-supervised learning have been introduced to leverage unlabeled data for ML tasks. In addition, a new family of ML models known as graph neural networks has been proposed to tackle the challenges associated with graph data. Despite being powerful, the potential privacy risk stemming from these paradigms should also be taken into account. In this dissertation, we perform the privacy risk assessment of the emerging machine learning paradigms. Firstly, we investigate the membership privacy leakage stemming from semi-supervised learning. Concretely, we propose the first data augmentation-based membership inference attack that is tailored to the training paradigm of semi-supervised learning methods. Secondly, we quantify the privacy leakage of self-supervised learning through the lens of membership inference attacks and attribute inference attacks. Thirdly, we study the privacy implications of training GNNs on graphs. In particular, we propose the first attack to steal a graph from the outputs of a GNN model that is trained on the graph. Finally, we also explore potential defense mechanisms to mitigate these attacks.


Trung Tin NGUYEN
Understanding and Measuring Privacy Violations in Android Apps
(Advisor: Prof. Michael Backes)
Tuesday, 25.07.23, 13:00 h, building E9 1, Room 0.01

Increasing data collection and tracking of consumers by today’s online services is becoming a major problem for individuals‘ rights. It raises a serious question about whether such data collection can be legally justified under legislation around the globe. Unfortunately, the community lacks insight into such violations in the mobile ecosystem.
In this dissertation, we approach these problems by presenting a line of work that provides a comprehensive understanding of privacy violations in Android apps in the wild and automatically measures such violations at scale. First, we build an automated tool that detects unexpected data access based on user perception when interacting with the apps‘ user interface. Subsequently, we perform a large-scale study on Android apps to understand how prevalent violations of GDPR’s explicit consent requirement are in the wild. Finally, un-til now, no study has systematically analyzed the currently implemented consent notices and whether they conform to GDPR in mobile apps. Therefore, we propose a mostly automated and scalable approach to identify the current practices of implemented consent notices. We then develop an automatic tool that detects data sent out to the Internet with different consent conditions.
Our result shows the urgent need for more transparent user interface designs to better inform users of data access and call for new tools to support app developers in this endeavor.

Mojtaba BEMANA
Efficient Image-Based Rendering
(Advisors: Dr. habil. Karol Myszkowski & Prof. Hans-Peter Seidel)
Wednesday, 12.07.23, 11:00 h, building E1 4, Room 0.19

Despite recent advancements in real-time ray tracing and deep learning for producing photo-realistic computer-generated images (CGI), the creation of CGI remains time-consuming and resource-intensive. Image-based rendering (IBR) provides an alternative by using pre-captured real-world images to generate realistic images in real-time, elimi-nating the need for extensive modeling. However, achieving faithful IBR reconstruction often requires dense scene sampling, leading to storage, capture, and processing challenges. Furthermore, IBR still struggles to offer the same level of control over scene attributes as traditional CG pipelines or accurately reproduce complex scenes and objects with materials like transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and human perception. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-driven approach that enables accurate novel view synthesis of the refractive objects. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.

Routegazing: Analysing the Evolving Internet Routing Ecosystem
(Advisor: Prof. Anja Feldmann)
Thursday, 06.07.23, 13:00 h, building E1 5, Room 002

The Internet’s routing ecosystem constantly evolves to meet the needs of its stakeholders and users. Tracking this evolution is essential, e.g., to identify business opportunities, address security challenges, or inform protocol design. However, most Internet protocols were designed without measurability in mind; hence, many measurements and inference methods rely on exploiting protocol-specific side effects.
This dissertation first assesses the limitations of our deployed observation infrastructures and commonly used inference methods via three orthogonal contributions: a case study on a European Internet Exchange Point to assess our visibility into the Internet’s AS topology; a framework to identify and measure biases in the placement of our vantage points across multiple dimensions; and a systematic analysis of the biases and sensitivity of AS relationship inference algorithms. We found that our view of the Internet’s AS topology diminishes over time, and that our AS relationship models are more biased and sensitive to short-term routing dynamics than previously assumed.
With these limitations in mind, we focused on one of the most critical routing ecosystem changes, IPv4 exhaustion, and two ways network operators can deal with it. First, we explored the IPv4 buying and leasing markets, identified market trends, and discussed the viability of these markets for different network types. Second, we analyzed the benefits, usage patterns, and disadvantages of announcing tiny address blocks—which we call „hyper-specific.“ We argue that a combination of leased IPv4 addresses and hyper-specific prefix announcements likely suffice for many networks to bridge the gap until full IPv6 adoption.
Besides its IPv6 adoption, the routing ecosystem also evolved in other dimensions. We first studied AS path prepending to assess the security implication of these changes. We found a typical configuration with no benefits yet an increase of an AS’s vulnerability to prefix hijacks. Infrastructural changes led to an overall decrease in prepending sizes over time and hence a safer use of the technique. However, we demonstrated that we can exploit the same changes to re-orchestrate prefix de-aggregation attacks to overcome widely deployed prevention mechanisms. We validated our assumptions and attack model using a real-world testbed and proposed updates to existing prevention mechanisms. Our two-stage disclosure campaign contributed to a safer routing ecosystem.

Compositional Synthesis of Reactive Systems
(Advisor: Prof. Bernd Finkbeiner)
Tuesday, 04.07.23, 16:30 h, building E1 7, Room 0.01

Synthesis is the task of automatically deriving correct-by-construction implementations from formal specifications. While it is a promising path toward developing verified programs, it is infamous for being hard to solve. Compositionality is recognized as a key technique for reducing the complexity of synthesis. So far, compositional approaches require extensive manual effort. In this thesis, we introduce algorithms that automate these steps.
In the first part, we develop compositional synthesis techniques for distributed systems. Providing assumptions on other processes‘ behavior is fundamental in this setting due to inter-process dependencies. We establish delay-dominance, a new requirement for im-plementations that allows for implicitly assuming that other processes will not maliciously violate the shared goal. Furthermore, we present an algorithm that computes explicit assumptions on process behavior to address more complex dependencies.
In the second part, we transfer the concept of compositionality from distributed to single-process systems. We present a preprocessing technique for synthesis that identifies independently synthesizable system components. We extend this approach to an incremental synthesis algorithm, resulting in more fine-grained decompositions. Our experimental evaluation shows that our techniques automate the required manual efforts, resulting in fully automated compositional synthesis algorithms for both distributed and single-process systems.

Tobias STARK
Real-Time Execution Management in the ROS 2 Framework
(Advisor: Dr. Björn Brandenburg)
Tuesday, 04.07.23, 10:00 h, building E1 7, Room 0.01

Over the past decade, the ROS ecosystem has emerged as the most popular repository of opensource robotics software. As a result, many robots rely on ROS-based software to make timing-critical decisions in real time. However, there is little evidence that real-time theory is used to analytically bound or control the worst-case response time in ROS components.
This dissertation identifies three main hurdles to adopt real-time theory in the context of ROS 2: first, the complex and non-obvious timing effects introduced by the ROS 2 frame-work; second, the expertise required to use real-time scheduling mechanisms correctly; and third, the inherent unpredictability of typical robotics workloads, which defy static provisioning.
To overcome these hurdles, the dissertation introduces a timing model for ROS 2 application, together with a response-time analysis that allows robotics developers to bound the worst-case response time of individual components and multi-component processing chains.
However, modeling and provisioning ROS 2 systems remains a cumbersome and error-prone task. In a second step, the dissertation hence proposes ROS-Llama, an automatic latency manager for ROS 2. ROS-Llama automatically controls the latency of a ROS 2 system through real-time scheduling, while requiring only little effort and no real-time scheduling expertise by the user. It runs in parallel with the deployed application and can therefore measure all required information without user involvement and adapt to changes at runtime. As part of ROS-Llama’s design, the dissertation discusses the conceptual and practical challenges in developing such an automatic tool, identifying relevant properties of ROS 2 and essential requirements of the robotics domain.


Natural Language Processing for African Languages
(Advisor: Prof. Dietrich Klakow)
Tuesday, 27.06.23, 13:00 h, building C7 4, Conference room

Recent advances in pre-training of word embeddings and language models leverage large amounts of unlabelled texts and self-supervised learning to learn distributed representations that have significantly improved the performance of deep learning models on a large variety of natural language processing tasks. Similarly, multilingual variants of these models have been developed from web-crawled multilingual resources like Wikipedia and Common crawl. However, there are some drawbacks to building these multilingual representation models using these web texts. First, the models only include few low-resource languages in the training corpus, and additionally, the texts of these languages are often noisy or of low quality texts. Second, their performance on downstream NLP tasks is difficult to evaluate because of the absence of labelled datasets, therefore, they are typically only evaluated on English and other high-resource languages.
In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in multilingual evaluations, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.
In order to advance the progress of NLP for African languages, future work should focus on expanding benchmark datasets for African languages in other important NLP tasks like part of speech tagging, sentiment analysis, hate speech detection, and question answering. Another direction is to focus on development of Africa-centric PLMs. Lastly, research on speech that involves developing corpora and techniques that require zero or few paired speech-text data would be very essential for the survival of many under-resourced African languages.

Perceptually driven methods for improved gaze-contingent rendering
(Advisor: Prof. Piotr Didyk, now Univ. della Svizzera Italiana)
Wednesday, 21.06.23, 15:00 h, building E1 5, Room 0.29

Computer graphics is responsible for the creation of beautiful and realistic content. However, visually pleasing results often come at an immense computational cost, especially for new display devices such as virtual reality headsets. A promising solution to overcome this problem is to use foveated rendering, which exploits the limitations of the human visual system with the help of eye trackers. In particular, visual acuity is not uniform across the visual field but it is rather focused in its center and it is rapidly declining towards the periphery. Foveated rendering takes advantage of this feature by displaying high-quality content only at the gaze location, gradually decreasing it towards the periphery. While this method is effective, it is subject to some limitations. An example of such limitation is the system latency, which becomes noticeable during rapid eye movements when the central vision is exposed to low-resolution content, reserved only for the peripheral vision. Another example is the prediction of the allowed quality degradation, which is based solely on the visual eccentricity; however, the loss of the peripheral acuity is more complex and it relies on the image content as well.
This thesis addresses these limitations by designing new, perceptually-driven methods for gaze-contingent rendering. The first part introduces a new model for saccade landing position prediction to combat system latency during rapid eye movements. This method extrapolates the gaze information from delayed eye-tracking samples and predicts the saccade’s landing position. The new gaze estimate is then used in the rendering pipeline in order to forestall the system latency. The model is further refined by considering the idiosyncratic characteristics of the saccades. The second part of this thesis introduces a new luminance-contrast-aware foveated rendering technique, which models the allowed peripheral quality degradation as a function of both visual eccentricity and local luminance contrast. The advantage of this model lies in its prediction of the perceived quality loss due to foveated rendering without full-resolution reference. As a consequence, it can be applied to foveated rendering to achieve better computational savings.

Physical Sketching Tools and Techniques for Customized Sensate Surfaces
(Advisor: Prof. Jürgen Steimle)
Wednesday, 07.06.23, 16:00 h, building E1 1, Room 2.06

Sensate surfaces are a promising avenue for enhancing human interaction with digital systems due to their inherent intuitiveness and natural user interface. Recent technological advancements have enabled sensate surfaces to surpass the constraints of conventional touchscreens by integrating them into everyday objects, creating interactive interfaces that can detect various inputs such as touch, pressure, and gestures. This allows for more natu-ral and intuitive control of digital systems. However, prototyping interactive surfaces that are customized to users‘ requirements using conventional techniques remains technically challenging due to limitations in accommodating complex geometric shapes and varying sizes. Furthermore, it is crucial to consider the context in which customized surfaces are utilized, as relocating them to fabrication labs may lead to the loss of their original design context. Additionally, prototyping high-resolution sensate surfaces presents challenges due to the complex signal processing requirements involved. This work investigates the design and fabrication of customized sensate surfaces that meet the diverse requirements of different users and contexts. The research aims to develop novel tools and techniques that overcome the technical limitations of current methods and enable the creation of sensate surfaces that enhance human interaction with digital systems.


Sebastian BIEWER
Software Doping – Theory and Detection
(Advisor: Prof. Holger Hermanns)
Thursday, 25.05.23, 15:15 h, building E1 7, Room 0.01

Software is doped if it contains a hidden functionality that is intentionally included by the manufacturer and is not in the interest of the user or society. This thesis complements this informal definition by a set of formal cleanness definitions that characterise the absence of software doping. These definitions reflect common expectations on clean software behaviour and are applicable to many types of software, from printers to cars to discriminatory AI systems. We use these definitions to propose white-box and black-box analysis techniques to detect software doping. In particular, we present a provably correct, model-based testing algorithm that is intertwined with a probabilistic-falsification-based test input selection technique. We identify and explain how to overcome the challenges that are specific to real-world software doping tests and analyses. We use the Diesel Emissions Scandal to demonstrate the strength of our cleanness definitions and analysis techniques by applying them to emission cleaning systems of diesel cars.

Ikhansul HABIBIE
Leveraging EEG-based Speech Imagery Brain-Computer Interfaces
(Advisor: Prof. Christian Theobalt)
Tuesday, 16.05.23, 10:00 h, building E14, Room 0.24

Realistic virtual human avatar is a crucial element in a wide range of applications, from 3D animated movies to emerging AR/VR technologies. However, producing a believable 3D motion for such avatars is widely known to be a challenging task. A traditional 3D human motion generation pipeline consists of several stages, each requiring expensive equipment and skilled human labor to perform, limiting its usage beyond the entertainment industry despite its massive potential benefits.
This thesis attempts to explore some alternative solutions to reduce the complexity of the traditional 3D animation pipeline. To this end, it presents several novel ways to perform 3D human motion capture, synthesis, and control.
Specifically, it focuses on using learning-based methods to bypass the critical bottlenecks of the classical animation approach. First, a new 3D pose estimation method from in-the-wild monocular images is proposed, eliminating the need for a multi-camera setup in the traditional motion capture system. Second, it explores several data-driven designs to achieve a believable 3D human motion synthesis and control that can potentially reduce the need for manual animation. In particular, the problem of speech-driven 3D gesture synthesis is chosen as the case study due to its uniquely ambiguous nature. The improved motion generation quality is achieved by introducing a novel adversarial objective that rates the difference between real and synthetic data. A novel motion generation strategy is also introduced by combining a classical database search algorithm with a powerful deep learning method, resulting in a greater motion control variation than the purely predictive counterparts.
Furthermore, this thesis also contributes a new way of collecting a large-scale 3D motion dataset through the use of learning-based monocular estimations methods. This result demonstrates the promising capability of learning-based monocular approaches and shows the prospect of combining these learning-based modules into an integrated 3D animation framework.
The presented learning-based solutions open the possibility of democratizing the traditional 3D animation system that can be enabled using low-cost equipment, e.g., a single RGB camera. Finally, this thesis also discusses the potential further integration of these learning-based approaches to enhance 3D animation technology.

Maurice REKRUT
Leveraging EEG-based Speech Imagery Brain-Computer Interfaces
(Advisor: Prof. Antonio Krüger)
Friday, 04.05.23, 15:00 h, building D3 2, Room -2.17 (Reuse)

Speech Imagery Brain-Computer Interfaces (BCIs) provide an intuitive and flexible way of interaction via brain activity recorded during imagined speech. Imagined speech can be decoded in form of syllables or words and captured even with non-invasive measurement methods as for example the Electroencephalography (EEG). Over the last decade, research in this field has made tremendous progress and prototypical implementations of EEG-based Speech Imagery BCIs are numerous. However, most work is still conducted in controlled laboratory environments with offline classification and does not find its way to real online scenarios.
Within this thesis we identify three main reasons for these circumstances, namely, the mentally and physically exhausting training procedures, insufficient classification accuracies and cumbersome EEG setups with usually high-resolution headsets. We furthermore elaborate on possible solutions to overcome the aforementioned problems and present and evaluate new methods in each of the domains. In detail we introduce two new training concepts for imagined speech BCIs, one based on EEG activity during silently reading and the other recorded during overtly speaking certain words. Insufficient classification accuracies are addressed by introducing the concept of a Semantic Speech Imagery BCI, which classifies the semantic category of an imagined word prior to the word itself to increase the performance of the system. Finally, we investigate on different techniques for electrode reduction in Speech Imagery BCIs and aim at finding a suitable subset of electrodes for EEG-based imagined speech detection, therefore facilitating the cumbersome setups. All of our presented results together with general remarks on experiences and best practice for study setups concerning imagined speech are summarized and supposed to act as guidelines for further research in the field, thereby leveraging Speech Imagery BCIs towards real-world application.

Han DU
Modeling Variation of Human Motion
(Advisor: Prof. Philipp Slusallek)
Thursday, 04.05.23, 08:30 h, building D3 2, Room -2.17 (Reuse)

This thesis presents a series of works that analyze and model the variations of human motion data. The goal is to learn statistical generative models to create any number of new human animations with rich variations and styles. The synthesis of realistic human motion with large variations and different styles has a growing interest in simulation applications such as the game industry, psychological experiments, and ergonomic analysis. The statistical generative models are used by motion controllers in our motion synthesis framework to create new animations for different scenarios.

Jiayi WANG
3D Hand Reconstruction from Monocular Camera with Model-Based Priors
(Advisor: Prof. Christian Theobalt)
Wednesday, 03.05.23, 14:00 h, building E1 4, Room 0.24

As virtual and augmented reality (VR/AR) technology gains popularity, facilitating intuitive digital interactions in 3D is of crucial importance. Tools such as VR controllers exist, but such devices support only a limited range of interactions, mapped onto complex sequences of button presses that can be intimidating to learn. In contrast, users already have an instinctive understanding of manual interactions in the real world, which is readily transferable to the virtual world. For enabling these interactions, hand-tracking systems using monocular images are desirable since they do not constrain articulation, unlike gloves or markers, and suitable input devices are pervasive in everyday life.
However existing learning-based methods have many limitations, such as their requirement for vast amounts of 3D annotations, the assumption only one hand appears in the scene, and their inability to characterize the 3D ambiguities in the input. Existing methods also focused primarily on modeling geometry while neglecting hand appearance. To tackle the aforementioned shortcomings of previous methods, this thesis advances the state-of-the-art through the novel use of model-based priors to incorporate hand-specific knowledge. In particular, this thesis presents a training method that reduces the amount of annotations required and is robust to systemic biases; it presents the first tracking method that addresses the challenging two-hand-interaction scenario using monocular RGB video, and also the first probabilistic method to model image ambiguity for two-hand interactions. Additionally, this thesis also contributes the first parametric hand texture model with example applications in hand personalization.


Sebastian ROTH
How to Deploy Security Mechanisms Online (Consistently)
(Advisor: Dr. Ben Stock)
Friday, 28.04.23, 10:00 h, building E9 1, Room 0.01

To mitigate a myriad of Web attacks, modern browsers support client-side security policies shipped through HTTP response headers. To enforce these policies, the operator can set response headers that the server then communicates to the client. We have shown that one of those, namely the Content Security Policy (CSP), requires massive engineering effort to be deployed in a non-trivially bypassable way. Thus, many policies deployed on Web sites are misconfigured. Due to the capability of CSP to also defend against framing-based attacks, it has a functionality-wise overlap with the X-Frame-Options header. We have shown that this overlap leads to inconsistent behavior of browsers, but also inconsistent deployment on real-world Web applications. Not only overloaded defense mechanisms are prone to security inconsistencies. We investigated that due to the structure of the Web itself, misconfigured origin servers or geolocation-based CDN caches can cause unwanted security inconsistencies. To not disregard the high number of misconfigurations of CSP, we also took a closer look at the deployment process of the mechanism. By conducting a semi-structured interview, including a coding task, we were able to shed light on motivations, strategies, and roadblocks of CSP deployment. However, due to the wide usage of CSP, drastic changes are generally considered impractical. Therefore, we also evaluated if one of the newest Web security features, namely Trusted Types, can be improved.

Improving Quality and Controllability in GAN-based Image Synthesis
(Advisor: Prof. Bernt Schiele)
Tuesday, 18.04.23, 09:30 h, building E1 4, DFKI, Room 0.24

The goal of the field of deep learning-based image synthesis is to achieve perfect visual realism, and to let users precisely control the content of the synthetic images. Generative adversarial networks (GANs) have been the most popular image synthesis framework until recently, due to their unrivaled image quality. Yet, there is still much room for improve-ment regarding synthesis quality and precisely controlling the image content. For this reason, this thesis introduces methods that improve both the synthesis quality and control-lability of GANs. Specifically, we address the following subproblems. First, we propose the idea of segmentation-based discriminator networks and segmentation-based regularizations for GANs. The new approach improves the quality of conditional and unconditional image synthesis. Second, we show that this approach is naturally well-suited for semantic image synthesis. Centered around the idea of segmentation-based discriminators, this thesis introduces techniques that strongly improve image quality and multi-modality. Additionally, the methods result in better modeling of long-tailed data and new possibilities for global and local image editing. Finally, the improvements in multi-modality and image editing in semantic image synthesis open the door for controlling the image content via the latent space of the GAN generator. Therefore, this thesis introduces a method for finding interpretable directions in the latent space of semantic image synthesis GANs, which enables an additional form of control over the image content next to the semantic layouts.


Nicklas LINZ
Automatic Detection of Dementia and related Affective Disorders through Processing of Speech and Language
(Advisor: Prof. Antonio Krüger)
Friday, 24.03.23, 15:00 h, building D3 2, DFKI, ViS Room (SB-1.61)

In 2019, dementia is has become a trillion dollar disorder. Alzheimer’s disease (AD) is a type of dementia in which the main observable symptom is a decline in cognitive functions, notably memory, as well as language and problem-solving. Experts agree that early detection is crucial to effectively develop and apply interventions and treatments, underlining the need for effective and pervasive assessment and screening tools. The goal of this thesis is to explores how computational techniques can be used to process speech and language samples produced by patients suffering from dementia or related affective disorders, to the end of automatically detecting them in large populations using machine learning models. A strong focus is laid on the detection of early stage dementia (MCI), as most clinical trials today focus on intervention at this level. To this end, novel automatic and semi-automatic analysis schemes for a speech-based cognitive task, i.e., verbal fluency, are explored and evaluated to be an appropriate screening task. Due to a lack of available patient data in most languages, world-first multilingual approaches to detecting dementia are introduced in this thesis. Results are encouraging and clear benefits on a small French dataset become visible. Lastly, the task of detecting these people with dementia who also suffer from an affective disorder called apathy is explored. Since they are more likely to convert into later stage of dementia faster, it is crucial to identify them. These are the first experiments that consider this task using solely speech and language as inputs. Results are again encouraging, both using only speech or language data elicited using emotional questions. Overall, strong results encourage further research in establishing speech-based biomarkers for early detection and monitoring of these disorders to better patients’ lives.

Designing Tactile Experiences for Immersive Virtual Environments
(Advisor: Prof. Antonio Krüger)
Tuesday, 21.03.23, 14:00 h, building D3 2, Reuse meeting room

Designing for the sense of touch is essential in creating convincing and realistic experiences in Virtual Reality (VR). Currently, a variety of methods exist for simulating touch experiences. However, developing effective and convincing haptic feedback still remains challenging. In this work, we study how real-world touch experiences can inform haptic design processes for VR. Firstly, we investigate the reproduction of haptic features by capturing and fabricating surface microgeometry. We show that haptic reproduction is able to create a wide range of feel aesthetics. Furthermore, we build upon procedural design by generating and fabricating haptically-varying surface structures. We show that digital design processes are able to generate flexible and universal structures that directly influence tactile dimensions, such as roughness and hardness. Lastly, we investigate correspondences between different sensory modalities to enhance the design of tactile experiences. We show that vocal expressions can translate a designer’s intent into effective haptic feedback, while providing a rapid in-situ design process. This thesis advances the fields of VR, haptic design, and fabrication by contributing knowledge to the question of how effective tactile experiences can be designed.

Anna HAKE (née Feldmann)
Predicting and analyzing HIV-1 adaptation to broadly neutralizing antibodies and the host immune system using machine learning
(Advisors: Prof. Nico Pfeifer, now Uni Tübingen)
Monday, 20.03.23, 14:00 h, building E1 4, Rm 0.24

With neither a cure nor a vaccine at hand, infection with the human immunode-ficiency virus type 1 (HIV-1) is still a major global health threat. Viral control is usually gained using lifelong therapy with antiretroviral drugs and rarely by the immune system alone. Without drug exposure, interindividual differences in viral control are partly influenced by host genetic factors like the human leukocyte antigen (HLA) system, and viral genetic factors like the predominant coreceptor usage of the virus. Thanks to its extraordinarily high mutation and replication rate, HIV-1 is however able to rapidly adapt to the selection pressure imposed by the host immune system or antiretroviral drug exposure.
For a successful control of the virus, it is thus vital to have fast and reliable methods in place that assess the viral adaptation to drugs of interest prior to their (further) administration. For a better assessment of our ability to control the virus, it is also important to estimate the viral adaptation to the host immune system.
In this talk, I will present four studies all aiming to further our understanding of HIV-1 adaptation and our ability to reliably predict it. In particular, we present a SVM approach to predict HIV adaptation to broadly neutralizing antibodies (bNAbs), a promising new treatment option. In addition, we use statistical learn-ing to further characterize antibody-mediated therapy with the promising bNAb 3BNC177 by investigating its ability (i) to suppress the virus and (ii) to boost the immune system. Finally, I will introduce a novel way to predict HIV-1 adaptation to the host immune system using Bayesian generalized linear mixed models, which allowed us to investigate the relationship between HIV-1 coreceptor usage and its adaptation to the host HLA system.

Bharat Lal Bhatnagar
Modelling 3D Humans: Pose, Shape, Clothing and Interactions
(Advisors: Prof. Gerard Pons-Moll, now Uni Tübingen)
Thursday, 16.03.23, 18:00 h, building E1 4, Rm 0.24

Digital humans are increasingly becoming a part of our lives with applications like animation, gaming, virtual try-on, Metaverse and much more. In recent years there has been a great push to make our models of digital humans as real as possible. In this thesis we present methodologies to model two key characteristics of real humans, their „appearance“ and „actions“. To this end, we discuss what are the best representations for humans, clothing and their interactions with their surroundings? How can we extract human appearance cues like pose, shape and clothing from scans, point clouds and images? How can we capture and in-turn model human-object interaction? and more

On a Notion of Abduction and Relevance for First-Order Logic Clause Sets
(Advisors: Prof. Christoph Weidenbach and Dr. Sophie Tourret)
Thursday, 09.03.23, 14:00 h, building E1 4, Rm 0.24

I propose techniques to help explain entailment and non-entailment in first-order logic. For entailment, I classify clauses necessary for any possible deduction (syntactically relevant), usable for some deduction (syntactically semi-relevant), or unusable (syntactically irrelevant) along with a semantic characterization via conflict literals (contradictory simple facts). This offers a novel insight beyond the existing notion of minimal unsatisfiable set. The need to test if a clause is syntactically semi-relevant leads to a generalization of the completeness result of a well-known resolution strategy: resolution with the set-of-support (SOS) strategy is refutationally complete on a clause set N and SOS M if and only if there is a resolution refutation from N ∪ M using a clause in M. For non-entailment, abductive reasoning helps find extensions of a knowledge base to obtain an entailment of some missing consequence. I focus on EL TBox abduction that is lightweight but prevalent in practice. The solution space can be huge so, to help sort the chaff from the grain, I introduce connection-minimality, a criterion such that accepted hypotheses always immediately relate the observation to the given axioms. I show that such hypotheses are computable using prime implicate-based abduction in first-order logic. I evaluate this notion on ontologies from the medical domain using an implementation with SPASS as a prime implicate generation engine.

Towards Enabling Cross-layer Information Sharing to Improve Today’s Content Delivery Systems
(Advisor: Prof. Anja Feldmann)
Thursday, 02.03.23, 15:00 h, building E1 4, Rm 0.24

Content is omnipresent and without content the Internet would not be what it is today. End users consume content throughout the day, from checking the latest news on Twitter in the morning, to streaming music in the background (while working), to streaming movies or playing online games in the evening, and to using apps (e.g., sleep trackers) even while we sleep in the night. All of these different kinds of content have very specific and different requirements on a transport—on one end, online gaming often requires a low latency connection but needs little throughput, and, on the other, streaming a video requires high throughput, but it performs quite poorly under packet loss. Yet, all content is transferred opaquely over the same transport, adhering to a strict separation of network layers. Even a modern transport protocol such as Multi-Path TCP, which is capable of utilizing multiple paths, cannot take the (above) requirements or needs of that content into account for its path selection. In this work we challenge the layer separation and show that sharing information across the layers is beneficial for consuming web and video content. To this end, we created an event-based simulator for evaluating how applications can make informed decisions about which interfaces to use delivering different content based on a set of pre-defined policies that encode the (performance) requirements or needs of that content. Our policies achieve speedups of a factor of two in 20% of our cases, have benefits in more than 50%, and create no overhead in any of the cases. For video content we created a full streaming system that allows an even finer grained information sharing between the transport and the application. Our streaming system, called VOXEL, enables applications to select dynamically and on a frame granularity which video data to transfer based on the current network conditions. VOXEL drastically reduces video stalls in the 90th-percentile by up to 97% while not sacrificing the stream’s visual fidelity. We confirmed our performance improvements in a real-user study where 84% of the participants clearly preferred watching videos streamed with VOXEL over the state-of-the-art.


Johannes BUND
Hazard-Free Clock Synchronization
(Advisor: Dr. Christoph Lenzen)
Tuesday, 28.02.23, 13:00 h, building E1 4, Rm 0.24

The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with small clock skew. Hence, chips are split into multiple clock regions, which are each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on implementation of clock synchronization algorithms.
A major obstacle when implementing circuits on clock domain crossings are hazardous signals. Extending the Boolean logic by a third value ‚u‘ we can formally define hazards. In this thesis we describe a theory for design and analysis of hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them.

Said Jawad SAIDI
Characterizing the IoT Ecosystem at Scale
(Advisor: Prof. Anja Feldmann)
Friday, 24.02.23, 15:00 h, building E1 4, Rm 0.24

Internet of Things (IoT) devices are extremely popular with home, business, and industrial users. To provide their services, they typically rely on a backend server infrastructure on the Internet, which collectively form the IoT Ecosystem. This ecosystem is rapidly growing and offers users an increasing number of services. It also has been a source and target of significant security and privacy risks. One notable example is the recent large-scale coordinated global attacks, like Mirai, which disrupted large service providers. Thus, characterizing this ecosystem yields insights that help end-users, network operators, policymakers, and researchers better understand it, obtain a detailed view, and keep track of its evolution. In addition, they can use these insights to inform their decision-making process for mitigating this ecosystem’s security and privacy risks. In this dissertation, we characterize the IoT ecosystem at scale by (i) detecting the IoT devices in the wild, (ii) conducting a case study to measure how deployed IoT devices can affect users’ privacy, and (iii) detecting and measuring the IoT backend infrastructure.
To conduct our studies, we collaborated with a large European Internet Service Provider (ISP) and a major European Internet eXchange Point (IXP). They routinely collect large volumes of passive, sampled data, e.g., NetFlow and IPFIX, for their operational purposes. These data sources help providers obtain insights about their networks, and we used them to characterize the IoT ecosystem at scale.
We start with IoT devices and study how to track and trace their activity in the wild. We developed and evaluated a scalable methodology to accurately detect and monitor IoT devices with limited, sparsely sampled data in the ISP and IXP.
Next, we conduct a case study to measure how a myriad of deployed devices can affect the privacy of ISP subscribers. Unfortunately, we found that the privacy of a substantial fraction of IPv6 end-users is at risk. We noticed that a single device at home that encodes its MAC address into the IPv6 address could be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage.
Finally, we focus on the backend server infrastructure and propose a methodology to identify and locate IoT backend servers operated by cloud services and IoT vendors. We analyzed their IoT traffic patterns as observed in the ISP. Our analysis sheds light on their diverse operational and deployment strategies.
The need for issuing a priori unknown network-wide queries against large volumes of network flow capture data, which we used in our studies, motivated us to develop Flowyager. It is a system built on top of existing traffic capture utilities, and it relies on flow summarization techniques to reduce (i) the storage and transfer cost of flow captures and (ii) query response time. We deployed a prototype of Flowyager at both the IXP and ISP.


Yaoyao LIU
Learning from Imperfect Data: Incremental Learning and Few-shot Learning
(Advisor: Prof. Bernt Schiele)
Friday, 27.01.23, 16:30 h, building E1 4, Rm 0.24

In recent years, artificial intelligence (AI) has achieved great success in many fields. Although impressive advances have been made, AI algorithms still suffer from an important limitation: they rely on static and large-scale datasets. In contrast, human beings naturally possess the ability to learn novel knowledge from imperfect real-world data such as a small number of samples or a non-static continual data stream. Attaining such an ability is particularly appealing and will push the AI models one step further toward human-level Intelligence. In this talk, I will present my work on addressing these challenges in the context of class-incremental learning and few-shot learning. Specifically, I will first discuss how to get better exemplars for class-incremental learning based on optimization. I parameterize exemplars and optimize them in an end-to-end manner to obtain high-quality memory-efficient exemplars. I will present my work on how to apply incremental techniques to a more challenging and realistic scenario, object detection. I will provide algorithm design on a transformer-based incremental object detection framework. I will briefly mention my work on addressing other challenges and discuss future research directions.

Dominik KIRST
Mechanised Metamathematics: An Investigation of First-Order Logic and Set Theory in Constructive Type Theory
(Advisor: Prof. Gert Smolka)
Friday, 27.01.23, 15:15 h, building E1 1, Rm 4.07

In this thesis, we investigate several key results in the canon of metamathematics, applying the contemporary perspective of formalisation in constructive type theory and mechanisation in the Coq proof assistant. Concretely, we consider the central completeness, undecidability, and incompleteness theorems of first-order logic as well as properties of the axiom of choice and the continuum hypothesis in axiomatic set theory. Due to their fundamental role in the foundations of mathematics and their technical intricacies, these results have a long tradition in the codification as standard literature and, in more recent investigations, increasingly serve as a benchmark for computer mechanisation.
With the present thesis, we continue this tradition by uniformly analysing the aforementioned cornerstones of metamathematics in the formal framework of constructive type theory. This programme offers novel insights into the constructive content of completeness, a synthetic approach to undecidability and incompleteness that largely eliminates the notorious tedium obscuring the essence of their proofs, as well as natural representations of set theory in the form of a second-order axiomatisation and of a fully type-theoretic account. The mechanisation concerning first-order logic is organised as a com-prehensive Coq library open to usage and contribution by external users.

Following the trail of cellular signatures: Computational methods for the analysis of molecular high-throughput profiles
(Advisor: Prof. Hans-Peter Lenhof)
Friday, 13.01.23, 11:00 h, building E2 1, Rm 007

Over the last three decades, high-throughput techniques, such as next- generation sequencing, microarrays, or mass spectrometry, have revolutionized biomedical research by enabling scientists to generate detailed molecular profiles of biological samples on a large scale. These profiles are usually complex, high-dimensional, and often prone to technical noise, which makes a manual inspection practically impossible. Hence, powerful computational methods are required that enable the analysis and exploration of these data sets and thereby help researchers to gain novel insights into the underlying biology.
In this thesis, we present a comprehensive collection of algorithms, tools, and databases for the integrative analysis of molecular high- throughput profiles. We developed these tools with two primary goals in mind. The detection of deregulated biological processes in complex diseases, like cancer, and the identification of driving factors within those processes.
Our first contribution in this context are several major extensions of the GeneTrail web service that make it one of the most comprehen- sive toolboxes for the analysis of deregulated biological processes and signaling pathways. GeneTrail offers a collection of powerful enrichment and network analysis algorithms that can be used to examine genomic, epigenomic, transcriptomic, miRNomic, and proteomic data sets. In addition to approaches for the analysis of individual -omics types, our framework also provides functionality for the integrative analysis of multi-omics data sets, the investigation of time-resolved expression profiles, and the exploration of single-cell experiments. Besides the analysis of deregulated biological processes, we also focus on the identification of driving factors within those processes, in particular, miRNAs and transcriptional regulators.
For miRNAs, we created the miRNA pathway dictionary database miRPathDB, which compiles links between miRNAs, target genes, and target pathways. Furthermore, it provides a variety of tools that help to study associations between them. For the analysis of transcriptional regulators, we developed REGGAE, a novel algorithm for the identification of key regulators that have a significant impact on deregulated genes, e.g., genes that show large expression differences in a comparison between disease and control samples. To analyze the influence of transcriptional regulators on deregulated biological processes we also created the RegulatorTrail web service. In addition to REGGAE, this tool suite compiles a range of powerful algorithms that can be used to identify key regulators in transcriptomic, proteomic, and epigenomic data sets.
Moreover, we evaluate the capabilities of our tool suite through several case studies that highlight the versatility and potential of our framework. In particular, we used our tools to conducted a detailed analysis of a Wilms’ tumor data set. Here, we could identify a circuitry of regulatory mechanisms, including new potential biomarkers, that might contribute to the blastemal subtype’s increased malignancy, which could potentially lead to new therapeutic strategies for Wilms’ tumors.
In summary, we present and evaluate a comprehensive framework of powerful algorithms, tools, and databases to analyze molecular high-throughput profiles. The provided methods are of broad inter- est to the scientific community and can help to elucidate complex pathogenic mechanisms.