Research | TRUE-AI Lab

Trustworthy Machine Learning

We study how machine learning systems can be made robust, reliable, and secure. Our work covers adversarial examples, data poisoning attacks and defenses, backdoor attacks, and hardware fault attacks on neural networks.

Handcrafted Backdoors in Deep Neural Networks — NeurIPS 2022 ★ Oral
A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference — ICLR 2021 ★ Spotlight
Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks — USENIX Sec. 2019

Privacy in Machine Learning

We investigate privacy risks in foundation models and machine learning pipelines, including membership inference attacks, data extraction from LLMs, and privacy-preserving training methods.

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models — NeurIPS 2024
Private Investigator: Extracting Personally Identifiable Information from Large Language Models Using Optimized Prompts — USENIX Sec. 2025
Evaluating Memorization in Parameter-Efficient Fine-tuning — ICML-W 2025 ★ Oral

LLM and Agentic AI Security

As large language models and AI agents become more capable and widely deployed, we study how they can be attacked and how to build defenses. Topics include jailbreaking, indirect prompt injection, and TOCTOU vulnerabilities in agents.

IF-Guide: Influence Function-Guided Detoxification of LLMs — NeurIPS 2025
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
AgentBreaker: A Framework for Evaluating Context-Aware Indirect Prompt Injection Risks in Modern Web Agents — ISSTA 2026
Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents — NeurIPS-W 2025

Efficient and Reliable AI

We explore the intersection of efficiency and reliability in AI systems, studying how model compression (quantization, early-exit, pruning) affects robustness, and how AI systems behave under hardware faults.

Harnessing Input-adaptive Inference for Efficient VLN — ICCV 2025
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions — ICML 2024
Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes — NeurIPS 2021