Research
We study the security, privacy, and reliability of modern AI systems, spanning theoretical analysis and practical attacks and defenses.
Trustworthy Machine Learning
We study how machine learning systems can be made robust, reliable, and secure. Our work covers adversarial examples, data poisoning attacks and defenses, backdoor attacks, and hardware fault attacks on neural networks.
- Handcrafted Backdoors in Deep Neural Networks — NeurIPS 2022 ★ Oral
- A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference — ICLR 2021 ★ Spotlight
- Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks — USENIX Sec. 2019
Privacy in Machine Learning
We investigate privacy risks in foundation models and machine learning pipelines, including membership inference attacks, data extraction from LLMs, and privacy-preserving training methods.
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models — NeurIPS 2024
- Private Investigator: Extracting Personally Identifiable Information from Large Language Models Using Optimized Prompts — USENIX Sec. 2025
- Evaluating Memorization in Parameter-Efficient Fine-tuning — ICML-W 2025 ★ Oral
LLM and Agentic AI Security
As large language models and AI agents become more capable and widely deployed, we study how they can be attacked and how to build defenses. Topics include jailbreaking, indirect prompt injection, and TOCTOU vulnerabilities in agents.
- IF-Guide: Influence Function-Guided Detoxification of LLMs — NeurIPS 2025
- PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
- AgentBreaker: A Framework for Evaluating Context-Aware Indirect Prompt Injection Risks in Modern Web Agents — ISSTA 2026
- Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents — NeurIPS-W 2025
Efficient and Reliable AI
We explore the intersection of efficiency and reliability in AI systems, studying how model compression (quantization, early-exit, pruning) affects robustness, and how AI systems behave under hardware faults.