Zihao Zhao
zihaozhao.bsky.social
Zihao Zhao
@zihaozhao.bsky.social
PhD student @jhuclsp.bsky.social| AI safety & privacy
Previous: Undergrad @jhucompsci.bsky.social
4/5 📈 Utility
On TAB, prefix-tuning+masking gives best utility (Perplexity ≈ 10.2, MAUVE ≈ 0.83), beating ICL and DP-SGD. Similar trends on MIMIC-III.
October 15, 2025 at 8:24 PM
3/5🔒 Privacy
ICL+blocking: ~0.00% privacy leakage (avg in our runs).
Prefix-tuning+masking yields the lowest ROUGE vs training data (e.g., ROUGE-L ≈ 0.098), indicating less copying.
October 15, 2025 at 8:24 PM
2/5 🔧 How it works
• Build control codes from detected private entities (PERSON, ORG, LOC, etc.).
• Generate with either ICL (and block those identifiers at decode time) or prefix-tuning with a privacy mask + KL/contrastive losses.
October 15, 2025 at 8:24 PM
🚀 Text anonymization is hard; DP often hurts utility.
We use entity-aware control codes + either ICL (with bad-token blocking) or prefix-tuning w/ masking to get strong privacy–utility tradeoffs on legal & clinical data, outperforming DP-SGD in practice (EMNLP 2025).
www.arxiv.org/abs/2509.25729
October 15, 2025 at 8:24 PM