We propose an architecture with a "scratchpad" that:
✅ Persists across turns
✅ Is readable/writable by the agent
✅ Is NEVER shown to the user
This Generative-Retention Loop allows agents to finally play Hangman correctly.
We propose an architecture with a "scratchpad" that:
✅ Persists across turns
✅ Is readable/writable by the agent
✅ Is NEVER shown to the user
This Generative-Retention Loop allows agents to finally play Hangman correctly.
1️⃣ If the secret is in the context, then it’s leaked (violates Secrecy).
2️⃣ If it’s NOT in the context, then the model forgets it (violates Consistency).
1️⃣ If the secret is in the context, then it’s leaked (violates Secrecy).
2️⃣ If it’s NOT in the context, then the model forgets it (violates Consistency).
Check out “LLMs Can’t Play Hangman: On the Necessity of a Private Working Memory for Language Agents”, led by Davide Baldelli, Ali Parviz, AmalZouaq and Sarath Chandar.
Check out “LLMs Can’t Play Hangman: On the Necessity of a Private Working Memory for Language Agents”, led by Davide Baldelli, Ali Parviz, AmalZouaq and Sarath Chandar.
📝 Paper: arxiv.org/abs/2507.09792
💻 Code: github.com/chandar-lab/...
🤗 Models & Dataset: huggingface.co/collections/...
🌐 Website: chandar-lab.github.io/cadmium-webs...
📋 Blog post: t.co/c3b6U3bIWl
📝 Paper: arxiv.org/abs/2507.09792
💻 Code: github.com/chandar-lab/...
🤗 Models & Dataset: huggingface.co/collections/...
🌐 Website: chandar-lab.github.io/cadmium-webs...
📋 Blog post: t.co/c3b6U3bIWl
⬆️ Higher F1 on lines, circles, extrusions
⬆️ Better Chamfer & curvature metrics
⬆️ Stronger structural similarity
We avoid extensive pre-training, customized embeddings, and other domain-specific strategies.
⬆️ Higher F1 on lines, circles, extrusions
⬆️ Better Chamfer & curvature metrics
⬆️ Stronger structural similarity
We avoid extensive pre-training, customized embeddings, and other domain-specific strategies.
📈 Shorter, clearer, and more natural-sounding
📈 Richer vocabulary (2× more unique words)
📈 Preferred by humans for readability & accuracy
📊 Plots illustrating better lexical diversity and conciseness!
📈 Shorter, clearer, and more natural-sounding
📈 Richer vocabulary (2× more unique words)
📈 Preferred by humans for readability & accuracy
📊 Plots illustrating better lexical diversity and conciseness!
✅ GPT-4.1 generates concise, expert-level, human-like geometric descriptions of 176k models.
✅ Qwen2.5-Coder (fine-tuned with LoRA) translates them back into CAD sequences in JSON format.
✅ New metrics to evaluate the structural and topological characteristics.
✅ GPT-4.1 generates concise, expert-level, human-like geometric descriptions of 176k models.
✅ Qwen2.5-Coder (fine-tuned with LoRA) translates them back into CAD sequences in JSON format.
✅ New metrics to evaluate the structural and topological characteristics.
🤖 Meet CADmium - a new dataset and fine-tuning framework for solving the text-to-CAD problem.
🤖 Meet CADmium - a new dataset and fine-tuning framework for solving the text-to-CAD problem.