https://daviddebot.github.io/
Site: dtai.cs.kuleuven.be/projects/nes...
Video: youtu.be/3uLVxwlcSQc?...
@daviddebot.bsky.social, @gabventurato.bsky.social, @giuseppemarra.bsky.social, @lucderaedt.bsky.social
#ReinforcementLearning #AI #MachineLearning #NeurosymbolicAI
(8/8)
🔷 Code: github.com/ML-KULeuven/...
🔷 Based on MiniHack & Stable Baselines3
🔷 Define new shields in just a few lines of code!
🚀 Let’s make RL safer & smarter, together!
(7/8)
🔷 Code: github.com/ML-KULeuven/...
🔷 Based on MiniHack & Stable Baselines3
🔷 Define new shields in just a few lines of code!
🚀 Let’s make RL safer & smarter, together!
(7/8)
Use our interactive web demo!
🔷 Modify environments (add lava, monsters!)
🔷 Test shielded vs. non-shielded agents
🖥️ Play with it here: dtai.cs.kuleuven.be/projects/nes...
(6/8)
Use our interactive web demo!
🔷 Modify environments (add lava, monsters!)
🔷 Test shielded vs. non-shielded agents
🖥️ Play with it here: dtai.cs.kuleuven.be/projects/nes...
(6/8)
🔷 Faster training ⌛
🔷 Safer exploration 🔒
🔷 Better generalization 🌍
(5/8)
🔷 Faster training ⌛
🔷 Safer exploration 🔒
🔷 Better generalization 🌍
(5/8)
The shield:
✅ Exploits symbolic data from sensors 🌍
✅ Uses logical rules 📜
✅ Prevents unsafe actions 🚫
✅ Still allows flexible learning 🤖
A perfect blend of symbolic reasoning & deep learning!
(4/8)
The shield:
✅ Exploits symbolic data from sensors 🌍
✅ Uses logical rules 📜
✅ Prevents unsafe actions 🚫
✅ Still allows flexible learning 🤖
A perfect blend of symbolic reasoning & deep learning!
(4/8)
There, RL agents face:
✅ Lava cliffs & slippery floors
✅ Chasing monsters
✅ Locked doors needing keys
Findings:
🔷 Standard RL struggles to find an optimal, safe policy.
🔷 Shielded RL agents stay safe & learn faster!
(3/8)
There, RL agents face:
✅ Lava cliffs & slippery floors
✅ Chasing monsters
✅ Locked doors needing keys
Findings:
🔷 Standard RL struggles to find an optimal, safe policy.
🔷 Shielded RL agents stay safe & learn faster!
(3/8)
⚠️ It can take dangerous actions
⚠️ It lacks safety guarantees
⚠️ It struggles with real-world constraints
Yang et al.'s probabilistic logic shields fix this, enforcing safety without breaking learning efficiency! 🚀
(2/8)
⚠️ It can take dangerous actions
⚠️ It lacks safety guarantees
⚠️ It struggles with real-world constraints
Yang et al.'s probabilistic logic shields fix this, enforcing safety without breaking learning efficiency! 🚀
(2/8)
1) Predict concepts from the input
2) Neurally select a rule from a memory of learned logic rules ➨ Accuracy
3) Evaluate the selected rule with the concepts to make a final prediction ➨ Interpretability (4/7)
1) Predict concepts from the input
2) Neurally select a rule from a memory of learned logic rules ➨ Accuracy
3) Evaluate the selected rule with the concepts to make a final prediction ➨ Interpretability (4/7)
⚡ State-of-the-art accuracy that rivals black-box models
🚀 Pure probabilistic semantics with linear-time exact inference
👁️ Transparent decision-making so human users can interpret model behavior
🛡️ Pre-deployment verifiability of model properties (3/7)
⚡ State-of-the-art accuracy that rivals black-box models
🚀 Pure probabilistic semantics with linear-time exact inference
👁️ Transparent decision-making so human users can interpret model behavior
🛡️ Pre-deployment verifiability of model properties (3/7)