Turan Orujlu
banner
turanorujlu.bsky.social
Turan Orujlu
@turanorujlu.bsky.social
PhD Student @unituebingen.bsky.social. Interested in intuitive physics, world models, causality, and reinforcement learning.
We are excited to share that this work has been accepted for an oral presentation at the Causal Reinforcement Learning Workshop at RLC 2025.
@thecharleywu.bsky.social

Read the full pre-print here:
arxiv.org/abs/2507.13920
Reframing attention as a reinforcement learning problem for causal discovery
Formal frameworks of causality have operated largely parallel to modern trends in deep reinforcement learning (RL). However, there has been a revival of interest in formally grounding the representati...
arxiv.org
July 22, 2025 at 6:59 PM
We also tested CPM's usefulness as a world model for a model-based RL agent. The agent's task was to move an object to a target location. Our CPM-based agent (red) broadly outperformed baselines (especially in the challenging "Unobserved" setting), achieving higher mean rewards.
July 22, 2025 at 6:59 PM
We tested our model in a simple physics environment with 'Observed' & 'Unobserved' settings. The plots show CPM (red) has higher prediction accuracy (H@1) than GNN & Modular (separate transition MLP per slot) baselines. The performance gap widens over longer prediction horizons.
July 22, 2025 at 6:59 PM
How does the CPM build its causal graph? We treat causal discovery as a multi-agent RL problem. As shown in the Causal MDP, controller agents make sequential decisions to add edges to the graph, determining which objects interact.
July 22, 2025 at 6:59 PM
Our model (see diagram) has an object-centric vision encoder to instantiate object representations and an action encoder, for force representations. The core part of the architecture is CPM. It acts as a dynamic transition function using a causal graph to predict object dynamics.
July 22, 2025 at 6:59 PM