hadasorgad.bsky.social
@hadasorgad.bsky.social
• Model Innovation – Designs and training inspired by interpretability.
• Impact Measurement – Benchmarks for real-world effectiveness.
• Critical Perspectives – Feasibility, limits, and future directions.

Website >>> actionable-interpretability.github.io
General Information
ICML 2025 - Vancouver
actionable-interpretability.github.io
March 31, 2025 at 5:06 PM
• Real-world Applications – Tackling bias, hallucinations, adversarial threats, and use in critical domains like healthcare, finance and cybersecurity.
• Method Comparison – Interpretability vs. alternative methods such as fine-tuning, prompting, etc.
March 31, 2025 at 5:05 PM
We aim to foster discussions on how interpretability research can inform concrete improvements in model design, safety, and robustness.

Topics of interest: ⬇️
March 31, 2025 at 5:05 PM