Lightnews — Scholar-powered news

Arduin Findeis

@arduin.io

Working on evaluation of AI models (via human and AI feedback) | PhD candidate @cst.cam.ac.uk

Web: https://arduin.io
Github: https://github.com/rdnfn
Latest project: https://app.feedbackforensics.com

Posts Replies Media Videos

Pinned

Arduin Findeis @arduin.io · Mar 17

🕵🏻💬 Introducing Feedback Forensics: a new tool to investigate pairwise preference data.

Feedback data is notoriously difficult to interpret and has many known issues – our app aims to help!

Try it at app.feedbackforensics.com

Three example use-cases 👇🧵

Reposted by Arduin Findeis

Tiancheng Hu

@tiancheng.bsky.social

Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)

October 28, 2025 at 4:54 PM

Arduin Findeis

@arduin.io

👋 I'll be at #ACL2025 presenting research from my Apple internship! Our poster is titled: "Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?"

☞ Let's meet: come by our poster on Tuesday (29/7), 10:30 - 12:00, Hall 4/5, or DM me to set up a meeting!

✍︎ Paper link below ↓

Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?

Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two…

machinelearning.apple.com

July 27, 2025 at 3:22 PM

Arduin Findeis

@arduin.io

Excited to be in Singapore for ICLR! Keen to chat about interpreting feedback data and detecting model characteristics ⚖️

Reach out or come by our poster on Inverse Constitutional AI on Friday 25 April from 10am-12.30pm (#520 in Hall 2B) - @timokauf.bsky.social and I will be there!

April 24, 2025 at 3:47 PM

Arduin Findeis

@arduin.io

How exactly was the initial Chatbot Arena version of Llama 4 Maverick different from the public HuggingFace version?🕵️

I used our Feedback Forensics app to quantitatively analyse how exactly these two models differ. An overview…👇🧵

April 17, 2025 at 1:55 PM

Arduin Findeis

@arduin.io

March 17, 2025 at 6:12 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news