Have a look :)
arxiv.org/abs/2505.20209
LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔
🧵⬇️
LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔
🧵⬇️
The good:
- Americans are the most charming, friendly and hospitable people
- it’s super fun how the country is split into states that all have different laws and stuff, with different vibes state to state
The good:
- Americans are the most charming, friendly and hospitable people
- it’s super fun how the country is split into states that all have different laws and stuff, with different vibes state to state
If you have any UK-based collaborations, their productivity is about to increase like 10 fold
If you have any UK-based collaborations, their productivity is about to increase like 10 fold
Have a look :)
arxiv.org/abs/2505.20209
Have a look :)
arxiv.org/abs/2505.20209
It will improve the paper for sure, but probably also making the tone a whole lot more annoying
It will improve the paper for sure, but probably also making the tone a whole lot more annoying
I will be presenting this work on Wednesday at the 11-12:30 poster session on Interpretability & analysis for language models (Hall 3).
aclanthology.org/2025.naacl-l...
I will be presenting this work on Wednesday at the 11-12:30 poster session on Interpretability & analysis for language models (Hall 3).
aclanthology.org/2025.naacl-l...
Gonna go binge watch the 13 seasons now 😍
Gonna go binge watch the 13 seasons now 😍
Is this a good thing for authors or reviewers that the responses can be so long? I feel like it’s a bit sub-optimal for both at the moment
Is this a good thing for authors or reviewers that the responses can be so long? I feel like it’s a bit sub-optimal for both at the moment
And loved visiting London+Edinburgh this week, hope to be back soon! 🙏
And loved visiting London+Edinburgh this week, hope to be back soon! 🙏
And loved visiting London+Edinburgh this week, hope to be back soon! 🙏
When LLMs learn from previous incorrect answers, they typically observe corrective feedback in the form of rationales explaining each mistake. In our new preprint, we find these rationales do not help, in fact they hurt performance!
🧵
When LLMs learn from previous incorrect answers, they typically observe corrective feedback in the form of rationales explaining each mistake. In our new preprint, we find these rationales do not help, in fact they hurt performance!
🧵
www.genai.ac.uk
www.genai.ac.uk
E.g. for an entailment NLI example, each hypothesis atom should also be entailed by the premise.
Very nice idea 👏👏
E.g. for an entailment NLI example, each hypothesis atom should also be entailed by the premise.
Very nice idea 👏👏
Crossed into Aqaba (Jordan) yesterday, so now onto Saudi 🙂
Crossed into Aqaba (Jordan) yesterday, so now onto Saudi 🙂
I’ve got that feeling of nervous excitement I always get before a trip 😬😁
I’ve got that feeling of nervous excitement I always get before a trip 😬😁
No offence to Vienna, but Albuquerque sounds way more fun 😉
No offence to Vienna, but Albuquerque sounds way more fun 😉
"ModernBERT-base is the first encoder to beat DeBERTaV3-base since its release in 2021" 🤯- arxiv.org/pdf/2412.13663
Pretty amazing how successful DeBERTa has been!
"ModernBERT-base is the first encoder to beat DeBERTaV3-base since its release in 2021" 🤯- arxiv.org/pdf/2412.13663
Pretty amazing how successful DeBERTa has been!
I'll try my best and see if I can get 100% of my reviews to be 'great' this round.
If you didn't see it already, ARR publishes how many of your reviews are considered to be 'great': stats.aclrollingreview.org
Join me for the challenge :)
I'll try my best and see if I can get 100% of my reviews to be 'great' this round.
If you didn't see it already, ARR publishes how many of your reviews are considered to be 'great': stats.aclrollingreview.org
Join me for the challenge :)
Here are my top ten train journeys so far.
Here are my top ten train journeys so far.
Here's a little thread about why you should consider applying :)
Here's a little thread about why you should consider applying :)