haokunliu.com
It's completely free and we'll try out ideas for you!
It's completely free and we'll try out ideas for you!
It's completely free and we'll try out ideas for you!
Humans pass the mirror test at ~18 months 👶
But what about LLMs? Can they recognize their own writing—or even admit authorship at all?
In our new paper, we put 10 state-of-the-art models to the test. Read on 👇
1/n 🧵
Humans pass the mirror test at ~18 months 👶
But what about LLMs? Can they recognize their own writing—or even admit authorship at all?
In our new paper, we put 10 state-of-the-art models to the test. Read on 👇
1/n 🧵
The better path is collaborative science. Let AI explore the ideas, draft hypotheses, surface evidence, and propose checks. Let humans decide what matters, set standards, and judge what counts as discovery.
The dream of “autonomous AI scientists” is tempting:
machines that generate hypotheses, run experiments, and write papers. But science isn’t just automation.
cichicago.substack.com/p/the-mirage...
🧵
The better path is collaborative science. Let AI explore the ideas, draft hypotheses, surface evidence, and propose checks. Let humans decide what matters, set standards, and judge what counts as discovery.
This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
This is holding us back. 🧵and new paper with @ari-holtzman.bsky.social .
This is holding us back. 🧵and new paper with @ari-holtzman.bsky.social .
1. Who’s working an overnight shift (in our data + external validation in MIMIC)
2. Who’s working on a disruptive circadian schedule
3. How many patients has the doc seen *on the current shift*
1. Who’s working an overnight shift (in our data + external validation in MIMIC)
2. Who’s working on a disruptive circadian schedule
3. How many patients has the doc seen *on the current shift*
Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️
1/n 🧵
Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️
1/n 🧵
There’s a lot of excitement around using LLMs for automated evaluation, but many methods fall short on alignment or explainability — let’s dive in! 🌊
There’s a lot of excitement around using LLMs for automated evaluation, but many methods fall short on alignment or explainability — let’s dive in! 🌊
Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
@chachachen.bsky.social GPT ❌ x-rays (Friday 9-10:30)
@mheddaya.bsky.social CaseSumm and LLM 🧑⚖️ (Thursday 2-3:30)
@haokunliu.bsky.social @qiaoyu-rosa.bsky.social hypothesis generation 🔬 (Saturday at 4pm)
@chachachen.bsky.social GPT ❌ x-rays (Friday 9-10:30)
@mheddaya.bsky.social CaseSumm and LLM 🧑⚖️ (Thursday 2-3:30)
@haokunliu.bsky.social @qiaoyu-rosa.bsky.social hypothesis generation 🔬 (Saturday at 4pm)
There is much excitement about leveraging LLMs for scientific hypothesis generation, but principled evaluations are missing - let’s dive into HypoBench together.
There is much excitement about leveraging LLMs for scientific hypothesis generation, but principled evaluations are missing - let’s dive into HypoBench together.
We are also actively looking for sponsors. Reach out if you are interested!
Please repost! Help spread the words!
We are also actively looking for sponsors. Reach out if you are interested!
Please repost! Help spread the words!
You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.
We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3
#NLP #NLProc #polisky #polcom #compsocialsci
🐦🐦
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.
We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3
#NLP #NLProc #polisky #polcom #compsocialsci
🐦🐦
Here are the slides for my talk titled "Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI": chenhaot.com/talks/alignm...
Here are the slides for my talk titled "Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI": chenhaot.com/talks/alignm...
👉 chicagohai.github.io/hypogenic-de...
👉 chicagohai.github.io/hypogenic-de...
(Powered by NotebookLM)
(Powered by NotebookLM)
Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊!
We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters.
Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊!
We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters.