Peng Qi
qi2peng2.bsky.social
Peng Qi
@qi2peng2.bsky.social
Multimodal Agents Research @ Orby AI. Ex-AWS AI, JD AI. PhD from @stanfordnlp.bsky.social, UG Tsinghua U. He/him. Opinions my own.
How do we prove that #AI can't do #maths?

Real Mathematics (yes, "real" is a pun here):

a+b+c = (a+b)+c = a+(b+c)

AI Mathematics (well, floating point maths, really):

>>> 0.1+0.2+0.3
0.6000000000000001
>>> 0.1+(0.2+0.3)
0.6

QED.
July 25, 2025 at 11:59 PM
As 🔎 AI deep research agents 🔎 become an essential part of many people's day-to-day work, it is more essential than ever before that we can trust what they produce.

When these agents cite sources they claim the report is based on, how much can we actually trust them? In our new #ACL2025 paper, ...
July 16, 2025 at 12:40 AM
Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond.

In my new blog post, I revisit the brief history of 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔, why it defined ...
Why You Should Stop Using HotpotQA for AI Agents Evaluation in 2025 | Peng Qi
We published HotpotQA, a groundbreaking multi-step question answering dataset in 2018, which has since motivated and facilitated numerous AI agent research works. But you should probably reconsider…
qipeng.me
July 2, 2025 at 6:39 PM
When making great hashtag#hiring decisions, we often look for growth potential in a candidate. Will they rise to the occasion when unforeseen challenges arise? Will they grow in the role, and lift up others in the team? Will they still be able to contribute if business direction changes?
1/
May 22, 2025 at 3:01 PM
#AI 𝘄𝗿𝗼𝘁𝗲 𝟵𝟵% 𝗼𝗳 𝗺𝘆 𝗰𝗼𝗱𝗲, 𝗻𝗼𝘄 𝘄𝗵𝗮𝘁?

Big tech executives and business analysts are racing to share eye-catching statements like "AI will write XX% of the code at MetaCorp by 20YY." How much truth is there to these, and what implications might this have?

🧵
May 6, 2025 at 1:27 AM
Non-native speakers sometimes have a unique advantage to language-based humor stemming from their unfamiliarity with idiomatic expressions. I saw an “assembly of god” on the road and thought to myself, “wait, they have a factory to build gods here?”
March 28, 2025 at 9:50 PM
Reposted by Peng Qi
Is #AI the new #RocketScience? In my new blog post, I explore the similarities and connections between the two seemingly distant relatives, and reflect on what today's AI scientists can learn from their rocket cousins, plus what makes AI science unique:
AI is the New Rocket Science | Peng Qi
AI science of today has astonishing similarities to rocket science in its prime days, if one pays close attention to history. What are some of these, and what can the history of rocket science tell…
qipeng.me
March 13, 2025 at 6:39 PM
Is #AI the new #RocketScience? In my new blog post, I explore the similarities and connections between the two seemingly distant relatives, and reflect on what today's AI scientists can learn from their rocket cousins, plus what makes AI science unique:
AI is the New Rocket Science | Peng Qi
AI science of today has astonishing similarities to rocket science in its prime days, if one pays close attention to history. What are some of these, and what can the history of rocket science tell…
qipeng.me
March 13, 2025 at 6:39 PM
#Reflection The need for tooling appears everywhere. While oftentimes unmet, it can significantly scale and improve the productivity of many people when fulfilled, especially in the digital world. Kudos to whoever wrote the AC checklist notebook for ACL 2025 Senior Area Chairs, which reduced ...
March 12, 2025 at 6:39 PM
When people ask me about good intro audiobooks, I've always recommended The Martian, which is one of my personal sci-fi favorites accompanied with a voice performance that truly brought the story to life. Now I'll be adding James (by Percival Everett, performed by Dominic Hoffman) to my personal rec
February 28, 2025 at 7:39 PM
Top 6 jobs LinkedIn recommended for me:
* VP of AI at a unicorn public company
* Applied AI Engineer II at a F50 company
* Applied Scientist II at a F10 co.
* Research Intern at a F50 co.
* Senior Principal Applied Scientist at a F500 co.
* Senior Director of Applied Science at a F500 co.
Who am I?!
February 21, 2025 at 10:59 PM
Thanks to Google, my name is now Robin (@robinjia.bsky.social) and I work at USC as a CS professor. (Or that Robin took my name and worked at Amazon 🤣 )
February 7, 2025 at 9:09 PM
We are looking for strong candidates for Research Scientist and Research Engineering roles at Orby AI. Join us to define the next generation of foundation models and AI agents for GUI automation! As an RS, you'll work towards defining groundbreaking research problems and leading the team in ...
Research Scientist
Research Scientist (Senior)
buff.ly
January 23, 2025 at 7:39 PM
Reposted by Peng Qi
Happy New Year everyone! Jim and I just put up our January 2025 release of Speech and Language Processing! Check it out here: web.stanford.edu/~jurafsky/sl...
Speech and Language Processing
Speech and Language Processing
web.stanford.edu
January 12, 2025 at 8:44 PM
Chinese #languagelearners of Korean should srsly consider detouring to Japanese first: some basic pronunciation and Kanji vocabulary can go a long way in helping you associate Korean words with its underlying Hanja. I'd never have remembered 일 월 화 수 목 금 토 without learning about 日 月 火 水 木 金 土 first.
January 10, 2025 at 4:37 PM
#NLProc is hard. I read this trending topic on Twitter, and it took me a full minute before I think I got the correct parse. Is this what #LLMs feel when they read technical stuff they outside of their training data?
January 9, 2025 at 12:59 AM
@kyunghyuncho.bsky.social's blog post on anxiety and frustration at #NeurIPS (https://buff.ly/4fN4Hah) has provided deep insight into the recent history of the AI boom and the angst many PhD students are facing. I'd like to echo 2 points I find myself agreeing a lot with Kyunghyun and expand a bit.
i sensed anxiety and frustration at NeurIPS’24 – Kyunghyun Cho
Kyunghyun Cho
kyunghyuncho.me
January 8, 2025 at 7:39 PM
A few months ago, I started a series of blog posts to explain what it is like to work as a researcher in the industry in the hope of documenting some useful information for those facing this potential transition (especially junior folks). Today, I've published the last part

🧵
What do industry researchers do, anyway? Part 2 -- What do they do when they are not publishing
Many people seem to
qipeng.me
January 3, 2025 at 7:39 PM
Reflection: Building intelligent #AIAgents has a lot in common with developing an exciting professional career (or maybe even personal life), where the tasks you are exposed to today matter a lot to your growth and your ability to take on more challenging tasks in the future.

1/n
January 3, 2025 at 12:59 AM
As 2024 comes to a wrap, I thought I'd share one of my favorite non-fiction books this year for whomever looking for a good read for the Holiday Season.

🧵
December 23, 2024 at 4:18 PM
Mathematicians since the dawn of numbers: "a + b = b + a" regardless of what "a" and "b" are, simple as that. We build entire fields (pun intended) from simple facts like this.

Computer scientists in the era of #AI boom: I'm not so sure about that...
December 11, 2024 at 5:17 PM
Reposted by Peng Qi
We are looking for highly motivated junior researchers to work with @OrbyAI on GUI #Agent related research problems in Summer 2025! We are excited to tackle cutting-edge problems concerning agents, multimodal models, interactive assistants, and beyond, eyeing top-tier research publications
1/n
November 26, 2024 at 12:59 AM
We are looking for highly motivated junior researchers to work with @OrbyAI on GUI #Agent related research problems in Summer 2025! We are excited to tackle cutting-edge problems concerning agents, multimodal models, interactive assistants, and beyond, eyeing top-tier research publications
1/n
November 26, 2024 at 12:59 AM
As the Greater #Seattle Area enters its fourth day of massive electrical/utilities #outages from the #BombCyclone that's still affecting 100ks of people, taking a moment to reflect on the engineering marvels that are the modern infrastructure we have come to rely on day-to-day without thinking.
1/n
November 22, 2024 at 7:39 PM
𝘿𝙤 𝙒𝙚𝙗 𝘼𝙜𝙚𝙣𝙩𝙨 𝘿𝙧𝙚𝙖𝙢 𝙤𝙛 𝘾𝙖𝙨𝙘𝙖𝙙𝙞𝙣𝙜 𝙎𝙩𝙮𝙡𝙚 𝙎𝙝𝙚𝙚𝙥?

A core challenge in developing a reliable #AIAgent is the ability to simulate potential outcomes of agent actions, especially at inference time, to guide robust planning and search.
1/n
November 21, 2024 at 7:39 PM