Yijia Shao
echoshao8899.bsky.social
Yijia Shao
@echoshao8899.bsky.social
CS PhD student @StanfordNLP
https://cs.stanford.edu/~shaoyj/
Mapping tasks to skills–and comparing currently high-paid skills and required human agency as AI agents enter the workforce—we see: core human strengths move from data processing toward interpersonal and organizational skills.

Read our blog post: futureofwork.saltlab.stanford.edu
June 12, 2025 at 4:39 PM
The study also reveals insights on the future of HUMAN work.

Mapping the Human Agency Scale across jobs shows which roles AI can’t replace. Currently, only Mathematicians & Aerospace Engineers have most AI expert ratings that fall into H5 (Human Involvement Essential).
June 12, 2025 at 4:39 PM
Despite the buzz around "AI software engineers," "AI journalists," etc., our Human Agency Scale uncovers task-level nuances within every occupation.

We suggest that AI agent R&D and products account for them for more responsible, higher-quality adoption.
June 12, 2025 at 4:36 PM
Workers generally prefer higher levels of human agency, hinting at friction as AI capabilities advance.

From transcript analysis, the top collaboration model envisioned by workers is “role-based” AI support (23.1%) - utilizing AI systems that embody specific roles.
June 12, 2025 at 4:35 PM
The impact of AI agents on work isn’t just a binary “automate or not.”

We introduce the Human Agency Scale: a 5-level scale to capture the spectrum between automation and augmentation--where technology complements and enhances human capabilities.
June 12, 2025 at 4:35 PM
Jointly considering worker desire and technological capability allows us to classify tasks into four zones to guide AI agent deployment and development.

Alarmingly, 41.0% of YC companies are mapped to Low Priority and Automation “Red Light” Zone.
June 12, 2025 at 4:35 PM
We rank tasks by worker desire for automation. For 46.1% of tasks receive a positive attitude (>3/5) – with notable variation across sectors.

Transcript analysis reveals top concerns: (1) lack of trust (45%), (2) fear of job replacement (23%), (3) loss of human touch (16.3%)
June 12, 2025 at 4:34 PM
In our new paper: arxiv.org/abs/2506.06576

We collaborate with economists to develop an audio-enhanced auditing framework.

- 1500 domain workers from 104 occupations shared their desires.
- 52 AI agent researchers & developers evaluated today’s technological capabilities.
June 12, 2025 at 4:34 PM
🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody’s asking them what they want.

While AI R&D races to automate everything, we took a different approach: auditing what workers want vs. what AI can deliver across the US workforce.🧵
June 12, 2025 at 4:34 PM
Try it out today at cogym.saltlab.stanford.edu!
Read our preprint to learn more details: arxiv.org/abs/2412.15701
February 12, 2025 at 7:25 PM
🎉 For the first time ever: Collaborate with AI agents in real-time! Collaborative Gym UI is now IRB-approved and alive at cogym.saltlab.stanford.edu!

A group of agents is eager to work with you. By providing feedback, you will see the agent's identity and its feedback to you!
February 12, 2025 at 7:24 PM
[6/8] We conducted a detailed error analysis by having authors annotate 300 trajectories. Collaborative agents expose significant limitations in current LMs and agent scaffoldings, with communication and situational awareness failures occurring in 65% and 40% of real trajectories.
January 17, 2025 at 5:48 PM
[5/8] We built a user simulator and web UI to instantiate Co-Gym in simulated and real settings.

Experiments reveal human-like patterns: collaborative inertia, where poor communication hinders delivery; and collaborative advantage, where human-agent teams outperform autonomous agents.
January 17, 2025 at 5:48 PM
[4/8] Our vision builds on a long-standing dream in AI: to develop machines that act as teammates, not mere tools.

This demands situational intelligence to take initiative, communicate, and adapt. Co-Gym offers an evaluation framework that assesses both collab outcomes and processes.
January 17, 2025 at 5:47 PM
[3/8] How does Co-Gym enable collaborative agents? Our infra (1) focuses on environment design and (2) supports async interaction beyond turn-taking.

We define primitives for public/private components in the shared env, as well as collaboration actions and notification protocol.
January 17, 2025 at 5:47 PM
[2/8] Excitingly, collaborative agents consistently outperform their fully autonomous counterparts in terms of task performance, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users.
January 17, 2025 at 5:47 PM
[1/8] While several HITL systems exist (e.g. OpenAI Canvas, our Collaborative STORM), what makes human-agent collab special? Agents need autonomy to be useful, yet the goal is empowering humans.

We start with three tasks: travel planning, surveying related work, and tabular analysis
January 17, 2025 at 5:46 PM
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤

Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
January 17, 2025 at 5:44 PM
In our paper, we explore the impact of prompting. Unfortunately, simple prompt engineering does little to mitigate privacy leakage of LM agents’ actions. We also examine the safety-helpfulness trade-off and conduct qualitative analysis to uncover more insights.
December 6, 2024 at 6:20 PM
We collected 493 negative privacy norms to seed PrivacyLens. Our results reveal a discrepancy between QA probing results and LMs’ actions in task execution. GPT-4 and Claude-3-Sonnet answer nearly all questions correctly, but they leak information in 26% and 38% of cases!
December 6, 2024 at 6:20 PM
With negative privacy norms, vignettes, trajectories, PrivacyLens conducts a multi-level evaluation by (1) assessing LMs on their ability to identify sensitive data transmission through QA probing, (2) evaluating whether LM agents’ final actions leak the sensitive information.
December 6, 2024 at 6:20 PM
Evaluating LMs’ actions in applications is more contextualized. But how to create test cases? PrivacyLens offers a data construction pipeline that procedurally converts the norms into a vignette and then to an agent trajectory via template-based generation and sandbox simulation.
December 6, 2024 at 6:20 PM
Once we collect these privacy norms, a direct way for evaluation is by using a template to turn the tuple into a multi-choice question. However, how LMs perform when answering probing questions may not be consistent with how they act in agentic applications.
December 6, 2024 at 6:20 PM
Humans protect privacy not by always avoiding sharing sensitive data, but by adhering to these norms during data use and communication with others. A well-established framework for privacy norms is the Contextual Integrity theory which expresses data transmission with a 5-tuple.
December 6, 2024 at 6:20 PM
Why is this important? While many studies have investigated LMs memorizing training data, a lot of private data or sensitive information is actually exposed to LMs at inference time, especially when we are using them for daily assistance.
December 6, 2024 at 6:20 PM