Junhong Shen
banner
junhongshen1.bsky.social
Junhong Shen
@junhongshen1.bsky.social
PhD Student in Machine Learning @CMU | BS @UCLA | Interning @Meta | Interned @MSFTResearch @DeterminedAI
7/ Beyond performance, ScribeAgent models also provide efficiency gains relative to most proprietary baselines, which are typically larger in size and slower at inference time. This makes ScribeAgent an attractive option in terms of accuracy, latency, and cost.
December 3, 2024 at 5:21 PM
6/ Our results? ScribeAgent outperforms GPT-4o on our internal dataset and achieves state-of-the-art direct generation performance on the public benchmark Mind2Web. Our multi-agent system integrating GPT-4o also improves the best task success rate for text-only agents by 14.1% on WebArena.
December 3, 2024 at 5:21 PM
4/ Data is the key! We leverage Scribe scribehow.com/, an AI documentation software that streamlines the creation of step-by-step guides for web tasks, to collect large-scale action data executed by real users on over 250 web domains. See scribehow.com/shared for example workflows.
December 3, 2024 at 5:21 PM
3/ Most existing web agents rely heavily on prompting general-purpose proprietary models like GPT-4. However, LLMs like GPT-4 are not specifically trained to parse languages like HTML, limiting the agent's ability to plan and reason. In contrast, ScribeAgent adapts the LLM itself for web navigation.
December 3, 2024 at 5:20 PM
2/ Web agents navigate through websites to solve real-world tasks. After the user defines a high-level objective, the agent outputs step-by-step actions based on the objective, observation, and interaction history. For text-based agents, the observation typically includes the website's URL and HTML.
December 3, 2024 at 5:20 PM
1/ Introducing ScribeAgent 🤖! Using 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘄𝗲𝗯 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗱𝗮𝘁𝗮, we at @scsatcmu.bsky.social and Scribe scribehow.com/ have adapted 𝗴𝗲𝗻𝗲𝗿𝗮𝗹-𝗽𝘂𝗿𝗽𝗼𝘀𝗲 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠𝘀 into 𝘀𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘇𝗲𝗱 𝘄𝗲𝗯 𝗮𝗴𝗲𝗻𝘁𝘀, outperforming agents that rely on proprietary models like GPT-4 and o1-preview. More in 🧵.
December 3, 2024 at 5:20 PM