Lightnews — Scholar-powered news

Shikhar Murty

@shikharmurty.bsky.social

470 followers 120 following 24 posts

Final year PhD Student in Computer Science @Stanford

Work on:
- Compositionality, syntax (language structure)
- Web Agents: Synthetic data, tree search, exploration (language interpretation)

Posts Replies Media Videos

Shikhar Murty

@shikharmurty.bsky.social

“casual interception” as defined in \citep{}…

February 14, 2025 at 11:41 PM

Shikhar Murty

@shikharmurty.bsky.social

controlling a browser / computer!
but requires a bit more tooling to set it up.

February 6, 2025 at 7:00 PM

Shikhar Murty

@shikharmurty.bsky.social

Please check out our paper for more details: arxiv.org/pdf/2410.02907

And our code if you want a NNetNav-ed model for your own domain:
github.com/MurtyShikhar...

Done with collaborators: @zhuhao.me, Dzmitry Bahdanau and @chrmanning.bsky.social

arxiv.org

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

We find that cross-website robustness is limited, and almost always, performance goes up from incorporating in-domain nnetnav data. This makes it even more important to work on unsupervised learning for agents - how are you going to collect human data for *any* website? [6/n]

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

We use this data for SFT-ing LLama3.1-8b. Our best models outperform zero-shot GPT-4 on both WebArena and WebVoyager, and reach SoTA performance among unsupervised methods for both datasets [5/n]

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

We use NNetNav to collect around 10k workflows for over 20 websites including 15 live websites, and 5 self-hosted websites.

Data is available on 🤗: huggingface.co/datasets/sta...
huggingface.co/datasets/sta...
[4/n]

stanfordnlp/nnetnav-live · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

Main ideas behind NNetNav exploration
1 complex goals have intermediate subgoals thus complex trajectories must have meaningful sub-trajectories
2 Use an LM instruction relabeler + judge to test if trajectory-so-far is meaningful. If yes, continue exploring, otherwise prune [3/n]

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

NNetNav uses a structured exploration method to efficiently search and collect traces on live-websites, which are retroactively labeled into instructions, finding a strikingly diverse set of workflows for any website (e.g. like this plot) [2/n]

February 6, 2025 at 5:43 PM

Shikhar Murty

@shikharmurty.bsky.social

Now, reviewers are upset if we only finetune sub 10B parameter models!

November 26, 2024 at 10:28 PM

Shikhar Murty

@shikharmurty.bsky.social

for more context: we are training the probe on sentences from PTB / BLIMP

November 25, 2024 at 5:52 AM

Shikhar Murty

@shikharmurty.bsky.social

thx for sharing, though semantic parsing almost certainly benefits from modeling syntax :)

November 25, 2024 at 3:49 AM

Shikhar Murty

@shikharmurty.bsky.social

SRL probe still rewards hidden states that model dependency relations, no? would like a probe thats agnostic to how well the underlying network models syntax

November 24, 2024 at 10:38 PM

Shikhar Murty

@shikharmurty.bsky.social

could i get added? thx for making this!!

November 24, 2024 at 5:25 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news