Lightnews — Scholar-powered news

Tom Dupuis

@tomdupuis.bsky.social

Oh don't get me wrong blocking individuals on sight whenever they get aggressive and toxic is healthy imo.
I was talking about some lists I saw like "hey guys I made a list of all HF employees, block them all". They don't live in the real world

November 28, 2024 at 6:00 PM

Tom Dupuis

@tomdupuis.bsky.social

Block lists = echo chamber speed run any%

November 28, 2024 at 2:32 PM

Tom Dupuis

@tomdupuis.bsky.social

Oh I def agree with you that more data is necessary but not sufficient, sorry for any misunderstanding

November 27, 2024 at 8:19 PM

Tom Dupuis

@tomdupuis.bsky.social

Very neat result and method !
But I'd argue your pretraining dataset is *big* compared to the diversity of the simulation tasks (low). Real world robotics is so much more complex/continuous/large dim that I'm afraid we may need wayyyy to much tokens for now to be able to train something like this

November 27, 2024 at 8:17 PM

Tom Dupuis

@tomdupuis.bsky.social

emphasis on the "well tuned". It's more of a practical choice rather than a fact about strict superiority. The same thing in RL: if 2 methods are equivalent but you have to do a big HP search for the first one while the second works out of the box, you choose the latter any day (Octo is like this)

November 27, 2024 at 7:53 PM

Tom Dupuis

@tomdupuis.bsky.social

Tbh I think the biggest added values of Octo were:
- open source implementation already coupled to a cheap table-top arm that I could buy (ViperX), integration is so hard and it saved me a lot of time
- Proof of existence of a good robot policy with few parameters < 100M
- good init for finetuning

November 27, 2024 at 7:14 PM

Tom Dupuis

@tomdupuis.bsky.social

Maybe overselling is a strong word, but a lot of paper sell on generalization instead of stability/training difficulty, when most methods actually have the same common generalization issues.
I do agree that Diffusion Policy, Octo and such made it 100x easier to have something

November 27, 2024 at 2:42 PM

Tom Dupuis

@tomdupuis.bsky.social

This is a shame because I genuinely believe the Octo-type architecture is the way to go (basically a low-level visuo-motor controller with early fusion of sensors), but real generalization capabilities will only appear with multiple OOM more data

November 27, 2024 at 2:38 PM

Tom Dupuis

@tomdupuis.bsky.social

Agree 100% with you on this. You either have bottlenecked perf with frozen backbone, or no generalization, no other choice (except maybe adapters, but that's a patch)
I'd guess we need to work on continual pretraining / neuron-level finetuning. Maybe need to go beyond SGD on dense layers...

November 27, 2024 at 2:32 PM

Tom Dupuis

@tomdupuis.bsky.social

For OOD generalization, most of the degradation comes from lack of VISUAL generalization, which is entirely to be expected currently... We need better vision backbones, and such don't exist yet (closest I can think of is prismatic models= SigLip + DinoV2 for both semantics and geometric info)

November 27, 2024 at 2:17 PM

Tom Dupuis

@tomdupuis.bsky.social

1. Sure but that's true for any robotics model honestly... Too much moving parts
2. Define too much. In my experience with a few 100s demo (takes a day) you can solve some nice non-trivial tasks
But we are using a secret sauce for finetuning I can't spill yet
3. Depends on the generalization type

November 27, 2024 at 2:15 PM

Tom Dupuis

@tomdupuis.bsky.social

What do you mean by that?
If you try zero-shot on new objects, sure I'm not surprised, the generalization capabilities are oversold, but it's not hard to figure out why: domain gap.
But with proper fine-tuning Octo small is very good on any tasks we tried and Octo base even better.

November 27, 2024 at 2:03 PM

Tom Dupuis

@tomdupuis.bsky.social

Domain adaptation? Otherwise pre-finetuning comes to mind

November 26, 2024 at 1:16 AM

Tom Dupuis

@tomdupuis.bsky.social

Ok I get it now. But that's a different problem def from the paper, where streaming RL = online RL inf horizon + 1 policy update per env step. I think that's reasonable to call that "streaming", because it's more specific than online RL inf horizon, and emphasizes the instant update/no-memory.

November 25, 2024 at 6:57 PM

Tom Dupuis

@tomdupuis.bsky.social

I agree that being clear is important. What is the canonical setting called? Online RL means everything interacting with an environment for deepRLers... so that's not very specific. And I don't see the connection with infinite horizon at all here, it seems that's unrelated but I may be wrong

November 25, 2024 at 6:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news