Maxim
banner
backpropaganda.bsky.social
Maxim
@backpropaganda.bsky.social
Doing Computer Vision stuff with ML.
This is just your standard gradient accumulation?

for i, (input, target) in enumerate(data):
output = model(input)
loss = loss_fn(output, target)
loss = loss / iters_to_accumulate
loss.backward()

if (i + 1) % iters_to_accumulate == 0:
optimizer.zero_grad()
December 19, 2024 at 7:02 PM
Still a bit confused on when to use Illuminate vs NotebookLM for getting an audio overview of papers. Currently using Illuminate.
December 14, 2024 at 10:51 PM
I find 'uv sync' so fast that I would just change the version in project.toml and .python-version and sync again. The fact that the env is tied to a directory may sometimes be a negative but it also ensures all package versions are tracked in git.
December 4, 2024 at 6:27 AM
What seems to be currently the best approach for depth estimation, diffusion models or "old-school" discriminative models? Both seem to claim SOTA models nowadays?
December 2, 2024 at 2:33 PM
Every once in a re-read Joseph Redmon's YOLOv3 paper. That was really a work of art..
"Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. [...] I managed to make some improvements to YOLO. But, honestly, nothing like super interesting"
November 29, 2024 at 3:39 PM
Would be interesting to see how it would perform for BOP dynamic onboarding.

As you can tell, I’ve started sharing interesting 6D pose estimation papers I come across. I already track these for myself, so why not share them with all of you?
November 26, 2024 at 2:02 PM
Authors: Vincent van der Brugge, Marc Pollefeys, Joshua B. Tenenbaum, Ayush Tewari, Krishna Murthy Jatavallabhula

Arxiv: arxiv.org/abs/2411.1...
Code: github.com/vincentva...
PickScan: Object discovery and reconstruction from handheld interactions
Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most...
arxiv.org
November 26, 2024 at 2:02 PM
As you can tell, I’ve started sharing interesting 6D pose estimation papers I come across. I already track these for myself, so why not share them with all of you?
November 25, 2024 at 2:05 PM
Authors: Kai Chen, Yiyao Ma, Xingyu Lin , Stephen James, Jianshu Zhou, Yun-Hui Liu, Pieter Abbeel, Qi Dou

Openreview: https://openreview.net/forum?id=FTpKGuxEfy
Project page: https://vfm-6d.github.io/
Vision Foundation Model Enables Generalizable Object Pose Estimation
Object pose estimation plays a crucial role in robotic manipulation, however, its practical applicability still suffers from limited generalizability. This paper addresses the challenge of...
openreview.net
November 25, 2024 at 2:05 PM
The GPU poor do not have it easy ;). But usually it is just multiple notebooks and nvidia-smi is handy to see how much each notebook is taking up.
November 21, 2024 at 3:04 PM
Haven't switched yet, is there an easy way to see which programs take up how much gpu memory like in nvidia-smi?
November 21, 2024 at 9:16 AM
Thats why all my pytorch code looks like:
```
from torchvision.transforms.v2.functional import to_dtype, to_image
img_tensor = to_dtype(to_image(image), scale=True)
```
November 20, 2024 at 7:19 PM