Chris Offner
banner
chrisoffner3d.bsky.social
Chris Offner
@chrisoffner3d.bsky.social
Student Researcher @ RAI Institute, MSc CS Student @ ETH Zurich
visual computing, 3D vision, spatial AI, machine learning, robot perception.
📍Zurich, Switzerland
Reposted by Chris Offner
There is a lot to hate about the politics of the silicon valley right, but they do actually want to build stuff, and I would prefer if the left didn't cede "we should be able to build stuff" to the right.
September 5, 2025 at 3:19 PM
People often use "smart" when they mean "wise" and I don't think it's too controversial to doubt the wisdom of some tech elites. Other than that I certainly agree with you.
September 5, 2025 at 10:39 AM
I love both.
August 23, 2025 at 1:55 PM
I'd also welcome a Bayesian framing. I know Andrew Davison's group has done work on Gaussian belief propagation for SLAM factor graphs (gaussianbp.github.io) but other than that and arxiv.org/abs/1703.04977, I'm not aware of of much Bayesian (deep) learning in (3D) vision right now.
Gaussian Belief Propagation
gaussianbp.github.io
August 22, 2025 at 4:45 PM
Reposted by Chris Offner
In general I think 3D vision would do well to take some inspiration from Bayesians. I guess these days they lost their glamour, but imo it's a very nice way of thinking that feels somewhat lost currently.
August 22, 2025 at 3:18 PM
You follow him. Andrew Davison from Imperial College London.
August 22, 2025 at 11:21 AM
Sort of, but DINOv3 also seems to (inadvertently?) point towards the limits of pure scaling.
x.com/chrisoffner3...
August 19, 2025 at 7:34 PM
If you maximize cosine similarity, aren't you left with only a single dimension (i.e. scaling the vector norm) as CosSim-invariant "wiggle room" to encode geometric information that isn't also captured by the language?
August 15, 2025 at 8:54 AM
Yes but that's an additional training objective beyond merely minimizing cosine similarity. You'd need to introduce something that ensures that pixel features don't just collapse to language semantics, via some auxiliary task, no?
August 15, 2025 at 8:43 AM
It just seems to me that mapping pixels and language to highly similar internal representations means that you'll drop a lot of information that is not (or cannot) be accurately described by language.
August 15, 2025 at 8:04 AM
If we try to perfectly reconstruct, e.g., a complex 3D mesh from a natural language description, we'll find that the two modalities operate on very different levels of precision and abstraction.
August 15, 2025 at 7:55 AM
My concern is that language as a modality inherently biases the data towards coarser labels/concepts. You won't perfectly describe per-pixel normals and depth in natural language. Geometry is continuous and "raw", language is discrete and abstract.
August 15, 2025 at 7:55 AM
Oh, interesting. I'll check that out!
August 15, 2025 at 7:48 AM