Lightnews — Scholar-powered news

Full body model really excels in exo views and is worth using if one can get a get view of the upper body, and hands only work great given a good bounding box from projecting 3D exo keypoints into egocentric views.

October 10, 2025 at 8:25 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

I've made lots of improvements to the calibration code and ended up merging the full body estimator with the hands only. Also FINALLY got ego synced and working in the full thing.

October 10, 2025 at 8:25 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

and the code that made it can be found here - <github.com/rerun-io/an...>

GitHub - rerun-io/annotation-example at dbt

Contribute to rerun-io/annotation-example development by creating an account on GitHub.

github.com

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

View it directly in the @rerundotio webviewer here (I promise it's worth it) - <app.rerun.io/version/0.2...>

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Still, I'm quite happy with how it's going so far. Currently, I have a reasonable set of datasets to validate, a performant baseline, and an annotation app to correct inaccurate predictions.

From here, the focus will be more on the egocentric side!

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

3. Interacting hands causes lots of issues, and the pipeline is very fragile when there's no clear delineation between hands

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Really happy with how it looks so far, but this is far from ideal.

1. Not even close to real time, this 30-second 8-view sequence took nearly 5 minutes to process on my 5090 GPU
2. 8 views is WAY too many and unscalable, I'm convinced this can be done with far fewer (2 exo + 1 stereo ego)

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

3. Per View 2D keypoint estimation
4. Hand Pose Optimization

At the end of it all, I have a pipeline where you input synchronized videos and this outputs full tracked per-view 2D keypoints, bounding boxes, 3D keypoints, MANO joint angles + hand shape!

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

I want to emphasize that these are not the ground-truth values provided by the wonderful HOCap dataset, but rather from my pipeline that was written from the ground up!

For context, it consists of 4 parts

1. Exo/Ego camera estimation
2. Hand Shape Calibration

September 29, 2025 at 1:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Code is here for you to follow along! <github.com/rerun-io/an...>

GitHub - rerun-io/annotation-example at dbt

Contribute to rerun-io/annotation-example development by creating an account on GitHub.

github.com

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

This tight integration between visuals, predictions, and data is crucial to ensure your data is precisely what you expect it to be.

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

The next step involves leveraging Rerun's recent updates, particularly the multisink support. Changes are saved directly to a file in .rrd format, easily extractable since the underlying representation is PyArrow. This can be converted to Pandas, Polars, or DuckDB.

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Networks will occasionally make mistakes, so having the ability to correct them manually is crucial. This is a significant step towards robust and powerful hand tracking, which will provide excellent training data for robot dexterous manipulation.

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

The only input required is a zip file containing two or more multiview MP4 files. I handle everything else automatically. This application works with both egocentric (first-person) and exocentric (third-person) videos.

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

The combination of Rerun's callback system and Gradio integration enables a highly customizable and powerful labeling app. It supports multiple views, 2D and 3D, and maintains time synchronization!

September 19, 2025 at 5:00 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Find the current code here <github.com/rerun-io/an...>

GitHub - rerun-io/annotation-example at dbt

Contribute to rerun-io/annotation-example development by creating an account on GitHub.

github.com

September 15, 2025 at 5:01 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

The complexity of this is really starting to stack up, and I hope in the longer term to have the compute + data to build a fully end-to-end network!
x.com/pablovelago...

September 15, 2025 at 5:01 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Upload multiview video zip -> calibrate cameras (VGGT + Moge) -> perform 2D point estimation (Wilor)

Now I need to add reactivity for every frame and timestamp to address any failures in the network!

September 15, 2025 at 5:01 PM

pablovelagomez.bsky.social

@pablovelagomez.bsky.social

Every off-the-shelf annotation solution I've tried doesn't provide nearly enough flexibility, so it was a no-brainer to build my own with rerun and gradio.

So far, I have the bare-bones implementation:

September 15, 2025 at 5:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news