Lightnews — Scholar-powered news

Michael J. Black

@michael-j-black.bsky.social

Director, Max Planck Institute for Intelligent Systems; Chief Scientist Meshcapade; Speaker, Cyber Valley.
Building 3D humans.
https://ps.is.mpg.de/person/black
https://meshcapade.com/
https://scholar.google.com/citations?user=6NjbexEAAAAJ&hl=en&oi=ao

Posts Replies Media Videos

Michael J. Black

@michael-j-black.bsky.social

Here are all the CVPR projects that I’m part of in one thread.

Conference papers:

PromptHMR: Promptable Human Mesh Recovery
yufu-wang.github.io/phmr-page/

DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
radualexandru.github.io/difflocks/

June 9, 2025 at 8:05 AM

Michael J. Black

@michael-j-black.bsky.social

We iterate training CameraHMR and running CamSMPLify on the training set initialized with CameraHMR. This results in much improved pGT for 4DHumans and a SOTA single-image HMR method.

January 21, 2025 at 9:57 AM

Michael J. Black

@michael-j-black.bsky.social

4. But SMPLify only uses sparse 2D keypoints, which do not capture body shape. So we train a dense surface keypoint detector, DenseKP, on BEDLAM and run it on 4DHumans, resulting in improved body shape. The resulting method is CamSMPLify.

January 21, 2025 at 9:57 AM

Michael J. Black

@michael-j-black.bsky.social

2. We introduce CameraHMR, which integrates HumanFOV into HMR2.0 to exploit the estimated focal length.

3. To get accurate pseudo ground truth (pGT) training data, we compute the focal length for images in 4DHumans dataset and modify SMPLify to take this into account.

January 21, 2025 at 9:57 AM

Michael J. Black

@michael-j-black.bsky.social

There are 4 key contributions that make it so accurate and robust:

1. To get accurate 3D shape and pose as well as good alignment to image features, you need to know the focal length of the camera. To solve this, we train HumanFOV to compute the field of view.

January 21, 2025 at 9:57 AM

Michael J. Black

@michael-j-black.bsky.social

Code and data are now online for CameraHMR, our state-of-the-art parametric 3D human pose and shape (HPS) estimation method that will appear at hashtag#3DV2025.
github.com/pixelite1201...

January 21, 2025 at 9:57 AM

Michael J. Black

@michael-j-black.bsky.social

Remember Photo Tourism? It reconstructed rigid scenes from a photo collection. Can we do the same for humans, creating an avatar from your personal photo album? Come see Yuliang Xiu present PuzzleAvatar today #SIGGRAPHAsia2024, 2:45-3:55 PM, Hall B7 (1). puzzleavatar.is.tue.mpg.de

December 5, 2024 at 11:52 PM

Michael J. Black

@michael-j-black.bsky.social

Who: Nikos Athanasiou, Alpár Cseke, Markos Diomataris, me, @gulvarol.bsky.social

MotionFix: Text-Driven 3D Human Motion Editing
arXiv: arxiv.org/abs/2408.00712
Demo: huggingface.co/spaces/atnik...
Project: motionfix.is.tue.mpg.de
Dataset: motionfix.is.tue.mpg.de/explore.php pic.x.com/70CoLIlSxM

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

Using the dataset we train TMED, a diffusion model that takes a source motion and text and generates an edited motion based on the text.

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

We then manually label these motion pairs with text that describes how to transform the source motion into the target motion. This gives the triplets necessary for training. You can explore the dataset with our on-line data-exploration tool:
motionfix.is.tue.mpg.de/explore.php

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

Such models embed human motions in a latent space such that neighboring points in this space have similar motions. We sample nearby points in this space to get “edit pairs” – these are motions that are similar in some way but different in others. Here we use TMR.

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

If you had triplets of source motions, text edits, and edited motions, this problem would be “easy”. But how can we get such data? The key idea is to exploit a large dataset of human motion like #AMASS and existing human motion generation models.

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

Today at
@SIGGRAPHAsia
we present MotionFix – given a 3D human motion, we edit it using text. The edits are varied and include timing, pose, semantics, etc. For example, “kick higher”, “do it faster”, “don’t bend over”.

December 3, 2024 at 10:49 PM

Michael J. Black

@michael-j-black.bsky.social

Project: mosh.is.tue.mpg.de
Paper: files.is.tue.mpg.de/black/papers...
Video: youtu.be/Uidbr2fQor0?...

The blog is also on Medium:
medium.com/@black_51980...

Congratulations to my amazing co-authors Matthew Loper
and @naureenm.bsky.social!

December 3, 2024 at 12:59 AM

Michael J. Black

@michael-j-black.bsky.social

MoSh has won the 2024 SIGGRAPH Asia Test-of-Time Award. What’s MoSh? It takes motion capture markers and returns the animation of a realistic 3D human body in #SMPL-X format. I wrote a blog post to explain why MoSh is still relevant after 10 years.
perceiving-systems.blog/en/news/moti...

December 3, 2024 at 12:59 AM

Michael J. Black

@michael-j-black.bsky.social

Congratulations Marilyn Keller on a brilliant defense & breakthrough thesis that infers the inside of the body from the outside. OSSO (CVPR22) infers the skeleton from the surface, SKEL (SIGAsia23) adds accurate skeletal motion, and HIT (CVRP24) infers the soft tissues. Also, best hat of the year.

November 30, 2024 at 8:32 AM

Michael J. Black

@michael-j-black.bsky.social

I'm going to attend #SIGGRAPHAsia in Tokyo. We have two papers:

MotionFix: Text-Driven 3D Human Motion Editing
motionfix.is.tue.mpg.de/index.html

PuzzleAvatar: Assembly of Avatar from Unconstrained Photo Collections
puzzleavatar.is.tue.mpg.de

I look forward to seeing people there!

November 26, 2024 at 5:32 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news