Our baseline method—a simple multi-view video pose transformer—directly regresses full-body 3D poses from 6 egocentric videos with solid accuracy, showing EgoSim's effective data generation.
Our baseline method—a simple multi-view video pose transformer—directly regresses full-body 3D poses from 6 egocentric videos with solid accuracy, showing EgoSim's effective data generation.
EgoSim takes real mocap data (e.g., from AMASS) and synthesizes multi-modal egocentric videos.
Plus: MultiEgoView, a real dataset from 6 GoPro cameras and ground-truth 3D poses during several activities (13 people).
EgoSim takes real mocap data (e.g., from AMASS) and synthesizes multi-modal egocentric videos.
Plus: MultiEgoView, a real dataset from 6 GoPro cameras and ground-truth 3D poses during several activities (13 people).
Our SlowFast feature fusion samples input signals sparsely & densely to extend the input context with no computational overhead.
Our SlowFast feature fusion samples input signals sparsely & densely to extend the input context with no computational overhead.
#ECCV2024
EgoPoser supports diverse body shapes & remains robust even when users move in large environments
@jiaxijiang.bsky.social @paulstreli.bsky.social
#ECCV2024
EgoPoser supports diverse body shapes & remains robust even when users move in large environments
@jiaxijiang.bsky.social @paulstreli.bsky.social