https://www.cs.cornell.edu/~ruojin/
🔑 Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.
🔑 Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.
✅Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
🔗 inter-pose.github.io
✅Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
🔗 inter-pose.github.io