TL;DR: ViPE is a useful open-source spatial AI tool for annotating camera poses and dense depth maps from raw videos! Contributors: NVIDIA (Spatial Intelligence Lab, Dynamic Vision Lab, NVIDIA Issac, ...
Outperforms Qwen2.5-Omni-7B, Kimi-Audio-Instruct-7B on multiple key audio understanding tasks. Although MiDashengLM demonstrates superior audio understanding performance and efficiency compared to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results