Abstract: Current 3D Large Multimodal Models (3D LMMs) have shown tremendous potential in 3D-vision-based dialogue and reasoning. However, how to further enhance 3D LMMs to achieve fine-grained scene ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results