In the paper, researchers used 15 example images and asked Gemini to begin categorizing the night sky using that base. Gemini ...
Abstract: Recently, many multimodal trackers have prioritized RGB as the dominant modality, treating other modalities as auxiliary, and fine-tuning separately various multimodal tasks. This imbalance ...
Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360 $^{\circ }$ images. To alleviate ...