Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) ...
Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate with lowlevel actions following natural language instructions in 3D environments. Most existing ...