Abstract: This paper explores zero-shot Vision-and-Language Navigation (VLN), enabling agents to generalize navigation to unseen data classes. Most current approaches rely on large models, but these ...