Abstract: Navigating in continuous environments with vision-language cues presents critical challenges, particularly in the accuracy of waypoint prediction and the quality of navigation ...