Abstract: Current cross-modal retrieval methods heavily rely on accurate semantic labels or sample similarity measurements, and need to search for the nearest samples among all samples in the huge ...