Abstract: In multi-label image classification, existing methods exhibit significant shortcomings in encoding semantic features from the textual modality and cross-modal feature fusion between image ...