Abstract: Capitalizing on image-level pre-trained models for various downstream tasks has recently emerged with promising performance. However, the paradigm of “image pre-training followed by video ...