Abstract: Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose TransMOT, which leverages powerful graph transformers to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results