Abstract: End-to-end DETR-based cross-modal fusion in 3-D object detection has achieved promising performance in many benchmarks. However, these methods either implement cross-modal fusion in a single ...