Abstract: For on-policy reinforcement learning (RL), discretizing action space for continuous control can easily express multiple modes and is straightforward to optimize. However, without considering ...