Abstract: Activation functions are essential components in any neural network model; they play a crucial role in determining the network’s expressive power through their introduced non-linearity.
Abstract: Nonlinear functions (NFs) in Transformers require high-precision computation consuming significant time and energy, despite the aggressive quantization schemes for other components.