Vish Sangale
NOTEApril 2020

Activation Functions

Activation functions are the mathematical “gates” that decide whether a neuron should fire or stay dormant. They introduce non-linearity into a neural network, allowing it to learn complex, non-linear relationships in data.

Sigmoid

Tanh (Hyperbolic Tangent)

ReLU (Rectified Linear Unit)

Leaky ReLU

ELU (Exponential Linear Unit)

Modern Functions: Swish and GELU

Which one should I use?

  1. Always start with ReLU. It’s the standard for a reason: it’s fast and effective.
  2. If you have “Dead Neurons”: Try Leaky ReLU or ELU.
  3. For Transformers and SOTA LLMs: Use GELU or Swish.
  4. Never use Sigmoid or Tanh in hidden layers unless you have a very specific reason. Save them for output layers (Sigmoid for binary classification, Tanh for GANs).