Vish Sangale
NOTEApril 2020

Loss functions in machine learning

Loss functions are the heartbeat of machine learning models. They quantify the discrepancy between a model’s predictions and the actual ground truth, guiding the optimization process. Choosing the right loss function is often as critical as the choice of model architecture.

In this guide, we’ll explore the essential loss functions used in modern ML, categorized by their primary use cases.

1. Regression Losses

Used when the target variable is continuous.

Mean Squared Error (MSE) / L2 Loss

The most common loss for regression. It penalizes larger errors more heavily by squaring them.

Mean Absolute Error (MAE) / L1 Loss

Measures the average magnitude of errors without considering their direction.

Huber Loss

A hybrid approach that’s quadratic for small errors and linear for large ones.

Log-Cosh Loss

Logarithm of the hyperbolic cosine of the prediction error.


2. Classification Losses

Used for predicting categorical outcomes.

Binary Cross-Entropy (BCE) / Log Loss

The standard loss for binary classification. It measures the performance of a model whose output is a probability between 0 and 1.

Categorical Cross-Entropy Loss

Used for multi-class classification where each example belongs to exactly one class.

Hinge Loss (SVM Loss)

Used for training maximum-margin classifiers, most notably Support Vector Machines (SVMs).

Focal Loss

An evolution of Cross-Entropy designed to address extreme class imbalance (e.g., in object detection where background examples far outnumber objects).


3. Probability and Similarity Losses

KL Divergence (Kullback-Leibler)

Measures how one probability distribution differs from a second, reference distribution.

Dice Loss

Measures the overlap between two samples. Frequently used in medical image segmentation to handle class imbalance between foreground and background.


4. Ranking and Retrieval Losses

Crucial for recommendation systems and search where relative order matters more than absolute scores.

Triplet Loss

Commonly used in face recognition and metric learning.

Quantile Loss

Used when we are interested in predicting a specific quantile (e.g., the 90th percentile) rather than the mean. Useful for estimating uncertainty and prediction intervals.


Choosing the right loss function is an iterative process. For large-scale systems, we often see hybrid losses that combine multiple objectives to balance precision, recall, and specific business constraints.