[1]. Cross-entropy and BCE have dedicated fused GPU kernels with built-in backward support. MSE and L1 are composed from primitive ops and differentiate through the standard autodiff engine.
g.cross_entropy_loss
Computes the mean cross-entropy loss between predicted logits and target labels. Internally applies log-softmax over the class dimension and then computes the mean negative log-likelihood.
Both inputs must have the same shape [batch, classes].
Raw (pre-softmax) predictions of shape
[batch, classes].Target distribution of shape
[batch, classes]. Typically one-hot encoded or soft labels.Scalar loss of shape
[1].cross_entropy_loss has a native GPU kernel with a fused backward pass. It is the recommended loss for multi-class classification.g.bce_loss
Binary cross-entropy loss for binary or multi-label classification. The formula is:
pred must contain values in the range (0, 1). Apply g.sigmoid to raw logits before passing them here.
Predicted probabilities in
(0, 1), typically after sigmoid. Must have the same shape as labels.Binary target values in
{0, 1} (or soft labels in [0, 1]). Same shape as pred.Scalar loss of shape
[1].bce_loss has a native GPU kernel with a fused backward pass. It is the recommended loss for binary and multi-label classification tasks.g.mse_loss
Mean squared error: loss = mean((pred - target)²).
Implemented as a composition of primitive ops (neg, add, mul, mean_all) and differentiates through the standard autodiff engine.
Predicted values. Any shape, must match
target.Ground truth values. Same shape as
pred.Scalar loss of shape
[1].g.l1_loss
Mean absolute error: loss = mean(|pred - target|).
Implemented as a composition of primitive ops (neg, add, abs, mean_all) and differentiates through the standard autodiff engine.
Predicted values. Any shape, must match
target.Ground truth values. Same shape as
pred.Scalar loss of shape
[1].Backward support summary
| Loss function | Backward support | Notes |
|---|---|---|
cross_entropy_loss | Native fused kernel | Preferred for multi-class classification. |
bce_loss | Native fused kernel | Preferred for binary / multi-label classification. |
mse_loss | Via primitive autodiff | Composed from neg, add, mul, mean_all. |
l1_loss | Via primitive autodiff | Composed from neg, add, abs, mean_all. Gradient is undefined at zero. |