用于二分类的多层感知器：阈值学习

阿华AIGC实验室

2026-5-19

Is This MLP Cross-Entropy Loss Setup Reasonable?

Great question—let’s unpack this step by step, because there’s a subtle mix-up between label encoding, model outputs, and how cross-entropy loss is designed to work.

First, let’s clarify key concepts to set the stage:

A sigmoid activation outputs a value s ∈ [0,1], which is standardly interpreted as the probability of the positive class in binary classification.
Cross-entropy loss (specifically binary cross-entropy, BCE, for binary tasks) is built to compare this continuous probability output to a true label that represents the ground-truth class distribution.

Now let’s break down your setup and its validity:

1. The core confusion: Label encoding vs. prediction thresholding

You mention setting "label ŷ" to +1 if the network's sigmoid output ≥0.5, else -1. Wait—are we talking about true labels or predicted labels here? That changes everything:

If this is about predicted labels (for inference/evaluation): Thresholding a sigmoid output at 0.5 to get a discrete class (+1/-1 or 1/0) is totally standard for binary classification. This is how you turn a probability into a hard classification decision.
If this is about using +1/-1 as the true labels for cross-entropy loss calculation: That's where we need to adjust, but it's still workable with a small tweak.

2. Using +1/-1 true labels with sigmoid & cross-entropy

Standard BCE loss expects true labels to be in {0,1} (matching the sigmoid's probability interpretation). But if your true labels are encoded as +1/-1, you can easily map them to 0/1 first:

# Convert +1/-1 labels to 0/1 format
y_true_01 = (y_true + 1) / 2
# Compute standard binary cross-entropy loss
loss = -y_true_01 * np.log(sigmoid_output) - (1 - y_true_01) * np.log(1 - sigmoid_output)

This is mathematically equivalent to using 0/1 labels, so it’s completely reasonable. The encoding is just a convention—no impact on the loss's ability to minimize error and update weights.

3. The problematic scenario: Using thresholded +1/-1 predictions in cross-entropy

If you’re taking the sigmoid output, thresholding it to get a discrete +1/-1, and then using that discrete value to compute cross-entropy loss—this is not reasonable, and here’s why:

Cross-entropy loss relies on continuous probability outputs to calculate meaningful gradients for weight updates. A discrete +1/-1 has no gradient information (it’s a step function), so your network can’t learn effectively.
Cross-entropy is designed to measure the difference between two probability distributions. A discrete label isn’t a distribution—you’re throwing away the uncertainty information captured by the sigmoid's continuous output.

Final Takeaway

Thresholding sigmoid outputs to +1/-1 for inference/evaluation is standard and reasonable.
Using +1/-1 as true labels is reasonable if you convert them to 0/1 before computing BCE loss.
Using thresholded +1/-1 predictions as part of the cross-entropy loss calculation is not reasonable—it breaks the loss's mathematical foundation and impairs learning.

内容的提问来源于stack exchange，提问作者Nikaido