PyTorch中torch.addmm函数的作用及优先使用原因问询

阿华AIGC实验室

2026-5-21

Understanding PyTorch's torch.addmm(): Purpose and Advantages Over Manual Calculation

Great question! Let’s break down what torch.addmm() does and why it’s often a better choice than writing the equivalent operation manually.

What does `torch.addmm()` do?

At its core, this function computes a scaled matrix multiplication plus a scaled input matrix—mathematically, that’s:

out = beta * mat + alpha * (mat1 @ mat2)

Let’s break down its parameters to make it concrete:

beta: Scaling factor for the input matrix mat (defaults to 1)
mat: The base matrix that gets scaled and added to the matrix product
alpha: Scaling factor for the product of mat1 and mat2 (defaults to 1)
mat1, mat2: The two matrices to multiply together
out (optional): A pre-allocated tensor to store the result, avoiding extra memory allocation

Here’s a quick code example to confirm the equivalence:

import torch

# Sample tensors
mat = torch.ones(2, 3)
mat1 = torch.randn(2, 4)
mat2 = torch.randn(4, 3)

# Using torch.addmm
addmm_result = torch.addmm(beta=0.5, mat=mat, alpha=2.0, mat1=mat1, mat2=mat2)

# Equivalent manual calculation
manual_result = 0.5 * mat + 2.0 * (mat1 @ mat2)

# Verify they're the same
print(torch.allclose(addmm_result, manual_result))  # Output: True

Why prioritize `torch.addmm()` over manual calculation?

You might be thinking "why not just write the math directly?"—here are the key reasons:

Better performance
PyTorch’s built-in functions are heavily optimized for both CPU and GPU. torch.addmm() combines the matrix multiplication, scaling, and addition into a single fused operation. This eliminates the need to create intermediate tensors (like the result of mat1 @ mat2 before scaling and adding), which saves memory bandwidth and speeds up computation—especially critical for large matrices or real-time applications. On GPUs, it leverages specialized kernels (like cuBLAS for CUDA) that are far more efficient than chaining separate operations.
Improved numerical stability
Fused operations like addmm() are designed to minimize floating-point precision errors. When you chain separate scaling, multiplication, and addition steps, you can accumulate more rounding errors. The optimized implementation in torch.addmm() handles these edge cases better than manual code.
Memory efficiency
Using the out parameter lets you write the result directly to a pre-allocated tensor, avoiding the overhead of creating new tensors for intermediate steps. This is a big win for memory-constrained environments (like training large models on GPUs with limited VRAM).
Clearer, more maintainable code
torch.addmm() explicitly signals that you’re performing a "matrix multiplication plus scaled add" operation. Other developers reading your code will immediately understand the intent, whereas manual calculations require double-checking the math to confirm it’s doing the same thing.
Optimized autograd support
While PyTorch’s autograd can track manual calculations, the built-in addmm() has a highly optimized backward pass. It computes gradients more efficiently, reducing both computation time and memory usage during training.