Graphcore IPU的TensorFlow XLA Dump文件识别与命名含义问询

阿华AIGC实验室

2026-5-7

Great questions—let's break these down clearly, since navigating XLA dumps for Graphcore IPUs can feel a bit overwhelming at first.

1. Target XLA Graph Files from TensorFlow-for-IPU Exports

When you export an XLA graph from a TensorFlow program optimized for Graphcore IPUs (usually by calling ipu.export_xla() on a function wrapped with ipu.compile()), the file containing your fully optimized target graph is the one with a .xla_pb extension (protobuf format).

Here’s what the filename components mean, using a real-world example (train_step.xla_pb):

train_step: This matches the name of the function you compiled with ipu.compile()—it’s a human-readable identifier for your computation graph.
.xla_pb: The suffix tells you this is a serialized XLA computation graph. It includes all IPU-specific optimizations: operator sharding across tiles, memory allocation strategies, and mappings to IPU-native operators—everything needed to run the graph on IPU hardware.

You might also see a .hlo.txt file alongside it: this is a human-readable text dump of the HLO (High-Level Optimizer) instructions, useful for debugging, but the .xla_pb is the actual executable graph file.

2. Identifying the Target .dot File in Debug Dumps

Your flags are dumping HLO graphs at specific optimization passes, and the changing file count makes total sense—ipu.compile() triggers IPU-exclusive optimization steps that don’t run in standard TensorFlow XLA compilation.

Here’s how to find your target graph:

Look for files with forward-allocation in the name: You specified --xla_dump_hlo_pass_re=forward-allocation, which tells XLA to dump the graph after it finalizes memory allocation for IPU tiles and locks in operator placements. This is the closest representation of the graph that will actually execute on the IPU.
Filename breakdown (example: hlo_forward-allocation_0001_fn_0.dot):
- hlo_: Marks this as an HLO graph dump.
- forward-allocation: The name of the optimization pass you targeted—this is the key string to search for.
- 0001: A sequential number for the pass execution step (useful if the pass runs multiple times).
- fn_0: A unique identifier for your compiled function (increments if you have multiple compiled functions in your program).
- .dot: The Graphviz format extension, which you can visualize in browser-based tools.

Why file counts vary:

Without ipu.compile(), XLA only runs generic CPU/GPU optimization passes, so you’ll see fewer dump files.
With ipu.compile(), Graphcore’s XLA backend runs additional IPU-specific passes (like tile sharding, memory reuse, and IPU operator lowering), which adds more dump files. Only the forward-allocation files show you the final IPU-optimized graph.

内容的提问来源于stack exchange，提问作者Niels