如何将dis.dis生成的Python反汇编转换回codeobject？

阿华AIGC实验室

2026-5-13

Can you create a code object from disassembly output from dis.dis()?

Great question! Let's tackle this head-on. The short answer is: Python's built-in dis module doesn't natively support "recompiling" human-readable disassembly back into a code object, but it's absolutely possible with some extra work—either by parsing structured disassembly data or using third-party libraries to reconstruct the code object.

Why the built-in `dis` can't do this alone

When you run dis.dis(co) on a code object, you get a human-friendly text output that shows bytecode instructions, their arguments, and some context. But code objects contain far more metadata than what's visible in that output: things like the constant pool, local/variable names, stack size, flags, line number tables, free variables, and more. Some of these details are either omitted from the text disassembly or presented in a format that's hard to programmatically parse back into a valid code object.

How to reconstruct a code object from disassembly

Here are two practical approaches to make this work:

1. Use structured disassembly with `dis.get_instructions()`

Instead of parsing the plain text output from dis.dis(), use dis.get_instructions(co) which returns structured Instruction objects. These objects have attributes like opname, arg, argval, and offset, making it easier to extract the data needed to rebuild the code object. This avoids the messy work of parsing raw text.

2. Use a third-party bytecode manipulation library

Libraries like bytecode simplify the process of constructing and converting bytecode into valid code objects. Let's walk through an example using your original code:

First, install the library:

pip install bytecode

Then, we'll reconstruct the code object from the disassembly of print("lol"):

from bytecode import ConcreteBytecode, Instr, Code

# Step 1: Extract metadata and instructions from the original disassembly
# Original disassembly output:
#   1           0 LOAD_NAME                0 (print)
#               2 LOAD_CONST               0 ('lol')
#               4 CALL_FUNCTION            1
#               6 POP_TOP
#               8 LOAD_CONST               1 (None)
#              10 RETURN_VALUE

consts = ('lol', None)
names = ('print',)
instructions = [
    Instr("LOAD_NAME", 0),
    Instr("LOAD_CONST", 0),
    Instr("CALL_FUNCTION", 1),
    Instr("POP_TOP"),
    Instr("LOAD_CONST", 1),
    Instr("RETURN_VALUE"),
]

# Step 2: Build a ConcreteBytecode object with all required metadata
concrete_bytecode = ConcreteBytecode(instructions)
concrete_bytecode.consts = list(consts)
concrete_bytecode.names = list(names)
concrete_bytecode.argcount = 0  # No positional arguments
concrete_bytecode.stacksize = 2  # Maximum stack depth needed for this code
concrete_bytecode.flags = 0x00  # Flags for 'exec' mode
concrete_bytecode.filename = '<string>'
concrete_bytecode.name = '<module>'

# Step 3: Convert to a Code object, then to a Python code object
code = concrete_bytecode.to_code()
rebuilt_code_obj = code.to_code_object()

# Test it out!
exec(rebuilt_code_obj)  # This will print "lol"

Key things to watch out for

Metadata completeness: To build a valid code object, you need to correctly set all required parameters (like nlocals, freevars, or lnotab for line numbers). If the original code object had these, you'll need to extract them separately since dis.dis() doesn't show every detail.
Instruction accuracy: Even a single wrong opcode or argument will break the code object, so double-check that your parsed instructions match the original disassembly exactly.
Python version differences: Bytecode formats can change between Python versions, so make sure your tooling matches the Python version you're working with.

内容的提问来源于stack exchange，提问作者pomo_mondreganto