如何将dis.dis生成的Python反汇编转换回codeobject?
Great question! Let's tackle this head-on. The short answer is: Python's built-in dis module doesn't natively support "recompiling" human-readable disassembly back into a code object, but it's absolutely possible with some extra work—either by parsing structured disassembly data or using third-party libraries to reconstruct the code object.
Why the built-in dis can't do this alone
When you run dis.dis(co) on a code object, you get a human-friendly text output that shows bytecode instructions, their arguments, and some context. But code objects contain far more metadata than what's visible in that output: things like the constant pool, local/variable names, stack size, flags, line number tables, free variables, and more. Some of these details are either omitted from the text disassembly or presented in a format that's hard to programmatically parse back into a valid code object.
How to reconstruct a code object from disassembly
Here are two practical approaches to make this work:
1. Use structured disassembly with dis.get_instructions()
Instead of parsing the plain text output from dis.dis(), use dis.get_instructions(co) which returns structured Instruction objects. These objects have attributes like opname, arg, argval, and offset, making it easier to extract the data needed to rebuild the code object. This avoids the messy work of parsing raw text.
2. Use a third-party bytecode manipulation library
Libraries like bytecode simplify the process of constructing and converting bytecode into valid code objects. Let's walk through an example using your original code:
First, install the library:
pip install bytecode
Then, we'll reconstruct the code object from the disassembly of print("lol"):
from bytecode import ConcreteBytecode, Instr, Code # Step 1: Extract metadata and instructions from the original disassembly # Original disassembly output: # 1 0 LOAD_NAME 0 (print) # 2 LOAD_CONST 0 ('lol') # 4 CALL_FUNCTION 1 # 6 POP_TOP # 8 LOAD_CONST 1 (None) # 10 RETURN_VALUE consts = ('lol', None) names = ('print',) instructions = [ Instr("LOAD_NAME", 0), Instr("LOAD_CONST", 0), Instr("CALL_FUNCTION", 1), Instr("POP_TOP"), Instr("LOAD_CONST", 1), Instr("RETURN_VALUE"), ] # Step 2: Build a ConcreteBytecode object with all required metadata concrete_bytecode = ConcreteBytecode(instructions) concrete_bytecode.consts = list(consts) concrete_bytecode.names = list(names) concrete_bytecode.argcount = 0 # No positional arguments concrete_bytecode.stacksize = 2 # Maximum stack depth needed for this code concrete_bytecode.flags = 0x00 # Flags for 'exec' mode concrete_bytecode.filename = '<string>' concrete_bytecode.name = '<module>' # Step 3: Convert to a Code object, then to a Python code object code = concrete_bytecode.to_code() rebuilt_code_obj = code.to_code_object() # Test it out! exec(rebuilt_code_obj) # This will print "lol"
Key things to watch out for
- Metadata completeness: To build a valid code object, you need to correctly set all required parameters (like
nlocals,freevars, orlnotabfor line numbers). If the original code object had these, you'll need to extract them separately sincedis.dis()doesn't show every detail. - Instruction accuracy: Even a single wrong opcode or argument will break the code object, so double-check that your parsed instructions match the original disassembly exactly.
- Python version differences: Bytecode formats can change between Python versions, so make sure your tooling matches the Python version you're working with.
内容的提问来源于stack exchange,提问作者pomo_mondreganto




