Python 字节码反汇编器#

参考:

先看给出函数示例:

def myfunc(alist):
    return len(alist)

下面的命令可以用来显示 myfunc() 的反汇编:

import dis

dis.dis(myfunc)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + len)
             12 LOAD_FAST                0 (alist)
             14 CALL                     1
             22 RETURN_VALUE

左上角的 2 是行号。

字节码分析#

字节码分析 API 允许将 Python 代码片段包装在 Bytecode 对象中,以便轻松访问已编译代码的详细信息。

bytecode = dis.Bytecode(myfunc)
for instr in bytecode:
    print(instr.opname)
RESUME
LOAD_GLOBAL
LOAD_FAST
CALL
RETURN_VALUE

字节码#

使用三方库 bytecode

安装:

pip install bytecode

抽象字节码#

下面使用抽象字节码执行 print('Hello World!')

from bytecode import Instr, Bytecode

bytecode = Bytecode([Instr("LOAD_NAME", 'print'),
                     Instr("LOAD_CONST", 'Hello World!'),
                     Instr("CALL_FUNCTION", 1),
                     Instr("POP_TOP"),
                     Instr("LOAD_CONST", None),
                     Instr("RETURN_VALUE")])
code = bytecode.to_code()
exec(code)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[4], line 1
----> 1 from bytecode import Instr, Bytecode
      3 bytecode = Bytecode([Instr("LOAD_NAME", 'print'),
      4                      Instr("LOAD_CONST", 'Hello World!'),
      5                      Instr("CALL_FUNCTION", 1),
      6                      Instr("POP_TOP"),
      7                      Instr("LOAD_CONST", None),
      8                      Instr("RETURN_VALUE")])
      9 code = bytecode.to_code()

ModuleNotFoundError: No module named 'bytecode'

具体字节码#

使用具体字节码执行 print('Hello World!') 的示例:

from bytecode import ConcreteInstr, ConcreteBytecode

bytecode = ConcreteBytecode()
bytecode.names = ['print']
bytecode.consts = ['Hello World!', None]
bytecode.extend([ConcreteInstr("LOAD_NAME", 0),
                 ConcreteInstr("LOAD_CONST", 0),
                 ConcreteInstr("CALL_FUNCTION", 1),
                 ConcreteInstr("POP_TOP"),
                 ConcreteInstr("LOAD_CONST", 1),
                 ConcreteInstr("RETURN_VALUE")])
code = bytecode.to_code()
exec(code)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[5], line 1
----> 1 from bytecode import ConcreteInstr, ConcreteBytecode
      3 bytecode = ConcreteBytecode()
      4 bytecode.names = ['print']

ModuleNotFoundError: No module named 'bytecode'

设置编译器标志#

BytecodeConcreteBytecodeControlFlowGraph 实例都有 flags 属性,它是 CompilerFlag 枚举的实例。可以像操作任何二进制标志一样操作该值。

设置 OPTIMIZED flags:

from bytecode import Bytecode, CompilerFlags

bytecode = Bytecode()
bytecode.flags |= CompilerFlags.OPTIMIZED
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from bytecode import Bytecode, CompilerFlags
      3 bytecode = Bytecode()
      4 bytecode.flags |= CompilerFlags.OPTIMIZED

ModuleNotFoundError: No module named 'bytecode'

清除 OPTIMIZED flags:

from bytecode import Bytecode, CompilerFlags

bytecode = Bytecode()
bytecode.flags ^= CompilerFlags.OPTIMIZED
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[7], line 1
----> 1 from bytecode import Bytecode, CompilerFlags
      3 bytecode = Bytecode()
      4 bytecode.flags ^= CompilerFlags.OPTIMIZED

ModuleNotFoundError: No module named 'bytecode'

可以使用 update_flags 方法根据存储在代码对象中的指令更新 flags。

简单的循环#

for x in (1, 2, 3): print(x)

from bytecode import Label, Instr, Bytecode

loop_start = Label()
loop_done = Label()
loop_exit = Label()
code = Bytecode(
    [
        # Python 3.8 removed SETUP_LOOP
        Instr("LOAD_CONST", (1, 2, 3)),
        Instr("GET_ITER"),
        loop_start,
            Instr("FOR_ITER", loop_exit),
            Instr("STORE_NAME", "x"),
            Instr("LOAD_NAME", "print"),
            Instr("LOAD_NAME", "x"),
            Instr("CALL_FUNCTION", 1),
            Instr("POP_TOP"),
            Instr("JUMP_ABSOLUTE", loop_start),
        # Python 3.8 removed the need to manually manage blocks in loops
        # This is now handled internally by the interpreter
        loop_exit,
            Instr("LOAD_CONST", None),
            Instr("RETURN_VALUE"),
    ]
)

# The conversion to Python code object resolve jump targets:
# abstract labels are replaced with concrete offsets
code = code.to_code()
exec(code)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 from bytecode import Label, Instr, Bytecode
      3 loop_start = Label()
      4 loop_done = Label()

ModuleNotFoundError: No module named 'bytecode'

条件调整#

print('yes' if test else 'no')

from bytecode import Label, Instr, Bytecode

label_else = Label()
label_print = Label()
bytecode = Bytecode([Instr('LOAD_NAME', 'print'),
                     Instr('LOAD_NAME', 'test'),
                     Instr('POP_JUMP_IF_FALSE', label_else),
                         Instr('LOAD_CONST', 'yes'),
                         Instr('JUMP_FORWARD', label_print),
                     label_else,
                         Instr('LOAD_CONST', 'no'),
                     label_print,
                         Instr('CALL_FUNCTION', 1),
                     Instr('LOAD_CONST', None),
                     Instr('RETURN_VALUE')])
code = bytecode.to_code()

test = 0
exec(code)

test = 1
exec(code)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[9], line 1
----> 1 from bytecode import Label, Instr, Bytecode
      3 label_else = Label()
      4 label_print = Label()

ModuleNotFoundError: No module named 'bytecode'

Control Flow Graph (CFG)#

为了分析或优化现有的代码,bytecode 提供了 ControlFlowGraph 类,它是控制流图(CFG)。

控制流图用于在转换为代码时进行堆栈深度分析。因为它比 CPython 更擅长识别死代码,所以它可以减少堆栈大小。

转储条件跳转示例的控制流图:

from bytecode import Label, Instr, Bytecode, ControlFlowGraph, dump_bytecode

label_else = Label()
label_print = Label()
bytecode = Bytecode([Instr('LOAD_NAME', 'print'),
                     Instr('LOAD_NAME', 'test'),
                     Instr('POP_JUMP_IF_FALSE', label_else),
                         Instr('LOAD_CONST', 'yes'),
                         Instr('JUMP_FORWARD', label_print),
                     label_else,
                         Instr('LOAD_CONST', 'no'),
                     label_print,
                         Instr('CALL_FUNCTION', 1),
                     Instr('LOAD_CONST', None),
                     Instr('RETURN_VALUE')])

blocks = ControlFlowGraph.from_bytecode(bytecode)
dump_bytecode(blocks)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[10], line 1
----> 1 from bytecode import Label, Instr, Bytecode, ControlFlowGraph, dump_bytecode
      3 label_else = Label()
      4 label_print = Label()

ModuleNotFoundError: No module named 'bytecode'

备注

  • block #1 是开始块,以 POP_JUMP_IF_FALSE 条件跳转结束,跟着的是 block #2

  • block #2JUMP_FORWARD 无条件跳跃结束

  • block #3 不包含 jump,后面跟着 block #4

  • block #4 是最终的块