开发Python代码注入重编译工具：如何检测源码缩进风格？

开发Python代码注入重编译工具：如何检测源码缩进风格？

阿华AIGC实验室

2026-5-11

嘿，这个需求我之前做代码生成工具的时候刚好碰到过，给你几个简便又靠谱的实现方案：

方案1：利用Python标准库tokenize（最推荐）

Python的tokenize模块是专门用来解析源码token的，它能精准区分代码中的缩进和字符串/注释里的空白字符，完全符合Python解析器的逻辑，不会误判。

核心思路就是遍历源码的tokens，找到第一个INDENT类型的token，然后看它的内容是空格还是制表符，顺便还能拿到缩进的长度（比如4空格还是2空格）。

示例代码：

import tokenize

def detect_indentation_style(file_path):
    with open(file_path, 'rb') as source_file:
        # tokenize需要二进制模式读取的文件
        tokens = tokenize.tokenize(source_file.readline)
        for token_type, token_str, _, _, _ in tokens:
            # 找到第一个缩进token
            if token_type == tokenize.INDENT:
                if token_str.startswith('\t'):
                    return ("tab", len(token_str))
                elif token_str.startswith(' '):
                    return ("space", len(token_str))
        # 如果没有找到任何缩进（比如空文件或只有顶级代码）
        # 默认返回Python社区惯例的4空格
        return ("space", 4)

这个方法的优势在于：完全避开了字符串、注释里的空白干扰，直接拿到Python解析时认可的缩进风格，非常可靠。

方案2：针对单个函数的缩进检测

如果你只需要检测某个特定函数内部的缩进，可以结合ast模块定位函数，再提取对应行的前导空白：

import ast

def detect_function_indent(file_path, target_func_name):
    # 先通过AST找到目标函数的位置
    with open(file_path, 'r', encoding='utf-8') as f:
        source_code = f.read()
        tree = ast.parse(source_code)
    
    target_function = None
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef) and node.name == target_func_name:
            target_function = node
            break
    
    if not target_function:
        raise ValueError(f"函数 {target_func_name} 在文件中未找到")
    
    # 提取函数体第一行的前导空白
    lines = source_code.splitlines()
    # 函数定义的行号是node.lineno，函数体的第一行是它的下一行（注意行号从1开始）
    func_body_line_idx = target_function.lineno
    if func_body_line_idx >= len(lines):
        return ("space", 4)
    
    func_body_line = lines[func_body_line_idx]
    leading_whitespace = func_body_line[:len(func_body_line) - len(func_body_line.lstrip())]
    
    if leading_whitespace.startswith('\t'):
        return ("tab", len(leading_whitespace))
    elif leading_whitespace.startswith(' '):
        return ("space", len(leading_whitespace))
    return ("space", 4)

这个方法能精准定位到目标函数的缩进风格，适合你只关注特定函数的场景。

额外提示

如果碰到混合缩进的文件（同时有空格和制表符），Python本身会抛出IndentationError，你的工具可以选择抛出错误，或者以第一个有效缩进的风格为准；
对于没有任何缩进的文件，建议默认返回4空格（Python社区的通用惯例）；
不要自己手动逐行扫描（比如用正则匹配前导空白），很容易被字符串、注释里的空白坑到，用标准库的工具才是最稳妥的。

内容的提问来源于stack exchange，提问作者Aviv Cohn

火山引擎最新活动

方舟 Coding Plan

模型自由，工具不限，最新支持 DeepSeek-V4 系列与 GLM-5.1，受邀下单叠加9.5折

ArkClaw

7×24在线专属智能伙伴

Seedance 2.0 全面开放 API

创作无限可能，一键生成电影级 AI 视频

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠