如何在Python中比较索引值，避免重复编写if语句？

如何在Python中比较索引值，避免重复编写if语句？

阿华AIGC实验室

2026-5-20

处理IBM Watson转录数据生成字幕的Python方案

嘿，我来帮你搞定从IBM Watson的转录数据生成字幕这件事！你拿到的这种包含词汇、开始和结束时间戳的列表，正好可以用来拼接成符合阅读习惯的字幕片段，下面给你几个实用的思路和代码示例：

合并相邻词汇成字幕块
字幕不会每个单词单独显示，所以可以设定时长或字数阈值，把连续词汇合并成合适的字幕块。比如设定单条字幕最长显示2秒、最多8个单词：

def generate_subtitles(transcript_section, max_duration=2.0, max_words=8):
    subtitles = []
    current_words = []
    current_start = None
    current_end = None

    for word, start, end in transcript_section:
        if not current_words:
            current_start = start
            current_end = end
            current_words.append(word)
        else:
            duration = end - current_start
            # 检查是否超过设定的时长或字数限制
            if duration <= max_duration and len(current_words) < max_words:
                current_words.append(word)
                current_end = end
            else:
                # 生成一条完整字幕
                subtitle_text = ' '.join(current_words)
                subtitles.append({
                    'start': current_start,
                    'end': current_end,
                    'text': subtitle_text
                })
                # 重置当前字幕块
                current_words = [word]
                current_start = start
                current_end = end
    # 处理最后一组词汇
    if current_words:
        subtitle_text = ' '.join(current_words)
        subtitles.append({
            'start': current_start,
            'end': current_end,
            'text': subtitle_text
        })
    return subtitles

# 示例调用
section = [['for', 5.77, 5.92], ['example', 5.93, 6.21], ['this', 6.22, 6.35], ['is', 6.36, 6.42], ['a', 6.43, 6.48], ['test', 6.49, 6.75]]
subtitles = generate_subtitles(section)
for sub in subtitles:
    print(f"[{sub['start']:.2f} - {sub['end']:.2f}] {sub['text']}")

导出为标准SRT字幕格式
生成字幕块后，可以转换成播放器通用的SRT格式：

def subtitles_to_srt(subtitles):
    srt_content = ""
    for idx, sub in enumerate(subtitles, 1):
        # 把秒数转换成SRT要求的时间格式：HH:MM:SS,mmm
        def format_time(seconds):
            hours = int(seconds // 3600)
            minutes = int((seconds % 3600) // 60)
            secs = seconds % 60
            return f"{hours:02d}:{minutes:02d}:{secs:06.3f}".replace('.', ',')
        
        start_time = format_time(sub['start'])
        end_time = format_time(sub['end'])
        srt_content += f"{idx}\n{start_time} --> {end_time}\n{sub['text']}\n\n"
    return srt_content

# 示例调用，保存为本地SRT文件
srt = subtitles_to_srt(subtitles)
with open('output.srt', 'w', encoding='utf-8') as f:
    f.write(srt)

优化字幕语义完整性
如果想避免把完整短语拆分成两个字幕，要是Watson的完整响应里包含短语或标点信息，可以优先按这些标记拆分；如果只有单个词汇，也可以用轻量NLP工具（比如spaCy）做简单句法分析，确保每条字幕是完整的语义单元。

内容的提问来源于stack exchange，提问作者Brendan Carlin

火山引擎最新活动

方舟 Coding Plan

模型自由，工具不限，免费解锁 ArkClaw，7*24 小时在线的专属智能伙伴

一键部署 OpenClaw

分钟级部署，云服务器包月低至￥9.9，与 CodingPlan 组合购买仅需19.8元

Seedance2.0 体验中心上线

注册即享免费500万Tokens，抢先领略新一代AI视频技术跃迁

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠