You need to enable JavaScript to run this app.
文档中心
AI 数据湖服务

AI 数据湖服务

复制全文
音频处理
语音质量评分(SpeechScore)
复制全文
语音质量评分(SpeechScore)

算子介绍

描述

音频语音质量评估算子 - 基于 ClearerVoice-Studio/speechscore 计算多项音频质量指标

核心功能

  • 统一封装 SpeechScore,支持无参考与有参考两类指标
  • 可通过参数 metrics 配置计算指标集合,默认计算全部支持指标
  • 支持本地/TOS/HTTP/S3 路径输入;参考音频列 reference_audio_paths 可选
  • 输出为结构化结果,包含 BSSEval、DNSMOS、NISQA、PESQ、STOI、SNR、SRMR 等常见指标

指标示例

  • 非参考:DNSMOS (OVRL/SIG/BAK/P808_MOS)、NISQA (mos_pred 等)
  • 有参考:BSSEval (ISR/SAR/SDR)、PESQ、STOI、SISDR 等

Daft 调用

算子参数

输入

输入列名

说明

audio_paths

包含音频文件路径的数组(字符串类型),支持:TOS url、http(s) url、S3 url、本地文件路径

reference_audio_paths

可选;与每个音频对应的参考音频路径数组(字符串类型)。不传入时,算子按无参考评分模式运行

输出

结构化结果数组,其中每个元素为一个包含多项指标的结构体,包含以下字段:

  • BSSEval: 结构体
  • ISR: float | null
  • SAR: float | null
  • SDR: float | null
  • CBAK: float | null
  • COVL: float | null
  • CSIG: float | null
  • DISTILL_MOS: float | null
  • DNSMOS: 结构体
  • BAK: float | null
  • OVRL: float | null
  • P808_MOS: float | null
  • SIG: float | null
  • FWSEGSNR: float | null
  • LLR: float | null
  • LSD: float | null
  • MCD: float | null
  • NB_PESQ: float | null
  • NISQA: 结构体
  • col_pred: float | null
  • dis_pred: float | null
  • loud_pred: float | null
  • mos_pred: float | null
  • noi_pred: float | null
  • PESQ: float | null
  • SISDR: float | null
  • SNR: float | null
  • SRMR: float | null
  • SSNR: float | null
  • STOI: float | null

处理失败的音频返回包含 null 值的结构

参数

如参数没有默认值,则为必填参数

参数名称

类型

默认值

描述

model_path

str

/opt/las/models

本地模型根路径

model_name

str

ClearerVoice-Studio/speechscore

SpeechScore 模型名称或目录

device

Optional[str]

None

运行设备;None 表示自动选择(优先 cuda,否则 cpu)

metrics

Optional[list[str]]

None

要计算的指标名称列表(不区分大小写);默认计算全部支持的指标:["BSSEval","CBAK","COVL","CSIG","DISTILL_MOS","DNSMOS","FWSEGSNR","LLR","LSD","MCD","NB_PESQ","NISQA","PESQ","SISDR","SNR","SRMR","SSNR","STOI"]

调用示例

下面的代码展示了如何使用 daft 运行算子对音频进行语音质量评分。

from __future__ import annotations

import logging
import os

import ray
import daft
from daft import col
from daft.las.functions.audio import AudioSpeechScore
from daft.las.functions.udf import las_udf

def configure_logging():
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S.%s".format(),
    )
    logging.getLogger("tracing.span").setLevel(logging.WARNING)
    logging.getLogger("daft_io.stats").setLevel(logging.WARNING)
    logging.getLogger("DaftStatisticsManager").setLevel(logging.WARNING)
    logging.getLogger("DaftFlotillaScheduler").setLevel(logging.WARNING)
    logging.getLogger("DaftFlotillaDispatcher").setLevel(logging.WARNING)

configure_logging()

if __name__ == "__main__":
    TOS_INPUT_DIR_URL = os.getenv("TOS_INPUT_DIR_URL", "las-cn-beijing-public-online.tos-cn-beijing.volces.com")

    ray.init(dashboard_host="0.0.0.0", runtime_env={"worker_process_setup_hook": configure_logging})
    daft.set_runner_ray()

    samples = {
        "audio_paths": [os.path.join(f"https://{TOS_INPUT_DIR_URL}", "public/shared_audio_dataset/video_demo.mp3")],
        "reference_audio_paths": [os.path.join(f"https://{TOS_INPUT_DIR_URL}", "public/shared_audio_dataset/video_demo_denoised.mp3")],
    }
    df = daft.from_pydict(samples)
    df = df.with_column(
        "audio_speech_score",
        las_udf(
            AudioSpeechScore,
            construct_args={},
            num_gpus=1,
            batch_size=8,
            concurrency=1,
        )(col("audio_paths"), col("reference_audio_paths")),
    )

    df.show()
    #     ╭────────────────────────────────┬────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    # │ audio_paths                    ┆ reference_audio_paths          ┆ audio_speech_score                                                                                                                                                                                                            │
    # │ ---                            ┆ ---                            ┆ ---                                                                                                                                                                                                                           │
    # │ String                         ┆ String                         ┆ Struct[BSSEval: Struct[ISR: Float64, SAR: Float64, SDR: Float64], CBAK: Float64, COVL: Float64, CSIG: Float64, DISTILL_MOS: Float64, DNSMOS: Struct[BAK: Float64, OVRL: Float64, P808_MOS: Float64, SIG: Float64], FWSEGSNR:  │
    # │                                ┆                                ┆ Float64, LLR: Float64, LSD: Float64, MCD: Float64, NB_PESQ: Float64, NISQA: Struct[col_pred: Float64, dis_pred: Float64, loud_pred: Float64, mos_pred: Float64, noi_pred: Float64], PESQ: Float64, SISDR: Float64, SNR:       │
    # │                                ┆                                ┆ Float64, SRMR: Float64, SSNR: Float64, STOI: Float64]                                                                                                                                                                         │
    # ╞════════════════════════════════╪════════════════════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
    # │ https://las-public-data-qa.to… ┆ https://las-public-data-qa.to… ┆ {BSSEval: {ISR: 15.0130389824…                                                                                                                                                                                                │
    # ╰────────────────────────────────┴────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
最近更新时间:2026.03.18 11:16:19
这个页面对您有帮助吗?
有用
有用
无用
无用