音频语音质量评估算子 - 基于 ClearerVoice-Studio/speechscore 计算多项音频质量指标
metrics 配置计算指标集合,默认计算全部支持指标reference_audio_paths 可选输入列名 | 说明 |
|---|---|
audio_paths | 包含音频文件路径的数组(字符串类型),支持:TOS url、http(s) url、S3 url、本地文件路径 |
reference_audio_paths | 可选;与每个音频对应的参考音频路径数组(字符串类型)。不传入时,算子按无参考评分模式运行 |
结构化结果数组,其中每个元素为一个包含多项指标的结构体,包含以下字段:
处理失败的音频返回包含 null 值的结构
如参数没有默认值,则为必填参数
参数名称 | 类型 | 默认值 | 描述 |
|---|---|---|---|
model_path | str | /opt/las/models | 本地模型根路径 |
model_name | str | ClearerVoice-Studio/speechscore | SpeechScore 模型名称或目录 |
device | Optional[str] | None | 运行设备;None 表示自动选择(优先 cuda,否则 cpu) |
metrics | Optional[list[str]] | None | 要计算的指标名称列表(不区分大小写);默认计算全部支持的指标:["BSSEval","CBAK","COVL","CSIG","DISTILL_MOS","DNSMOS","FWSEGSNR","LLR","LSD","MCD","NB_PESQ","NISQA","PESQ","SISDR","SNR","SRMR","SSNR","STOI"] |
下面的代码展示了如何使用 daft 运行算子对音频进行语音质量评分。
from __future__ import annotations import logging import os import ray import daft from daft import col from daft.las.functions.audio import AudioSpeechScore from daft.las.functions.udf import las_udf def configure_logging(): logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S.%s".format(), ) logging.getLogger("tracing.span").setLevel(logging.WARNING) logging.getLogger("daft_io.stats").setLevel(logging.WARNING) logging.getLogger("DaftStatisticsManager").setLevel(logging.WARNING) logging.getLogger("DaftFlotillaScheduler").setLevel(logging.WARNING) logging.getLogger("DaftFlotillaDispatcher").setLevel(logging.WARNING) configure_logging() if __name__ == "__main__": TOS_INPUT_DIR_URL = os.getenv("TOS_INPUT_DIR_URL", "las-cn-beijing-public-online.tos-cn-beijing.volces.com") ray.init(dashboard_host="0.0.0.0", runtime_env={"worker_process_setup_hook": configure_logging}) daft.set_runner_ray() samples = { "audio_paths": [os.path.join(f"https://{TOS_INPUT_DIR_URL}", "public/shared_audio_dataset/video_demo.mp3")], "reference_audio_paths": [os.path.join(f"https://{TOS_INPUT_DIR_URL}", "public/shared_audio_dataset/video_demo_denoised.mp3")], } df = daft.from_pydict(samples) df = df.with_column( "audio_speech_score", las_udf( AudioSpeechScore, construct_args={}, num_gpus=1, batch_size=8, concurrency=1, )(col("audio_paths"), col("reference_audio_paths")), ) df.show() # ╭────────────────────────────────┬────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ # │ audio_paths ┆ reference_audio_paths ┆ audio_speech_score │ # │ --- ┆ --- ┆ --- │ # │ String ┆ String ┆ Struct[BSSEval: Struct[ISR: Float64, SAR: Float64, SDR: Float64], CBAK: Float64, COVL: Float64, CSIG: Float64, DISTILL_MOS: Float64, DNSMOS: Struct[BAK: Float64, OVRL: Float64, P808_MOS: Float64, SIG: Float64], FWSEGSNR: │ # │ ┆ ┆ Float64, LLR: Float64, LSD: Float64, MCD: Float64, NB_PESQ: Float64, NISQA: Struct[col_pred: Float64, dis_pred: Float64, loud_pred: Float64, mos_pred: Float64, noi_pred: Float64], PESQ: Float64, SISDR: Float64, SNR: │ # │ ┆ ┆ Float64, SRMR: Float64, SSNR: Float64, STOI: Float64] │ # ╞════════════════════════════════╪════════════════════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ # │ https://las-public-data-qa.to… ┆ https://las-public-data-qa.to… ┆ {BSSEval: {ISR: 15.0130389824… │ # ╰────────────────────────────────┴────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯