You need to enable JavaScript to run this app.
导航

声音复刻API

最近更新时间2024.01.08 14:22:58

首次发布时间2023.11.27 21:19:18

创建音色

1. 请求方式

域名: https://openspeech.bytedance.com
具体请求方式可参考下方示例代码

2. 训练

接口路径: POST/api/v1/mega_tts/audio/upload
接口描述: 提交音频训练音色
认证方式采用 Bearer Token.
需要在请求的 Header 中填入"Authorization":"Bearer;${token}"
AppID 和 Token 可在火山引擎-语音技术控制台查看。AppID对应下图的APP ID, Token对应下图的Access Token.

请求参数

Header:

参数名称参数类型必须参数备注
Authorizationstring必填Bearer;${Access Token}
Resource-Idstring必填填入volc.megatts.voiceclone

Body:

参数名称层级参数类型必须参数备注
appid1string必填
speaker_id1string必填唯一音色代号

audios

1

list

必填

  • 音频格式支持:wav、mp3、ogg、m4a、aac、pcm,其中pcm仅支持24k 单通道

  • 目前限制单文件上传最大20MB

  • 每次最多上传1个音频文件

audio_bytes

2

string

必填

二进制音频字节,需对二进制音频进行base64编码

audio_format2string音频格式,pcm**、m4a必传**,其余可选
text2string可选,音频朗读文本,若text有值则默认开启wer错词检测
source1int必填固定值:2

json示例

{
        "speaker_id": "S_*******",
        "appid": "your appid",
        "audios": [{
                "audio_bytes": "base64编码后的音频",
                "audio_format": "wav"
        }],
        "source": 2
}

返回数据

Body:

参数名称层级参数类型必须参数备注
BaseResp1object必填
StatusCode2int必填成功:0
StatusMessage2string错误信息
speaker_id1string必填唯一音色代号

json示例

{
    "BaseResp":{
        "StatusCode":0,
        "StatusMessage":""
    },
    "speaker_id":"S_*******"
}

3. 状态查询

接口路径: POST/api/v1/mega_tts/status
接口描述: 查询音色训练状态

请求参数

Header:

参数名称参数类型必须参数备注
Authorizationstring必填Bearer;${Access Token}
Resource-Idstring必填填入volc.megatts.voiceclone

Body:

参数名称层级类型必填备注
appid1string必填
speaker_id1string必填唯一音色代号

json示例

{
    "appid": "your appid",
    "speaker_id": "S_*******"
}

返回数据

Body:

参数名称层级参数类型必须参数备注
BaseResp1object必填
StatusCode2int必填成功:0
StatusMessage2string错误信息
speaker_id1string必填唯一音色代号

status

1

enum { NotFound = 0 Training = 1 Success = 2 Failed = 3 Active = 4 }

必填

训练状态,状态为4(Active)时可调用tts合成音频

create_time1int必填创建时间
version1string选填训练版本
demo_audio1string选填Success状态时返回,一小时有效,若需要,请下载后使用

json示例

{
    "BaseResp":{
        "StatusCode":0,
        "StatusMessage":""
    },
    "creaet_time":1701055304000,
    "version": "V1",
    "demo_audio": "http://**********.wav"
    "speaker_id":"S_*******",
    "status":2
}

4. 状态码

Success0成功
BadRequestError1001请求参数有误
AudioUploadError1101音频上传失败
ASRError1102ASR转写失败
SIDError1103SID声纹检测失败
SIDFailError1104声纹检测未通过
GetAudioDataError1105获取音频数据失败
SpeakerIDDuplicationError1106SpeakerID重复
SpeakerIDNotFoundError1107SpeakerID未找到
AudioConvertError1108音频转码失败
WERError1109wer检测错误
DelSpeakerError1110音色删除失败
AEDError1111aed检测错误
SNRError1112SNR检测错误

5. 示例代码

import base64
import os
import requests


host = "https://openspeech.bytedance.com"


def train(appid, token, audio_path, spk_id):
    url = host + "/api/v1/mega_tts/audio/upload"
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer;" + token,
        "Resource-Id": "volc.megatts.voiceclone",
    }
    encoded_data, audio_format = encode_audio_file(audio_path)
    audios = [{"audio_bytes": encoded_data, "audio_format": audio_format}]
    data = {"appid": appid, "speaker_id": spk_id, "audios": audios, "source": 2}
    response = requests.post(url, json=data, headers=headers)
    print("status code = ", response.status_code)
    if response.status_code != 200:
        raise Exception("train请求错误:" + response.text)
    print("headers = ", response.headers)
    print(response.json())


def get_status(appid, token, spk_id):
    url = host + "/api/v1/mega_tts/status"
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer;" + token,
        "Resource-Id": "volc.megatts.voiceclone",
    }
    body = {"appid": appid, "speaker_id": spk_id}
    response = requests.post(url, headers=headers, json=body)
    print(response.json())


def encode_audio_file(file_path):
    with open(file_path, 'rb') as audio_file:
        audio_data = audio_file.read()
        encoded_data = str(base64.b64encode(audio_data), "utf-8")
        audio_format = os.path.splitext(file_path)[1][1:]  # 获取文件扩展名作为音频格式
        return encoded_data, audio_format


if __name__ == "__main__":
    appid = "填入appid"
    token = "填入access token"
    spk_id = "填入声音ID"
    train(appid=appid, token=token, audio_path="填入音频路径", spk_id=spk_id)
    get_status(appid=appid, token=token, spk_id=spk_id)


TTS 语音合成(WS/HTTP)

接口与TTS一致,需要将集群名称cluster换成volcano_mega

Websocket

使用账号申请部分申请到的appid&access_token进行调用
文本一次性送入,后端边合成边返回音频数据

https://www.volcengine.com/docs/6561/79821

HTTP

使用账号申请部分申请到的appid&access_token进行调用
文本全部合成完毕之后,一次性返回全部的音频数据

https://www.volcengine.com/docs/6561/79820

批量查询接口及激活(启用)音色接口

API接入说明

访问鉴权

  1. 鉴权方式说明 公共参数--API签名调用指南-火山引擎 (volcengine.com)
    线上请求地址域名 open.volcengineapi.com

  2. 固定公共参数

    Region "cn-north-1"
    Service "speech_saas_prod"
    Version "2023-11-07"
    
  3. AKSK获取 访问控制-火山引擎 (volcengine.com)
    说明:Access Key(密钥)管理--API访问密钥(Access Key)-火山引擎 (volcengine.com)

  4. 调用方式

    1. SDK SDK概览--API签名调用指南-火山引擎 (volcengine.com)

    2. 直接签名后调用

      1. 结合文档内api说明调用 ListMegaTTSTrainStatus 的例子(*其他语言和使用sdk调用的方式请参考火山鉴权源码说明 一)

        import binascii
        import datetime
        import hashlib
        import hmac
        import json
        import requests
        import urllib
        
        domain = "open.volcengineapi.com"
        region = "cn-north-1"
        service = "speech_saas_prod"
        contentType = "application/json; charset=utf-8"
        
        def list_megatts_train_status(
            app_id: int,
            ak: str,
            sk: str,
        ) -> requests.Response:
            params_body = {
                "AppID": "TODO",
                "SpeakerIDs": ["TODO"],#如果希望获取全量speaker id,可以不传入该参数
            }
            canonical_query_string = "Action=ListMegaTTSTrainStatus&Version=2023-11-07"
            url = "https://" + domain + "/?" + canonical_query_string
            content_type = "application/json; charset=utf-8"
            payload_sign = get_hmac_encode16(json.dumps(params_body))
            headers = get_hashmac_headers(
                domain,
                region,
                service,
                canonical_query_string,
                "POST",
                "/",
                content_type,
                payload_sign,
                ak,
                sk,
            )
        
            submit_resp = requests.post(url=url, headers=headers, data=json.dumps(params_body))
            return submit_resp
        
        def get_canonical_query_string(param_dict):
            target = sorted(param_dict.items(), key=lambda x: x[0], reverse=False)
            canonicalQueryString = urllib.parse.urlencode(target)
            return canonicalQueryString
        
        def get_hmac_encode16(data):
            return binascii.b2a_hex(hashlib.sha256(data.encode("utf-8")).digest()).decode(
                "ascii"
            )
        
        def get_volc_signature(secret_key, data):
            return hmac.new(secret_key, data.encode("utf-8"), digestmod=hashlib.sha256).digest()
        
        def get_hashmac_headers(
            domain,
            region,
            service,
            canonicalquerystring,
            httprequestmethod,
            canonicaluri,
            contenttype,
            payloadsign,
            ak,
            sk,
        ):
            utc_time_sencond = datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
            utc_time_day = datetime.datetime.utcnow().strftime("%Y%m%d")
            credentialScope = utc_time_day + "/" + region + "/" + service + "/request"
            headers = {
                "content-type": contenttype,
                "x-date": utc_time_sencond,
            }
            canonicalHeaders = (
                "content-type:"
                + contenttype
                + "\n"
                + "host:"
                + domain
                + "\n"
                + "x-content-sha256:"
                + "\n"
                + "x-date:{}".format(utc_time_sencond)
                + "\n"
            )
            signedHeaders = "content-type;host;x-content-sha256;x-date"
            canonicalRequest = (
                httprequestmethod
                + "\n"
                + canonicaluri
                + "\n"
                + canonicalquerystring
                + "\n"
                + canonicalHeaders
                + "\n"
                + signedHeaders
                + "\n"
                + payloadsign
            )
            stringToSign = (
                "HMAC-SHA256"
                + "\n"
                + utc_time_sencond
                + "\n"
                + credentialScope
                + "\n"
                + get_hmac_encode16(canonicalRequest)
            )
            signingkey = get_volc_signature(
                get_volc_signature(
                    get_volc_signature(
                        get_volc_signature(sk.encode("utf-8"), utc_time_day), region
                    ),
                    service,
                ),
                "request",
            )
            signature = binascii.b2a_hex(get_volc_signature(signingkey, stringToSign)).decode(
                "ascii"
            )
            headers[
                "Authorization"
            ] = "HMAC-SHA256 Credential={}/{}, SignedHeaders={}, Signature={}".format(
                ak, credentialScope, signedHeaders, signature
            )
            return headers
        
        if __name__ == "__main__":
            print(
                json.dumps(
                    list_megatts_train_status(
                        app_id="",
                        ak="",
                        sk="",
                    ).text
                )
            )
        

错误码

  1. 2xx 开头的HTTP返回状态码被可以认为是错误

  2. 错误的HTTP返回结构体如下

    {
        "ResponseMetadata": {
            "RequestId": "20220214145719010211209131054BC103", // header中的X-Top-Request-Id参数
            "Action": "ListMegaTTSTrainStatus",
            "Version": "2023-11-07",
            "Service": "{Service}",// header中的X-Top-Service参数
            "Region": "{Region}", // header中的X-Top-Region参数
            "Error": {
                "Code": "InternalError.NotCaptured",
                "Message": "xxx"
            }
        }
    }
    
  3. "ResponseMetadata.Error.Code" 客户端可以依照这个字段判断错误种类,已知种类和含义如下

    CodeDescription
    OperationDenied.InvalidSpeakerID账号或AppID无权限操作或无法操作SpeakerID列表中的一个或多个实例
    OperationDenied.InvalidParameter请求体字段不合法(缺失必填字段、类型错误等)
    InternalError.NotCaptured未知的服务内部错误

API列表

  1. 查询 SpeakerID 状态

    1. Description

      查询已购买的音色状态;如果SpeakerIDs为空则返回账号的AppID下所有的列表(有最大值限制);如果SpeakerIDs不为空则返回对应的结果,且结果总是包含输入的SpeakerID(即使查询不到它)

    2. Method: POST

    3. Request

      ParameterTypeMustArgument typeDescription
      Content-TypestringYheaderConstant string: application/json; charset=utf-8
      ActionstringYqueryListMegaTTSTrainStatus
      VersionstringYquery2023-11-07
      AppIDstringYbodyAppID of application
      SpeakerIDs[]stringNbodyList of speaker IDs; if empty, resp will return all speakerIDs from given appID
    4. Response

      {
          "ResponseMetadata": {
              "RequestId": "20220214145719010211209131054BC103", // header中的X-Top-Request-Id参数
              "Action": "ListMegaTTSTrainStatus",
              "Version": "2023-11-07",
              "Service": "{Service}",// header中的X-Top-Service参数
              "Region": "{Region}" // header中的X-Top-Region参数
          },
          "Result":{
                  "AppID": "xxx",
                  "Total": 2, // number of speakerIDs status returned, always equal to length of SpeakerIDs in input
                  "Statuses": [
                         {
                              "CreateTime": 1700727790000, // unix epoch create time in millisecond
                              "DemoAudio": "https://example.com", // http demo link
                              "InstanceNO": "Model_storage_meUQ8YtIPm", // volcengine Instance Number
                              "IsActivable": true, // if this speakerID can be updated later
                              "SpeakerID": "S_VYBmqB0A", // speakerID
                              "State": "Success", // state of speakerID
                              "Version": "V1" // version of speakerID
                        },
                        {
                              "SpeakerID": "S_VYBmqB0B", // speakerID
                              "State": "Unknown", // state of speakerID
                        }
                  ]
          }
      }
      
      1. State of speakerID is an enum with possible values of:

        StateDescription
        Unknown未找到对应SpeakerID的记录
        Training声音复刻中(长时间处于复刻中状态请联系TODO)
        Success声音复刻成功,可以进行启动(update)操作
        Active使用中
        Expired火山控制台实例已过期或账号欠费
        Reclaimed火山控制台实例已回收
  2. 激活 (activate) SpeakerID

    1. Description

      对应火山页面的启动音色;如果输入的音色列表中有一个或多个音色不能被激活,或找不到它的记录,那么所有的音色都不会被激活,并且会返回错误 OperationDenied.InvalidSpeakerID;如果输入的音色列表为空,那么返回字段不合法的错误OperationDenied.InvalidParameter;距离音色可被访问可能会有分钟级别延迟;

    2. Method: POST

    3. Request

      ParameterTypeMustArgument typeDescription
      Content-TypestringYheaderConstant string: application/json; charset=utf-8
      ActionstringYqueryActivateMegaTTSTrainStatus
      VersionstringYquery2023-11-07
      AppIDstringYbodyAppID of application
      SpeakerIDs[]stringYbodyList of speaker IDs
    4. Response

      {
          "ResponseMetadata": {
              "RequestId": "20220214145719010211209131054BC103", // header中的X-Top-Request-Id参数
              "Action": "ActivateMegaTTSTrainStatus",
              "Version": "2023-11-07",
              "Service": "{Service}",// header中的X-Top-Service参数
              "Region": "{Region}" // header中的X-Top-Region参数
          },
          "Result":{
                  "AppID": "xxx",
                  "Statuses": [
                         {
                              "CreateTime": 1700727790000, // unix epoch create time in millisecond
                              "DemoAudio": "https://example.com", // http demo link
                              "InstanceNO": "Model_storage_meUQ8YtIPm", // volcengine Instance Number
                              "IsActivable": false, // if this speakerID can be updated
                              "SpeakerID": "S_VYBmqB0A", // speakerID
                              "State": "Active", // state of speakerID
                              "Version": "V1" // version of speakerID
                        }
                  ]
          }
      }