You need to enable JavaScript to run this app.
导航

API接口文档

最近更新时间2023.03.14 17:03:51

首次发布时间2023.03.14 17:03:51

口语评测http接口文档
1.协议

服务地址:openspeech.bytedance.com
通信协议:https
字符编码:utf-8

2.公共头部

2.1. 请求

头部名称是否必选描述
Host服务地址,固定为openspeech.bytedance.com
Authorization用来做鉴权,内容为Bearer; <token><token> 部分需要用户换上自己申请的token
Content-Type固定为application/json

2.2. 响应

头部名称是否必选描述
Content-Type固定为application/json
3. 响应信息

3.1. HTTP状态码

收到非200状态码,表明服务器未处理该请求。
收到200状态码,说明服务器已经在处理该请求。此时需要进一步解析Body中的数据,判断服务器处理结果。

3.2. Body中返回的结果

Body中返回的结果根据下文接口说明中的描述进行解析。

3.3. Body中的错误码

当服务端Body中的code字段值为非1000时,表明当前请求发送错误。

4. 接口说明

4.1. 接口路径

/api/v1/mdd

4.2. 请求方法

POST

4.3. Content-Type

请求和回复的Content-Type均为:application/json。

4.4. 请求Body

字段类型是否必须描述
app.appidstringrequired标记特定的应用程序
app.tokenstringrequired用户申请的token,用于鉴权
app.clusterstringoptional后端集群

user.uid

string

optional

uid 用于标记设备用户
可用于跟踪来自指定用户的问题;如果调用者不需要此调试功能,则可以将其留空或填充任意字符串。

audio.format

string

required

音频文件格式

支持: "ogg

audio.url

string

required*

音频文件的URL (http)

audio.url 或 audio.data至少设置一个。 如果两者都设置了,audio.url 将被忽略。

audio.data

string

required*

Base64 编码的音频文件(如果这是分块请求,则可能是分块的)

audio.url 或 audio.data至少设置一个。 如果两者都设置了,audio.url 将被忽略。

audio.rate

int

optional

音频文件采样率

支持的值: 16000

audio.codec

string

optional

音频编码

  • 当format为容器格式时,codec不能为空,且需要为系统支持的编码格式(例如ogg目前只支持opus编码格式)

  • 当格式为非容器格式(如pcm、wav、mp3)时,codec字段应缺失或填raw

支持的值: "raw

request.reqid

string

required

请求或session的唯一标识(session的概念会在下文介绍)

单个请求场景:

不同请求需要带上不同的值

对于分片请求的场景:

不同session需要带上不同值,同一session内不同请求带上相同的值

必须由调用方指定

request.sequenceintrequired如果是单个请求则填-1 分片请求场景详见下文

request.core_type

string

required

使用的打分模型

支持的值:

中文模型: cn.sent.raw
  英文模型: en.sent.score

request.ref_text

string

required

用于打分的文本

request.difficulty

int

optional

难度等级

1 - 容易
2 - 中等
3 - 困难

默认值为: 2

request.response_mode

string

optional

返回模式,具体介绍见下文。

支持的值:

单次返回:once
  流式返回:streaming

默认值:once

示例:

{
        "app": {
                "appid": "xxx", 
                "token": "xxx", 
                "cluster": "xxxx" 
        },
        "user": {
                "uid": "xxx" 
        },
        "audio": {
                "format": "wav", 
                "codec": "raw", 
                "rate": 16000,
                "data": "xxx",
        },
        "request": {
                "reqid": "64afdeb1-fc1c-4d06-8510-3ef8ef6adc2c", 
                "sequence": 1, 
                "core_type": "en.sent.score", 
                "ref_text": "When you want to give up of that a moment, think about why at the beginning insist on here.", 
                "difficulty": 2 
        }
}

4.5. 单个请求

音频在一个请求内发送到服务端。
请求body中的request.sequence需为-1。

请求示例:

url传输音频:

POST /api/v1/mdd HTTP/1.1
Host: openspeech.bytedance.com
Content-Type: application/json
Authorization: Bearer; exampletoken

{
    "app": {
        "appid": "<your-application-id>",
        "token": "exampletoken",
        "cluster": "<cluster>"
    },
    "user": {  
        "uid": "388808087185088"
    },
    "audio": {
        "format": "wav",
        "url": "http://example.com/mdd_audio.wav",
        "rate": 16000,
        "codec": "raw"
    },
    "request": {
        "reqid": "<unique-reqid>",
        "sequence": -1,
        "core_type": "en.sent.score",
        "ref_text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
        "difficulty": 2
    }
}

音频数据嵌入到请求体:

POST /api/v1/mdd HTTP/1.1
Host: openspeech.bytedance.com
Content-Type: application/json
Authorization: Bearer; exampletoken

{
    "app": {
        "appid": "<your-application-id>",
        "token": "exampletoken",
        "cluster": "<cluster>"
    },
    "user": {  
        "uid": "388808087185088"
    },
    "audio": {
        "format": "wav",
        "data":"UklGRnDVAgBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAATElTVBoAAABJTkZPSVNGVA4AAABMYXZmNTguMTIuMTAwAGRhdGEq1QIA/f4U/97+Jf+P/0D/WP9m/4P/3f/N/9n/JQB1AI4AXwCFAM4A/QAKAT4B5wBtAD8AUQBiALMAtQCrAPAA9gC3AKgA7ADoAKIA9AAfASABQQF2AY8BWgFEAYcBfQFOAQgBvgCbAHkAYwCAAIAARgCVAHYAGQCVALAAfAAWAYkAfQC9AGkAkAB2AOf/NgBWACUA/P+1/+r/UQCpAPUA5wCZALMACgH2AEwBDQEDAd4AFgE5AVIBoAEoAf8AwgDMALYAuQCTABsAbABdAKgA7wDOABEBmQFzAQkByQBkAIUAtwA7AYYBmgHHAS0CLAIvAn4CsgE4AQsB0gA1AVoBcwFaAX8BnAFmAacBjAF5AUoBBQFVAeABygGCAS8B9ACxAKQAfwC...wUAOwA0ADkANwC"
        "rate": 16000,
        "codec": "raw"
    },
    "request": {
        "reqid": "<unique-reqid>",
        "sequence": -1,
        "core_type": "en.sent.score",
        "ref_text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
        "difficulty": 2
    }
}

4.6. 分片请求

音频可以被分为多个片段发送到服务端,这些分片请求被统称为一个session。
一个session中的请求被request.reqidrequest.sequence控制.

  • request.reqid - session的唯一标识, 同一个session内的不同请求必须相同

  • request.sequence - 分片请求的序号

最后一包数据 request.sequence要小于0

请求示例:

第1包请求:

{
  "app": {..}
  "user": {..}
  "audio": {..}
  "request": {
        "reqid": "abc1238ef",
        "sequence": 1,
        "core_type": "en.sent.score",
        "ref_text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
  }
}

第2包请求:
{
  "app": {..}
  "user": {..}
  "audio": {..}
  "request": {
        "reqid": "abc1238ef",   <-- 相同的 request.reqid
        "sequence": 2,          <-- 下一个 request.sequence
        "core_type": "en.sent.score"
  }
}

第3包请求:
{
  "app": {..}
  "user": {..}
  "audio": {..}
  "request": {
        "reqid": "abc1238ef",   <-- 相同的 request.reqid
        "sequence": 3,          <-- 下一个 request.sequence
        "core_type": "en.sent.score"
  }
}

最后一包请求:
{
  "app": {..}
  "user": {..}
  "audio": {..}
  "request": {
        "reqid": "abc1238ef",   <-- 相同的 request.reqid
        "sequence": -4,         <-- request.sequence < 0, 表示最后一包
        "core_type": "en.sent.score"
  }
}

4.7. 响应Body

  1. 数据结构:

英文打分:

结构:

字段类型描述
reqidstringreqid (与请求体中的request.reqid一致)

code

int

状态码

1000 代表 成功

messagestring与状态码对应的补充信息
sequenceint请求序号 (与请求体中的request.sequence一致)
versionstring版本号

accuracy_details

[]array

单词打分细节数组

每一项代表一个识别出来的单词

accuracy_details[N].ref_wordstring识别出的单词

accuracy_details[N].score

float64

这个给定单词的分数,取值范围为 0 to 1。用户如有需要,可自行用如下公式将原始打分转换到 [0-100]之间

accuracy_details[N].start_indexintref_text 中第 N 个单词的开始位置
accuracy_details[N].end_indexintref_text 中第 N 个单词的结束位置
accuracy_details[N].start_timeint音频中第N个单词开始的时间,以毫秒为单位

accuracy_details[N].end_time

int

音频中第N个单词结束的时间,以毫秒为单位

accuracy_details[N].phones

[]array

每个音素的打分细节

每一项代表单词内的一个音素

accuracy_details[N].phones[M].ref_phone

string

期望的音素

accuracy_details[N].phones[M].recognition_result

string

竞争音素

accuracy_details[N].phones[M].type

string

发音是否正确。正确:Correct,错误:Wrong 假设预期音素是音素/A/ (ref_phone),具有最高后验的最具竞争力音素是音素 /B/ (recognition_result) P(A) 和 P(B) 分别是他们的音素水平分数。

accuracy_details[N].phones[M].scorefloat64ref_phone 的地方,取值范围[0-1]。如果你想要转换成[0-100] ,可参考accuracy_details[N].score的转换公式
accuracy_details[N].phones[M].recognition_score

float64

被识别出的音素的得分 (recognition_result).

accuracy_details[N].phones[M].proficiency_score

float64

期望音素(ref_phone)的打分 [0-100](Recommended)

scores.accuracyfloat64准确性
scores.fluencyfloat64流利度
scores.integrityfloat64提供的音频与参考文本的匹配程度
integrity_details.refstring参考文本(大写)
integrity_details.recognition_resultstring单词及其得分,如:"COULD 1.000000 YOU 1.500000"。oov的词会在单词后加#号,如“COULD#”。
addition.audio_urlstring用户上传音频的链接
addition.req.request.core_typestring打分模型的类型,从请求体中获得

示例:

{
   "reqid":"1163e15e-5a1f-4107-8973-9e25928932ca",
   "code":1000,
   "message":"Success",
   "sequence":-1,
   "accuracy_details":[
      {
         "ref_word":"COULD",
         "score":1,
         "phones":[
            {
               "ref_phone":"K",
               "recognition_result":"G",
               "type":"Correct",
               "score":0.8953,
               "recognition_score":0.0068,
               "proficiency_score":100
            },
            {
               "ref_phone":"UH",
               "recognition_result":"D",
               "type":"Correct",
               "score":0.9902,
               "recognition_score":0.0044,
               "proficiency_score":100
            },
            {
               "ref_phone":"D",
               "recognition_result":"UH",
               "type":"Correct",
               "score":0.9201,
               "recognition_score":0.0719,
               "proficiency_score":100
            }
         ],
         "start_index":0,
         "end_index":4,
         "start_time":440,
         "end_time":710,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":1
      },
      {
         "ref_word":"YOU",
         "score":1.5,
         "phones":[
            {
               "ref_phone":"Y",
               "recognition_result":"D",
               "type":"Correct",
               "score":0.9917,
               "recognition_score":0.0032,
               "proficiency_score":100
            },
            {
               "ref_phone":"UW",
               "recognition_result":"Y",
               "type":"Correct",
               "score":0.9429,
               "recognition_score":0.0264,
               "proficiency_score":100
            }
         ],
         "start_index":6,
         "end_index":8,
         "start_time":710,
         "end_time":840,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":2
      },
      {
         "ref_word":"PLEASE",
         "score":1,
         "phones":[
            {
               "ref_phone":"P",
               "recognition_result":"L",
               "type":"Correct",
               "score":0.9933,
               "recognition_score":0.0013,
               "proficiency_score":100
            },
            {
               "ref_phone":"L",
               "recognition_result":"IY",
               "type":"Correct",
               "score":0.7571,
               "recognition_score":0.1099,
               "proficiency_score":100
            },
            {
               "ref_phone":"IY",
               "recognition_result":"EY",
               "type":"Correct",
               "score":0.9318,
               "recognition_score":0.0403,
               "proficiency_score":100
            },
            {
               "ref_phone":"Z",
               "recognition_result":"P",
               "type":"Correct",
               "score":0.9368,
               "recognition_score":0.0458,
               "proficiency_score":100
            }
         ],
         "start_index":10,
         "end_index":15,
         "start_time":840,
         "end_time":1230,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":2
      },
      {
         "ref_word":"PASS",
         "score":1,
         "phones":[
            {
               "ref_phone":"P",
               "recognition_result":"Z",
               "type":"Correct",
               "score":0.9897,
               "recognition_score":0.0044,
               "proficiency_score":100
            },
            {
               "ref_phone":"AE",
               "recognition_result":"AA",
               "type":"Correct",
               "score":0.9754,
               "recognition_score":0.0121,
               "proficiency_score":100
            },
            {
               "ref_phone":"S",
               "recognition_result":"AE",
               "type":"Correct",
               "score":0.9696,
               "recognition_score":0.0257,
               "proficiency_score":100
            }
         ],
         "start_index":17,
         "end_index":20,
         "start_time":1230,
         "end_time":1570,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":2
      },
      {
         "ref_word":"IT",
         "score":1.2,
         "phones":[
            {
               "ref_phone":"IH",
               "recognition_result":"S",
               "type":"Correct",
               "score":0.0025,
               "recognition_score":0.8984,
               "proficiency_score":25
            },
            {
               "ref_phone":"T",
               "recognition_result":"S",
               "type":"Correct",
               "score":0.341,
               "recognition_score":0.4537,
               "proficiency_score":92
            }
         ],
         "start_index":22,
         "end_index":23,
         "start_time":1570,
         "end_time":1630,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":0
      },
      {
         "ref_word":"TO",
         "score":1.2,
         "phones":[
            {
               "ref_phone":"T",
               "recognition_result":"AH",
               "type":"Correct",
               "score":0.6687,
               "recognition_score":0.1504,
               "proficiency_score":100
            },
            {
               "ref_phone":"UW",
               "recognition_result":"UH",
               "type":"Correct",
               "score":0.8434,
               "recognition_score":0.0769,
               "proficiency_score":100
            }
         ],
         "start_index":25,
         "end_index":26,
         "start_time":1630,
         "end_time":1920,
         "turbidity":2,
         "pause":2,
         "pitch":2,
         "focus_word":2,
         "stress":2,
         "linking":2
      },
      {
         "ref_word":"ME",
         "score":1.2,
         "phones":[
            {
               "ref_phone":"M",
               "recognition_result":"UW",
               "type":"Correct",
               "score":0.9519,
               "recognition_score":0.0102,
               "proficiency_score":100
            },
            {
               "ref_phone":"IY",
               "recognition_result":"M",
               "type":"Correct",
               "score":0.4895,
               "recognition_score":0.0008,
               "proficiency_score":100
            }
         ],
         "start_index":28,
         "end_index":29,
         "start_time":1920,
         "end_time":2820,
         "turbidity":2,
         "pause":2,
         "pitch":1,
         "focus_word":2,
         "stress":2,
         "linking":2
      }
   ],
   "integrity_details":[
      {
         "ref":"COULD YOU PLEASE PASS IT TO ME",
         "recognition_result":"COULD 1.000000 YOU 1.500000 PLEASE 1.000000 PASS 1.000000 IT 1.200000 TO 1.200000 ME 1.200000 "  #oov会在单词后加#号,如“COULD#”
      }
   ],
   "scores":{
      "accuracy":96.6746,
      "fluency":100,
      "integrity":100
   },
   "addition":{
      "audio_url":"https://lf26-labspeech-sign.talk-guru.com/lab-speech-capt-audio/1163e15e-5a1f-4107-8973-9e25928932ca.mp3?x-expires=1670404194\u0026x-signature=c1O24%2Fb%2FfE12abv6HYIvhn%2BLL94%3D",
      "req.request.core_type":"en.sent.score"
   },
   "version":"1.10.14"
}

中文打分:

结构:

字段类型描述
reqidstringreqid (与请求体中的request.reqid一致)

code

int

状态码

1000 代表 成功

messagestring与状态码对应的补充信息
sequenceint请求序号 (与请求体中的request.sequence一致)
versionstring版本号

accuracy_details

[]array

字和拼音打分细节数组

每一项代表一个识别出来的字

accuracy_details[N].ref_characterstring识别出的中文字
accuracy_details[N].ref_pinyinstring识别出的中文拼音

accuracy_details[N].pinyin_score

float64

拼音的发音打分,使用以下公式根据syllable_score与tone_score的值算出

pinyin_score = syllable_score * 0.8 + tone_score * 0.2

accuracy_details[N].syllable_scorefloat64该音节的发音分数,包含该音节内的音素分数
accuracy_details[N].tone_scorefloat64该音节声调的发音分数。 (韵律得分)
accuracy_details[N].start_indexint文本(ref_text)中第N个拼音开始的位置
accuracy_details[N].end_indexint文本(ref_text)中第N个拼音结束的位置
accuracy_details[N].start_timeint音频中第N个拼音开始的时间,以毫秒为单位
accuracy_details[N].end_timeint音频中第N个拼音结束的时间,以毫秒为单位

accuracy_details[N].phones

[]array

每个音素的细节得分

每一项代拼音的一个音素

accuracy_details[N].phones[M].ref_phone

string

期望的音素

accuracy_details[N].phones[M].recognition_resultstring竞争音素

accuracy_details[N].phones[M].type

string

发音是否正确。正确:Correct,错误:Wrong 假设预期音素是音素/A/ (ref_phone),具有最高后验的最具竞争力音素是音素 /B/ (recognition_result) P(A) 和 P(B) 分别是他们的音素水平分数。

accuracy_details[N].phones[M].scorefloat64ref_phone 的地方,取值范围[0 -1 ]。如果你想要转换成[0-100] ,可参考accuracy_details[N].score的转换公式
##### accuracy_details[N].phones[M].recognition_scorefloat64被识别出的音素的得分 (recognition_result).

accuracy_details[N].phones[M].proficiency_score

float64

期望音素(ref_phone)的打分 [0-100](Recommended)

scores.accuracyfloat64准确性
scores.fluencyfloat64流利度
scores.integrityfloat64提供的音频与参考文本的匹配程度
integrity_details.ref_charactersstring参考文本
integrity_details.ref_pinyinsstring拼音集合
integrity_details.recognition_resultstring拼音及其得分,如:"ni3 0 你 hao3 0.996707 好"。
addition.audio_urlstring用户上传音频的链接
addition.req.request.core_typestring打分模型的类型,从请求体中获得

示例:

{
    "reqid": "CB11023E-A7A2-44F1-917B-E8858A90896C",
    "code": 1000,
    "message": "Success",
    "sequence": -1,
    "version": "1.3.0",
    "accuracy_details": [
        {
            "ref_character": "你",
            "ref_pinyin":    "ni3",
            "pinyin_score":  0.8,
            "syllable_score":  0.8,
            "tone_score":      0.8,        
            "start_index": 0,
            "end_index": 5,
            "start_time": 1130,
            "end_time": 1790,
            "phones": [
                {
                    "ref_phone": "n",
                    "recognition_result": "l",
                    "type": "Correct",
                    "score": 0.7316722273826599,
                    "recognition_score": 0.004821970127522945
                },
                {
                    "ref_phone": "i",
                    "recognition_result": "u",
                    "type": "Correct",
                    "score": 0.5287531614303589,
                    "recognition_score": 0.09780968725681304
                }
           ]
        },
        {
            "ref_character": "好",
            "ref_pinyin":    "hao3",
            "pinyin_score":  1,
            "syllable_score":  1,
            "tone_score":  1,      
            "start_index": 16,
            "end_index": 26,
            "start_time": 1790,
            "end_time": 2260
            "phones": [..]       # 为了更好的展示而折叠
        }
    ],
    "integrity_details": [
        {
            "recognition_result":"ni3 0.8 你 hao3 1 好",
            "ref_characters":"你好",
            "ref_pinyins":"ni3 hao3"
        }
    ],
    "scores": {
        "accuracy": 72.7,
        "fluency": 75.5,
        "integrity": 50
    },
    "addition": {
        "audio_url": "https://lf6-labspeech-sign.talk-guru.com/lab-speech-capt-audio/c6a131a5-d73a-4eda-904b-f56bed552063.mp3?x-expires=1669277694&x-signature=1%2FTqiRiiUEaSpkC2n4AbCbLFEfI%3D",
        "req.request.core_type": "cn.sent.raw"
    }
}
  1. 返回模式

返回模式有两种,分为单次返回和流式返回。由请求Body中的request.response_mode字段控制。

单次返回

只在尾包返回有效的打分结果。中间包返回的结果,各个字段为空值。

流式返回

中间包也会返回有效的打分结果。打分结果随着包号的递增而不断更新,直到尾包输出最终结果。