You need to enable JavaScript to run this app.
导航

评测数据集格式说明

最近更新时间2023.11.30 17:25:55

首次发布时间2023.09.22 14:50:48

评测集格式目前支持JSONL,一行一个评估样本

单轮对话

格式说明

# 输入文件eval_math.jsonl
{"prompt":"计算1+1","answer":"2"}
{"system":"请完成下面的计算题","prompt":"1+1","parameters":{"top_k":1},"answer":"2"}
{"prompt_error":"This is an error prompt"} # 格式错误的示例样本,上传后无法进行评测

# 输出文件eval_math_result.json
{"prompt":"计算1+1","answer":"2","response":"2","usage":{"prompt_tokens":6,"completion_tokens":2,"total_tokens":8}}
{"system":"请完成下面的计算题","prompt":"1+1","parameters":{"top_k":1},"answer":"2","response":"2","usage":{"prompt_tokens":13,"completion_tokens":2,"total_tokens":15}}
{"prompt_error":"This is an error prompt","error":"Invalid input"} # 若因数据或平台问题导致推理无法完成,将展示error

输入字段说明

字段类型选填备注
promptstr必填输入指令,作为向模型的提问
answerstr必填评测的参考答案,用于对模型生成的回答进行验证
systemstr选填引入角色的输入指令
parametersdict选填请求参数,参考API调用指南API Specification一节的input中的parameters描述

输出字段说明

字段类型备注
responsestr模型生成的回答
usagedicttoken使用信息,参考API调用指南API Specification一节的ouput中的usage描述
errorstr若因数据或平台问题导致推理无法完成,将展示error

多轮对话(推理)

# 输入文件test.jsonl 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"1+1"}],"parameters":{"top_k":1}} 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"2+1"},{"role":"user","content":"2+2"}],"parameters":{"top_k":1}} 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"3+1"},{"role":"assistant","content":"999"},{"role":"user","content":"3+2"}],"parameters":{"top_k":1}} 
{"messages":[{"role":"someone","content":"error role"}]}# 自动识别多轮推理格式,格式错误的示例样本,上传后无法进行推理;

# 输出文件test_result.json 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"1+1"},{"role":"assistant","content":"2"}],"parameters":{"top_k":1},"usage":{"prompt_tokens":13,"completion_tokens":2,"total_tokens":15}} 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"2+1"},{"role":"assistant","content":"3"},{"role":"user","content":"2+2"},{"role":"assistant","content":"4"}],"parameters":{"top_k":1},"usage":{"prompt_tokens":33,"completion_tokens":4,"total_tokens":37}} 
{"messages":[{"role":"system","content":"请完成下面的计算题"},{"role":"user","content":"3+1"},{"role":"assistant","content":"999"},{"role":"user","content":"3+2"},{"role":"assistant","content":"5"}],"parameters":{"top_k":1},"usage":{"prompt_tokens":22,"completion_tokens":2,"total_tokens":24}} 
{"messages":[{"role":"someone","content":"error role"}],"error":"Invalid role: {"someone"}}# 若因数据或平台问题导致推理无法完成,将展示error

输入字段说明

字段类型选填备注
messageslist必填多轮输入输出,参考API调用指南API Specification一节的input中的messages描述,同时需要注意,messages字段不能以assistant的回复结束
parametersstr选填请求参数,参考API调用指南API Specification一节的input中的parameters描述

输出字段说明

字段类型备注
messagesstr模型会在messages字段中,所有`{role:"user"}`后插入`{role:"assistant","content":"xxx"}`对象作为模型的输出
usagedicttoken使用信息,参考API调用指南API Specification一节的ouput中的usage描述
errorstr若因数据或平台问题导致推理无法完成,将展示error