最近更新时间:2024.05.10 19:20:48
首次发布时间:2024.04.17 14:21:10
本节将说明如何基于一个已创建的知识库做在线检索。
说明
/api/knowledge/collection/search 接口用于对知识库进行检索,当前会默认对原始文本加工后的知识内容进行检索。
URI | http://api-knowledgebase.ml_platform.cn-beijing.volces.com/api/knowledge/collection/search | 统一资源标识符 |
---|---|---|
请求方法 | POST | 客户端对向量数据库服务器请求的操作类型 |
请求头 | Content-Type: application/json | 请求消息类型 |
Authorization: HMAC-SHA256 *** | 鉴权 |
参数 | 类型 | 是否必选 | 默认值 | 参数说明 |
---|---|---|---|---|
name | string | 是 | 知识库的名字。 | |
query | string | 是 | 要检索的文本,最大长度为65535。 | |
limit | int | 否 | 10 | 检索结果数量,最大数量为200,最小为1。 |
query_param | json | 否 | 检索的过滤条件,支持对doc的meta信息过滤。 | |
rerank_switch | bool | 否 | false | 自动对结果做rerank。 |
dense_weight | float | 否 | 0.5 | 混合检索中稠密向量的权重,1 表示纯稠密检索 ,0表示纯字面检索。范围 [0.2, 1],否则抛出错误 |
参数 | 参数说明 |
---|---|
code | 状态码 |
message | 返回信息 |
request_id | 标识每个请求的唯一标识符 |
data | {
] } |
状态码 | http状态码 | 返回信息 | 状态码说明 |
---|---|---|---|
0 | 200 | success | 成功 |
1000001 | 401 | unauthorized | 缺乏鉴权信息 |
1000002 | 403 | no permission | 权限不足 |
1000003 | 400 | invalid request:%s | 非法参数 |
1000005 | 400 | collection not exist | collection不存在 |
curl -i -X POST \ -H 'Content-Type: application/json' \ -H 'Authorization: HMAC-SHA256 ***' \ http://api-knowledgebase.ml_platform.cn-beijing.volces.com/api/knowledge/collection/search \ -d '{ "name": "test_name", "query": "introduce a new document level structure", "retrieve_count": 25, "limit": 2, "query_param": { "filter": { "op": "must", "field": "doc_id", "conds": ["tos_doc_id_123", "tos_doc_id_456"] }, }, "rerank_switch": true, "dense_weight": 0.5 }'
执行成功返回:
HTTP/1.1 200 OK Content-Length: 43 Content-Type: application/json { "code":0, "data": { "collection_name": "test_name", "count": 2 "result_list": [ { "id": "tos_doc_id_123", "chunk_title": "Conclusion", "content": "In this paper, we discussed the task of document level structure parsing. This task is more intricate compared to the traditional page level scenario. This complexity arises because we need to consider connecting paragraphs across pages and linking paragraphs into sections. To address these challenges, we introduced a transition-based parser as a solution. Alongside this, we introduced a new dataset called DocTree to support this task.", "score":0.7119365930557251, "recall_position": 1 "point_id": "tos_doc_id_2_1-217-6834848478902922598", "process_time": 1709097567, "rerank_score": 0.877777 "rerank_position": 1 "doc_info": { "doc_id": "tos_doc_id_123", "doc_name": "DLSP: A Document Level Structure Parser for Multi-Page Digital Documents.pdf", "create_time": 1677561567, "doc_type": "pdf", "doc_meta": "[{"field_name": "author", "field_type": "string", "field_value": "Mike"}, {"field_name": "category", "field_type": "string", "field_value": "Mike"}]", "source": "tos", "title":"DLSP: A Document Level Structure Parser for Multi-Page Digital Documents" } }, { "id": "tos_doc_id_456", "chunk_title": "Conclusion", "content": "We also introduce a new document level structure parsing dataset called DocTree. It comprises 1,298 manually annotated documents with document level structural information. In contrast to previous datasets focusing on single page, the maximum page number in DocTree reaches 85 while the average is 7.2.", "score":0.711473822593689 "recall_position": 1 "point_id": "tos_doc_id_2_1-37-3242137170643999406", "process_time": 1709097567, "rerank_score": 0.5874546 "rerank_position": 2 "doc_info": { "doc_id": "tos_doc_id_123", "doc_name": "DLSP: A Document Level Structure Parser for Multi-Page Digital Documents.pdf", "create_time": 1677561593, "doc_type": "pdf", "doc_meta": "[{"field_name": "author", "field_type": "string", "field_value": "Mike"}, {"field_name": "category", "field_type": "string", "field_value": "Mike"}]", "source": "tos", "title":"DLSP: A Document Level Structure Parser for Multi-Page Digital Documents" } } ] }, "message": "success", "request_id": "02170910041086600000000000000000000ffff0a00609d26d25e" }
执行失败返回:
HTTP/1.1 400 OK Content-Length: 43 Content-Type: application/json {"code":1000003, "message":"invalid request:%s", "request_id": "021695029757920fd001de6666600000000000000000002569b8f"}