文档中心

search

最近更新时间：2024.04.16 15:42:20

首次发布时间：2024.03.27 17:09:24

本节将说明如何基于一个已创建的知识库做在线检索。

说明

知识库创建完成、文档导入且处理完成后，即代表可以进行在线检索了。
调用接口前请先完成“对接指南“页面的注册账号、实名认证、AK/SK 密钥获取和签名获取。

概述

/api/knowledge/collection/search 接口用于对知识库进行检索，当前会默认对原始文本加工后的知识内容进行检索。

前提条件

知识库创建完成。
文档导入且处理完成。
完成“对接指南“页面的注册账号、实名认证、AK/SK 密钥获取和签名获取后，可调用 API 接口实现知识库的检索查询等功能。

请求接口

URI	http://api-knowledgebase.ml_platform.cn-beijing.volces.com/api/knowledge/collection/search	统一资源标识符
请求方法	POST	客户端对向量数据库服务器请求的操作类型
请求头	Content-Type: application/json	请求消息类型
请求头	Authorization: HMAC-SHA256 ***	鉴权

请求参数

参数	类型	是否必选	默认值	参数说明
name	string	是		知识库的名字。
query	string	是		要检索的文本，最大长度为65535。
limit	int	否	10	检索结果数量，最大数量为200，最小为1。
query_param	json	否		检索的过滤条件，支持对doc的meta信息过滤。 { "doc_filter"：map，表示对doc的meta信息做过滤，使用方式和支持字段见 filter表达式说明，可支持对doc_id做筛选 } 注：此处用过过滤的字段，需要在collection/create时添加到index的fields上
rerank_switch	bool	否	false	自动对结果做rerank。说明：打开后，会自动请求rerank模型排序。
dense_weight	float	否	0.5	混合检索中稠密向量的权重，1 表示纯稠密检索，0表示纯字面检索。范围 [0.2, 1]，否则抛出错误只有在请求的知识库使用的是混合检索时有效，即索引算法为 hnsw_hybrid。

响应消息

参数	参数说明
code	状态码
message	返回信息
request_id	标识每个请求的唯一标识符
data	{ "collection_name": 知识库的名字 "count": 结果数量 "result_list": [ { "id": 索引的primary_key "chunk_title": 该文本片的父标题，是由解析模型识别出来的上一层级的标题。若没有上一层级标题则为空。 "content": 原始文本加工后的知识内容; 对于faq，这里展示匹配到问题对应的答案 "score": 检索得分 "point_id": 知识点id，即文档切片的id "original_question": ", //对于faq，这里展示匹配到的原始问题 "process_time": 知识处理完成的时间 "rerank_score": rerank得分 "doc_info": { "doc_id":文档id "doc_name": 文档名称 "create_time": 文档的创建时间 "doc_type": 知识所属原始文档的类型 "doc_meta": 文档的原始meta信息 "source": 知识来源（所属文档的来源，url即为url的链接，tos为tos的目录） "title"：知识所属文档的标题 } } ] }

状态码说明

状态码	http状态码	返回信息	状态码说明
0	200	success	成功
1000001	401	unauthorized	缺乏鉴权信息
1000002	403	no permission	权限不足
1000003	400	invalid request：%s	非法参数
1000005	400	collection not exist	collection不存在

完整示例

请求消息

curl -i -X POST \
  -H 'Content-Type: application/json' \
  -H 'Authorization: HMAC-SHA256 ***' \
  http://api-knowledgebase.ml_platform.cn-beijing.volces.com/api/knowledge/collection/search \
  -d '{
    "name": "test_name",
    "query": "introduce a new document level structure",
    "limit": 2,
    "query_param": {
        "filter": {
          "op": "must",    
          "field": "doc_id", 
          "conds": ["tos_doc_id_123", "tos_doc_id_456"]
        },
    }, 
    "rerank_switch": false
}'

响应消息

执行成功返回：

HTTP/1.1 200 OK
Content-Length: 43
Content-Type: application/json
 
{
    "code":0,
    "data": {
          "collection_name": "test_name",
          "count": 2
          "result_list": [
            {
                "id": "tos_doc_id_123",
                "chunk_title": "Conclusion"，
                "content": "In this paper, we discussed the task of document level structure parsing. This task is more intricate compared to the traditional page level scenario. This complexity arises because we need to consider connecting paragraphs across pages and linking paragraphs into sections. To address these challenges, we introduced a transition-based parser as a solution. Alongside this, we introduced a new dataset called DocTree to support this task.",
                "score"：0.7119365930557251,
                "point_id": "tos_doc_id_2_1-217-6834848478902922598",
                "process_time": 1709097567,
                "doc_info": {        
                    "doc_id": "tos_doc_id_123",
                    "doc_name": "DLSP: A Document Level Structure Parser for Multi-Page Digital Documents.pdf",
                    "create_time": 1677561567,
                    "doc_type": "pdf",
                    "doc_meta": "[{"field_name": "author", "field_type": "string", "field_value": "Mike"}, {"field_name": "category", "field_type": "string", "field_value": "Mike"}]",
                    "source": "tos"，
                    "title"："DLSP: A Document Level Structure Parser for Multi-Page Digital Documents"
               }
            }，
            {
                "id": "tos_doc_id_456",
                "chunk_title": "Conclusion"，
                "content": "We also introduce a new document level structure parsing dataset called DocTree. It comprises 1,298 manually annotated documents with document level structural information. In contrast to previous datasets focusing on single page, the maximum page number in DocTree reaches 85 while the average is 7.2.",
                "score"：0.711473822593689
                "point_id": "tos_doc_id_2_1-37-3242137170643999406",
                "process_time": 1709097567,
                "rerank_score": 
                "doc_info": {        
                    "doc_id": "tos_doc_id_123",
                    "doc_name": "DLSP: A Document Level Structure Parser for Multi-Page Digital Documents.pdf",
                    "create_time": 1677561593,
                    "doc_type": "pdf",
                    "doc_meta": "[{"field_name": "author", "field_type": "string", "field_value": "Mike"}, {"field_name": "category", "field_type": "string", "field_value": "Mike"}]",
                    "source": "tos"，
                    "title"："DLSP: A Document Level Structure Parser for Multi-Page Digital Documents"
               }
            }
        ]
    },
    "message": "success",
    "request_id": "02170910041086600000000000000000000ffff0a00609d26d25e"
}

执行失败返回：

HTTP/1.1 400 OK
Content-Length: 43
Content-Type: application/json
 
{"code":1000003, "message":"invalid request：%s", "request_id": "021695029757920fd001de6666600000000000000000002569b8f"}

概述

前提条件

请求接口

请求参数

响应消息

状态码说明

完整示例

请求消息

响应消息

机器学习平台

search

状态码说明

请求消息

响应消息