检索公共参数--向量数据库VikingDB-火山引擎

文档中心

向量数据库VikingDB

检索(Search)

检索公共参数

检索功能包括多个接口组成，分别对应不同的检索模式和业务场景。

接口升级说明

对应的V1接口说明为：https://www.volcengine.com/docs/84313/1580544，V1接口将各种检索模式放置在同一接口中，由不同参数组合来区分不同的检索模式。V2接口则将不同检索模式对应到不同的接口，更加清晰，便于您针对自己的业务场景，选择合适的检索方式。
使用区别：

	V2接口	V1接口
向量检索	SearchByVector dense_vector参数	https://www.volcengine.com/docs/84313/1419285 设置order_by_vector.vectors参数
稠密稀疏向量混合检索	SearchByVector，设置sparse_vector参数	https://www.volcengine.com/docs/84313/1419286 设置order_by_vector.sparse_vectors参数
向量检索下支持张量重排	SearchByVector 设置 tensor_rerank 参数，详见：Tensor Rerank 结构	不支持
多模态检索	SearchByMultiModal 其中，若为支持instruction的模型，需要主动设置是否添加instruction，以增强感知度。请求结果中会返回拼接instruction后的query语句。	https://www.volcengine.com/docs/84313/1419288 设置order_by_raw参数
多模态检索下支持张量重排	SearchByMultiModal 设置 tensor_rerank 参数，详见：Tensor Rerank 结构	不支持
主键检索	SearchByID	https://www.volcengine.com/docs/84313/1419285 设置order_by_vector.primary_keys参数
标量排序检索	SearchByScalar	https://www.volcengine.com/docs/84313/1419287 设置order_by_scalar参数
随机检索	SearchByRandom	https://www.volcengine.com/docs/84313/1578505 不设置任何order_by_xxx参数
关键词检索	SearchByKeywords	不支持
主键过滤	ids_in、ids_not_in参数	primary_key_in、primary_key_not_in参数
子索引	性能和易用性更强的分片研发中，敬请期待。	partition参数指定
响应字段	主键值会单独列出，便于定位数据	所有字段均在同一级参数中

检索类接口通用请求体参数

参数名	类型	必选	子参数	类型	备注
resource_id	string	2选1			资源id
collection_name	string	2选1			collection名称
index_name	string	是			索引名称
output_fields	list	否			要返回的标量字段列表. 用户不传 output_fields 时, 返回所有标量字段用户传一个空列表不返回标量字段 output_fields格式错误或者过滤字段不是 collection 里的字段时, 接口返回错误
filter	map	否			过滤条件，格式见下文。默认为空，不做过滤
limit	int	否			检索结果数量，默认为10, 上限是 100000
offset	int	否			偏移量。仅分页场景下使用，不建议设置过大值，否则有深分页影响。默认值为0。设置值至少为0，语义和mysql的offset相同。
advance（一些高级参数，普通场景不需要设置）	map	否	dense_weight	float32	如果collection带有sparse向量字段，这里可以指定检索时dense和sparse的权重。默认值0.5，可选范围[0.2, 1]
			ids_in	list或list	设定此参数，会限定仅在主键列表范围内进行检索。默认为空。
			ids_not_in	list或list	设定此参数，将把主键列表范围内的数据从检索结果中排除。默认为空。
			post_process_ops	list	后置处理算子列表。在向量或标量召回阶段后，对候选数据进行进一步过滤处理，如字符串匹配、频控等。每个算子为一个map，串行执行。默认为空。详见下文后置处理算子。
			post_process_input_limit	int	当设置了post_process_ops时，进入后置处理阶段的候选数量。默认值为上述limit参数值*3。若显式设置，则不应该小于limit值，但不超过100000。
			scale_k	float64	检索sef是索引参数sef的scale_k倍。默认1，取值范围[0.1, 100]，值越大检索结果越精确，但相应地会降低性能。

filter结构

使用filter的前提是，相应的标量字段设置为了标量字段（设置了ScalarIndex）

算子	适用字段类型		示例
must	string、int64、bool、list、list	针对指定字段名生效，语义为必须在 [...] 之中，即 "must in" “conds”中的每个条件之间是 or 关系	`{ "op": "must", "field": "region", "conds": ["cn", "sg"] }`
must_not	string、int64、bool、list、list	针对指定字段名生效，语义为必须不在 [...] 之中，即 "must not in" “conds”中的每个条件之间是 or 关系	`{ "op": "must_not", "field": "data_type", "conds": [1,2,3] }`
and	/	逻辑算子，针对逻辑查询需求，对多个条件取交集	`{ "op": "and", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`
or	/	逻辑算子，针对逻辑查询需求，对多个条件取并集	`{ "op": "or", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`
range	int64、float32	针对指定字段名生效，语义为必须在指定范围内。配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。	`// price在[100.0, 500.0) { "op": "range", "field": "price", "gte": 100.0, "lt": 500.0 } //price >= 100.0 { "op": "range", "field": "price", "gte": 100.0 }`
time_range	date_time	时间点筛选，配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。	`{ "collection_name": "test", "index_name": "test", "filter": { "op": "and", "conds": [ { // 检索北京时间 2025-08-12 00:00:00 ~ 2025-08-13 00:00:00 之间的数据 "op": "time_range", "field": "f_date_time", "gt": "2025-08-12T00:00:00+08:00", "lt": "2025-08-13T00:00:00+08:00", } ] } }`
geo_range	geo_point	地理距离筛选，使用 `center` 为中心点，`radius` 为距离半径。	`{ "collection_name": "test", "index_name": "test", "filter": { "op": "and", "conds": [ { "op": "geo_range", "field": "f_geo_point", "center": "116.412138,39.914912", "radius": "10000m" } ] } }`

后置处理结构

算子	适用字段类型		示例
score_fusion	string	分数融合算子，用于融合指定的时空标量的分数。	`{ "op": "score_fusion", "fusion_by": "add", "addition_score_weight": 0.6, //附加score的权重为0.6 ann score的权重为1-0.6=0.4 "addition_score": [ { "factor": 1.0, "base_value_from": "scalar_field", "field": "f_sales" }, { "factor": -10.0, // 投诉多的商品，score应该降低 "base_value_from": "scalar_field", "field": "f_complain" } ] }`
string_contain	string	关键词匹配过滤算子。表示该字段内容包含pattern。	`{ "op": "string_contain", "field": "name", "pattern": "bar" }`
string_match	string	正则匹配过滤算子。	`{ "op": "string_match", "field": "name", "pattern": "^[0-9A-Za-z_]+$" }`
enum_freq_limiter	string	频控算子。用于保证一次召回的结果中, 一个特定取值出现的总数不超过 `threshold` 次。	`{ "op": "enum_freq_limiter", "field": "city" "threshold": 5 }`

Tensor Rerank 结构

注：此参数仅在 searchByVector 和 searchByMultiModal 接口中支持

参数名	类型	必选	说明
tensor	list<list>	在 searchByVector 中必选；在 searchByMultiModal 中不允许设置；	张量检索数据：用于和数据库中已有的张量做相似度计算的输入仅在 searchByVector 中必填在 searchByMultiModal 中会自动由模型生成
input_limit	int	是	进入张量重排的候选数量，范围：[1, 1000]

参数名

类型

必选

说明

tensor

list<list>

在 searchByVector 中必选；
在 searchByMultiModal 中不允许设置；

张量检索数据：用于和数据库中已有的张量做相似度计算的输入

仅在 searchByVector 中必填
在 searchByMultiModal 中会自动由模型生成

input_limit

int

是

进入张量重排的候选数量，范围：[1, 1000]

检索类接口通用响应体参数

参数名	类型	子参数		说明
result	map	data	list	查询到的数据列表，SearchItem结构见下。
		total_return_count	int	返回实际检索结果的数据条数
		token_usage	map	包括prompt_tokens、completion_tokens、image_tokens、total_tokens信息

SearchItem

参数名	类型	备注
id	string/int64	数据的主键字段值。
fields	map<string,any>	key为字段名，value为字段值
score	float	必返回。最终得分score值。search接口返回的结果是按score排序的。
ann_score	float	非必返回。如果是全文检索、score值计算处理等算子加工过，会返回向量检索（ANN）的原始score值。

最近更新时间：2026.01.27 21:10:01

这个页面对您有帮助吗？

有用

无用

向量数据库VikingDB

filter结构 #

后置处理结构 #

Tensor Rerank 结构 #

SearchItem #

filter结构

后置处理结构

Tensor Rerank 结构

SearchItem