检索公共参数--向量数据库VikingDB-火山引擎

文档中心

向量数据库VikingDB

检索(Search)

检索公共参数

检索公共参数

检索功能由多个接口组成，分别对应不同的检索模式和业务场景。

接口说明

V2 接口将不同检索模式对应到不同的接口，更加清晰，便于您针对自己的业务场景选择合适的检索方式。

检索类接口通用请求体参数

名称	类型	必选	子参数	类型	描述
resource_id	str	二选一			Collection 的资源 ID。
collection_name	str	二选一			Collection 名称。
index_name	str	是			索引名称。
output_fields	List[str]	否			要返回的标量字段列表。未设置时返回集合内所有标量字段。传入空列表表示不返回任何标量字段。字段名必须存在于 collection schema，否则请求报错。
filter	Dict[str, Any]	否			标量过滤条件，格式见下文，默认不设置。
limit	int	否			返回结果上限，默认 10，最大值 100000。
offset	int	否			分页偏移量，默认 0，过大时会触发深分页性能损耗。
Advance	SearchAdvance	否	dense_weight	Optional[float]	混合向量场景下 dense / sparse 权重，默认 0.5，范围 [0.2, 1]。
			ids_in	Optional[List[Any]]	仅在该主键集合范围内检索。
			ids_not_in	Optional[List[Any]]	排除指定主键列表。
			post_process_ops	Optional[List[Dict[str, Any]]]	后置处理算子列表，串行执行，详见《检索后处理算子-PostProcess》。
			post_process_input_limit	Optional[int]	进入后处理阶段的候选条数。
			scale_k	Optional[float]	分配给后续 rerank 的候选倍率。
			filter_pre_ann_limit	Optional[int]	先执行标量过滤时的最大候选条数。
			filter_pre_ann_ratio	Optional[float]	先执行标量过滤时的候选比例。

filter 结构

使用 filter 的前提是，相应的标量字段已经设置为标量索引（scalar_index）。

算子	适用字段类型		示例
must	string、int64、bool、list、list	针对指定字段名生效，语义为必须在 [...] 之中，即 "must in"	`{ "op": "must", "field": "region", "conds": ["cn", "sg"] }`
must_not	string、int64、bool、list、list	针对指定字段名生效，语义为必须不在 [...] 之中，即 "must not in"	`{ "op": "must_not", "field": "data_type", "conds": [1,2,3] }`
and	/	逻辑算子，针对逻辑查询需求，对多个条件取交集	`{ "op": "and", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`
or	/	逻辑算子，针对逻辑查询需求，对多个条件取并集	`{ "op": "or", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`
range	int64、float32	针对指定字段名生效，语义为必须在指定范围内。配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。	`// price在[100.0, 500.0) { "op": "range", "field": "price", "gte": 100.0, "lt": 500.0 } //price >= 100.0 { "op": "range", "field": "price", "gte": 100.0 }`
time_range	date_time	时间点筛选，配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。	`{ // 检索北京时间 2025-08-12 00:00:00 ~ 2025-08-13 00:00:00 之间的数据 "op": "time_range", "field": "f_date_time", "gt": "2025-08-12T00:00:00+08:00", "lt": "2025-08-13T00:00:00+08:00", }`
geo_range	geo_point	地理距离筛选，使用 `center` 为中心点，`radius` 为距离半径。	`{ "op": "geo_range", "field": "f_geo_point", "center": "116.412138,39.914912", "radius": "10000m" }`

后置处理结构

算子	适用字段类型	介绍	示例
score_fusion	string	分数融合算子，用于融合指定的时空标量的分数。	`{ "op": "score_fusion", "fusion_by": "add", "addition_score_weight": 0.6, //附加score的权重为0.6 ann score的权重为1-0.6=0.4 "addition_score": [ { "factor": 1.0, "base_value_from": "scalar_field", "field": "f_sales" }, { "factor": -10.0, // 投诉多的商品，score应该降低 "base_value_from": "scalar_field", "field": "f_complain" } ] }`
string_contain	string	关键词匹配过滤算子。表示该字段内容包含 pattern。	`{ "op": "string_contain", "field": "name", "pattern": "bar" }`
string_match	string	正则匹配过滤算子。	`{ "op": "string_match", "field": "name", "pattern": "^[0-9A-Za-z_]+$" }`
enum_freq_limiter	string	频控算子。用于保证一次召回的结果中，一个特定取值出现的总数不超过 `threshold` 次。	`{ "op": "enum_freq_limiter", "field": "city", "threshold": 5 }`

最近更新时间：2026.04.14 15:03:25

这个页面对您有帮助吗？

有用

有用

无用

无用