使用记忆库算子实现画像精准抽取--向量数据库VikingDB-火山引擎

文档中心

向量数据库VikingDB

最佳实践

使用记忆库算子实现画像精准抽取

为什么需要算子？

在使用大语言模型做记忆抽取时，模型擅长语义理解和推理，但在数值计算（如计数、平均值、最大值等）方面往往容易产生幻觉，导致结果不够精准。
为了解决这一问题，记忆库提供了算子（Operator）机制。算子可以在模型调用前后，对记忆数据进行可靠的数值处理或智能聚合，确保记忆结果既准确又有洞察力。

算子能力概览

记忆库的算子主要分为两类：

数值计算类算子

这些算子保证数值计算结果的准确性，避免依赖模型的“心算”。

SUM：对数值字段求和，返回总和结果。
COUNT：统计记录数量，返回计数结果。
MAX：获取数值字段中的最大值，返回最大结果。
AVG：对数值字段求平均值，返回平均结果。

应用示例：在教育场景中，利用 SUM 和 COUNT 可以统计某个用户对知识点的答对次数、答错次数，从而评估掌握情况。

大模型处理类算子

这类算子发挥大模型的语义理解能力，对非结构化或复杂数据进行聚合。

LLM_MERGE：通过大模型对画像字段进行智能合并与归纳。利用语义压缩、思维链等方法，对记忆进行修订和融合。

应用示例：在用户兴趣画像中，LLM_MERGE 可以将用户关于“数学兴趣”的不同表达方式（如“喜欢几何”“对代数有兴趣”）合并为更统一、更有洞察力的描述。

算子能力最佳实践

下面以一个完整的教育场景案例为例，演示如何记录用户的学习行为，并通过 MAX 与 AVG 算子评估学生在某知识点上的平均得分和最高得分，从而实现对知识点掌握情况的个性化记录。

创建记忆库

您可以通过控制台或数据面 API 创建记忆库，具体操作步骤如下：

控制台

API

创建一个名为 education_memory_base 的记忆库。

请参考下图，在记忆抽取配置中编辑事件规则，以确保能够准确捕获学习行为数据。

在记忆抽取配置中，请参考下图编辑画像规则。添加“最高分”和“平均分”字段时，点击更多配置，将关联事件 english_study 中的 rating_score 作为聚合对象，并分别选择 MAX 和 AVG 算子作为聚合操作符。

配置好事件和画像抽取规则后，点击“创建记忆库”即可完成创建。

创建一个名为 education_memory_base 的记忆库，并自定义画像配置，用于记录用户在英语学习会话中的助教—学生问答及评分情况。

import json
import requests
from volcengine.base.Request import Request
from volcengine.Credentials import Credentials
from volcengine.auth.SignerV4 import SignerV4
import os

AK = os.getenv("VOLC_ACCESSKEY")
SK = os.getenv("VOLC_SECRETKEY")
Domain = "api-knowledgebase.mlp.cn-beijing.volces.com"

def prepare_request(method, path, ak, sk, data=None):
  r = Request()
  r.set_shema("http")
  r.set_method(method)
  r.set_host(Domain)
  r.set_path(path)

  if data is not None:
    r.set_body(json.dumps(data))
  credentials = Credentials(ak, sk, 'air', 'cn-north-1')
  SignerV4.sign(r, credentials)
  return r

def internal_request(method, api, payload, params=None):
  req = prepare_request(
                        method = method,
                        path = api,
                        ak = AK,
                        sk = SK,
                        data = payload)

  r = requests.request(method=req.method,
          url="{}://{}{}".format(req.schema, req.host, req.path),
          headers=req.headers,
          data=req.body,
          params=params,
      )
  return r
  
path = '/api/memory/collection/create'
playload = {
    "CollectionName": "education_memory_base",
    "Description": "test description",
    "CustomEventTypeSchemas": [
        {
            "EventType": "english_study",
            "Description": "记录一次英语学习会话中助教与学生的问答及评分",
            "Properties": [
                {
                    "PropertyName": "knowledge_point_name",
                    "PropertyValueType": "string",
                    "Description": "当前对话涉及的知识点名称"
                },
                {
                    "PropertyName": "question",
                    "PropertyValueType": "string",
                    "Description": "助教提出的问题"
                },
                {
                    "PropertyName": "answer",
                    "PropertyValueType": "string",
                    "Description": "学生的回答"
                },
                {
                    "PropertyName": "rating_score",
                    "PropertyValueType": "float32",
                    "Description": "对学生回答的数值评分，满分为10分"
                }
            ]
        }
    ],
    "CustomProfileTypeSchemas": [
        {
            "ProfileType": "english_knowledge_point",
            "Description": "用于追踪学生在特定英语知识点上的学习进展",
            "Properties": [
                {
                    "PropertyName": "id",
                    "PropertyValueType": "int64",
                    "Description": "知识点的唯一主键ID",
                    "IsPrimaryKey": True
                },
                {
                    "PropertyName": "knowledge_point_name",
                    "PropertyValueType": "string",
                    "Description": "知识点具体名称 (例如: "天气相关词汇")"
                },
                {
                    "PropertyName": "good_evaluation_criteria",
                    "PropertyValueType": "string",
                    "Description": "判断回答为'好'的评判标准"
                },
                {
                    "PropertyName": "rating_score_max",
                    "PropertyValueType": "float32",
                    "Description": "在该知识点上获得过的最高数值评分",
                    "AggregateExpression": {
                        "Op": "MAX",
                        "EventType": "english_study",
                        "EventPropertyName": "rating_score"
                    }
                },
                {
                    "PropertyName": "rating_score_avg",
                    "PropertyValueType": "float32",
                    "Description": "在该知识点上获得过的平均数值评分",
                    "AggregateExpression": {
                        "Op": "AVG",
                        "EventType": "english_study",
                        "EventPropertyName": "rating_score"
                    }
                }                            
            ]
        }
    ]
}
rsp = internal_request('POST', path, playload)
print(rsp.json())

添加会话

在添加会话时，需预先定义 profile_scope 字段。这样，大模型就能将用户的学习行为精准归属到对应的知识点，确保画像更新的准确性与可追溯性。

import json
import requests
from volcengine.base.Request import Request
from volcengine.Credentials import Credentials
from volcengine.auth.SignerV4 import SignerV4
import os

AK = os.getenv("VOLC_ACCESSKEY")
SK = os.getenv("VOLC_SECRETKEY")
Domain = "api-knowledgebase.mlp.cn-beijing.volces.com"

def prepare_request(method, path, ak, sk, data=None):
  r = Request()
  r.set_shema("http")
  r.set_method(method)
  r.set_host(Domain)
  r.set_path(path)

  if data is not None:
    r.set_body(json.dumps(data))
  credentials = Credentials(ak, sk, 'air', 'cn-north-1')
  SignerV4.sign(r, credentials)
  return r

def internal_request(method, api, payload, params=None):
  req = prepare_request(
                        method = method,
                        path = api,
                        ak = AK,
                        sk = SK,
                        data = payload)

  r = requests.request(method=req.method,
          url="{}://{}{}".format(req.schema, req.host, req.path),
          headers=req.headers,
          data=req.body,
          params=params,
      )
  return r

messages = [
  {"role": "assistant", "content": "Let’s practice comparatives. How do you say: “我的房间比你的大”？"},
  {"role": "user", "content": "My room more big than yours."},
  {"role": "assistant", "content": "Almost! Correct is: “My room is bigger than yours.”"},

  {"role": "assistant", "content": "How would you say: “她比我聪明”？"},
  {"role": "user", "content": "She is more smart than me."},
  {"role": "assistant", "content": "Good try! Correct is: “She is smarter than me.”"},

  {"role": "assistant", "content": "Translate: “今天比昨天冷”。"},
  {"role": "user", "content": "Today is colder than yesterday."},
  {"role": "assistant", "content": "Perfect! That’s correct."},

  {"role": "assistant", "content": "How do you say: “这是我最喜欢的电影”？"},
  {"role": "user", "content": "This is my favorite movie."},
  {"role": "assistant", "content": "Excellent! That’s the superlative form."},

  {"role": "assistant", "content": "Translate: “这是我们班里最高的男孩”。"},
  {"role": "user", "content": "He is the most tall boy in our class."},
  {"role": "assistant", "content": "Almost! Better is: “He is the tallest boy in our class.”"},

  {"role": "assistant", "content": "Make your own sentence with a superlative."},
  {"role": "user", "content": "Mount Everest is the highest mountain in the world."},
  {"role": "assistant", "content": "Perfect! Very natural usage."},

  {"role": "assistant", "content": "Now let’s practice articles. How do you say: “我有一只狗”？"},
  {"role": "user", "content": "I have dog."},
  {"role": "assistant", "content": "Almost! It should be: “I have a dog.”"},

  {"role": "assistant", "content": "Translate: “月亮在天上”。"},
  {"role": "user", "content": "The moon is in the sky."},
  {"role": "assistant", "content": "Perfect! We use 'the' for unique things."},

  {"role": "assistant", "content": "How would you say: “我需要一杯水”？"},
  {"role": "user", "content": "I need water."},
  {"role": "assistant", "content": "Correct! No article with uncountable nouns like 'water'."},

  {"role": "assistant", "content": "How do you say: “她是老师”？"},
  {"role": "user", "content": "She is teacher."},
  {"role": "assistant", "content": "Almost! Correct is: “She is a teacher.”"},

  {"role": "assistant", "content": "Translate: “太阳正在照耀”。"},
  {"role": "user", "content": "Sun is shining."},
  {"role": "assistant", "content": "Close! Correct is: “The sun is shining.”"},

  {"role": "assistant", "content": "Make your own sentence with 'an'."},
  {"role": "user", "content": "I ate an apple."},
  {"role": "assistant", "content": "Perfect! That’s exactly right."},

  {"role": "assistant", "content": "Let’s try countable vs uncountable nouns. Translate: “我买了一些米”。"},
  {"role": "user", "content": "I bought some rices."},
  {"role": "assistant", "content": "Not quite. 'Rice' is uncountable: “I bought some rice.”"},

  {"role": "assistant", "content": "How do you say: “我需要一些信息”？"},
  {"role": "user", "content": "I need some informations."},
  {"role": "assistant", "content": "Almost! Correct is: “I need some information.”"},

  {"role": "assistant", "content": "Translate: “桌子上有三本书”。"},
  {"role": "user", "content": "There are three books on the table."},
  {"role": "assistant", "content": "Perfect! 'Books' are countable."},

  {"role": "assistant", "content": "How do you say: “我喝了两杯水”？"},
  {"role": "user", "content": "I drank two water."},
  {"role": "assistant", "content": "Close! Correct is: “I drank two glasses of water.”"},

  {"role": "assistant", "content": "Make your own sentence with an uncountable noun."},
  {"role": "user", "content": "I have some money."},
  {"role": "assistant", "content": "Excellent! That’s correct."},

  {"role": "assistant", "content": "Now let’s practice passive voice. Translate: “这本书是他写的”。"},
  {"role": "user", "content": "This book he wrote."},
  {"role": "assistant", "content": "Almost! In passive voice: “This book was written by him.”"},

  {"role": "assistant", "content": "How would you say: “英语在全世界被使用”？"},
  {"role": "user", "content": "English is used all over the world."},
  {"role": "assistant", "content": "Perfect! That’s correct."},

  {"role": "assistant", "content": "Translate: “这些苹果已经被吃掉了”。"},
  {"role": "user", "content": "These apples are eaten."},
  {"role": "assistant", "content": "Close! Correct is: “These apples have been eaten.”"},

  {"role": "assistant", "content": "How do you say: “这座桥是去年建的”？"},
  {"role": "user", "content": "This bridge built last year."},
  {"role": "assistant", "content": "Almost! Correct is: “This bridge was built last year.”"},

  {"role": "assistant", "content": "Make your own passive sentence."},
  {"role": "user", "content": "The homework was done by me."},
  {"role": "assistant", "content": "Correct! That’s a good example."},

  {"role": "assistant", "content": "Finally, let’s try conditionals. How would you say: “如果下雨，我就带伞”。"},
  {"role": "user", "content": "If it rains, I will take an umbrella."},
  {"role": "assistant", "content": "Perfect! That’s a first conditional."},

  {"role": "assistant", "content": "Translate: “如果我是你，我就会去”。"},
  {"role": "user", "content": "If I was you, I would go."},
  {"role": "assistant", "content": "Almost! Better is: “If I were you, I would go.”"},

  {"role": "assistant", "content": "How do you say: “如果他努力，他就会成功”。"},
  {"role": "user", "content": "If he works hard, he will succeed."},
  {"role": "assistant", "content": "Excellent! That’s correct."},

  {"role": "assistant", "content": "Translate: “如果她昨天来了，我们会很开心”。"},
  {"role": "user", "content": "If she came yesterday, we will happy."},
  {"role": "assistant", "content": "Close! Correct is: “If she had come yesterday, we would have been happy.”"},

  {"role": "assistant", "content": "Make your own conditional sentence."},
  {"role": "user", "content": "If I study hard, I will pass the exam."},
  {"role": "assistant", "content": "Perfect! Very natural usage."}
]

path = "/api/memory/messages/add"
playload = {
  "collection_name": "education_memory_base",
  "session_id": "message0",
  "messages": messages,
  "metadata": {
    "default_user_id": "user1",
    "default_assistant_id": "assistant1",
    "time": 1678886400000,
    "group_id": "english_class_101"
  },
  "profiles": [
    {
      "profile_type": "english_knowledge_point",
      "profile_scope": [
        {
          "id": 301,
          "knowledge_point_name": "比较级与最高级",
          "good_evaluation_criteria": "能够正确使用比较级和最高级表达进行比较和描述"
        },
        {
          "id": 302,
          "knowledge_point_name": "冠词用法",
          "good_evaluation_criteria": "能够正确使用a、an和the表达泛指、特指以及唯一事物"
        },
        {
          "id": 303,
          "knowledge_point_name": "可数与不可数名词",
          "good_evaluation_criteria": "能够区分可数和不可数名词并正确搭配量词和数词"
        },
        {
          "id": 304,
          "knowledge_point_name": "被动语态",
          "good_evaluation_criteria": "能够正确使用被动语态描述动作的承受者和动作的执行情况"
        },
        {
          "id": 305,
          "knowledge_point_name": "条件句",
          "good_evaluation_criteria": "能够根据语境正确使用不同类型的条件句（零条件句、第一、第二和第三条件句）"
        }
      ]
    }
  ]
}
rsp = internal_request('POST', path, playload)
print(rsp.json())

结果查看

在记忆库控制台中，可以直接查看经过算子计算后的结果，包括用户在某知识点上的平均得分和最高得分。同时，也可以在事件记录中查看每一次问答的具体得分，便于溯源和分析。

附：算子的字段限制

算子	int64事件字段	float32事件字段	string事件字段	bool事件字段	list事件字段	list事件字段
SUM	✅	✅（画像字段类型=int64时，有精度损失）	❌	❌	❌	❌
COUNT	✅	✅	✅	✅	✅	✅
MAX	✅	✅（画像字段类型=int64时，有精度损失）	❌	❌	❌	❌
AVG	✅（画像字段类型=int64时，有精度损失）	✅（画像字段类型=int64时，有精度损失）	❌	❌	❌	❌
LLM_MERGE	✅	✅	✅	✅	✅	✅

最近更新时间：2025.09.30 15:22:30

这个页面对您有帮助吗？

有用

无用

向量数据库VikingDB

为什么需要算子？ #

算子能力概览 #

算子能力最佳实践 #

创建记忆库 #

添加会话 #

结果查看 #