本文介绍了如何使用边缘大模型网关平台预置的目标检测智能体。
边缘大模型网关预置目标检测智能体。该智能体能够识别各种目标物体,包括不同类别、形状、大小、颜色的物体。
要使用目标检测智能体,您需要:
创建一个网关访问密钥,并为该密钥绑定 目标检测智能体。相关操作,请参见调用平台预置智能体。
获取网关访问密钥的 API key。相关操作,请参见查看密钥(API Key)。
调用目标检测智能体 API 执行目标检测任务。关于 API 的使用说明,请参见 API 使用方法。
目标检测智能体的使用方式整体上符合 OpenAI 标准 Chat 接口,仅有微小差异。您可以参考 OpenAI 相关文档 进行调用。具体差异,请参见与 OpenAI 的不同之处。
以下是对单张图片进行检测的示例:
curl "https://ai-gateway.vei.volces.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $YOUR_API_KEY" \
-d '{
"model": "AG-object-detection-agent",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "请检测出其中的苹果"
},
{
"type": "image_url",
"image_url": {"url": "b64_img_url"}
}
]
}
],
"stream": false
}'
向目标检测智能体发送的请求中,关于目标的描述:
示例
# pip install openai
# https://platform.openai.com/docs/api-reference
import base64
from openai import OpenAI
client = OpenAI(
base_url="https://ai-gateway.vei.volces.com/v1",
api_key="YOUR_API_KEY",
)
def img_to_base64(img_path):
if img_path.find("http") >= 0:
base64_str = base64.b64encode(httpx.get(img_path).content).decode("utf-8")
else:
with open(img_path, "rb") as f:
base64_str = base64.b64encode(f.read()).decode("utf-8")
return base64_str
# 示例1: 中文描述的带属性目标检测
text = "检测出其中戴红色安全帽的人"
image_fn = "./test_data/dod_helmets.jpg"
# #示例2: 中文描述的带属性目标数量统计
# text = "有几个红苹果"
# image_fn = "./test_data/dod_apples.jpg"
# #示例3: 中文描述的普通目标检测
# text = "苹果"
# image_fn = "./test_data/dod_apples.jpg"
b64 = img_to_base64(image_fn)
completion = client.chat.completions.create(
model="AG-object-detection-agent",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": text},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64}"},
}
],
},
],
max_tokens=300,
)
print(completion)
目标检测能体可以返回更详尽的目标检测结果:检测结果存放在 jdata["choices"][0]["message"]["content"]
中,包含“scores”、“labels”、“boxes”三个字段。其中:
完整返回结果示例:
{
"id": "AG-object-detection-agent-1740727485120",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": {
"scores": [0.43657687306404114,0.42889276146888733,0.3822806775569916,0.3763623833656311,0.3696218729019165,0.3690265417098999],
"labels": [1,1,1,0,1,1],
"boxes": [[878.8975811004639,258.3551917076111,1130.8835220336914,891.4355516433716],
[826.282527923584,279.6627961397171,962.5131340026855,699.0053584575653],
[486.19521975517273,273.1498453617096,613.9511375427246,782.4513545036316],
[77.88547253608704,268.2019966840744,293.77442049980164,877.022209405899],
[225.4987235069275,298.51445257663727,285.8189606666565,656.4610722064972],
[235.5941677093506,268.90106761455536,555.3854942321777,887.1528959274292]],
"refusal": null,
"role": "assistant",
"audio": null,
"function_call": null,
"tool_calls": null
}
}
],
"created": 1740727485120,
"model": "AG-object-detection-agent",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": "",
"usage": {
"completion_tokens": 57,
"prompt_tokens": 1503,
"total_tokens": 1560,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}