Quickly deploy the quantized version of DeepSeek-R1 based on TensorRT-LLM--Volcengine Kubernetes Engine-Volcengine

Docs

Docs Console

Volcengine Kubernetes Engine

Document

Volcengine Kubernetes Engine

Document

Volcengine Kubernetes Engine

What's New

Release notes

History feature release notes

History feature release notes (2024)

History feature release notes (2023)

History feature release notes (2022)

Kubernetes version release notes

Release notes of VKE based on Kubernetes v1.34

Release notes of VKE based on Kubernetes v1.32

Release notes of VKE based on Kubernetes v1.30

[Scheduled for EOL] Release notes of VKE based on Kubernetes v1.28

(EOL) Release notes of VKE based on Kubernetes v1.26

(EOL) Release notes of VKE based on Kubernetes v1.24

(EOL) Release notes of VKE based on Kubernetes v1.20

Kubernetes version support policy

Containerd runtime release notes

Introduction to containerd 2.1.x

Add-on release notes

Network add-ons

Storage add-ons

Monitoring add-ons

node-problem-detector

apmplus-server-agent

apmplus-opentelemetry-collector

Scheduling add-ons

DNS add-ons

Security add-ons

application-inspector

pod-identity-webhook

Image add-ons

cr-credential-controller

GPU add-ons

Security announcements

ingress-nginx vulnerability fixed (CVE-2026-1580)

Kubernetes security vulnerability advisory (CVE-2024-10220)

NVIDIA Container Toolkit security vulnerability announcement (CVE-2024-0132)

Ingress-nginx add-on security vulnerability advisory (CVE-2024-7646)

Kubernetes security vulnerability advisory (CVE-2024-21626)

Product Announcement

[Cluster version] Kubernetes Version 1.28 cluster EOL announcement

[Product update] csi-tos add-on EOM

Product update: Notice of amendment to product terms of service and SLA

[Product announcement] The XID error isolation feature is disabled for the nvidia-device-plugin add-on by default

[Cluster version] Kubernetes Version 1.26 cluster EOL announcement

[Product announcement] NGINX Ingress EOM

[Announcements] Patch version upgrade for VKE clusters

[Product changes] FinOps cost management feature removal announcement

[Product changes] Cloud monitoring collection add-on discontinuation announcement

[Product changes] Announcement of changes of configuration names of the VKE

[Product change] Announcement on the EOM of node images related to GPU Driver 470

[Product changes] VKE supports Asia Pacific (Jakarta) region

[Cluster version] Kubernetes Version 1.24 cluster EOL announcement

[Product changes] VKE is officially available in Asia Pacific (Johor) region

[Product change] Changes for partial formats and fields of cloud product events reported to the cloud monitoring event center by the VKE

[Product change] VKE stops creating storage resources of the PTSSD cloud disk type

[Cluster version] Kubernetes Version 1.20 cluster EOL announcement

[Product alteration] SLA modification announcement

[Product changes] The VKE stops supporting the creation of ECS models of the specification series including g2i, c2i, r2i, and ebmg2i

[Product changes] Container logs key value and index fields changes description

[Product changes] Migration announcement of API of the previous version

[Product changes] The VKE suspends supporting creating the ECS of the specification series of g1, c1, r1, and i1

[Product changes] Cross-service authorization permissions convergence announcement

[Product changes] Volcengine Kubernetes Engine (VKE) newly adds supporting and adapting partial ECS instances specifications

[Product changes] The VKE is officially available in China (Shanghai) region

[Product changes] The VKE is officially available in China (Guangzhou) region

[Product changes] Announcement of the official commercialization of the VKE

[Product changes] VKE executes full compatibility with the ECS of a new specification

[Product change] VKE restricts the creation and usage of PTSSD cloud disk storage resources through the whitelist

Overview

What is Volcengine Kubernetes Engine

Benefits

Application scenarios

Features

Glossary

Limit

High-risk operations and recovery solutions

Dependencies between VKE and other cloud services

Regions and AZs

CIS Kubernetes benchmark compliance status

Billing

User guide

Usage notes

Clusters

Cluster overview

Creating clusters

Cluster creation overview

Creating managed clusters

Registering existing clusters

Creating clusters without ECS nodes

Creating a managed heterogeneous computing cluster

Creating IPv4/IPv6 dual-stack clusters

Viewing cluster information

Scaling out clusters

Connect to the cluster

Connecting to clusters

Cluster O&M

O&M overview

Automatic O&M

O&M events

Upgrading clusters

Cluster upgrade checks

Deleting clusters

Deregistering registered clusters

Node pool

Node pool overview

Creating node pools

Creating managed node pools

Creating normal node pools

Create node pools for spot instances

Managing node pools

Scaling out/in node pools

Manually scaling node pools

Elastic scheduled scale-out（General-purpose type）

Elastic scheduled scale-out (Time-period type)

Configuring Kubelet parameters for node pools

Configuring containerd for node pools

Getting started

Guide for beginners

Preparations

Using VKE via the console

Using VKE via kubectl

Deploying applications by using application templates

Nodes

Node overview

Adding worker nodes

Adding existing nodes

Managing nodes

Managing nodes of managed clusters

Managing nodes of registered clusters

Node operation and maintenance

Node resource reservation policies

Calculating schedulable Pod density for nodes

System labels and taints

Authorization management

Authorization overview

IAM user permissions

Granting IAM permissions

Service-specific condition keys and user-defined policies

RBAC permissions

Granting RBAC permissions

Description of RBAC permission roles

IRSA mechanism

IRSA overview

Using IRSA in clusters

Introduction to the VKE service-associated role

Image

Overview of operating system (OS) images

Creating a custom image based on Docker

Creating a custom image based on ECS

Project management

Namespaces

Namespace overview

Creating namespaces

Setting ResourceQuota

Setting LimitRange

Managing namespaces

Networking

Network overview

Cluster network

Expanding cluster subnets

Configuring security groups

Cluster access rules

Access control of API server

Modifying the subnets of API servers

Accessing the API server over IPv6

Container Network

Binding dedicated EIP to pods

Elastic network interface dedicated to a pod with Trunk ENI

Configuring Static IPs for pods

Configuring QoS bandwidth limits for pods

Configuring IPv6 public bandwidth for pods

Using a NetworkPolicy for network access control

Service

Service Overview

ClusterIP

NodePort

LoadBalancer

LoadBalancer Service overview

Considerations for using LoadBalancer Services

Creating CLB services via the console

Creating CLB services through kubectl

Configuring CLB LoadBalancer Services by using annotations

Configuring the IP mode of the LoadBalancer Service

Configuring an NLB Service using annotations

Reusing existing LoadBalancer instances to deploy Services across clusters

Service Management

Ingress

Ingress overview

APIG Ingress

Creating APIG Ingresses in the console

Creating APIG Ingresses by using kubectl

Configuring APIG Ingresses by using annotations

Configuring HTTPS for the APIG Ingress

Using APIG Ingresses to implement gray releases and blue-green deployments

Releasing services using APIG Ingress

ALB Ingress

Overview of ALB instance editions

Considerations for using ALB Ingresses

Creating an ALB Ingress through the console

Creating an ALB Ingress through kubectl

Configuring ALB Ingress using annotation

Customize ALB Ingress forwarding rules

Configuring header-based traffic forwarding

Configuring traffic forwarding based on request methods

Configuring query string-based traffic forwarding

Configuring rewriting for traffic forwarding

Configuring redirection for traffic forwarding

Configuring forward traffic throttling

Configuring fixed responses

Configuring cross-origin traffic forwarding

Configuring traffic forwarding to multiple back-end services

Enabling HTTPS for ALB Ingress

Configuring mutual authentication on HTTPS listeners for ALB Ingresses

Using ALB Ingresses for service deployment

CLB Ingress

Considerations for using CLB Ingresses

Creating CLB Ingresses through the console

Creating CLB Ingresses through kubectl

Configuring CLB Ingress through annotations

Configure the HTTPS protocol for CLB Ingress

Nginx Ingress

Creating NGINX Ingresses in the console

Creating Nginx Ingresses through kubectl

Common annotations of NGINX Ingresses

Configure HTTPS for Nginx Ingress

Connecting NGINX Ingresses to backend services through HTTPS

Configuring Nginx Ingress redirect rules

Configuring URL rewrite rules in Nginx Ingresses

Configuring consistent hashing for load balancing with NGINX Ingresses

Configuring NGINX Ingresses for traffic mirroring

Gray releases and blue-green deployments using Nginx Ingresses

Deploying multiple NGINX Ingress controllers

Managing Ingresses

Migrating Ingresses

Migrating from NGINX Ingresses to APIG Ingresses

Migrating from NGINX Ingresses to ALB Ingresses

DNS

Using NodeLocal DNSCaches as DNS cache proxies

Workloads

Workload overview

Creating workloads

Creating Deployments

Creating StatefulSets

Creating Jobs

Creating DaemonSets

Creating CronJobs

Workload configurations

Cron expressions

Using a secret-free add-on to pull private images to create workloads

Managing workloads

Pods

Creating pods

Managing pods

Viewing pod information

Initializing applications by using init containers

Resource explorer

Configuration management

Creating ConfigMaps

Managing ConfigMaps

Using ConfigMaps in containers

Creating Secrets

Using Secrets in containers

Storage

FSX client

FSX authentication methods

Using IRSA to implement authentication for storage mounting

DMC Volumes

Accessing static DMC PVs by using the FSX client

EBS Volumes

Using static PVs of the EBS type

Using EBS as dynamic PersistentVolumes

Expanding by using dynamically provisioned EBS volumes

Expanding by using statically provisioned EBS data volumes

Specifying formatting parameters

EBS volume snapshots

Snapshot overview

Using EBS snapshots

TOS Volumes

Upgrading CSI for TOS

FSX access methods

Upgrading FSX clients

Accessing static TOS PVs by using the FSX client

Encrypting data for TOS PVs

S3fs access methods

Using static TOS PVs

NAS volumes

FSX access methods

Accessing FileNAS static PV through the FSX client

NFS access methods

Using static FileNAS PVs

Using dynamic FileNAS PVs

EFS volumes

Method selection description

FSX access methods

Accessing static EFS PV through the FSX client

NFS access methods

Using static EFS PVs via NFS

Using dynamic EFS PVs via NFS

vePFS volumes

Method selection description

FSX access methods

Upgrading FSX clients

Accessing static vePFS PV through the FSX client

vePFS Client access methods

Using static vePFS PVs

Using dynamic vePFS PVs

NFS access methods

Using vePFS PVs via NFS

CloudFS volumes

Using static CloudFS PVs

Local volumes

Using dynamic PersistentVolumes created from LVM-managed local storage

Using dynamic PersistentVolumes created from dedicated local storage

Using LVM-managed local volumes as ephemeral volumes

PersistentVolume mount parameters

Autoscale

Autoscaling overview

Node autoscaler

Node scaling overview

Configuring node autoscaling (Custer AutoScaler)

Configuring node instant autoscaling (Karpenter)

Workload autoscaler

Workload scaling overview

Horizontal Pod Autoscaler (HPA)

Autoscaling based on CPU and memory metrics

Achieving autoscaling by using GPU and network metrics

Autoscaling based on custom metrics

Autoscaling based on self-managed Prometheus

Managing HorizontalPodAutoscalers

Cron Horizontal Pod Autoscaler (CronHPA)

Creating CronHPA

Managing CronHPA

Intelligent Horizontal Pod Autoscaler (IHPA)

IHPA overview

Creating IHPA

Horizontal Pod Autoscaler based on event-driven (KEDA)

Autoscaling based on APIG network metrics

Add-on management

Add-on overview

Storage add-ons

Upgrading csi-ebs

Upgrading csi-tos add-on

Upgrading csi-nas add-on

Upgrading add-ons

Uninstalling add-ons

Description of configuration changes when updating add-ons

Description of IP addresses assigned to add-ons on nodes

Heterogeneous computing

GPU scheduling overview

NVIDIA GPU

Configuring NVIDIA GPU scheduling

Implementing NVIDIA GPU scheduling based on DRA

Automatically installing GPU drivers

Custom installation of GPU drivers

Optimizing GPU clock configurations

mGPU scheduling

mGPU overview

Configuring mGPU scheduling

Implementing mGPU computing power allocation

Using mGPU to implement multi-card sharing

Implementing online-offline hybrid deployment based on mGPUs

RDMA high-performance computing

Using RDMA resources (RoCE) in VKE clusters

Using RDMA resources (InfiniBand) in VKE clusters

Scheduling management

task scheduling

Gang scheduling

Capacity scheduling

ResourcePolicy scheduling

Load-aware scheduling

NUMA-aware scheduling

RDMA topology-aware scheduling

Configuring custom scheduling parameters

Using CPU Burst for pods

Observability

Observability overview

Log management

Log overview

Collecting container logs

Collecting container logs by using environment variables

Collecting the logs of the ingress-nginx add-on

Collecting control plane add-on logs

Event monitoring

Event overview

Querying and handling events

Configuring and viewing persistent events

Enabling observability

Basic observation

Overview of basic observability

Service discovery

Custom metric labels

Container service observability

Control plane observability

Using self-managed Prometheus to collect control plane add-on metrics

DNS service observability

Ingress service observability

CNI network observability

Image acceleration observability

AI resource observability

Image repository observability

CSI storage observability

Container storage observability

Check and self-healing observability

Using dashboards

Full stack observation

Full-stack observability overview

Enabling full-stack observability

Resource Search

Capturing network packets

Metric references

Application performance observability

AI profiling

Auditing clusters

Observability of registered clusters

Diagnosis and inspection

Diagnosis and inspection overview

Fault diagnosis

Node check and self-healing

Configuring node check and auto-recovery

Custom node check items

Cluster inspection

Configuring cluster inspection

Inspection item descriptions and fix suggestions

Baseline inspection

Application inspection

Application center

Application templates

Creating Helm applications

Managing applications

Backup center

Backup center overview

TOS warehouse

Adding warehouses

Managing warehouses

Application backup

Creating backup plans

Viewing backup jobs

Managing backup plans

Creating restore jobs

Troubleshooting partially failed jobs

Image cache

Stability

Basic stability capabilities

Workload stability optimization

Kubernetes cluster stability optimization

Large-scale cluster management

Overview of large-scale cluster management

Large-scale cluster version and configuration

Large-scale cluster limits and suggestions

Large-scale cluster O&M guarantee

Best practices

Authorization

IAM authorization best practices

Access control based on IAM for the VKE console

Using the new RBAC authorization system of VKE

Clusters

Using KubeCm to manage the kubeconfig files of multiple clusters

Best practices of cluster upgrade

Using a shared VPC to create clusters

Kubernetes best practices for developers

Nodes and node pools

Best practices of ECS instance selection

Enhancing container resource visibility

Configuring custom kubelet parameters

Implementing custom node scale-in behavior with lifecycle hooks

Accelerating node scale-out by using data volume snapshots

Runtime and operating system

Networking

CoreDNS best practices

Nginx Ingress best practices

Obtaining client IPs by using Ingresses

Best practices for observability of custom Nginx Ingress controllers

Configuring pod security groups and subnets for node pools

Optimizing container kernel network parameters

Container networking

Deploying and running DPDK applications

Storage

Container storage

Scheduling

Injecting node labels or annotations into pods

Observability

Cluster monitoring practices and commonly used dashboards

Monitoring Go applications in clusters

Monitoring application instances deployed in clusters

Configuring service discovery with a PodMonitor

Configuring service discovery with a ServiceMonitor

Best practices for switching alert configuration from Cloud Monitor to VMP

Autoscaling

Cluster autoscaler best practices

HPA best practices

Security

Safely taking nodes offline

Cloud-native container security solution

Best practices for security group configuration

Image acceleration

P2P acceleration

Image lazy loading solutions

Solution

Cloud-based architecture solution for large-scale concurrent business systems

Optimizing costs of infrastructure resources in cloud-native scenarios

Setting up refined cloud workflows based on Argo Workflows and Serverless Kubernetes

Workload migration solutions for disaster recovery from node failures

Cloud-native migration solution of Volcano Engine

Security and compliance

Shared security responsibility for VKE

API reference

Call methods

Cluster management

ListSupportedResourceTypes

NodePool management

CreateDefaultNodePool

DeleteScalingPolicies

Node management

CreateNodes

ListNodes

DeleteNodes

Virtual nodes management

Add-on management

Tag management

Access policy management

GrantPermission

ListPermissions

RevokePermission

Instance images management

ListSupportedImages

ListInstanceTypeLabels

ListSupportedGpuDriverVersions

ListSupportedAddInstanceTypes

Volcengine Container Instance

Image Cache

UpdateImageCacheRetentionDays

Appendix

Resource status description

SDK reference

Development guide

Terraform

Managing clusters created by Terraform

Creating VKE clusters without ECS nodes by using Terraform

Managing existing clusters by using Terraform

Managing node pools created via Terraform

Managing existing nodes in the default node pool by using Terraform

Managing existing nodes in a custom node pool by using Terraform

FAQs

General FAQs

What is the general process for running applications by using VKE?

What container runtime does VKE use?

How do I set an allowlist when VKE accesses an RDS for MySQL database?

How do I apply to use a beta feature?

FAQ about using the command line for Base64 encoding

Cluster FAQ

cluster management

How do I view the Kubernetes version of a cluster?

Why cannot I delete a cluster that fails to be created?

How do I modify the instance type of the CLB instance associated with the API server during cluster creation?

What are the purposes of the CLB instances that are automatically created during cluster creation?

How do I use a cluster to create a CP private resource pool?

How do I obtain node information of a cluster?

cluster network

How do I manually enable Public NAT for Internet endpoint for a cluster?

Which types of network models do VKE clusters support?

How do I obtain the EIP of a cluster?

How do I obtain the EIP of public traffic egress for a cluster?

cluster upgrade

Will network add-on upgrades before cluster upgrades cause traffic interruption?

Will business traffic be interrupted during the cluster control plane upgrade?

Will business pods that run as expected restart during a cluster upgrade?

How long does it take to upgrade a cluster?

How do I upgrade data plane nodes?

Will custom kubelet parameters on a node be overwritten during node upgrade?

Registered cluster FAQ

How do I handle the disconnection of an agent-registered cluster caused by a missing agent?

Node and Node pool FAQs

How do I view the containerd version of a node?

How do I handle RDMA network connectivity issues for nodes with multiple A100/A800 GPUs?

Why do node configurations not change after the corresponding ECS instance configurations are modified in the ECS console?

What is the default node pool?

How do I view the allocatable resources of a node?

Why cannot nodes automatically scale in?

How do I view node resource usage?

How do I clear residual data from the data volume of an existing node?

How does the expected number of nodes change?

Authorization FAQs

How do I grant cluster-level resource permissions to IAM users?

Why does an IAM user fail to access an ALB Ingress?

How do I fix the "AccessDenied,Code:403" error when performing operations on cluster resources?

Why is an IAM user with cluster administrator permissions unable to create YAML resources?

Workload FAQs

Why is the pod instance always in the "ContainerCreating" state after the workload is created?

How do I collect container initialization logs?

How do I fix workload deployment failures with the "0/16 nodes are available" error?

How to disable the IPv6 protocol stack in a pod

How do I update the CoreDNS replica count and resource quotas?

Pod exception troubleshooting

How do I fix a failure in scheduling a pod in the Pending state?

How do I fix a pod in the CrashLoopBackOff state?

How do I fix a pod that remains in the ImagePullBackOff state?

How do I fix pod health check failures?

Service and Ingress FAQs

Why do I receive the "lb is provisioning" event when configuring a Layer 4 LoadBalancer via Annotation?

Why does a "loadbalancer for service not found" event occur when I configure Layer 4 load balancing using Annotation?

Why does the "InvalidServerGroupListener.InUse" event occur when I delete a Service?

After a workload is unbounded from a Service, why is it still accessible by using the address of the Service?

How do I create multiple services at a time?

Why does Webhook validation report a "Service not found" error during Ingress creation?

What are the differences between session persistence in LoadBalancer Services and session persistence in CLB Listeners?

How do I resolve the issue where the Ingress Controller itself is inaccessible?

What should I do if TCP or UDP services are inaccessible via the Ingress?

How do I fix the "SSL_ERROR_RX_RECORD_TOO_LONG" error for HTTPS access?

Why is the default or old certificate used for access requests after the cluster has a TLS certificate added or modified?

How do I fix a "failed calling webhook" error when creating an Ingress?

How do I troubleshoot an Ingress that is not working as expected?

How do I fix the problem of not retaining source IP addresses in Ingress pods?

Why am I unable to connect to the gRPC service exposed via an Ingress?

Why does a canary Ingress fail to take effect?

How do I handle an incorrect canary Ingress or its impact on other traffic?

Why am I unable to connect to a back-end HTTPS service?

How do I fix common HTTP error codes?

How do I fix the 404 error reported when I access the resources via an Ingress with the rewrite-target setting?

How do I fix the "net::ERR_HTTP2_SERVER_REFUSED_STREAM" error returned from a resource access request?

Why do sessions time out and disconnect during large file downloads that take over 1 minute when a service is exposed externally via ingress-nginx?

How do I fix a cross-origin Ingress access failure?

How do I handle missing HTTP headers in requests?

How do I fix an Nginx Ingress add-on installation failure?

How do I resolve issues where configuration changes fail to take effect, either fully or partially?

How do I handle a dropped persistent connection?

How do I handle unresponsive Ingress requests?

How do I fix Ingress NginxController startup errors caused by a large number of ConfigMaps?

Storage FAQs

How do I provision cloud disk PVs when a cluster contains nodes of various ECS instance specifications?

How do I fix issues where a Kubernetes cluster fails to mount PV or PVC cloud disks with existing data?

How do I resolve container creation errors when mounting the root directory of a TOS bucket to a PV?

How do I modify the TOS mount configuration to grant others read and write access to TOS PersistentVolumes?

Heterogeneous computing FAQs

Why does the process SM utilization not match the computing power proportion configured for the mGPU?

How to deal with the error in an RDMA-enabled environment?

Autoscale FAQ

Why are excess pods added during HPA rolling updates?

Scheduling FAQs

How do I handle the impact of the katalyst add-on upgrade?

Observability FAQs

Why am I unable to query the kube_namespace_label metric?

Quota FAQs

How do I increase quotas?

Add-on FAQs

How do I handle add-on scheduling failures?

How do I handle add-on installation failures caused by resource name conflicts?

How do I handle the issue where an add-on remains in the Installing, Updating, or Uninstalling state for a long period of time?

How do I handle add-on uninstallation failures?

How to determine if a node-local-dns add-on is used?

How do I handle add-on startup failures?

How do I handle add-ons with the error message "Image pull failed"?

How do I restore system-required add-ons that are in an abnormal status?

Image FAQ

How do I disable the GSP configuration in a public image?

Technical service FAQs

Cloud-native AI

AI cloud-native release notes

ServingKit

ServingKit overview

Supported inference models

LLM applications

Deploying LLM applications (DeepSeek and Qwen)

Deploying general-purpose LLM template applications

ComfyUI application

Deploying ComfyUI

Using ComfyUI to achieve universal migration

Using ComfyUI to implement digital human lip-sync podcasts

ComfyUI FAQs

Managing and using AI applications

AI application observability

Deploying the Al inference application through Helm

Helm deployment overview

Deepseek practices

Quickly deploying full-version DeepSeek-R1 on SGLang

Quickly deploying the full-version model DeepSeek-R1 based on SGLang (PD disaggregation)

Quickly deploying the full-version model DeepSeek-V3/R1 based on xLLM (PD disaggregation)

Quickly deploy the quantized version of DeepSeek-R1 based on TensorRT-LLM

Quickly deploying the full-version model DeepSeek-R1-0528 based on SGLang

Quickly deploying the SGLang-based DeepSeek-R1-0528/V3-0424 full version (with PD disaggregation)

Quickly deploy the DeepSeek-R1 (Quantitative version) based on SGLang

Deploying DeepSeek-V3.1 based on SGLang

Qwen practices

Quickly deploying Qwen3-235B based on Dynamo and vLLM (PD disaggregation)

Quick deployment of Qwen3-32B-FP8 based on SGlang

Quickly deploying Qwen3-235B-A22B-FP8 based on SGLang

Quick deployment of Qwen3-30B-A3B-FP8 based on SGlang

Kimi practice

Quickly deploying Kimi-K2-Instruct based on SGLang

GPT

Deploying gpt-oss-120b based on vLLM

Seed practices

Deploying Seed-OSS-36B-Instruct based on vLLM

Method of testing EIC performance

Best practices

Quickly deploying DeepSeek-V4-Pro by using the LLM General Template

Quickly deploying DeepSeek-V4-Flash using the LLM General Template

Quickly deploying the GLM-5 model by using the LLM general-purpose template

Quickly deploying the GLM-5.1 model by using the LLM general-purpose template

Quickly deploying the MiniMax-M2.5 model by using the LLM general-purpose template

Quickly deploying the Qwen3.5 model by using the LLM general-purpose template

Quickly deploying the Kimi K2.5 model by using the LLM general-purpose template

Using AI container images

ChatOps: Using the MCP service to manage VKE clusters

TrainingKit

PPO training on the GSM8K dataset with veRL

High-performance communication practices of veCCL

Conducting RL for code generation through veRL Code Sandbox

Monitoring AI training jobs

Volcengine Container Instance

What's new

Release notes

Historical feature release notes

Historical release notes (2024)

Historical release notes (2023)

Historical release notes (2022 and earlier)

Product Announcement

[EOS announcement] End of sale for n1 and n2i instance families of VCI

[Product update] Description of the impact of VCI pod eviction and protection policies on resource status

[Product update] Decoupling of VCI log collection capability from the metadata server

[Product update] Description of changes in key-value indexed fields of VCI logs

[Product update]: VCI becomes available in the China (Guangzhou) region

[Product update]: VCI is available in the China (Shanghai) region

[Product update]: Official commercialization of VCI

Overview

What is VCI?

Product advantages

Instance families

VCI instance specifications

General-purpose specifications

GPU-accelerated specifications

Billing

Billing methods

Reserved Container Instances

Arrears Explanation

Tag-based bill splitting

Project-based bill splitting

Getting started

VCI quick guide

VCI preparations

Using VCI in the VKE console

Using VCI through kubectl

User guide

Integrating with VCI

Using VCI in VPC-CNI clusters

Using VCI in Flannel clusters

Using VCIs by configuring vci-profile

Virtual node

VKE cluster uses virtual nodes

Virtual Kubelet and virtual nodes

Managing virtual nodes

Self-managed/third-party cluster uses virtual node

Creating virtual nodes

Elastically using VCI for self-managed or cross-cloud Kubernetes clusters

Creating an instance

General purpose instance

Creating general-purpose instances by specifying container vCPUs and memory

Creating general-purpose instances by specifying VCI specifications

GPU-accelerated computing instance

Creating instances from a specified instance family (GPU resources)

Creating GPU-accelerated instances based on the specified VCI specification

Supported GPU driver versions

Creating instances by ignoring specific container resource requirements

Creating instances and configuring proxy cache repositories

Pulling images from self-built image registries during instance creation

VCI Agent Sandbox

Networking

Creating instances by specifying subnets

Creating instances by specifying security groups

Configuration of IPv4/IPv6 dual-stack networks

Storage

Persistence of container rootfs data

Logs

Overview

Collecting VCI container logs via TLS

Collecting VCI container logs via a sidecar

Monitoring

Monitoring VCI with VMP

Using self-built Prometheus to monitor VCI

Obtaining VCI metrics through virtual nodes

VCI GPU metrics

Configuring monitoring and alerting of VCI disk usage

Events

VCI event monitoring overview

Summary of VCI event types

O&M

Persistent storage of container core dump files

VCI pod eviction and protection

Scheduling

VCI inventory-aware scheduling

Container configuration

Configuring security contexts

Configuring the startup and exit priorities of containers

In-place restart of VCI pods

Instance metadata

Instance metadata overview

Instance metadata

Querying instance metadata

VCI Image Cache

Overview

Creating VCIs by using automatically created image caches

Creating VCIs by using manually created image caches

Managing VCI image caches by using CRD

Pod annotation descriptions

Kubectl commands supported by VCI

Best practices

Using VCI to run Spark data processing jobs

Defining appropriate Java 8 heap memory in containers

Best practices for VCI troubleshooting

Deploying OpenClaw by using VCI

Security and compliance

Shared security responsibility for VCI

FAQs

How do I handle VCI pod scheduling when a cluster contains subnets within multiple availability zones?

How do I schedule a VCI pod to a specified availability zone?

How do I handle a FailedCreateSystemDisk event?

How do I troubleshoot pod failures when using partitioned data volumes?

How do I handle the ProviderFailed error when creating a pod?

How do I resolve the error "exec user process caused: exec format error" when creating a pod?

How do I avoid pulling images after using an image cache?

How do I fix a situation where a VCI pod remains in the Pending state with no image pull events?

Terms

Service-Specific Terms for Volcano Engine Volcengine Kubernetes Engine

Volcano Engine VKE_Service Level Agreement

Volcano Engine VCI_Service Level Agreement

Upgrading VPC-CNI add-on

Documentation

Volcengine Kubernetes Engine

Cloud-native AI

ServingKit

Deploying the Al inference application through Helm

Deepseek practices

Quickly deploy the quantized version of DeepSeek-R1 based on TensorRT-LLM

Copy page

Download PDF

Deepseek practices

Quickly deploy the quantized version of DeepSeek-R1 based on TensorRT-LLM

Copy page

Download PDF

Quickly deploy the quantized version of DeepSeek-R1 based on TensorRT-LLM

Last updated: 2026.03.16 14:55:26

Public security network record in Beijing: No. 11010802032137