为何Flink默认采用Yarn作为资源管理器？是否考虑过Kubernetes？

阿华AIGC实验室

2026-5-20

Why Flink Defaults to Yarn (and Its Kubernetes Journey)

Great question—this ties into both Flink's early ecosystem roots and the evolution of container orchestration tools. Let's unpack it:

Timing and Ecosystem Alignment
When Flink was gaining traction in the mid-2010s, YARN was already the de facto resource manager for big data workloads. Most enterprises running big data had existing Hadoop clusters, using YARN to manage everything from MapReduce to Spark jobs. By defaulting to YARN, Flink lowered the barrier to entry: users could run Flink on their existing infrastructure without spinning up a new resource management layer. This was a critical move to get adopted within the dominant Hadoop ecosystem.
Kubernetes Wasn't Ready for Big Data Back Then
Kubernetes launched its 1.0 version in 2014, around the same time Flink was maturing. But early K8s lacked the features needed for robust big data workloads:
- Poor support for long-running streaming jobs (Flink's bread and butter)
- Less granular resource scheduling compared to YARN, which was built specifically for batch/stream processing
- Limited integration with big data storage systems like HDFS or HBase
Back then, K8s was primarily focused on microservices, not resource-heavy, long-running data pipelines.
Flink Has Fully Embraced Kubernetes Now
Don't let the default setting fool you—Kubernetes is now a first-class citizen in Flink's deployment options. The community started adding native K8s support around version 1.10, and today it's one of the most popular ways to run Flink, especially in cloud-native environments. You can:
- Submit jobs directly to K8s with commands like flink run -t kubernetes-application
- Use the official Flink Kubernetes Operator to manage cluster lifecycle, scaling, and recovery
- Leverage K8s' strengths like auto-scaling, self-healing, and integration with other cloud-native tools (monitoring, logging, etc.)

If you're working with Kubernetes today, you're in good shape—Flink's K8s support is mature, stable, and designed to take advantage of cloud-native benefits. No need to stick to Yarn unless you have existing Hadoop infrastructure that you need to integrate with.

内容的提问来源于stack exchange，提问作者Anu