为何Actor系统默认采用至多一次（at-most-once）消息投递机制？

阿华AIGC实验室

2026-4-28

Why Actor Models Default to At-Most-Once Delivery (Even for Event-Sourced Systems)

Great question—this is a common point of confusion when moving from intuitive "reliable first" thinking to understanding the tradeoffs baked into Actor model design, especially with Akka. Let’s break this down, using your document word count example as a reference.

Core Reasons for the At-Most-Once Default

1. Performance & Minimalism Are Foundational

At-most-once is the lightest possible message delivery mechanism: no message persistence, no tracking of delivery states, no retry logic, no coordination between sender and receiver beyond a basic send. For frameworks like Akka, built to scale to massive distributed systems with high throughput and low latency, keeping the core Actor runtime as lean as possible is non-negotiable.

Adding at-least-once guarantees by default would force every Actor to carry unnecessary overhead for many use cases:

Storing messages until confirmation is received
Managing retry queues and backoff policies
Handling duplicate messages (requiring idempotency logic)

These costs make no sense for scenarios like real-time metrics collection (a lost heartbeat doesn’t break the system) or transient notifications (duplicates are harmless). The framework defaults to the lowest-overhead foundation, letting you layer in reliability when your workflow demands it.

2. The Actor Model’s Philosophy of "Choice Over Mandates"

Actor models are designed to give developers granular control over system behavior, not enforce one-size-fits-all reliability. The runtime handles core Actor lifecycle, message routing, and concurrency—you decide how much reliability your specific workflow needs.

The Akka manual’s line about "occasional message loss not causing inconsistency" refers to scenario-level tolerance, not "this specific message doesn’t matter." For example:

In a real-time chat system, losing a single "typing" indicator is irrelevant
In a sensor data pipeline, one missing reading might be averaged out by subsequent data

Your word count system is a case where loss does matter—but that’s a business-specific requirement, not a universal one. The framework doesn’t assume every message is critical; it lets you build the safeguards you need on top.

3. At-Least-Once Isn’t "Free" (Even With Idempotency)

You’re right that idempotency (via message versioning or IDs) solves duplicate issues—but that’s still extra work, and it doesn’t eliminate all overhead. Mapping this to your example:

Actor M would need to track which line-count messages it’s received (storing IDs or sequence numbers)
Sender Actors (C) would need to persist their count messages until they get an acknowledgment from M, or retry if no ack arrives
You’d have to handle edge cases like retries arriving after M has already completed the total count, or network partitions delaying acks indefinitely

These are solvable problems, but they’re not trivial—and they shouldn’t be forced on every Actor in the system. Akka provides tools like Akka Reliable Delivery and Akka Persistence to handle exactly these scenarios, but they’re optional extensions, not part of the core runtime.

Addressing Your Word Count Example

In your scenario, at-most-once delivery does create silent failure risk. But the solution isn’t to criticize the default—it’s to leverage Akka’s built-in tools to add the reliability you need:

Use Akka Persistence for Actor M’s state: persist each line-count event, so if M restarts, it can rebuild the total count from the event log.
Use Akka Reliable Delivery between C and M: this component handles at-least-once delivery with automatic retries and duplicate detection, so you don’t have to build tracking logic yourself.
For idempotency, assign each line-count message a unique ID (like document ID + line number) and have M ignore duplicates when processing.

This way, you’re building a reliable event-sourced system on top of the lightweight Actor runtime, rather than fighting against the default.

Final Takeaway

The at-most-once default isn’t an oversight—it’s a deliberate tradeoff between performance, flexibility, and simplicity. Actor frameworks prioritize giving you a fast, scalable foundation, then let you layer in reliability guarantees when your business logic demands it. For event-sourced systems specifically, Akka’s persistence and reliable delivery tools are designed to work seamlessly with the core Actor model to solve exactly the problems you’re facing.

内容的提问来源于stack exchange，提问作者Nimrod Sadeh