Async Design
Cerulion Core uses a dual-path architecture where local publishing is synchronous and fast, while network publishing happens asynchronously on a background thread. This design ensures high availability and optimal performance.What is Async Design?
Think of Cerulion Core’s async design like a restaurant with two service counters: a fast walk-up counter for customers already in the building, and a delivery service that handles orders for customers across town. The walk-up counter serves customers immediately (like local communication), while the delivery service works in the background without slowing down the fast counter (like network communication). When you publish a message in Cerulion Core, it takes two paths simultaneously:- Local path: Your message is delivered instantly to subscribers on the same machine using shared memory - like handing a package directly to someone in the same room. This happens in less than 1 microsecond.
- Network path: Your message is also queued for delivery to subscribers on other machines over the network - like putting a package in a delivery truck that drives away. This happens in the background without blocking your code.
Async means operations happen independently without blocking each other. In Cerulion Core, network publishing happens asynchronously on a background thread, so it never slows down your local publishing operations.
Why Use This Design?
Ultra-Low Latency
Local subscribers receive messages in less than 1 microsecond. The fast path is never blocked by slow network operations.
Non-Blocking Operations
Your code never waits for network I/O. Publishing returns immediately, and network sends happen in the background.
High Availability
Network failures don’t affect local communication. Your system keeps working even when the network is down or slow.
Simple API
No async/await complexity. Just call
publisher.send(data) and it works - the system handles threading automatically.Latest-Message Semantics
Network subscribers always get the most current data. If messages arrive faster than they can be sent, only the latest is kept.
Automatic Thread Management
Background threads are created and cleaned up automatically. You don’t need to manage thread lifecycles yourself.
How It Works
Cerulion Core’s Publisher implements a dual-path architecture that optimizes for both local and network communication:The Two Paths
Local Path (Fast)- Uses zero-copy shared memory (iceoryx2)
- Synchronous operation - returns immediately
- Latency: < 1 μs for small messages
- Perfect for same-machine communication
- Uses Zenoh for network transport
- Asynchronous operation - happens on a background thread
- Latency: 1-10 ms (network dependent)
- Never blocks the local path
Zero-copy means data is shared directly in memory without making copies. This is why local communication is so fast - there’s no serialization or copying overhead.
Architecture Diagram
Here’s how the dual-path architecture works when you callpublisher.send(data):
Step-by-Step Flow
When you callpublisher.send(data), here’s exactly what happens:
1
Local Zero-Copy Send (Fast Path)
Data is written directly to shared memory via iceoryx2. This is synchronous and returns immediately.Latency: < 1 μs
Blocking: Minimal (only memory allocation)
Blocking: Minimal (only memory allocation)
2
Queue for Network (Non-Blocking)
Message is queued to a background thread using a Mutex with latest-message semantics. The queue only holds one message (the latest).Operation: Non-blocking
Queue Size: 1 (only latest message kept) Time: ~30 ns (negligible)
Queue Size: 1 (only latest message kept) Time: ~30 ns (negligible)
Mutex is a synchronization primitive that ensures only one thread accesses shared data at a time. In this case, it protects the message queue.
3
Background Thread Serialization
A background thread takes the message from the queue and serializes it to bytes. This happens outside the lock, so it doesn’t block anything.Operation: Happens outside the lock
Time: 1-10 μs (depends on message size)
Time: 1-10 μs (depends on message size)
4
Network Send
Serialized bytes are sent over Zenoh network to remote subscribers.Operation: Asynchronous
Time: 1-10 ms (network dependent)
Time: 1-10 ms (network dependent)
Network latency varies based on network conditions. The important thing is this never blocks your local publishing.
The key insight is that serialization and network I/O happen outside the lock. This means the fast path (local send) is never blocked by slow network operations.
Latest-Message Semantics
What Are Latest-Message Semantics?
Latest-message semantics means that if multiple messages arrive faster than they can be sent over the network, only the most recent message is kept. Think of it like a news ticker that always shows the latest headline - older headlines are replaced by newer ones. This design choice ensures that network subscribers always receive the most current data, even if the network is slow or messages are arriving very quickly.Why This Design?
The network path uses a queue with capacity of 1, implemented using a Mutex:- ✅ True latest-message: Only the most recent message is queued
- ✅ Minimal lock contention: Lock is held for only ~30 ns
- ✅ No blocking: Fast path never waits for network
- ✅ Automatic cleanup: Old messages are automatically replaced
Alternative Designs Considered
We evaluated several alternatives before choosing this design: 1. Synchronous Network Send (Rejected)The chosen design (
Mutex<Option<T>> with background thread) provides the best balance of simplicity, performance, and true latest-message semantics.Performance Characteristics
Fast Path (Local Send)
The local path uses zero-copy shared memory for maximum performance:The Rust implementation uses iceoryx2 for zero-copy local transport. Python and C++ implementations will use Zenoh for network transport until local zero-copy support is added.
- Shared memory allocation: ~100-500 ns
- Memory write: ~10-50 ns
- Total: < 1 μs
Async Path (Network Send)
The network path adds minimal overhead to the fast path: Sending side (fast path):- Mutex lock: ~10 ns
- Memory write: ~10 ns
- Mutex unlock: ~10 ns
- Total added to fast path: ~30 ns
- Take message from mutex: ~30 ns (with lock)
- Serialization: ~1-10 μs (NO LOCK HELD)
- Network send: ~1-10 ms (NO LOCK HELD)
- Sleep if no message: 100 μs (prevents busy-wait)
Performance Summary
| Operation | Latency | Blocking |
|---|---|---|
| Local send | < 1 μs | Minimal (memory alloc) |
| Network send (queue) | ~30 ns | Non-blocking |
| Network send (serialization) | 1-10 μs | Non-blocking (background thread) |
| Network send (zenoh) | 1-10 ms | Non-blocking (background thread) |
| Local receive | < 1 μs | Non-blocking |
| Network receive | 1-10 ms | Non-blocking |
The network path adds only ~30 ns to the fast path. All slow operations (serialization, network I/O) happen on a background thread and don’t block local publishing.
Thread Lifecycle
Thread Creation
When you create a Publisher, a background thread is automatically spawned to handle network publishing:Thread Shutdown
When the Publisher is dropped, the background thread is cleanly shut down:Clean shutdown ensures no messages are lost on exit. The background thread processes any remaining messages before terminating.
Error Handling
Errors in the network path are logged but do not propagate to the caller. This ensures that network failures never affect local communication:Trade-offs
Advantages
- ✅ Ultra-low latency for local subscribers
- ✅ Non-blocking send operation
- ✅ Resilient to network failures
- ✅ Simple API - no async/await complexity
- ✅ Automatic thread management
- ✅ Latest-message semantics - network subscribers always get current data
Considerations
- ⚠️ Eventual consistency: Network subscribers may see messages slightly delayed
- ⚠️ Message dropping: If network is very slow, some messages may be skipped (only latest kept)
- ⚠️ No backpressure: Publisher doesn’t slow down for slow network subscribers
Backpressure is when a slow consumer causes the producer to slow down. Cerulion Core doesn’t implement backpressure - it prioritizes local performance over network delivery guarantees.
When to Use This Design
Good fit:- High-frequency sensor data (e.g., 1 kHz+)
- Time-critical local processing
- Monitoring/telemetry where occasional drops are acceptable
- Systems where local latency is paramount
- Critical commands that must reach network subscribers
- Low-frequency high-value messages
- Systems requiring delivery guarantees
- Transactions requiring acknowledgments
Implementation Details
Lock Contention
The mutex is held for only ~30 ns per operation, ensuring minimal contention: Sending side:- Acquire lock (~10 ns)
- Replace message (~10 ns)
- Release lock (~10 ns)
- Total: ~30 ns holding lock
- Acquire lock (~10 ns)
- Take message with
.take()(~10 ns) - Release lock (~10 ns)
- Then serialize outside the lock (1-10 μs, doesn’t block anything)
- Then send over network (1-10 ms, doesn’t block anything)
Serialization and network I/O happen outside the lock. This means the fast path is never blocked by slow serialization or network latency.
Background Thread Behavior
The background thread follows this pattern:- Checks for new messages periodically
- Takes message from mutex (if available)
- Serializes message (outside lock)
- Sends over network (outside lock)
- Sleeps briefly if no message (prevents busy-wait)
The thread sleeps for 100 μs when no message is available. This prevents busy-waiting while still providing low latency for new messages.