Skip to main content

Async Design

Cerulion Core uses a dual-path architecture where local publishing is synchronous and fast, while network publishing happens asynchronously on a background thread. This design ensures high availability and optimal performance.

What is Async Design?

Think of Cerulion Core’s async design like a restaurant with two service counters: a fast walk-up counter for customers already in the building, and a delivery service that handles orders for customers across town. The walk-up counter serves customers immediately (like local communication), while the delivery service works in the background without slowing down the fast counter (like network communication). When you publish a message in Cerulion Core, it takes two paths simultaneously:
  • Local path: Your message is delivered instantly to subscribers on the same machine using shared memory - like handing a package directly to someone in the same room. This happens in less than 1 microsecond.
  • Network path: Your message is also queued for delivery to subscribers on other machines over the network - like putting a package in a delivery truck that drives away. This happens in the background without blocking your code.
The key insight is that local communication never waits for network operations. Even if the network is slow or unavailable, your local subscribers still get messages instantly. This dual-path architecture ensures high availability and optimal performance.
Async means operations happen independently without blocking each other. In Cerulion Core, network publishing happens asynchronously on a background thread, so it never slows down your local publishing operations.

Why Use This Design?

Ultra-Low Latency

Local subscribers receive messages in less than 1 microsecond. The fast path is never blocked by slow network operations.

Non-Blocking Operations

Your code never waits for network I/O. Publishing returns immediately, and network sends happen in the background.

High Availability

Network failures don’t affect local communication. Your system keeps working even when the network is down or slow.

Simple API

No async/await complexity. Just call publisher.send(data) and it works - the system handles threading automatically.

Latest-Message Semantics

Network subscribers always get the most current data. If messages arrive faster than they can be sent, only the latest is kept.

Automatic Thread Management

Background threads are created and cleaned up automatically. You don’t need to manage thread lifecycles yourself.

How It Works

Cerulion Core’s Publisher implements a dual-path architecture that optimizes for both local and network communication:

The Two Paths

Local Path (Fast)
  • Uses zero-copy shared memory (iceoryx2)
  • Synchronous operation - returns immediately
  • Latency: < 1 μs for small messages
  • Perfect for same-machine communication
Network Path (Background)
  • Uses Zenoh for network transport
  • Asynchronous operation - happens on a background thread
  • Latency: 1-10 ms (network dependent)
  • Never blocks the local path
Zero-copy means data is shared directly in memory without making copies. This is why local communication is so fast - there’s no serialization or copying overhead.

Architecture Diagram

Here’s how the dual-path architecture works when you call publisher.send(data):

Step-by-Step Flow

When you call publisher.send(data), here’s exactly what happens:
1

Local Zero-Copy Send (Fast Path)

Data is written directly to shared memory via iceoryx2. This is synchronous and returns immediately.Latency: < 1 μs
Blocking: Minimal (only memory allocation)
This happens first and returns immediately. Your code continues without waiting.
2

Queue for Network (Non-Blocking)

Message is queued to a background thread using a Mutex with latest-message semantics. The queue only holds one message (the latest).Operation: Non-blocking
Queue Size: 1 (only latest message kept) Time: ~30 ns (negligible)
Mutex is a synchronization primitive that ensures only one thread accesses shared data at a time. In this case, it protects the message queue.
3

Background Thread Serialization

A background thread takes the message from the queue and serializes it to bytes. This happens outside the lock, so it doesn’t block anything.Operation: Happens outside the lock
Time: 1-10 μs (depends on message size)
Serialization converts your data structure into bytes that can be sent over the network. This happens in the background, not on your main thread.
4

Network Send

Serialized bytes are sent over Zenoh network to remote subscribers.Operation: Asynchronous
Time: 1-10 ms (network dependent)
Network latency varies based on network conditions. The important thing is this never blocks your local publishing.
The key insight is that serialization and network I/O happen outside the lock. This means the fast path (local send) is never blocked by slow network operations.

Latest-Message Semantics

What Are Latest-Message Semantics?

Latest-message semantics means that if multiple messages arrive faster than they can be sent over the network, only the most recent message is kept. Think of it like a news ticker that always shows the latest headline - older headlines are replaced by newer ones. This design choice ensures that network subscribers always receive the most current data, even if the network is slow or messages are arriving very quickly.

Why This Design?

The network path uses a queue with capacity of 1, implemented using a Mutex:
// Shared Mutex holding the latest message (or None if nothing pending)
let latest_message = Arc::new(Mutex::new(None));

// On each send, replace whatever was there with the new message
{
    let mut guard = self.latest_message.lock().unwrap();
    *guard = Some(data);  // Always overwrites with latest
}
Benefits of this approach:
  • True latest-message: Only the most recent message is queued
  • Minimal lock contention: Lock is held for only ~30 ns
  • No blocking: Fast path never waits for network
  • Automatic cleanup: Old messages are automatically replaced

Alternative Designs Considered

We evaluated several alternatives before choosing this design: 1. Synchronous Network Send (Rejected)
// ❌ Blocks fast path
local_sample.send()?;
network_publisher.put(bytes).wait()?;  // 1-10ms latency!
This would make every publish wait for network I/O, destroying the performance benefits. 2. Channel with try_send (Rejected)
// ❌ Drops NEW message if channel is full (wrong semantics!)
let (tx, rx) = mpsc::sync_channel::<T>(1);
let _ = tx.try_send(data);  // Fails if full, drops NEW message!
This has the wrong semantics - it would drop new messages instead of old ones. 3. Larger Channel Buffer (Rejected)
// ❌ Could send very old messages when network is slow
let (tx, rx) = mpsc::sync_channel::<T>(100);
This could send very stale data when the network is slow, defeating the purpose of latest-message semantics. 4. Tokio Async Runtime (Rejected)
// ❌ Adds heavy dependency, complexity
tokio::spawn(async move {
    network_publisher.put(bytes).await
});
This adds unnecessary complexity and a heavy dependency for a simple use case.
The chosen design (Mutex<Option<T>> with background thread) provides the best balance of simplicity, performance, and true latest-message semantics.

Performance Characteristics

Fast Path (Local Send)

The local path uses zero-copy shared memory for maximum performance:
let size = mem::size_of::<T>();
let mut local_sample = unsafe { 
    self.local_publisher.loan_slice_uninit(size)?.assume_init() 
};
unsafe {
    let data_ptr = local_sample.as_mut_ptr() as *mut T;
    data_ptr.write(data);
}
local_sample.send()?;  // Zero-copy via shared memory
The Rust implementation uses iceoryx2 for zero-copy local transport. Python and C++ implementations will use Zenoh for network transport until local zero-copy support is added.
Cost breakdown:
  • Shared memory allocation: ~100-500 ns
  • Memory write: ~10-50 ns
  • Total: < 1 μs

Async Path (Network Send)

The network path adds minimal overhead to the fast path: Sending side (fast path):
{
    let mut guard = self.latest_message.lock().unwrap();
    *guard = Some(data);  // Replace with latest
}  // Lock released here (~30 ns total)
Cost breakdown (added to fast path):
  • Mutex lock: ~10 ns
  • Memory write: ~10 ns
  • Mutex unlock: ~10 ns
  • Total added to fast path: ~30 ns
Background thread (not on critical path):
let data = {
    let mut guard = latest_message.lock().unwrap();
    guard.take()  // Remove message from mutex
};  // Lock released here (~30 ns total)

if let Some(data) = data {
    // Serialization happens OUTSIDE lock (1-10 μs)
    let bytes = data.to_bytes()?;
    
    // Network send happens OUTSIDE lock (1-10 ms)
    publisher.put(bytes).wait()?;
}
Cost breakdown (background thread):
  • Take message from mutex: ~30 ns (with lock)
  • Serialization: ~1-10 μs (NO LOCK HELD)
  • Network send: ~1-10 ms (NO LOCK HELD)
  • Sleep if no message: 100 μs (prevents busy-wait)

Performance Summary

OperationLatencyBlocking
Local send< 1 μsMinimal (memory alloc)
Network send (queue)~30 nsNon-blocking
Network send (serialization)1-10 μsNon-blocking (background thread)
Network send (zenoh)1-10 msNon-blocking (background thread)
Local receive< 1 μsNon-blocking
Network receive1-10 msNon-blocking
The network path adds only ~30 ns to the fast path. All slow operations (serialization, network I/O) happen on a background thread and don’t block local publishing.

Thread Lifecycle

Thread Creation

When you create a Publisher, a background thread is automatically spawned to handle network publishing:
// On Publisher::create():
// 1. Spawn background thread
let network_thread = thread::spawn(move || {
    Self::network_thread(topic, rx, shutdown)
});

// 2. Store thread handle
Publisher {
    network_thread: Some(network_thread),
    shutdown: Arc::new(AtomicBool::new(false)),
    // ...
}

Thread Shutdown

When the Publisher is dropped, the background thread is cleanly shut down:
// On Publisher::drop():
impl Drop for Publisher {
    fn drop(&mut self) {
        // 1. Signal shutdown
        self.shutdown.store(true, Ordering::Relaxed);
        let _ = self.network_sender.try_send(NetworkMessage::Shutdown);
        
        // 2. Wait for thread to finish
        if let Some(handle) = self.network_thread.take() {
            let _ = handle.join();
        }
    }
}
Clean shutdown ensures no messages are lost on exit. The background thread processes any remaining messages before terminating.

Error Handling

Errors in the network path are logged but do not propagate to the caller. This ensures that network failures never affect local communication:
// In background thread:
match data.to_bytes() {
    Ok(bytes) => {
        if let Err(e) = publisher.put(bytes).wait() {
            eprintln!("Failed to publish to zenoh: {}", e);
            // Continues processing next message
        }
    }
    Err(e) => {
        eprintln!("Failed to serialize data: {}", e);
        // Continues processing next message
    }
}
Rationale: Network failures should not affect local (fast-path) communication. The system prioritizes availability over guaranteed network delivery.
Network errors are logged but don’t propagate. Check logs if network communication isn’t working. Local communication continues even if the network is down.

Trade-offs

Advantages

  • Ultra-low latency for local subscribers
  • Non-blocking send operation
  • Resilient to network failures
  • Simple API - no async/await complexity
  • Automatic thread management
  • Latest-message semantics - network subscribers always get current data

Considerations

  • ⚠️ Eventual consistency: Network subscribers may see messages slightly delayed
  • ⚠️ Message dropping: If network is very slow, some messages may be skipped (only latest kept)
  • ⚠️ No backpressure: Publisher doesn’t slow down for slow network subscribers
Backpressure is when a slow consumer causes the producer to slow down. Cerulion Core doesn’t implement backpressure - it prioritizes local performance over network delivery guarantees.

When to Use This Design

Good fit:
  • High-frequency sensor data (e.g., 1 kHz+)
  • Time-critical local processing
  • Monitoring/telemetry where occasional drops are acceptable
  • Systems where local latency is paramount
Not ideal for:
  • Critical commands that must reach network subscribers
  • Low-frequency high-value messages
  • Systems requiring delivery guarantees
  • Transactions requiring acknowledgments

Implementation Details

Lock Contention

The mutex is held for only ~30 ns per operation, ensuring minimal contention: Sending side:
  1. Acquire lock (~10 ns)
  2. Replace message (~10 ns)
  3. Release lock (~10 ns)
  4. Total: ~30 ns holding lock
Receiving side:
  1. Acquire lock (~10 ns)
  2. Take message with .take() (~10 ns)
  3. Release lock (~10 ns)
  4. Then serialize outside the lock (1-10 μs, doesn’t block anything)
  5. Then send over network (1-10 ms, doesn’t block anything)
Serialization and network I/O happen outside the lock. This means the fast path is never blocked by slow serialization or network latency.

Background Thread Behavior

The background thread follows this pattern:
  1. Checks for new messages periodically
  2. Takes message from mutex (if available)
  3. Serializes message (outside lock)
  4. Sends over network (outside lock)
  5. Sleeps briefly if no message (prevents busy-wait)
The thread sleeps for 100 μs when no message is available. This prevents busy-waiting while still providing low latency for new messages.

Next Steps

Ready to learn more? Explore these related topics: