Performance

Cerulion Core is designed for high-performance real-time applications. This page covers performance characteristics, optimization tips, and benchmarking guidance.

Performance Characteristics

Latency Measurements

Operation	Transport	Latency	Notes
Local send	iceoryx2	< 1 μs	Zero-copy shared memory
Local receive	iceoryx2	< 1 μs	Direct memory read
Network send (queue)	Background thread	~30 ns	Added to fast path
Network send (serialization)	Background thread	1-10 μs	Depends on message size
Network send (zenoh)	Background thread	1-10 ms	Network dependent
Network receive	Zenoh	1-10 ms	Network dependent

Local communication latency is sub-microsecond for small messages. Network communication adds minimal overhead to the fast path (~30 ns) because serialization and network I/O happen on a background thread.

Throughput

Local transport:

Limited by memory bandwidth
Typical: 10+ GB/s for small messages
Scales with message size (larger messages = higher throughput)

Network transport:

Limited by network bandwidth
Typical: 1-10 Gbps (depends on network)
Serialization overhead: ~1-10 μs per message

For high-throughput applications, local transport is ideal. Network transport is suitable for lower-frequency data or distributed systems.

Optimization Tips

1. Use Local Transport When Possible

// ✅ Good: Local transport for same-machine communication
let subscriber = Subscriber::<T>::create("topic", None)?;  // Auto-detects local

// ⚠️ Only if needed: Force network for remote communication
let subscriber = Subscriber::<T>::create("topic", Some(true))?;

Local transport provides < 1 μs latency vs 1-10 ms for network. Use local transport whenever possible.

2. Keep Message Types Small

// ✅ Good: Small, focused message type
#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct SensorData {
    temperature: f32,  // 4 bytes
    timestamp: u64,    // 8 bytes
}  // Total: 12 bytes

// ⚠️ Consider: Large messages increase serialization time
#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct LargeData {
    buffer: [u8; 1_000_000],  // 1MB per message!
}

Small messages (< 1 KB) have minimal serialization overhead. Large messages (> 100 KB) may benefit from compression or chunking.

3. Use Copy Types

// ✅ Good: Copy type (zero-copy local transport)
#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct Data {
    value: f32,
}

// ❌ Bad: Non-Copy type (requires serialization even for local)
struct Data {
    value: String,  // Not Copy!
}

Copy types enable zero-copy local transport. Non-Copy types require serialization even for local communication, adding overhead.

4. Batch Messages When Possible

// ✅ Good: Batch multiple values into one message
#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct BatchData {
    values: [f32; 100],  // 100 values in one message
}

// ⚠️ Less efficient: Send 100 separate messages
for value in values {
    publisher.send(SingleValue { value })?;  // 100 sends
}

Batching reduces the number of send operations and can improve throughput, especially for network transport.

5. Use TopicManager for Multiple Topics

// ✅ Good: Shared session reduces overhead
let manager = TopicManager::create()?;
let pub1 = manager.register_publisher::<T1>("topic1", false)?;
let pub2 = manager.register_publisher::<T2>("topic2", false)?;

// ⚠️ Less efficient: Each publisher creates its own session
let pub1 = Publisher::<T1>::create("topic1")?;
let pub2 = Publisher::<T2>::create("topic2")?;

TopicManager shares a single Zenoh session across all publishers/subscribers, reducing memory and connection overhead.

Benchmarking

Local Transport Benchmark

Here’s a simple benchmark for local transport:

use cerulion_core::prelude::*;
use std::time::Instant;

#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct TestData {
    value: f32,
    timestamp: u64,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let publisher = Publisher::<TestData>::create("test")?;
    let subscriber = Subscriber::<TestData>::create("test", None)?;
    
    // Warmup
    for _ in 0..100 {
        publisher.send(TestData { value: 1.0, timestamp: 0 })?;
        let _ = subscriber.receive();
    }
    
    // Benchmark
    let iterations = 1_000_000;
    let start = Instant::now();
    
    for i in 0..iterations {
        publisher.send(TestData {
            value: i as f32,
            timestamp: i as u64,
        })?;
        
        if let Ok(Some(_)) = subscriber.receive() {
            // Message received
        }
    }
    
    let elapsed = start.elapsed();
    let latency = elapsed.as_nanos() / iterations as u128;
    
    println!("Average latency: {} ns", latency);
    println!("Throughput: {:.2} M messages/s", 
             iterations as f64 / elapsed.as_secs_f64() / 1_000_000.0);
    
    Ok(())
}

Expected results for local transport: < 1 μs average latency, 10+ M messages/s throughput for small messages.

Network Transport Benchmark

For network transport, measure end-to-end latency:

use cerulion_core::prelude::*;
use std::time::{Instant, SystemTime};

#[derive(Copy, Clone, Debug)]
#[repr(C)]
struct TimestampedData {
    value: f32,
    send_time: u64,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let publisher = Publisher::<TimestampedData>::create("test")?;
    let subscriber = Subscriber::<TimestampedData>::create("test", Some(true))?;  // Force network
    
    // Wait for connection
    std::thread::sleep(std::time::Duration::from_secs(1));
    
    let iterations = 1000;
    let mut latencies = Vec::new();
    
    for i in 0..iterations {
        let send_time = SystemTime::now()
            .duration_since(SystemTime::UNIX_EPOCH)?
            .as_nanos() as u64;
        
        publisher.send(TimestampedData {
            value: i as f32,
            send_time,
        })?;
        
        if let Ok(Some(data)) = subscriber.receive() {
            let receive_time = SystemTime::now()
                .duration_since(SystemTime::UNIX_EPOCH)?
                .as_nanos() as u64;
            
            let latency = receive_time - data.send_time;
            latencies.push(latency);
        }
        
        std::thread::sleep(std::time::Duration::from_millis(10));
    }
    
    let avg_latency = latencies.iter().sum::<u64>() / latencies.len() as u64;
    println!("Average network latency: {} ns ({:.2} ms)", 
             avg_latency, avg_latency as f64 / 1_000_000.0);
    
    Ok(())
}

Network latency depends on network conditions. Typical values: 1-10 ms for local network, 10-100 ms for WAN.

Performance Considerations

Message Size Impact

Message Size	Local Latency	Network Serialization	Network Latency
16 bytes	< 1 μs	~1 μs	1-10 ms
1 KB	< 1 μs	~5 μs	1-10 ms
100 KB	< 10 μs	~50 μs	1-10 ms
1 MB	< 100 μs	~500 μs	1-10 ms

Local transport latency is relatively insensitive to message size (memory bandwidth is high). Network serialization time increases with message size, but network latency dominates for larger messages.

CPU Usage

Local transport:

Minimal CPU usage (< 1% for 1 MHz message rate)
CPU usage scales with message rate

Network transport:

Background thread handles serialization and network I/O
Main thread CPU usage: ~30 ns per send (just queueing)
Background thread CPU usage: Depends on message size and rate

Network transport is designed to minimize impact on the main thread. Serialization and network I/O happen on a background thread, so they don’t block your application.

Best Practices Summary

✅ Use local transport when possible (< 1 μs vs 1-10 ms)
✅ Keep messages small (< 1 KB for best performance)
✅ Use Copy types for zero-copy local transport
✅ Batch messages when sending multiple values
✅ Use TopicManager for multiple topics (shared session)
✅ Profile your application to identify bottlenecks
✅ Consider message frequency - high frequency benefits more from local transport

Troubleshooting Performance Issues

”High latency on local transport”

Possible causes:

Message type is not Copy
Message size is very large
System is under heavy load

Solutions:

Ensure message type implements Copy
Reduce message size or batch messages
Check system load and CPU usage

”Network messages not arriving”

Possible causes:

Network transport not enabled
Network connectivity issues
Subscriber not connected

Solutions:

Check that network transport is enabled
Verify network connectivity
Ensure subscriber is created before publisher sends

”High CPU usage”

Possible causes:

Very high message rate
Large message sizes
Multiple background threads

Solutions:

Reduce message rate if possible
Reduce message size
Use TopicManager to share sessions

Next Steps

Architecture

Understand the performance characteristics of each component

Async Design

Learn how async network publishing affects performance

Publisher & Subscriber

Optimize your Publisher and Subscriber usage

Troubleshooting

Debug performance issues

Cerulion Graph Editor

Cerulion Core

Cerulion RCL Hooks

Cerulion Visualization

Cerulion CLI

Cerulion Launch

Cerulion Serialization

Performance

Performance

Performance Characteristics

Latency Measurements

Throughput

Optimization Tips

1. Use Local Transport When Possible

2. Keep Message Types Small

3. Use Copy Types

4. Batch Messages When Possible

5. Use TopicManager for Multiple Topics

Benchmarking

Local Transport Benchmark

Network Transport Benchmark

Performance Considerations

Message Size Impact

CPU Usage

Best Practices Summary

Troubleshooting Performance Issues

”High latency on local transport”

”Network messages not arriving”

”High CPU usage”

Next Steps

Architecture

Async Design

Publisher & Subscriber

Troubleshooting

Cerulion Graph Editor

Cerulion Core

Cerulion RCL Hooks

Cerulion Visualization

Cerulion CLI

Cerulion Launch

Cerulion Serialization

​Performance

​Performance Characteristics

​Latency Measurements

​Throughput

​Optimization Tips

​1. Use Local Transport When Possible

​2. Keep Message Types Small

​3. Use Copy Types

​4. Batch Messages When Possible

​5. Use TopicManager for Multiple Topics

​Benchmarking

​Local Transport Benchmark

​Network Transport Benchmark

​Performance Considerations

​Message Size Impact

​CPU Usage

​Best Practices Summary

​Troubleshooting Performance Issues

​”High latency on local transport”

​”Network messages not arriving”

​”High CPU usage”

​Next Steps

Architecture

Async Design

Publisher & Subscriber

Troubleshooting

Performance

Performance Characteristics

Latency Measurements

Throughput

Optimization Tips

1. Use Local Transport When Possible

2. Keep Message Types Small

3. Use Copy Types

4. Batch Messages When Possible

5. Use TopicManager for Multiple Topics

Benchmarking

Local Transport Benchmark

Network Transport Benchmark

Performance Considerations

Message Size Impact

CPU Usage

Best Practices Summary

Troubleshooting Performance Issues

”High latency on local transport”

”Network messages not arriving”

”High CPU usage”

Next Steps