Why ROS2 Is Non-Deterministic
ROS2’s execution model introduces non-determinism at multiple levels: Even with identical inputs, ROS2 produces different execution sequences because the runtime doesn’t control scheduling.OS thread scheduling
OS thread scheduling
ROS2 executors dispatch callbacks across threads managed by the OS. The OS scheduler decides which thread runs next based on system load, other processes, and CPU availability. Two runs of the same graph on the same machine can produce different callback orders.
Callback queue races
Callback queue races
When multiple messages arrive at similar times, the order in which their callbacks fire depends on thread scheduling. A sensor fusion node that receives camera and LiDAR data may process them in different orders on different runs — producing different fusion results.
Timer drift
Timer drift
ROS2’s wall-clock timers drift under CPU load. A 33 ms periodic timer may fire at 31 ms or 36 ms depending on what else is running. This changes the input data available when nodes execute.
How Cerulion Achieves Determinism
Cerulion’s runtime controls three things that ROS2 leaves to chance:Deterministic scheduling
The scheduler evaluates triggers based on graph topology and input data, not OS thread scheduling. Given the same inputs, nodes always execute in the same order.
Zero-copy logging
Zero-copy logging captures the exact input sequence without perturbing the system. Recording doesn’t change timing, so logs faithfully represent what happened.
Deterministic vs Non-Deterministic Execution
Consider a sensor fusion node that combines camera and LiDAR data: In ROS2, a 2 ms difference in arrival time can cause the fusion node to pair the wrong camera frame with the wrong LiDAR scan — and this pairing changes on every run. In Cerulion, the scheduler’s synchronized triggers guarantee consistent pairing.What This Means for Safety
Deterministic resim is the foundation of safety validation. When you can replay a production incident and get the exact same behavior, you can: identify the root cause with certainty, verify that your fix actually prevents the incident, and build regression tests from real-world failures.
Next Steps
Scheduling
How the scheduler controls node execution order
Zero-Copy Logging
Recording without changing behavior
Rust Backend
Memory safety and fearless concurrency
Why Cerulion
Full comparison across 14 dimensions