Skip to main content
The Cerulion lion leaping across a rickety bridge over a chasm, leaving a crowd of grey figures behind
ROS 2 set the bar for robotics middleware, and Cerulion is built to clear it. Cerulion is a ground-up rethink of the robotics stack, engineered to be dramatically faster and more deterministic where it counts most today: the single-machine, multi-process real-time graph — and it is expanding outward toward a complete, better-in-every-way replacement. Where ROS 2 carries decades of accumulated complexity, Cerulion starts clean and makes the fast, predictable path the default. This page explains the design choices that make Cerulion faster and why. For the mental model of how it works, start with Core concepts.

Where Cerulion pulls ahead

ROS 2’s architecture carries decades of accumulated complexity: a DDS networking layer, performance features that are opt-in rather than automatic, and execution order that depends on the executor and thread pool. Cerulion is engineered to beat that head-on, turning the frustrations teams feel most on a single machine into deliberate design advantages.

Zero-copy is the default, not an opt-in

In ROS 2, zero-copy means choosing a compatible RMW, using loaned messages, and tuning configuration, so most teams run the default copy-on-receive path and never get it. In Cerulion, zero-copy shared memory is the hot path. A message is written once into a shared-memory slot and read in place, so latency stays flat as payloads grow, with no tuning.

Deterministic, replayable execution

In ROS 2, execution order depends on the executor, thread pool, and DDS scheduling, which makes reproducing a timing bug notoriously hard. In Cerulion, execution order is derived from the graph file and driven by a simulated clock, so a recorded run replays the same way it ran live. Debug once, reproduce exactly.

Predictable, low-jitter tails

ROS 2 tail latency varies with executor and DDS behavior. In Cerulion’s internal benchmarks, 99th-percentile latency stays within roughly 10% of the median across payload sizes: the tail tracks the median rather than spiking.
The Cerulion lion relaxing at a desk with the load monitor reading 0% — predictable, low-overhead execution

Timing violations are loud, not silent

ROS 2’s DDS DEADLINE QoS exists but is easy to misconfigure or have silently ignored by the RMW. Cerulion tracks input, output, and tick-execution deadlines with miss counters and emits a structured warning the moment a deadline slips, which is critical for real-time control loops.

One source of truth, less config sprawl

In ROS 2, behavior is spread across code, launch files, parameter YAMLs, and QoS profiles that can silently conflict. In Cerulion, node behavior lives in the code macro and the graph file defines only wiring. No silent overrides, no config drift.

A shallow learning curve

ROS 2 has a steep ramp: DDS QoS matrices, launch files, parameter plumbing. Cerulion asks you to write a Rust struct plus one macro; the CLI scaffolds the workspace, nodes, and graph for you.

A memory-safe Rust foundation

ROS 2’s core client library (rclcpp) is C++, exposed to whole classes of memory bugs. Cerulion is built in Rust: memory safety without a garbage collector, and predictable performance.

The performance story

Cerulion’s headline advantage is flat latency. Under a saturation (back-to-back) round-trip workload, Cerulion stays in the low-microsecond range (about 2.4 to 2.8 µs) from a 64-byte message all the way to a 16-megabyte message. Standard ROS 2 (CycloneDDS over shared memory), measured at realistic sensor rates, climbs into the milliseconds as payloads grow, because it copies each message on receive. The decisive difference is the shape: Cerulion holds flat exactly where standard ROS 2 falls behind. Round-trip latency, single machine, single publisher/subscriber pair, from internal benchmarks. The Cerulion column is a saturation test; the standard ROS 2 column is measured at sensor rates. “vs. standard” is the ratio of the ROS 2 figure to the Cerulion figure.
PayloadCerulion (saturation test)ROS 2 standard (sensor-rate test)vs. standard
64 B2.4 µs15.3 µs6.3×
256 B2.7 µs25.6 µs9.6×
1 KB2.8 µs21.3 µs7.7×
4 KB2.5 µs29.6 µs12.0×
16 KB2.5 µs23.0 µs9.4×
64 KB2.4 µs19.2 µs8.0×
256 KB2.5 µs228 µs91×
1 MB2.4 µs1.28 ms537×
4 MB2.6 µs6.00 ms2,343×
16 MB2.4 µs26.2 ms10,807×
All figures are internal benchmarks, single-machine, single publisher/subscriber pair. The Cerulion column is a saturation (back-to-back) round-trip test; the “standard ROS 2” column is ROS 2 Humble, CycloneDDS over shared memory on the default receive path, measured at sensor rates (Jazzy and Kilted measured within a few percent). ROS 2 also has an optional zero-copy (“loaned-message”) receive path; against that path Cerulion’s lead is steadier. These are not guarantees and carry no error bars.
Reading the table:
  • At control-loop message sizes, Cerulion is roughly 6 to 12× faster than standard ROS 2.
  • At image and lidar scale, the gap grows into the hundreds to thousands× faster than standard ROS 2 (about 537× at a 1 MB depth frame and about 10,807× at a 16 MB dense scan) because standard ROS 2 copies each message and Cerulion does not.
The path to a fully distributed stack. Cerulion’s vision is a complete, better-in-every-way robotics stack, and cross-machine communication with network-wide topic discovery is the next milestone on that road, landing in the next release. Today Cerulion is purpose-built for the single-machine, multi-process graph — and that same zero-copy, deterministic core is what scales out to the distributed system that comes next.

Next steps

Core concepts

Workspaces, nodes, graphs, topics, and schemas: the mental model.

Quickstart

Go from zero to a running graph you can observe with topic echo.

Define a node

Create a node type, choose a trigger policy, and write tick().

CLI reference

Every command and flag, in one place.