Schema System

Cerulion Core’s schema system provides automatic type generation from YAML schema files. This build-time code generation ensures type safety, consistency across languages, and eliminates manual struct definitions.

What is the Schema System?

The schema system is a build-time code generation architecture that transforms YAML schema definitions into type-safe message structs. Instead of manually writing struct definitions, you define your message types once in YAML, and Cerulion Core generates the code automatically during compilation. Think of it as a compiler for your data types - you write the specification, and the system produces optimized, type-safe code that integrates seamlessly with Cerulion Core’s transport and serialization layers.

Design Philosophy

Single Source of Truth

Define types once in YAML, use everywhere. No duplication between languages or components.

Type Safety

Generated code is type-checked at compile time (Rust) or runtime (Python/C++). Prevents compatibility errors.

Consistency

Same schema produces identical binary layouts across all languages. Guaranteed interoperability.

Zero Boilerplate

No manual struct definitions. Code generation happens automatically during build.

Memory Layout Control

All types use repr(C) for predictable memory layout. Enables zero-copy local transport.

Copy Semantics

Generated types implement Copy, enabling efficient zero-copy shared memory transport.

Schema Format

Cerulion Core supports two schema formats: V2 format (recommended) and legacy format (backward compatible).

V2 Format (Recommended)

V2 format uses inline field syntax inspired by ROS2:

schemas:
  SensorData:
    description: Temperature and pressure sensor readings
    fields:
      float temperature:
        description: Temperature in Celsius
      float pressure:
        description: Pressure in hPa
      uint64 timestamp:
        description: Timestamp in nanoseconds

Field Syntax: type[size] name:

fields:
  # Simple types
  float temperature:
  uint64 timestamp:
  bool is_active:
  
  # Fixed-size arrays
  float[3] position:
  uint8[16] uuid:
  
  # Fixed-size strings (Copy-compatible)
  string_fixed[32] username:
  
  # Custom types
  Position3D location:
  
  # Arrays of custom types
  Position3D[4] waypoints:

The inline syntax float[20] positions: is much more concise than the legacy format’s separate name, type, and size fields.

Key Features

Multi-Schema Files Define multiple related types in one file:

schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:
  
  Quaternion:
    fields:
      float w:
      float x:
      float y:
      float z:
  
  Pose:
    fields:
      Position3D position:
      Quaternion orientation:

Shared Index Groups Define named array indices once, reuse across multiple fields:

index_groups:
  joint_names:
    shoulder_left: 0
    elbow_left: 1
    wrist_left: 2
    # ... more joints

schemas:
  JointState:
    fields:
      float[20] positions:
        indexes: joint_names  # Reference shared group
      float[20] velocities:
        indexes: joint_names  # Reuse same names
      float[20] effort:
        indexes: joint_names  # No duplication

This generates named accessors for all fields:

// All three arrays have the same named accessors
state.positions.shoulder_left();
state.positions.set_elbow_left(1.5);

state.velocities.shoulder_left();
state.effort.shoulder_left();

Imports Share common definitions across schema files:

# common_types.yaml
schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:

# robot_state.yaml
imports:
  - common_types.yaml

schemas:
  RobotState:
    fields:
      Position3D base_position:  # Imported type

Legacy Format

Legacy format is still supported for backward compatibility:

name: SensorData
rust:
  derives: [Clone, Copy, Debug]
  repr: C

fields:
  - name: temperature
    type: float
  - name: pressure
    type: float
  - name: timestamp
    type: uint64

The parser automatically detects format version. V2 format is recommended for new schemas, but legacy format continues to work.

Build Process Architecture

Code generation happens automatically during cargo build:

Build Process Steps

Schema Discovery: build.rs scans schemas/ directory for YAML files
Parsing: Parser detects format version (V2 vs legacy) and parses schemas
Validation: Type checker validates field types and resolves dependencies
Code Generation: Generates Rust structs with appropriate derives and layout
Output: Writes generated code to OUT_DIR (build output directory)
Inclusion: Source code uses include!() macro to include generated code
Compilation: Rust compiler validates generated code with full type checking

Generated files are placed in target/debug/build/cerulion-*/out/ and included via include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs")).

Generated Code Structure

Simple Message

Schema:

schemas:
  SensorData:
    fields:
      float temperature:
      float pressure:
      uint64 timestamp:

Generated Rust:

#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct SensorData {
    pub temperature: f32,
    pub pressure: f32,
    pub timestamp: u64,
}

Key Characteristics:

Clone, Copy: Enables efficient zero-copy transport
Debug: Essential for development and debugging
repr(C): Ensures predictable memory layout across languages
All fields public: Direct access for high-performance code

Arrays with Partial Naming

Schema:

schemas:
  JointState:
    fields:
      float[20] positions:
        indexes:
          shoulder_left: 0
          elbow_left: 1
          wrist_left: 2
      uint64 timestamp:

Generated Rust:

#[derive(Clone, Copy, Debug)]
#[repr(transparent)]
pub struct PositionsArray {
    data: [f32; 20],
}

impl PositionsArray {
    // Named accessors
    pub fn shoulder_left(&self) -> f32 { self.data[0] }
    pub fn set_shoulder_left(&mut self, val: f32) { self.data[0] = val; }
    
    pub fn elbow_left(&self) -> f32 { self.data[1] }
    pub fn set_elbow_left(&mut self, val: f32) { self.data[1] = val; }
    
    // Generic accessors
    pub fn get(&self, index: usize) -> Option<f32> { ... }
    pub fn set(&mut self, index: usize, val: f32) -> Result<(), ()> { ... }
}

#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct JointState {
    pub positions: PositionsArray,
    pub timestamp: u64,
}

Partial naming lets you name only the array indices you know, while still accessing all elements. This is ideal for robot joints where you know some joint names but not all.

Nested Custom Types

Schema:

schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:
  
  Pose:
    fields:
      Position3D position:
      Quaternion orientation:

Generated Rust (dependency-ordered):

// Position3D generated first
#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct Position3D {
    pub x: f32,
    pub y: f32,
    pub z: f32,
}

// Pose generated after its dependencies
#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct Pose {
    pub position: Position3D,
    pub orientation: Quaternion,
}

Type Mappings

Schema types map to language-specific types with consistent memory layout:

Schema Type	Rust Type	Size	Copy?
`bool`	`bool`	1	✅
`int8`	`i8`	1	✅
`int16`	`i16`	2	✅
`int32`	`i32`	4	✅
`int64`	`i64`	8	✅
`uint8`	`u8`	1	✅
`uint16`	`u16`	2	✅
`uint32`	`u32`	4	✅
`uint64`	`u64`	8	✅
`float`	`f32`	4	✅
`double`	`f64`	8	✅
`string_fixed[N]`	`[u8; N]`	N	✅
`T[N]`	`[T; N]`	N×sizeof(T)	✅

All generated types implement Copy, making them compatible with Cerulion Core’s zero-copy local transport and efficient network serialization.

Multi-Language Support

The schema system is designed for multi-language code generation:

Language	Status	Generator	Binary Compatible
Rust	✅ Working	`build.rs` (automatic)	Native
Python	⏳ Designed	Future script	✅ Yes
C++	⏳ Designed	Future script	✅ Yes

Binary Compatibility

All languages will generate types with identical memory layouts: Rust:

#[repr(C)]
pub struct SensorData {
    pub temperature: f32,  // offset 0
    pub pressure: f32,     // offset 4
    pub timestamp: u64,    // offset 8
}

Python (future):

@dataclass
class SensorData:
    temperature: float  # offset 0
    pressure: float     # offset 4
    timestamp: int      # offset 8
    
    _FORMAT: ClassVar[str] = '<ffQ'  # Little-endian, 2 floats, 1 uint64

C++ (future):

struct SensorData {
    float temperature;  // offset 0
    float pressure;     // offset 4
    uint64_t timestamp; // offset 8
} __attribute__((packed));

The repr(C) layout ensures that the same bytes can be interpreted correctly across all languages, enabling true zero-copy multi-language communication.

Integration with Cerulion Core

The schema system integrates with other Cerulion Core components:

With Serialization

Generated types work seamlessly with the serialization system:

include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs"));

let data = SensorData {
    temperature: 23.5,
    pressure: 1013.25,
    timestamp: 1234567890,
};

// Automatic serialization via Copy trait
let bytes = data.to_bytes()?;  // For network transport

Local transport: Zero-copy via shared memory (no serialization)
Network transport: Automatic byte serialization using Copy trait
Type safety: Compile-time guarantees in Rust

See the Serialization section for details.

With Publisher & Subscriber

Generated types are used directly with Publisher and Subscriber APIs:

include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs"));

// Create publisher with generated type
let publisher = Publisher::<SensorData>::create("sensors")?;

// Send generated type
publisher.send(SensorData {
    temperature: 23.5,
    pressure: 1013.25,
    timestamp: 1234567890,
})?;

// Receive on subscriber
let subscriber = Subscriber::<SensorData>::create("sensors")?;
let data = subscriber.receive()?;
println!("Temperature: {}", data.temperature);

The type parameter ensures that publisher and subscriber agree on message types at compile time. See the Publisher & Subscriber section for details.

With Topic Manager

Topic Manager validates that topics use consistent message types:

let manager = TopicManager::new()?;

// Register types with topic names
manager.register_publisher::<SensorData>("sensors")?;
manager.register_subscriber::<SensorData>("sensors")?;

// Type mismatch detected at compile time
// manager.register_subscriber::<DifferentType>("sensors")?;  // ❌ Compile error

See the Topic Manager section for details.

Best Practices

1. Use V2 Format for New Schemas

# ✅ Good: V2 format (concise, modern)
schemas:
  SensorData:
    fields:
      float temperature:
      uint64 timestamp:

# ⚠️ Works but verbose: Legacy format
name: SensorData
fields:
  - name: temperature
    type: float

# ✅ Good: Related types together
schemas:
  Position3D:
    fields: ...
  Quaternion:
    fields: ...
  Pose:
    fields: ...
    
# ⚠️ Less organized: Separate files
# position3d.yaml, quaternion.yaml, pose.yaml

Benefits:

Clear dependencies between types
Easier to maintain
Reduces file switching
Single import for related types

3. Use Shared Index Groups for Arrays

# ✅ Good: Define once, reuse
index_groups:
  joint_names:
    shoulder_left: 0
    elbow_left: 1

schemas:
  JointState:
    fields:
      float[20] positions:
        indexes: joint_names
      float[20] velocities:
        indexes: joint_names  # Reuse!

# ❌ Bad: Duplicate index definitions
schemas:
  JointState:
    fields:
      float[20] positions:
        indexes:
          shoulder_left: 0
          elbow_left: 1
      float[20] velocities:
        indexes:
          shoulder_left: 0  # Duplication!
          elbow_left: 1

4. Document Your Schemas

# ✅ Good: Documented
schemas:
  SensorData:
    description: Temperature and pressure sensor readings from environmental sensors
    fields:
      float temperature:
        description: Temperature in Celsius (-40 to 85°C range)
      float pressure:
        description: Pressure in hPa (sea level reference)

# ⚠️ Less maintainable: No documentation
schemas:
  SensorData:
    fields:
      float temperature:
      float pressure:

5. Keep Types Small and Focused

# ✅ Good: Small, focused types
schemas:
  Temperature:
    fields:
      float celsius:
      uint64 timestamp:
  
  Pressure:
    fields:
      float hpa:
      uint64 timestamp:

# ⚠️ Less flexible: Large, monolithic type
schemas:
  AllSensorData:
    fields:
      float temperature:
      float pressure:
      float humidity:
      float wind_speed:
      # ... 20 more fields

Benefits of small types:

Easier to version independently
More reusable across different contexts
Lower memory overhead when only subset needed
Clearer semantic meaning

Memory Layout and Performance

Memory Alignment

Generated types use repr(C) for predictable memory layout:

#[repr(C)]
pub struct SensorData {
    pub temperature: f32,  // offset: 0, size: 4
    pub pressure: f32,     // offset: 4, size: 4
    pub timestamp: u64,    // offset: 8, size: 8
}
// Total size: 16 bytes

This ensures:

Consistent layout across compiler versions
Binary compatibility with C/C++
Predictable serialization size
Efficient zero-copy shared memory access

Copy Semantics

All generated types implement Copy:

let data1 = SensorData { ... };
let data2 = data1;  // Copied, not moved
// data1 still usable here

Benefits:

Works with iceoryx2 zero-copy transport
Efficient stack allocation
No heap allocations or reference counting
Predictable performance

Limitation: Only Copy-compatible types allowed (no Vec, String, etc.)

For dynamic-sized data, use fixed-size arrays (float[100]) or fixed-size strings (string_fixed[256]) instead.

Next Steps

Quick Start

Get started with your first schema and generated code

Serialization

Learn how generated types are serialized

Publisher & Subscriber

Use generated types with Publisher and Subscriber

Components

See how schemas fit into the component architecture

Cerulion Graph Editor

Cerulion Core

Cerulion RCL Hooks

Cerulion Visualization

​Schema System

​What is the Schema System?

​Design Philosophy

Single Source of Truth

Type Safety

Consistency

Zero Boilerplate

Memory Layout Control

Copy Semantics

​Schema Format

​V2 Format (Recommended)

​Key Features

​Legacy Format

​Build Process Architecture

​Build Process Steps

​Generated Code Structure

​Simple Message

​Arrays with Partial Naming

​Nested Custom Types

​Type Mappings

​Multi-Language Support

​Binary Compatibility

​Integration with Cerulion Core

​With Serialization

​With Publisher & Subscriber

​With Topic Manager

​Best Practices

​1. Use V2 Format for New Schemas

​2. Group Related Types in Multi-Schema Files

​3. Use Shared Index Groups for Arrays

​4. Document Your Schemas

​5. Keep Types Small and Focused

​Memory Layout and Performance

​Memory Alignment

​Copy Semantics

​Next Steps

Quick Start

Serialization

Publisher & Subscriber

Components

Schema System

What is the Schema System?

Design Philosophy

Schema Format

V2 Format (Recommended)

Key Features

Legacy Format

Build Process Architecture

Build Process Steps

Generated Code Structure

Simple Message

Arrays with Partial Naming

Nested Custom Types

Type Mappings

Multi-Language Support

Binary Compatibility

Integration with Cerulion Core

With Serialization

With Publisher & Subscriber

With Topic Manager

Best Practices

1. Use V2 Format for New Schemas

2. Group Related Types in Multi-Schema Files

3. Use Shared Index Groups for Arrays

4. Document Your Schemas

5. Keep Types Small and Focused

Memory Layout and Performance

Memory Alignment

Copy Semantics

Next Steps