Skip to main content

Schema System

Cerulion Core’s schema system provides automatic type generation from YAML schema files. This build-time code generation ensures type safety, consistency across languages, and eliminates manual struct definitions.

What is the Schema System?

The schema system is a build-time code generation architecture that transforms YAML schema definitions into type-safe message structs. Instead of manually writing struct definitions, you define your message types once in YAML, and Cerulion Core generates the code automatically during compilation. Think of it as a compiler for your data types - you write the specification, and the system produces optimized, type-safe code that integrates seamlessly with Cerulion Core’s transport and serialization layers.

Design Philosophy

Single Source of Truth

Define types once in YAML, use everywhere. No duplication between languages or components.

Type Safety

Generated code is type-checked at compile time (Rust) or runtime (Python/C++). Prevents compatibility errors.

Consistency

Same schema produces identical binary layouts across all languages. Guaranteed interoperability.

Zero Boilerplate

No manual struct definitions. Code generation happens automatically during build.

Memory Layout Control

All types use repr(C) for predictable memory layout. Enables zero-copy local transport.

Copy Semantics

Generated types implement Copy, enabling efficient zero-copy shared memory transport.

Schema Format

Cerulion Core supports two schema formats: V2 format (recommended) and legacy format (backward compatible). V2 format uses inline field syntax inspired by ROS2:
schemas:
  SensorData:
    description: Temperature and pressure sensor readings
    fields:
      float temperature:
        description: Temperature in Celsius
      float pressure:
        description: Pressure in hPa
      uint64 timestamp:
        description: Timestamp in nanoseconds
Field Syntax: type[size] name:
fields:
  # Simple types
  float temperature:
  uint64 timestamp:
  bool is_active:
  
  # Fixed-size arrays
  float[3] position:
  uint8[16] uuid:
  
  # Fixed-size strings (Copy-compatible)
  string_fixed[32] username:
  
  # Custom types
  Position3D location:
  
  # Arrays of custom types
  Position3D[4] waypoints:
The inline syntax float[20] positions: is much more concise than the legacy format’s separate name, type, and size fields.

Key Features

Multi-Schema Files Define multiple related types in one file:
schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:
  
  Quaternion:
    fields:
      float w:
      float x:
      float y:
      float z:
  
  Pose:
    fields:
      Position3D position:
      Quaternion orientation:
Shared Index Groups Define named array indices once, reuse across multiple fields:
index_groups:
  joint_names:
    shoulder_left: 0
    elbow_left: 1
    wrist_left: 2
    # ... more joints

schemas:
  JointState:
    fields:
      float[20] positions:
        indexes: joint_names  # Reference shared group
      float[20] velocities:
        indexes: joint_names  # Reuse same names
      float[20] effort:
        indexes: joint_names  # No duplication
This generates named accessors for all fields:
// All three arrays have the same named accessors
state.positions.shoulder_left();
state.positions.set_elbow_left(1.5);

state.velocities.shoulder_left();
state.effort.shoulder_left();
Imports Share common definitions across schema files:
# common_types.yaml
schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:
# robot_state.yaml
imports:
  - common_types.yaml

schemas:
  RobotState:
    fields:
      Position3D base_position:  # Imported type

Legacy Format

Legacy format is still supported for backward compatibility:
name: SensorData
rust:
  derives: [Clone, Copy, Debug]
  repr: C

fields:
  - name: temperature
    type: float
  - name: pressure
    type: float
  - name: timestamp
    type: uint64
The parser automatically detects format version. V2 format is recommended for new schemas, but legacy format continues to work.

Build Process Architecture

Code generation happens automatically during cargo build:

Build Process Steps

  1. Schema Discovery: build.rs scans schemas/ directory for YAML files
  2. Parsing: Parser detects format version (V2 vs legacy) and parses schemas
  3. Validation: Type checker validates field types and resolves dependencies
  4. Code Generation: Generates Rust structs with appropriate derives and layout
  5. Output: Writes generated code to OUT_DIR (build output directory)
  6. Inclusion: Source code uses include!() macro to include generated code
  7. Compilation: Rust compiler validates generated code with full type checking
Generated files are placed in target/debug/build/cerulion-*/out/ and included via include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs")).

Generated Code Structure

Simple Message

Schema:
schemas:
  SensorData:
    fields:
      float temperature:
      float pressure:
      uint64 timestamp:
Generated Rust:
#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct SensorData {
    pub temperature: f32,
    pub pressure: f32,
    pub timestamp: u64,
}
Key Characteristics:
  • Clone, Copy: Enables efficient zero-copy transport
  • Debug: Essential for development and debugging
  • repr(C): Ensures predictable memory layout across languages
  • All fields public: Direct access for high-performance code

Arrays with Partial Naming

Schema:
schemas:
  JointState:
    fields:
      float[20] positions:
        indexes:
          shoulder_left: 0
          elbow_left: 1
          wrist_left: 2
      uint64 timestamp:
Generated Rust:
#[derive(Clone, Copy, Debug)]
#[repr(transparent)]
pub struct PositionsArray {
    data: [f32; 20],
}

impl PositionsArray {
    // Named accessors
    pub fn shoulder_left(&self) -> f32 { self.data[0] }
    pub fn set_shoulder_left(&mut self, val: f32) { self.data[0] = val; }
    
    pub fn elbow_left(&self) -> f32 { self.data[1] }
    pub fn set_elbow_left(&mut self, val: f32) { self.data[1] = val; }
    
    // Generic accessors
    pub fn get(&self, index: usize) -> Option<f32> { ... }
    pub fn set(&mut self, index: usize, val: f32) -> Result<(), ()> { ... }
}

#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct JointState {
    pub positions: PositionsArray,
    pub timestamp: u64,
}
Partial naming lets you name only the array indices you know, while still accessing all elements. This is ideal for robot joints where you know some joint names but not all.

Nested Custom Types

Schema:
schemas:
  Position3D:
    fields:
      float x:
      float y:
      float z:
  
  Pose:
    fields:
      Position3D position:
      Quaternion orientation:
Generated Rust (dependency-ordered):
// Position3D generated first
#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct Position3D {
    pub x: f32,
    pub y: f32,
    pub z: f32,
}

// Pose generated after its dependencies
#[derive(Clone, Copy, Debug)]
#[repr(C)]
pub struct Pose {
    pub position: Position3D,
    pub orientation: Quaternion,
}

Type Mappings

Schema types map to language-specific types with consistent memory layout:
Schema TypeRust TypeSizeCopy?
boolbool1
int8i81
int16i162
int32i324
int64i648
uint8u81
uint16u162
uint32u324
uint64u648
floatf324
doublef648
string_fixed[N][u8; N]N
T[N][T; N]N×sizeof(T)
All generated types implement Copy, making them compatible with Cerulion Core’s zero-copy local transport and efficient network serialization.

Multi-Language Support

The schema system is designed for multi-language code generation:
LanguageStatusGeneratorBinary Compatible
Rust✅ Workingbuild.rs (automatic)Native
Python⏳ DesignedFuture script✅ Yes
C++⏳ DesignedFuture script✅ Yes

Binary Compatibility

All languages will generate types with identical memory layouts: Rust:
#[repr(C)]
pub struct SensorData {
    pub temperature: f32,  // offset 0
    pub pressure: f32,     // offset 4
    pub timestamp: u64,    // offset 8
}
Python (future):
@dataclass
class SensorData:
    temperature: float  # offset 0
    pressure: float     # offset 4
    timestamp: int      # offset 8
    
    _FORMAT: ClassVar[str] = '<ffQ'  # Little-endian, 2 floats, 1 uint64
C++ (future):
struct SensorData {
    float temperature;  // offset 0
    float pressure;     // offset 4
    uint64_t timestamp; // offset 8
} __attribute__((packed));
The repr(C) layout ensures that the same bytes can be interpreted correctly across all languages, enabling true zero-copy multi-language communication.

Integration with Cerulion Core

The schema system integrates with other Cerulion Core components:

With Serialization

Generated types work seamlessly with the serialization system:
include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs"));

let data = SensorData {
    temperature: 23.5,
    pressure: 1013.25,
    timestamp: 1234567890,
};

// Automatic serialization via Copy trait
let bytes = data.to_bytes()?;  // For network transport
  • Local transport: Zero-copy via shared memory (no serialization)
  • Network transport: Automatic byte serialization using Copy trait
  • Type safety: Compile-time guarantees in Rust
See the Serialization section for details.

With Publisher & Subscriber

Generated types are used directly with Publisher and Subscriber APIs:
include!(concat!(env!("OUT_DIR"), "/sensor_data_generated.rs"));

// Create publisher with generated type
let publisher = Publisher::<SensorData>::create("sensors")?;

// Send generated type
publisher.send(SensorData {
    temperature: 23.5,
    pressure: 1013.25,
    timestamp: 1234567890,
})?;

// Receive on subscriber
let subscriber = Subscriber::<SensorData>::create("sensors")?;
let data = subscriber.receive()?;
println!("Temperature: {}", data.temperature);
The type parameter ensures that publisher and subscriber agree on message types at compile time. See the Publisher & Subscriber section for details.

With Topic Manager

Topic Manager validates that topics use consistent message types:
let manager = TopicManager::new()?;

// Register types with topic names
manager.register_publisher::<SensorData>("sensors")?;
manager.register_subscriber::<SensorData>("sensors")?;

// Type mismatch detected at compile time
// manager.register_subscriber::<DifferentType>("sensors")?;  // ❌ Compile error
See the Topic Manager section for details.

Best Practices

1. Use V2 Format for New Schemas

# ✅ Good: V2 format (concise, modern)
schemas:
  SensorData:
    fields:
      float temperature:
      uint64 timestamp:

# ⚠️ Works but verbose: Legacy format
name: SensorData
fields:
  - name: temperature
    type: float
# ✅ Good: Related types together
schemas:
  Position3D:
    fields: ...
  Quaternion:
    fields: ...
  Pose:
    fields: ...
    
# ⚠️ Less organized: Separate files
# position3d.yaml, quaternion.yaml, pose.yaml
Benefits:
  • Clear dependencies between types
  • Easier to maintain
  • Reduces file switching
  • Single import for related types

3. Use Shared Index Groups for Arrays

# ✅ Good: Define once, reuse
index_groups:
  joint_names:
    shoulder_left: 0
    elbow_left: 1

schemas:
  JointState:
    fields:
      float[20] positions:
        indexes: joint_names
      float[20] velocities:
        indexes: joint_names  # Reuse!

# ❌ Bad: Duplicate index definitions
schemas:
  JointState:
    fields:
      float[20] positions:
        indexes:
          shoulder_left: 0
          elbow_left: 1
      float[20] velocities:
        indexes:
          shoulder_left: 0  # Duplication!
          elbow_left: 1

4. Document Your Schemas

# ✅ Good: Documented
schemas:
  SensorData:
    description: Temperature and pressure sensor readings from environmental sensors
    fields:
      float temperature:
        description: Temperature in Celsius (-40 to 85°C range)
      float pressure:
        description: Pressure in hPa (sea level reference)

# ⚠️ Less maintainable: No documentation
schemas:
  SensorData:
    fields:
      float temperature:
      float pressure:

5. Keep Types Small and Focused

# ✅ Good: Small, focused types
schemas:
  Temperature:
    fields:
      float celsius:
      uint64 timestamp:
  
  Pressure:
    fields:
      float hpa:
      uint64 timestamp:

# ⚠️ Less flexible: Large, monolithic type
schemas:
  AllSensorData:
    fields:
      float temperature:
      float pressure:
      float humidity:
      float wind_speed:
      # ... 20 more fields
Benefits of small types:
  • Easier to version independently
  • More reusable across different contexts
  • Lower memory overhead when only subset needed
  • Clearer semantic meaning

Memory Layout and Performance

Memory Alignment

Generated types use repr(C) for predictable memory layout:
#[repr(C)]
pub struct SensorData {
    pub temperature: f32,  // offset: 0, size: 4
    pub pressure: f32,     // offset: 4, size: 4
    pub timestamp: u64,    // offset: 8, size: 8
}
// Total size: 16 bytes
This ensures:
  • Consistent layout across compiler versions
  • Binary compatibility with C/C++
  • Predictable serialization size
  • Efficient zero-copy shared memory access

Copy Semantics

All generated types implement Copy:
let data1 = SensorData { ... };
let data2 = data1;  // Copied, not moved
// data1 still usable here
Benefits:
  • Works with iceoryx2 zero-copy transport
  • Efficient stack allocation
  • No heap allocations or reference counting
  • Predictable performance
Limitation: Only Copy-compatible types allowed (no Vec, String, etc.)
For dynamic-sized data, use fixed-size arrays (float[100]) or fixed-size strings (string_fixed[256]) instead.

Next Steps