Slow Lane¶
The slow lane is MOSAIC’s durable rendering and telemetry path. It carries every step and episode event from workers through gRPC, a publish-subscribe bus, and into SQLite: feeding both the live UI and persistent storage for replay and analytics. Where the Fast Lane optimises for latency, the slow lane optimises for completeness: every event is persisted.
Pipeline Overview¶
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
W["Worker Process<br/>(JSONL stdout)"]
PROXY["TrainerTelemetryProxy<br/>JSONL → gRPC"]
DAEMON["TrainerService<br/>(gRPC Daemon)"]
BUS["RunBus<br/>pub-sub · queue 2 048"]
LTC["LiveTelemetryController<br/>(background thread)"]
CM["CreditManager<br/>200 credits/stream"]
RSR["RenderingSpeedRegulator<br/>100 ms drain · queue 32"]
LTT["LiveTelemetryTab"]
SINK["TelemetryDBSink<br/>batch 256"]
DB[("SQLite / WAL")]
W --> PROXY --> DAEMON --> BUS
BUS --> LTC --> CM --> RSR --> LTT
BUS --> SINK --> DB
style BUS fill:#fff3e0,stroke:#e65100,color:#333
style DB fill:#e3f2fd,stroke:#1565c0,color:#333
All queue sizes above are governed by
Application Constants: see
constants_telemetry.py and constants_telemetry_bus.py.
Components¶
TrainerTelemetryProxy¶
Lives in gym_gui/services/trainer/trainer_telemetry_proxy.py. Tails the
worker’s JSONL stdout stream and translates each line into a gRPC
PublishRunSteps / PublishRunEpisodes call on the daemon. When
Fast Lane mode is active, the proxy also extracts RGB frames and writes
them to a FastLaneWriter: bridging the two lanes.
RunBus¶
An in-process publish-subscribe event bus (gym_gui/telemetry/run_bus.py)
that fans telemetry events to all subscribers. Key topics:
STEP_APPENDED: a new step arrived.EPISODE_FINALIZED: an episode completed.CONTROL: pause / resume / stop commands.
Default queue size: RUNBUS_DEFAULT_QUEUE_SIZE = 2048
(RUNBUS_UI_PATH_QUEUE_SIZE = 512, RUNBUS_DB_PATH_QUEUE_SIZE = 1024).
LiveTelemetryController¶
A QObject (gym_gui/controllers/live_telemetry_controllers.py) that
subscribes to the RunBus on a background thread and emits Qt signals
on the main thread.
Signals:
Signal |
Description |
|---|---|
|
First event for a new agent → create a Render Tabs
|
|
Aggregate stats changed (steps, episodes, mean reward). |
|
Training run finished: clean up resources. |
Subscription lifecycle:
sequenceDiagram
autonumber
participant C as LiveTelemetryController
participant RB as RunBus
participant CM as CreditManager
participant TAB as LiveTelemetryTab
note over C: Phase 1: Subscribe to a run
C->>RB: subscribe_to_runbus(run_id)
C->>CM: initialize_stream(run_id, "default")
note over C: Phase 2: First step triggers tab creation
RB-->>C: STEP_APPENDED event
C->>C: _process_step_queue()
C-->>TAB: emit run_tab_requested(run_id, agent_id, tab_name)
note over TAB: Phase 3: Tab registers itself
TAB->>C: register_tab(run_id, agent_id, tab)
C->>CM: grant_credits(run_id, agent_id, 200)
C->>TAB: flush buffered steps/episodes
Queue sizes: LIVE_STEP_QUEUE_SIZE = 64,
LIVE_EPISODE_QUEUE_SIZE = 64, LIVE_CONTROL_QUEUE_SIZE = 32.
CreditManager¶
Credit-based backpressure (gym_gui/telemetry/credit_manager.py) prevents
the bus from overwhelming the UI thread.
Method |
Behaviour |
|---|---|
|
Sets initial credits to |
|
Returns |
|
Grants credits up to |
When credits reach zero the producer pauses the UI rendering path: the
Application Constants CreditDefaults
(initial_credits=200, starvation_threshold=10) control the thresholds.
Database writes via TelemetryDBSink are never throttled.
RenderingSpeedRegulator¶
A QObject (gym_gui/telemetry/rendering_speed_regulator.py) that
decouples visual frame rendering from table/telemetry updates.
Maintains a bounded
dequeof render payloads (default maxRENDER_QUEUE_SIZE = 32).Drains at a configurable interval (default
100 ms→ ~10 FPS).Auto-drops oldest payloads when the queue is full: the GUI always shows the freshest available frame.
Emits
payload_ready(dict)when a frame should be painted.Buffers early payloads submitted before
start()is called; an auto-start timer (RENDER_BOOTSTRAP_TIMEOUT_MS = 500 ms) ensures the regulator eventually starts even if no explicitstart()call arrives.
regulator = RenderingSpeedRegulator(render_delay_ms=100)
regulator.payload_ready.connect(tab.render_payload)
regulator.start()
# Workers submit at arbitrary rate:
regulator.submit_payload({"rgb": frame_bytes, "reward": 0.5})
Human-Control Path¶
When a human plays, the slow lane takes a shorter route:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
HIC["HumanInputController<br/>keyboard shortcuts"]
SC["SessionController"]
TEL["TelemetryService"]
STOR["StorageRecorderService<br/>JSONL ring"]
SQL["TelemetrySQLiteStore"]
DB[("SQLite / WAL")]
HIC --> SC --> TEL
TEL --> STOR
TEL --> SQL --> DB
style HIC fill:#9370db,stroke:#6a0dad,color:#fff
style DB fill:#e3f2fd,stroke:#1565c0,color:#333
HumanInputController captures keyboard events, SessionController emits
StepRecord objects, and TelemetryService fans them to both the JSONL
ring buffer and SQLite for durable persistence. The
Log Constants codes LOG401–LOG407
trace session lifecycle and human input events.
Agent-Control Path¶
Remote agents stream JSONL through trainer_telemetry_proxy.py, which
calls TrainerService.PublishRunSteps / PublishRunEpisodes.
TrainerService fans events onto RunBus; TelemetryDBSink drains
the bus into TelemetrySQLiteStore with batch writes
(DB_SINK_BATCH_SIZE = 256, DB_SINK_CHECKPOINT_INTERVAL = 4096).
Design Principles¶
WAL + batching: SQLite’s WAL mode with large batch writes keeps the slow lane efficient under high-frequency telemetry.
Hot vs cold storage:
LiveTelemetryTabshows hot data fromRunBus; SQLite provides cold storage for replay and post-hoc analysis.The GUI never blocks: all writes are asynchronous. Credit-based backpressure ensures the UI thread stays responsive.
Complementary to the fast lane: the Fast Lane gives real-time visuals while the slow lane guarantees every event is persisted for Architecture Overview analytics (W&B, TensorBoard).
See Also¶
Fast Lane: the zero-serialisation rendering path for live training.
Render Tabs:
LiveTelemetryTabis the slow-lane widget that receives regulated payloads.Rendering Strategies: the
RendererRegistrydecides how each slow-lane frame is painted.Application Constants: all queue sizes, batch sizes, and credit thresholds live in the constants package.