Ray RLlib Worker¶
The Ray RLlib worker is MOSAIC’s distributed multi-agent RL integration. It wraps Ray RLlib <https://docs.ray.io/en/latest/rllib/index.html>, Ray’s scalable reinforcement learning library behind the standard shim pattern, providing distributed training across multiple CPUs or GPUs, PettingZoo multi-agent environment support, and flexible policy configurations including self-play and independent learning.
Paradigm |
Single-agent, Multi-agent (parameter sharing, independent, self-play, CTDE) |
Algorithms |
|
Environments |
PettingZoo (SISL, Classic, Butterfly, MPE) |
Execution |
Ray cluster (distributed across workers, optionally multi-GPU) |
GPU required |
No (optional CUDA acceleration) |
Source |
|
Architecture¶
graph TB
subgraph "MOSAIC GUI"
FORM["Training Form<br/>(Advanced Config)"]
DAEMON["Trainer Daemon"]
end
subgraph "Ray Head Process"
CLI["cli.py<br/>entry point"]
CFG["config.py<br/>RayWorkerConfig"]
RT["runtime.py<br/>RayWorkerRuntime"]
FL["fastlane.py<br/>FastLane telemetry"]
SITE["sitecustomize.py"]
AP["algo_params.py<br/>schema-based hyperparams"]
PA["policy_actor.py<br/>inference actors"]
end
subgraph "Ray Workers"
W0["Rollout Worker 0"]
W1["Rollout Worker 1"]
WN["Rollout Worker N"]
end
subgraph "Upstream RLlib"
ALGO["PPO / DQN / IMPALA / APPO<br/>(unmodified RLlib algorithms)"]
end
FORM -->|"config JSON"| DAEMON
DAEMON -->|"spawn"| CLI
CLI --> CFG --> RT
RT --> AP
RT --> ALGO
RT --> PA
ALGO --> W0
ALGO --> W1
ALGO --> WN
FL -.->|"shared-memory frames"| DAEMON
style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
style RT fill:#ff7f50,stroke:#cc5500,color:#fff
style FL fill:#ff7f50,stroke:#cc5500,color:#fff
style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
style AP fill:#ff7f50,stroke:#cc5500,color:#fff
style PA fill:#ff7f50,stroke:#cc5500,color:#fff
style W0 fill:#c8e6c9,stroke:#388e3c
style W1 fill:#c8e6c9,stroke:#388e3c
style WN fill:#c8e6c9,stroke:#388e3c
style ALGO fill:#e8e8e8,stroke:#999
Lifecycle of a training run:
The GUI form builds a config JSON and hands it to the Trainer Daemon.
The daemon spawns
python -m ray_worker.cli --config <path>.cli.pyloads the config intoRayWorkerConfigand delegates toRayWorkerRuntime.RayWorkerRuntime.run()initialises Ray (ray.init()), builds the RLlibAlgorithmConfig, and callsalgorithm.train().Ray distributes rollout workers across available CPUs/GPUs.
On completion the runtime saves a checkpoint and writes an analytics manifest to
var/trainer/runs/.
Supported Algorithms¶
Algorithm |
Type |
Notes |
|---|---|---|
PPO |
On-policy policy gradient |
Default choice; works for both discrete and continuous action spaces |
DQN |
Off-policy Q-learning |
Discrete action spaces only |
A2C |
On-policy actor-critic |
Synchronous variant of A3C |
Distributed policy gradient |
High-throughput asynchronous training |
|
Asynchronous PPO |
IMPALA with PPO-style clipping |
Policy Configurations¶
The worker supports four multi-agent policy configurations, controlled by
PolicyConfiguration in config.py:
Configuration |
|
Description |
|---|---|---|
Parameter Sharing |
|
All agents share one policy. Sample-efficient for cooperative, homogeneous teams. |
Independent |
|
Each agent has its own policy. No coordination signal. Equivalent to running N independent PPO agents. |
Self-Play |
|
Agent plays against frozen copies of itself. Produces competitive policies without a fixed opponent. Supports population-based training. |
Shared Value Function |
|
CTDE: separate actors per agent, shared centralised critic. Equivalent to MAPPO but within the RLlib framework. |
Supported Environments¶
The Ray worker integrates with PettingZoo environment families:
Family |
Example Environments |
Notes |
|---|---|---|
SISL |
|
Cooperative continuous control; multiple agents share a reward |
Classic |
|
Turn-based board games; AEC API |
Butterfly |
|
Real-time cooperative/competitive games |
MPE |
|
Multi-agent particle environments; cooperative and adversarial |
Configuration¶
The RayWorkerConfig dataclass (config.py) composes several
sub-configs:
@dataclass
class RayWorkerConfig:
run_id: str
environment: EnvironmentConfig # env family + env_id + wrappers
policy_configuration: PolicyConfiguration # sharing / independent / self-play
training: TrainingConfig # algorithm, timesteps, hyperparams
resources: ResourceConfig # num_workers, num_gpus, num_cpus
checkpoint: CheckpointConfig # save frequency, checkpoint dir
@dataclass
class EnvironmentConfig:
family: str # "sisl", "classic", "butterfly", "mpe"
env_id: str # e.g. "waterworld_v4"
api_type: PettingZooAPIType # AEC or PARALLEL
@dataclass
class ResourceConfig:
num_workers: int = 2 # Rollout workers (default: 2)
num_gpus: float = 0.0 # GPU fraction for the head process
num_cpus: int = 1 # CPUs per worker
Algorithm hyperparameters are schema-driven via algo_params.py.
Each algorithm exposes a versioned JSON schema; the GUI reads the schema
to generate form fields dynamically.
Policy Actor and Evaluation¶
The worker ships a dedicated inference layer (policy_actor.py) for
loading trained RLlib checkpoints and running policy evaluation without
starting a full Ray cluster:
from ray_worker import RayPolicyConfig, create_ray_actor, run_evaluation
actor = create_ray_actor(RayPolicyConfig(
checkpoint_path="var/trainer/runs/my_run/checkpoint_000100",
algorithm="PPO",
env_id="waterworld_v4",
))
results = run_evaluation(EvaluationConfig(
actor=actor,
num_episodes=20,
))
RayPolicyController wraps multiple actors for multi-agent evaluation,
mapping each agent ID to its corresponding policy checkpoint.
FastLane Telemetry¶
FastLane streams render frames to the MOSAIC GUI via shared memory.
The Ray worker’s fastlane.py hooks into RLlib’s callback system to
emit frames on each rollout step without modifying the upstream algorithm.
GUI Integration¶
The Ray RLlib worker is configured via the Advanced Config panel in the MOSAIC training dashboard. Unlike CleanRL and XuanCe, it does not currently have a dedicated form widget; all parameters are passed as a raw JSON config.
Worker Discovery¶
The worker registers itself via the mosaic.workers entry point in
pyproject.toml:
[project.entry-points."mosaic.workers"]
ray = "ray_worker:get_worker_metadata"
get_worker_metadata() returns a WorkerCapabilities descriptor
advertising support for self-play, population-based training, pause/resume,
and up to 100 agents across the sisl, classic, butterfly, and
mpe environment families.