Worker Examples¶
MOSAIC ships with three production workers. Each demonstrates a different integration strategy: vendored source, pip dependency, and local package.
CleanRL Worker¶
Strategy: Git submodule (vendored source)
The CleanRL worker wraps CleanRL, a collection of single-file RL implementations, as a MOSAIC-compatible subprocess.
pip install -e ".[cleanrl]"
Supported Algorithms: PPO, DQN, SAC, TD3, DDPG, C51
graph TB
subgraph "cleanrl_worker/ (Shim)"
CLI["cli.py<br/>Entry point"]
CFG["config.py<br/>MOSAIC JSON → CLI args"]
RT["runtime.py<br/>Lifecycle management"]
TEL["telemetry.py<br/>JSONL emission"]
FL["fastlane.py<br/>Shared-memory rendering"]
AN["analytics.py<br/>TB/W&B manifests"]
end
subgraph "cleanrl/ (Upstream)"
PPO["ppo.py"]
DQN["dqn.py"]
SAC["sac.py"]
end
CLI --> CFG --> RT
RT --> TEL
RT --> FL
RT --> AN
RT -->|"subprocess"| PPO
RT -->|"subprocess"| DQN
RT -->|"subprocess"| SAC
style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
style RT fill:#ff7f50,stroke:#cc5500,color:#fff
style TEL fill:#ff7f50,stroke:#cc5500,color:#fff
style FL fill:#ff7f50,stroke:#cc5500,color:#fff
style AN fill:#ff7f50,stroke:#cc5500,color:#fff
style PPO fill:#e8e8e8,stroke:#999
style DQN fill:#e8e8e8,stroke:#999
style SAC fill:#e8e8e8,stroke:#999
Directory layout:
3rd_party/workers/cleanrl_worker/
├── cleanrl/ ← Git submodule (upstream)
│ └── cleanrl/
│ ├── ppo.py
│ ├── dqn.py
│ └── sac.py
├── cleanrl_worker/ ← MOSAIC shim
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── runtime.py
│ ├── telemetry.py
│ ├── analytics.py
│ ├── fastlane.py
│ └── algorithms/
│ └── ppo_with_save.py
├── pyproject.toml
└── tests/
How it works:
The Daemon spawns
python -m cleanrl_worker.cli --config worker.jsoncli.pyparses the MOSAIC config and creates aCleanRLWorkerRuntimeruntime.pytranslates config fields to CleanRL CLI argumentsCleanRL runs as a subprocess;
sitecustomize.pyinjects telemetry hooksTraining metrics are emitted as JSONL to stdout
The Telemetry Proxy reads stdout and streams to the Daemon
Configuration example:
config = {
"algorithm": "ppo",
"learning_rate": 3e-4,
"total_timesteps": 1_000_000,
"num_envs": 4,
"capture_video": True,
}
XuanCe Worker¶
Strategy: Local package (3rd_party/workers/xuance_worker/)
The XuanCe worker integrates XuanCe, a comprehensive multi-agent RL library, for cooperative and competitive training.
pip install -e ".[xuance]"
Supported Algorithms: MAPPO, QMIX, MADDPG, VDN, COMA, IPPO, IQL
graph TB
subgraph "xuance_worker/ (Shim)"
CLI2["cli.py"]
CFG2["config.py"]
RT2["runtime.py"]
TEL2["telemetry.py"]
end
subgraph "xuance/ (Upstream)"
MAPPO["MAPPO"]
QMIX["QMIX"]
VDN["VDN"]
end
CLI2 --> CFG2 --> RT2 --> TEL2
RT2 -->|"in-process"| MAPPO
RT2 -->|"in-process"| QMIX
RT2 -->|"in-process"| VDN
style CLI2 fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG2 fill:#ff7f50,stroke:#cc5500,color:#fff
style RT2 fill:#ff7f50,stroke:#cc5500,color:#fff
style TEL2 fill:#ff7f50,stroke:#cc5500,color:#fff
style MAPPO fill:#e8e8e8,stroke:#999
style QMIX fill:#e8e8e8,stroke:#999
style VDN fill:#e8e8e8,stroke:#999
Key difference from CleanRL: XuanCe algorithms are called in-process (imported as Python modules) rather than spawned as subprocesses. This allows tighter integration with MOSAIC’s multi-agent environment adapters.
Directory layout:
3rd_party/workers/xuance_worker/
├── xuance/ ← Upstream XuanCe source
│ └── xuance/
│ ├── learner/
│ ├── agent/
│ └── environment/
├── xuance_worker/ ← MOSAIC shim
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── runtime.py
│ └── telemetry.py
└── pyproject.toml
Configuration example:
config = {
"algorithm": "mappo",
"learning_rate": 5e-4,
"batch_size": 256,
"backend": "torch",
}
RLlib Worker¶
Strategy: Pip dependency (ray[rllib])
The RLlib worker uses Ray RLlib for distributed, GPU-accelerated training at scale.
pip install -e ".[ray-rllib]"
Supported Algorithms: PPO, IMPALA, APPO, SAC, DQN
graph TB
subgraph "ray_worker/ (Shim)"
CLI3["cli.py"]
CFG3["config.py"]
RT3["runtime.py"]
CB["callbacks.py<br/>RLlib callback hooks"]
end
subgraph "Ray Cluster"
HEAD["Ray Head"]
ACT1["Actor 1"]
ACT2["Actor 2"]
ACT3["Actor N"]
HEAD --> ACT1
HEAD --> ACT2
HEAD --> ACT3
end
CLI3 --> CFG3 --> RT3
RT3 --> CB
RT3 -->|"ray.init()"| HEAD
style CLI3 fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG3 fill:#ff7f50,stroke:#cc5500,color:#fff
style RT3 fill:#ff7f50,stroke:#cc5500,color:#fff
style CB fill:#ff7f50,stroke:#cc5500,color:#fff
style HEAD fill:#e8e8e8,stroke:#999
style ACT1 fill:#e8e8e8,stroke:#999
style ACT2 fill:#e8e8e8,stroke:#999
style ACT3 fill:#e8e8e8,stroke:#999
Key difference: RLlib manages its own cluster of Ray actors.
The shim translates MOSAIC configs to RLlib AlgorithmConfig
objects and hooks into RLlib’s callback system to emit MOSAIC
telemetry.
Directory layout:
3rd_party/workers/ray_worker/
├── ray_worker/ ← MOSAIC shim
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── runtime.py
│ ├── telemetry.py
│ ├── callbacks.py
│ └── fastlane.py
└── pyproject.toml
Configuration example:
config = {
"algorithm": "PPO",
"num_workers": 8,
"num_envs_per_worker": 4,
"framework": "torch",
}