Integrated Workers¶
MOSAIC ships with eight production-ready workers that wrap major RL frameworks, LLM evaluation suites, multi-agent LLM coordination, LLM chess play, human-in-the-loop control, and baseline agents. Each worker follows the shim pattern: upstream libraries are never modified; a thin integration layer translates between MOSAIC and the library.
Worker |
Paradigm |
Algorithms / Models |
Environments |
Execution Model |
|---|---|---|---|---|
Multi-Agent LLM |
OpenRouter, GPT-4o, Claude 3, Gemini, vLLM |
MultiGrid Soccer/Collect, BabyAI, PettingZoo |
Subprocess (interactive) |
|
Single-Agent |
PPO, DQN, SAC, TD3, DDPG, C51 |
Gymnasium, Atari, MiniGrid, BabyAI, Procgen |
Subprocess |
|
Multi-Agent |
MAPPO, QMIX, MADDPG, VDN, COMA + 40 more |
PettingZoo, SMAC, MultiGrid, MPE |
In-process |
|
Both |
PPO, IMPALA, APPO, DQN, A2C |
PettingZoo (SISL, Classic, Butterfly, MPE) |
Ray cluster |
|
LLM/VLM Evaluation |
GPT-4o, Claude 3, Gemini, vLLM (local) |
NetHack, MiniHack, BabyAI, Crafter, TextWorld |
Subprocess (parallel) |
|
LLM Chess |
GPT-4o, Claude 3, Gemini, vLLM (local) |
PettingZoo Chess (chess_v6) |
Subprocess (interactive) |
|
Human-in-the-Loop |
Human action selection via GUI |
MiniGrid, Crafter, PettingZoo, Classic Control |
Subprocess (interactive) |
|
Baseline Agent |
random, noop, cycling (no training) |
All Gymnasium-compatible environments |
Subprocess |
Each worker provides:
CLI entry point for subprocess launching by the Trainer Daemon
Configuration dataclass implementing the
WorkerConfigprotocolRuntime orchestrator managing the training lifecycle
FastLane telemetry for real-time frame streaming to the GUI
GUI form widgets for visual experiment configuration
Automatic discovery via Python entry points
graph TB
subgraph "MOSAIC GUI"
FORM["Training Form<br/>(per-worker UI)"]
DAEMON["Trainer Daemon"]
end
subgraph "Worker Subprocess"
CLI["cli.py"]
CFG["config.py"]
RT["runtime.py"]
FL["fastlane.py"]
SITE["sitecustomize.py"]
end
subgraph "Upstream Library"
LIB["CleanRL / XuanCe / RLlib<br/>(unmodified)"]
end
FORM -->|"config JSON"| DAEMON
DAEMON -->|"spawn"| CLI
CLI --> CFG --> RT
RT --> FL
RT --> LIB
SITE -.->|"import-time patches"| LIB
style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
style RT fill:#ff7f50,stroke:#cc5500,color:#fff
style FL fill:#ff7f50,stroke:#cc5500,color:#fff
style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
style LIB fill:#e8e8e8,stroke:#999
GUI Integration¶
Each worker has dedicated GUI form widgets for experiment configuration:
Worker |
Form Widgets |
Purpose |
|---|---|---|
CleanRL |
|
Standard training, custom scripts, checkpoint resume, policy evaluation |
XuanCe |
|
Standard training (with backend selection), custom scripts |
Ray RLlib |
(Configured via Advanced Config) |
Distributed training setup |