Integrated Workers¶

MOSAIC ships with twelve production-ready workers that wrap major RL frameworks, LLM evaluation suites, VLM multimodal agents, multi-agent LLM coordination, LLM chess play, human-in-the-loop control, and baseline agents. Each worker follows the shim pattern: upstream libraries are never modified; a thin integration layer translates between MOSAIC and the library.

Worker	Paradigm	Algorithms / Models	Environments	Execution Model
MOSAIC LLM	Multi-Agent LLM	OpenRouter, GPT-4o, Claude 3, Gemini, vLLM	MultiGrid Soccer/Collect, Melting Pot, Google Research Football, Minecraft	Subprocess
MOSAIC VLM	Multi-Agent VLM	OpenRouter, GPT-4o, Claude 3, Gemini, vLLM (multimodal)	MultiGrid, Melting Pot, Google Research Football, Minecraft	Subprocess
MOSAIC Human	Human-in-the-Loop	Human action selection via GUI	MiniGrid, Crafter, PettingZoo, Classic Control	Subprocess
MOSAIC Random	Baseline Agent	random (uniform sampling, no training)	All Gymnasium-compatible environments	Subprocess
MOSAIC Passive	Passive Baseline	noop / still (env-aware, no training)	All Gymnasium-compatible environments	Subprocess
CleanRL	Single-Agent	PPO, DQN, SAC, TD3, DDPG, C51	Gymnasium, Atari, MiniGrid, BabyAI, Procgen	Subprocess
XuanCe	Multi-Agent	MAPPO, QMIX, MADDPG, VDN, COMA + 40 more	PettingZoo, SMAC, MultiGrid, MPE, Google Research Football	Subprocess
Ray RLlib	Both	PPO, IMPALA, APPO, DQN, A2C	PettingZoo (SISL, Classic, Butterfly, MPE)	Subprocess
BALROG	Single-Agent, LLM/VLM	GPT-4o, Claude 3, Gemini, vLLM (local)	NetHack, MiniHack, BabyAI, Crafter, TextWorld	Subprocess
Chess LLM	LLM Chess	GPT-4o, Claude 3, Gemini, vLLM (local)	PettingZoo Chess (chess_v6)	Subprocess
Tianshou	Sing-Agent, Multi-Agent, MARL, Model-based RL	DQN, C51, Rainbow, IQN, PG, A2C, TRPO, PPO, DDPG, TD3, SAC, REDQ, BCQ, CQL, GAIL + more	Gymnasium, Atari, MuJoCo, Classic Control, Box2D	Subprocess
Jumanji	A suite of scalable reinforcement learning environments written in JAX	A2C, PPO (hardware-accelerated via JAX)	BinPack, TSP, CVRP, Knapsack, Game2048, Routing, Cleaner	Subprocess

Each worker provides:

CLI entry point for subprocess launching by the Trainer Daemon
Configuration dataclass implementing the WorkerConfig protocol
Runtime orchestrator managing the training lifecycle
FastLane telemetry for real-time frame streaming to the GUI
GUI form widgets for visual experiment configuration
Automatic discovery via Python entry points

        graph TB
    subgraph "MOSAIC GUI"
        FORM["Training Form<br/>(per-worker UI)"]
        DAEMON["Trainer Daemon"]
    end

    subgraph "Worker Subprocess"
        CLI["cli.py"]
        CFG["config.py"]
        RT["runtime.py"]
        FL["fastlane.py"]
        SITE["sitecustomize.py"]
    end

    subgraph "Upstream Library"
        LIB["CleanRL / XuanCe / RLlib<br/>(unmodified)"]
    end

    FORM -->|"config JSON"| DAEMON
    DAEMON -->|"spawn"| CLI
    CLI --> CFG --> RT
    RT --> FL
    RT --> LIB
    SITE -.->|"import-time patches"| LIB

    style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
    style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
    style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
    style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
    style RT fill:#ff7f50,stroke:#cc5500,color:#fff
    style FL fill:#ff7f50,stroke:#cc5500,color:#fff
    style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
    style LIB fill:#e8e8e8,stroke:#999

GUI Integration¶

Each worker has dedicated GUI form widgets for experiment configuration:

Worker	Form Widgets	Purpose
CleanRL	`cleanrl_train_form.py` `cleanrl_script_form.py` `cleanrl_resume_form.py` `cleanrl_policy_form.py`	Standard training, custom scripts, checkpoint resume, policy evaluation
XuanCe	`xuance_train_form.py` `xuance_script_form.py`	Standard training (with backend selection), custom scripts
Tianshou	`tianshou_train_form.py` `tianshou_script_form.py` `tianshou_resume_form.py` `tianshou_policy_form.py`	Standard training, custom scripts, checkpoint resume, policy evaluation
Ray RLlib	(Configured via Advanced Config)	Distributed training setup