Integrated Workers

MOSAIC ships with twelve production-ready workers that wrap major RL frameworks, LLM evaluation suites, VLM multimodal agents, multi-agent LLM coordination, LLM chess play, human-in-the-loop control, and baseline agents. Each worker follows the shim pattern: upstream libraries are never modified; a thin integration layer translates between MOSAIC and the library.

Worker

Paradigm

Algorithms / Models

Environments

Execution Model

MOSAIC LLM

Multi-Agent LLM

OpenRouter, GPT-4o, Claude 3, Gemini, vLLM

MultiGrid Soccer/Collect, Melting Pot, Google Research Football, Minecraft

Subprocess

MOSAIC VLM

Multi-Agent VLM

OpenRouter, GPT-4o, Claude 3, Gemini, vLLM (multimodal)

MultiGrid, Melting Pot, Google Research Football, Minecraft

Subprocess

MOSAIC Human

Human-in-the-Loop

Human action selection via GUI

MiniGrid, Crafter, PettingZoo, Classic Control

Subprocess

MOSAIC Random

Baseline Agent

random (uniform sampling, no training)

All Gymnasium-compatible environments

Subprocess

MOSAIC Passive

Passive Baseline

noop / still (env-aware, no training)

All Gymnasium-compatible environments

Subprocess

CleanRL

Single-Agent

PPO, DQN, SAC, TD3, DDPG, C51

Gymnasium, Atari, MiniGrid, BabyAI, Procgen

Subprocess

XuanCe

Multi-Agent

MAPPO, QMIX, MADDPG, VDN, COMA + 40 more

PettingZoo, SMAC, MultiGrid, MPE, Google Research Football

Subprocess

Ray RLlib

Both

PPO, IMPALA, APPO, DQN, A2C

PettingZoo (SISL, Classic, Butterfly, MPE)

Subprocess

BALROG

Single-Agent, LLM/VLM

GPT-4o, Claude 3, Gemini, vLLM (local)

NetHack, MiniHack, BabyAI, Crafter, TextWorld

Subprocess

Chess LLM

LLM Chess

GPT-4o, Claude 3, Gemini, vLLM (local)

PettingZoo Chess (chess_v6)

Subprocess

Tianshou

Sing-Agent, Multi-Agent, MARL, Model-based RL

DQN, C51, Rainbow, IQN, PG, A2C, TRPO, PPO, DDPG, TD3, SAC, REDQ, BCQ, CQL, GAIL + more

Gymnasium, Atari, MuJoCo, Classic Control, Box2D

Subprocess

Jumanji

A suite of scalable reinforcement learning environments written in JAX

A2C, PPO (hardware-accelerated via JAX)

BinPack, TSP, CVRP, Knapsack, Game2048, Routing, Cleaner

Subprocess

Each worker provides:

  • CLI entry point for subprocess launching by the Trainer Daemon

  • Configuration dataclass implementing the WorkerConfig protocol

  • Runtime orchestrator managing the training lifecycle

  • FastLane telemetry for real-time frame streaming to the GUI

  • GUI form widgets for visual experiment configuration

  • Automatic discovery via Python entry points

        graph TB
    subgraph "MOSAIC GUI"
        FORM["Training Form<br/>(per-worker UI)"]
        DAEMON["Trainer Daemon"]
    end

    subgraph "Worker Subprocess"
        CLI["cli.py"]
        CFG["config.py"]
        RT["runtime.py"]
        FL["fastlane.py"]
        SITE["sitecustomize.py"]
    end

    subgraph "Upstream Library"
        LIB["CleanRL / XuanCe / RLlib<br/>(unmodified)"]
    end

    FORM -->|"config JSON"| DAEMON
    DAEMON -->|"spawn"| CLI
    CLI --> CFG --> RT
    RT --> FL
    RT --> LIB
    SITE -.->|"import-time patches"| LIB

    style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
    style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
    style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
    style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
    style RT fill:#ff7f50,stroke:#cc5500,color:#fff
    style FL fill:#ff7f50,stroke:#cc5500,color:#fff
    style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
    style LIB fill:#e8e8e8,stroke:#999
    

GUI Integration

Each worker has dedicated GUI form widgets for experiment configuration:

Worker

Form Widgets

Purpose

CleanRL

cleanrl_train_form.py cleanrl_script_form.py cleanrl_resume_form.py cleanrl_policy_form.py

Standard training, custom scripts, checkpoint resume, policy evaluation

XuanCe

xuance_train_form.py xuance_script_form.py

Standard training (with backend selection), custom scripts

Tianshou

tianshou_train_form.py tianshou_script_form.py tianshou_resume_form.py tianshou_policy_form.py

Standard training, custom scripts, checkpoint resume, policy evaluation

Ray RLlib

(Configured via Advanced Config)

Distributed training setup