Workers

A Worker is a process-isolated wrapper around a reinforcement learning library or framework. Workers are the execution layer of the platform. These workers are the main decision makers, they run the actual training, evaluation, custom scripts and so on. Whereas MOSAIC handles orchestration, telemetry, and visualization.

Some workers are designed to facilitate training reinforcement learning algorithms. These workers often bundle training scripts, evaluation scripts, and benchmark files. While other Workers are designed for evaluation only, such as LLM related benchmarks. The common thread is that they all implement the same simple interface, which allows the GUI to interact with them in a consistent way.

        graph TB
    subgraph "MOSAIC Core"
        GUI["Qt6 GUI<br/>(Main Process)"]
        Daemon["Trainer Daemon<br/>(AsyncIO)"]
    end

    subgraph "Worker Sub-Processes"
        W1["CleanRL Worker<br/>PPO · DQN · SAC"]
        W2["XuanCe Worker<br/>MAPPO · QMIX"]
        W3["RLlib Worker<br/>PPO · IMPALA"]
        W4["Tianshou Worker<br/>PPO · DQN"]
        W5["Jumanji Worker<br/>JAX Combinatorial"]
        W6["BALROG Worker<br/>Single-Agent LLM"]
        W7["MOSAIC LLM Worker<br/>Multi-Agent LLM"]
    end

    GUI -- "gRPC" --> Daemon
    Daemon -- "spawn + JSONL" --> W1
    Daemon -- "spawn + JSONL" --> W2
    Daemon -- "spawn + JSONL" --> W3
    Daemon -- "spawn + JSONL" --> W4
    Daemon -- "spawn + JSONL" --> W5
    Daemon -- "spawn + JSONL" --> W6
    Daemon -- "spawn + JSONL" --> W7

    style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
    style Daemon fill:#50c878,stroke:#2e8b57,color:#fff
    style W1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W2 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W3 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W4 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W5 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W6 fill:#ff7f50,stroke:#cc5500,color:#fff
    style W7 fill:#ff7f50,stroke:#cc5500,color:#fff
    

Key Principles

Process Isolation

Each worker runs as a separate OS process. A worker crash never takes down the GUI or other workers.

Zero Modification

Upstream libraries (CleanRL, Ray, XuanCe) are never modified. A thin “shim” layer translates between MOSAIC and the library.

JSONL Telemetry

Workers emit structured JSON lines to stdout. This is the simplest possible output mechanism. No gRPC client code required inside the worker itself.

Automatic Discovery

Workers register via Python entry points ([project.entry-points."mosaic.workers"]). The GUI discovers them at startup.

Protocol-Based

Workers implement Python Protocol classes instead of inheriting from base classes.

Available Workers

Worker

Paradigm

Algorithms

Use Case

CleanRL

Single-Agent

PPO, DQN, SAC, TD3, DDPG, C51

Simple single-file RL training

XuanCe

Multi-Agent

MAPPO, QMIX, MADDPG, VDN, COMA

Multi-agent RL research

Tianshou

Single-Agent

DQN, C51, Rainbow, IQN, PPO, A2C, TRPO, DDPG, TD3, SAC, REDQ, BCQ, CQL, GAIL + more

Modular PyTorch RL (type-safe, dual API)

Jumanji

JAX-based

A2C, PPO (hardware-accelerated via JAX)

Combinatorial & logistics environments (BinPack, TSP, Routing)

RLlib

Multi-Agent

PPO, IMPALA, APPO, SAC, DQN

Distributed training at scale

BALROG

Evaluation

GPT-4, Claude, Llama (single-agent)

Single-agent LLM benchmarking (MiniGrid, BabyAI)

MOSAIC LLM

Evaluation

GPT-4, Claude, Llama (multi-agent)

Multi-agent LLM with coordination strategies and Theory of Mind

Tip

Install a specific worker with pip install -e ".[<worker>]". For example: pip install -e ".[cleanrl]", pip install -e ".[xuance]", or pip install -e ".[tianshou]".