Operators

An Operator is the agent-level interface of MOSAIC, the unified abstraction that lets the GUI assign a worker to each individual agent or a group of agents. While Workers handle process-level concerns (training, telemetry, GPU isolation), Operators are strictly for evaluation and interactive play. Then, the worker inside an Operator loads a trained policy (or calls an LLM API, or reads keyboard input) and computes actions step-by-step. The Operator wraps this and answers the question “given this observation, what action should I take?”

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
    GUI["Qt6 GUI<br/>(Main Process)"]
    LAUNCHER["OperatorLauncher<br/>(Subprocess Manager)"]

    GUI --> LAUNCHER

    LAUNCHER -- "stdin/stdout JSON" --> H_OP
    LAUNCHER -- "stdin/stdout JSON" --> L_OP
    LAUNCHER -- "stdin/stdout JSON" --> R_OP
    LAUNCHER -- "stdin/stdout JSON" --> B_OP

    subgraph H_OP["Human Operator"]
        HW["human_worker<br/>Keyboard Input"]
    end

    subgraph L_OP["LLM Operator"]
        LW1["balrog_worker<br/>Single-Agent"]
        LW2["mosaic_llm_worker<br/>Multi-Agent"]
        LW3["chess_worker<br/>Two-Player"]
    end

    subgraph R_OP["RL Operator"]
        RW1["cleanrl_worker<br/>PPO / DQN"]
        RW2["xuance_worker<br/>MAPPO / QMIX"]
        RW3["ray_worker<br/>PPO / IMPALA"]
    end

    subgraph B_OP["Baseline Operator"]
        BW["operators_worker<br/>Random / Scripted"]
    end

    style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
    style LAUNCHER fill:#50c878,stroke:#2e8b57,color:#fff
    style H_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style L_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style R_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style B_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style HW fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW2 fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW3 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW2 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW3 fill:#ff7f50,stroke:#cc5500,color:#fff
    style BW fill:#ff7f50,stroke:#cc5500,color:#fff
    

Key Principles

Protocol-Based

Operators implement Python Protocol classes – no base class inheritance required. Any object with select_action(obs) is a valid operator.

Category System

Every operator belongs to a category: human, llm, rl, or baseline. The GUI adapts its configuration UI based on category.

Interactive Mode

Operators run as subprocesses with --interactive flag, enabling step-by-step JSON commands over stdin/stdout. This keeps the GUI responsive while operators compute.

Multi-Operator Comparison

Multiple operators can run side-by-side on the same environment with shared seeds for scientific comparison (e.g., LLM vs RL on the same MiniGrid layout).

Decoupled Execution

Manual mode (click-to-step) and Script mode (automated experiments) are fully independent code paths with separate state machines.

Available Operators

Operator

Category

Backend

Use Case

Human

human

Keyboard input via GUI

Manual play and debugging

BALROG LLM

llm

balrog_worker (vLLM, OpenRouter)

Single-agent LLM benchmarking on MiniGrid/BabyAI

MOSAIC LLM

llm

mosaic_llm_worker (vLLM, OpenRouter, OpenAI, Anthropic)

Multi-agent LLM with coordination and Theory of Mind

Chess LLM

llm

chess_worker (llm_chess prompting)

LLM chess play with multi-turn dialog

CleanRL

rl

cleanrl_worker (PPO, DQN)

Trained single-agent RL policy evaluation

XuanCe

rl

xuance_worker (MAPPO, QMIX)

Trained multi-agent RL policy evaluation

Ray RLlib

rl

ray_worker (PPO, IMPALA)

Distributed RL policy evaluation

Random Baseline

baseline

operators_worker (random action)

Baseline comparison for experiments

Tip

An Operator wraps one or more Workers. The Operator is the agent-level interface (select_action(obs) -> action) that the GUI interacts with. The Worker is the process-level engine that runs inside the Operator. This separation is what enables heterogeneous teams – e.g., an RL-trained policy and an LLM playing side-by-side in the same multi-agent environment. See What Is an Operator? for the full motivation and diagrams.