Operators

An Operator is the agent-level interface of MOSAIC, the unified abstraction that lets the GUI assign a worker to each individual agent or a group of agents. While Workers handle process-level concerns (training, telemetry, GPU isolation), Operators are strictly for evaluation and interactive play. Then, the worker inside an Operator loads a trained policy (or calls an LLM API, or reads keyboard input) and computes actions step-by-step. The Operator wraps this and answers the question “given this observation, what action should I take?”

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
    GUI["Qt6 GUI<br/>(Main Process)"]
    LAUNCHER["OperatorLauncher<br/>(Subprocess Manager)"]

    GUI --> LAUNCHER

    LAUNCHER -- "stdin/stdout JSON" --> H_OP
    LAUNCHER -- "stdin/stdout JSON" --> L_OP
    LAUNCHER -- "stdin/stdout JSON" --> V_OP
    LAUNCHER -- "stdin/stdout JSON" --> R_OP
    LAUNCHER -- "stdin/stdout JSON" --> RND_OP
    LAUNCHER -- "stdin/stdout JSON" --> P_OP

    subgraph H_OP["Human Operator"]
        HW["human_worker<br/>Keyboard Input"]
    end

    subgraph L_OP["LLM Operator"]
        LW1["balrog_worker<br/>Single-Agent"]
        LW2["llm_worker<br/>MOSAIC Native"]
        LW3["chess_worker<br/>Two-Player"]
    end

    subgraph V_OP["VLM Operator"]
        VW["vlm_worker<br/>Vision-Language"]
    end

    subgraph R_OP["RL Operator"]
        RW1["cleanrl_worker<br/>PPO / DQN"]
        RW2["xuance_worker<br/>MAPPO / QMIX"]
        RW3["ray_worker<br/>PPO / IMPALA"]
    end

    subgraph RND_OP["Random Operator"]
        RNDW["random_worker<br/>Uniform Random"]
    end

    subgraph P_OP["Passive Operator"]
        PW["passive_worker<br/>NOOP / STILL"]
    end

    style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
    style LAUNCHER fill:#50c878,stroke:#2e8b57,color:#fff
    style H_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style L_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style V_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style R_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style RND_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style P_OP fill:#9370db,stroke:#6a0dad,color:#fff
    style HW fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW2 fill:#ff7f50,stroke:#cc5500,color:#fff
    style LW3 fill:#ff7f50,stroke:#cc5500,color:#fff
    style VW fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW2 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RW3 fill:#ff7f50,stroke:#cc5500,color:#fff
    style RNDW fill:#ff7f50,stroke:#cc5500,color:#fff
    style PW fill:#ff7f50,stroke:#cc5500,color:#fff
    

Key Principles

Protocol-Based

Operators implement Python Protocol classes – no base class inheritance required. Any object with select_action(obs) is a valid operator.

Category System

Every operator belongs to a category: human, llm, vlm, rl, random, or passive. The GUI adapts its configuration UI based on category.

Interactive Mode

Operators run as subprocesses with --interactive flag, enabling step-by-step JSON commands over stdin/stdout. This keeps the GUI responsive while operators compute.

Multi-Operator Comparison

Multiple operators can run side-by-side on the same environment with shared seeds for scientific comparison (e.g., LLM vs RL on the same MiniGrid layout).

Decoupled Execution

Manual mode (click-to-step) and Script mode (automated experiments) are fully independent code paths with separate state machines.

Available Operators

Operator

Category

Backend

Use Case

Human

human

Keyboard input via GUI

Manual play and debugging

BALROG LLM

llm

balrog_worker (vLLM, OpenRouter)

Single-agent LLM benchmarking on MiniGrid/BabyAI

MOSAIC LLM

llm

mosaic_llm_worker (vLLM, OpenRouter, OpenAI, Anthropic)

Multi-agent LLM with coordination and Theory of Mind

Chess LLM

llm

chess_worker (llm_chess prompting)

LLM chess play with multi-turn dialog

CleanRL

rl

cleanrl_worker (PPO, DQN)

Trained single-agent RL policy evaluation

XuanCe

rl

xuance_worker (MAPPO, QMIX)

Trained multi-agent RL policy evaluation

Ray RLlib

rl

ray_worker (PPO, IMPALA)

Distributed RL policy evaluation

MOSAIC Random Worker

random

random_worker (random action)

Random action selection for experiments

MOSAIC Passive Worker

passive

passive_worker (NOOP/STILL)

Do-nothing agent for experiments

Tip

An Operator wraps one or more Workers. The Operator is the agent-level interface (select_action(obs) -> action) that the GUI interacts with. The Worker is the process-level engine that runs inside the Operator. This separation is what enables heterogeneous teams – e.g., an RL-trained policy and an LLM playing side-by-side in the same multi-agent environment. See What Is an Operator? for the full motivation and diagrams.

Note

Policy Mappings for Multi-Agent RL: When deploying RL policies in multi-agent scenarios, MOSAIC supports flexible policy-to-agent mappings through link groups. This enables one-to-one mappings (each agent has its own policy) and one-to-many mappings (multiple agents share a single policy checkpoint). Link groups are essential for MAPPO/IPPO evaluation because these algorithms store all agents’ policies in a single checkpoint file. See PolicyMappingService for complete documentation.