What Is an Operator?

Operators answer a single question: given an observation, what action should the agent take? They are the decision-making layer of MOSAIC, sitting above the process-level Worker abstraction.

The core interface is simple:

observation --> [Operator] --> action

Every decision-maker whether it may be a human, LLM, RL policy, or Random policy, implements the same select_action(obs) -> action protocol. This makes all decision-makers interchangeable: the GUI, the experiment runner, and the telemetry system never need to know what kind of operator they are talking to.

Operator vs Worker

Concept

Definition

Examples

Operator

The agent-level interface, wraps one or more Worker subprocesses and presents select_action(obs) -> action to the GUI.

LLM Operator, RL Operator, Human Operator, Chess Operator (wraps 2 workers)

Worker

A process-level execution unit inside an Operator. Manages library lifecycle, API calls, or scripted behaviors. Lives in 3rd_party/workers/ and communicates via stdin/stdout JSON.

balrog_worker, cleanrl_worker, xuance_worker, ray_worker, llm_worker, vlm_worker, random_worker, passive_worker, human_worker

Two Modes of Operation

MOSAIC supports two fundamentally different operator configurations:

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    subgraph "Homogeneous"
        H1["RL"]
        H2["RL"]
        H3["RL"]
    end

    subgraph "Heterogeneous"
        X1["RL"]
        X2["LLM"]
        X3["Human"]
    end

    style H1 fill:#9370db,stroke:#6a0dad,color:#fff
    style H2 fill:#9370db,stroke:#6a0dad,color:#fff
    style H3 fill:#9370db,stroke:#6a0dad,color:#fff
    style X1 fill:#9370db,stroke:#6a0dad,color:#fff
    style X2 fill:#4a90d9,stroke:#2e5a87,color:#fff
    style X3 fill:#ff7f50,stroke:#cc5500,color:#fff
    
Homogeneous Decision-Makers

All agents use the same paradigm (all RL, all LLM, etc.). Covers the Operator Protocol, the five categories (human, llm, rl, baseline), the single-worker pattern, and GUI adaptation by category.

Heterogeneous Decision-Maker

Agents use different paradigms in the same experiment (e.g., RL + LLM as teammates). Covers the research gap this addresses, the WorkerAssignment system, experimental configurations, deterministic cross-paradigm evaluation, and the research questions this enables.

OperatorConfig

Each operator is configured via an OperatorConfig dataclass:

@dataclass
class OperatorConfig:
    operator_id: str
    display_name: str
    env_name: str
    task: str
    workers: Dict[str, WorkerAssignment]
    run_id: str | None = None
    execution_mode: str = "aec"  # "aec" (Agent-Environment Cycle, turn-based) or "parallel" (simultaneous)
    max_steps: int | None = None

WorkerAssignment

Each agent slot in an operator maps to a WorkerAssignment. For single-agent environments there is one assignment; for multi-agent environments there is one per player:

@dataclass
class WorkerAssignment:
    worker_id: str   # e.g. "cleanrl_worker", "random_worker"
    worker_type: str  # "llm", "vlm", "rl", "human", "random", "passive"
    settings: Dict[str, Any] = field(default_factory=dict)

The worker_type controls how the GUI renders configuration fields and how the OperatorLauncher builds the subprocess command. Valid types:

Type

UI Label

Description

llm

LLM

Language model agent. Settings include client_name, model_id, api_key, base_url.

vlm

VLM

Vision-language model. Same as LLM plus max_image_history=1.

rl

RL

Trained RL policy. Settings include policy_path, algorithm.

human

Human

Keyboard-driven. The GUI captures input via action buttons.

random

Random

Uniformly random action selection. Uses random_worker.

passive

Passive

Always selects the do-nothing (NOOP/STILL) action. Uses passive_worker.

Note

Each decision-maker type maps directly to its own worker_type. For example, selecting “Random” in the Type dropdown sets worker_type="random" and worker_id="random_worker". Selecting “Passive” sets worker_type="passive" and worker_id="passive_worker".

Agent-Level Interface

The agent-level interface is the core abstraction that sits between environments and decision-makers. Every agent slot in a multi-agent environment is assigned to exactly one decision-maker – an RL policy, an LLM, a human, or a random baseline. The interface is uniform: regardless of what runs behind it, the environment only ever calls select_action(obs) action.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    ENV["Environment<br/>(MultiGrid / MeltingPot / PettingZoo)"]

    ENV -- "obs" --> A0
    ENV -- "obs" --> A1
    ENV -- "obs" --> A2

    A0 -- "action" --> ENV
    A1 -- "action" --> ENV
    A2 -- "action" --> ENV

    subgraph AGENTS["Agent-Level Interface (Player Assignments)"]
        A0["agent_0<br/>RL · XuanCe"]
        A1["agent_1<br/>LLM · GPT-4o"]
        A2["agent_2<br/>Random · Baseline"]
    end

    style ENV fill:#4a90d9,stroke:#2e5a87,color:#fff
    style AGENTS fill:#f5f5f5,stroke:#999,color:#333
    style A0 fill:#9370db,stroke:#6a0dad,color:#fff
    style A1 fill:#50c878,stroke:#2e8b57,color:#fff
    style A2 fill:#ff7f50,stroke:#cc5500,color:#fff
    

This is what makes heterogeneous teams possible – each agent slot is independently configured, yet they all plug into the same environment through a single protocol.

Policy Mappings for Multi-Agent RL

When deploying RL policies in multi-agent scenarios, MOSAIC supports flexible policy-to-agent mappings through link groups. This is essential because MAPPO/IPPO checkpoints store all agents’ policies in a single file.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    subgraph ONE["One-to-One (Default)"]
        direction TB
        A0["agent_0<br/>ppo.pth"]
        A1["agent_1<br/>dqn.pth"]
    end

    subgraph MANY["One-to-Many (Link Groups)"]
        direction TB
        CHECKPOINT["mappo_team.pth"]
        B0["agent_0<br/>(Primary)"]
        B1["agent_1<br/>(Linked)"]
        B2["agent_2<br/>(Linked)"]

        CHECKPOINT -->|"Shared"| B0
        CHECKPOINT -->|"Shared"| B1
        CHECKPOINT -->|"Shared"| B2
    end

    style ONE fill:#e8f5e9,stroke:#2e8b57,color:#333
    style MANY fill:#f3e5f5,stroke:#9c27b0,color:#333
    style A0 fill:#50c878,stroke:#2e8b57,color:#fff
    style A1 fill:#4a90d9,stroke:#2e5a87,color:#fff
    style CHECKPOINT fill:#ff7f50,stroke:#cc5500,color:#fff
    style B0 fill:#9370db,stroke:#6a0dad,color:#fff
    style B1 fill:#ba68c8,stroke:#8e24aa,color:#fff
    style B2 fill:#ba68c8,stroke:#8e24aa,color:#fff
    
One-to-one mapping (default):

Each agent has its own independent policy checkpoint. Agents are configured individually with separate policy paths.

One-to-many mapping (via link groups):

Multiple agents share a single policy checkpoint. The primary agent’s policy path is automatically synced to all linked agents.

Link groups prevent manual copy-paste errors, ensure consistency across agents, and enable complex team configurations (e.g., two independent teams with different policies). They are created manually via the “Link Agents” button in the GUI.

Example: All agents trained together

# All 4 agents share the same MAPPO checkpoint
LinkGroup(
    group_id="operator_0_link_0",
    primary_agent="agent_0",
    linked_agents=["agent_1", "agent_2", "agent_3"],
    policy_path="/path/to/checkpoint/final_train_model.pth",
    algorithm="mappo",
)

Example: Two independent teams

# Offense team (agents 0 and 2)
LinkGroup(
    group_id="operator_0_link_0",
    primary_agent="agent_0",
    linked_agents=["agent_2"],
    policy_path="/path/to/offense_mappo.pth",
    algorithm="mappo",
)

# Defense team (agents 1 and 3)
LinkGroup(
    group_id="operator_0_link_1",
    primary_agent="agent_1",
    linked_agents=["agent_3"],
    policy_path="/path/to/defense_mappo.pth",
    algorithm="mappo",
)

See PolicyMappingService for complete documentation on link groups and policy mappings.

Player Assignment (the GUI for the Agent-Level Interface)

Player Assignment is the GUI realization of the agent-level interface. The PlayerAssignmentPanel in the Configure Operators widget lets the user wire each agent slot to a specific decision-maker by selecting a Type and a Worker.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
    OCW["OperatorConfigWidget"]

    OCW --> PAP
    OCW --> MGS

    subgraph PAP["PlayerAssignmentPanel"]
        direction TB
        ROW0["PlayerAssignmentRow<br/>agent_0 → RL · XuanCe Worker"]
        ROW1["PlayerAssignmentRow<br/>agent_1 → LLM · GPT-4o"]
    end

    subgraph MGS["Environment-Specific Settings<br/>(MultiGrid / MeltingPot)"]
        direction TB
        OBS["Observation Mode"]
        COORD["Coordination Strategy<br/>(LLM only)"]
        ROLES["Role Assignment<br/>(Level 3 only)"]
    end

    style OCW fill:#4a90d9,stroke:#2e5a87,color:#fff
    style PAP fill:#e8f5e9,stroke:#2e8b57,color:#333
    style ROW0 fill:#ff7f50,stroke:#cc5500,color:#fff
    style ROW1 fill:#ff7f50,stroke:#cc5500,color:#fff
    style MGS fill:#ede7f6,stroke:#6a0dad,color:#333
    style OBS fill:#9370db,stroke:#6a0dad,color:#fff
    style COORD fill:#9370db,stroke:#6a0dad,color:#fff
    style ROLES fill:#9370db,stroke:#6a0dad,color:#fff
    

Each PlayerAssignmentRow exposes:

  • Type dropdown: LLM, RL, Human, or Random. Controls which configuration fields are visible.

  • Worker dropdown: populated based on the selected type. Hidden for Human and Random (single worker each).

  • Type-specific settings: LLM shows provider/model/API-key fields; RL shows policy path and algorithm; Human and Random show nothing extra.

The panel emits an assignments_changed signal whenever any row changes, which the parent widget uses to:

  1. Rebuild the OperatorConfig via get_config().

  2. Update the visibility of the Coordination Strategy selector – this dropdown appears only for MultiGrid and MeltingPot environments, and only when at least one agent uses an LLM worker (it configures the mosaic_llm_worker’s coordination level). When no agent is LLM the entire coordination section is hidden.

# How the widget builds a multi-agent config (any multi-agent env)
config = OperatorConfig.multi_agent(
    operator_id="op_0",
    display_name="Heterogeneous Team",
    env_name="<env_family>",        # e.g. mosaic_multigrid, meltingpot, pettingzoo
    task="<env_id>",                # e.g. Soccer-2v2, predator_prey, chess_v6
    player_workers={
        "agent_0": WorkerAssignment(
            worker_id="xuance_worker",
            worker_type="rl",
            settings={"policy_path": "/path/to/final_train_model.pth"},
        ),
        "agent_1": WorkerAssignment(
            worker_id="random_worker",
            worker_type="random",
        ),
    },
    observation_mode="visible_teammates",
    coordination_level=1,
)

OperatorService

The OperatorService provides a central registry for all available operators:

class OperatorService:
    def register_operator(self, operator, descriptor) -> None: ...
    def set_active_operator(self, operator_id: str) -> None: ...
    def select_action(self, observation: Any) -> Any: ...
    def seed(self, seed: int) -> None: ...

At startup, MOSAIC registers built-in operators and any discovered via entry points. The GUI’s operator dropdown is populated from OperatorService.get_descriptors().

Directory Layout

Operator-related code lives in the MOSAIC core (not 3rd_party/):

gym_gui/
    services/
        operator.py                          # Protocol + OperatorService
        operator_launcher.py                 # Subprocess spawning
        operator_script_execution_manager.py # Script mode state machine
    ui/
        widgets/
            operators_tab.py                 # Manual + Script mode tabs
            operator_config_widget.py        # Per-operator config rows
            operator_render_container.py     # Per-operator render view
            multi_operator_render_view.py    # Grid of render containers
            script_experiment_widget.py      # Script mode UI
        panels/
            control_panel_container.py       # Service-to-UI bridge
        config_panels/
            single_agent/                    # Per-game environment configs
            multi_agent/                     # Multi-agent environment configs