What Is an Operator?¶
Operators answer a single question: given an observation, what action should the agent take? They are the decision-making layer of MOSAIC, sitting above the process-level Worker abstraction.
The core interface is simple:
observation --> [Operator] --> action
Every decision-maker whether it may be a human, LLM, RL policy, or Random
policy, implements the same select_action(obs) -> action
protocol. This makes all decision-makers interchangeable: the GUI,
the experiment runner, and the telemetry system never need to know what
kind of operator they are talking to.
Operator vs Worker¶
Concept |
Definition |
Examples |
|---|---|---|
Operator |
The agent-level interface, wraps one or more Worker
subprocesses and presents |
LLM Operator, RL Operator, Human Operator, Chess Operator (wraps 2 workers) |
Worker |
A process-level execution unit inside an Operator.
Manages library lifecycle, API calls, or scripted behaviors.
Lives in |
|
Two Modes of Operation¶
MOSAIC supports two fundamentally different operator configurations:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
subgraph "Homogeneous"
H1["RL"]
H2["RL"]
H3["RL"]
end
subgraph "Heterogeneous"
X1["RL"]
X2["LLM"]
X3["Human"]
end
style H1 fill:#9370db,stroke:#6a0dad,color:#fff
style H2 fill:#9370db,stroke:#6a0dad,color:#fff
style H3 fill:#9370db,stroke:#6a0dad,color:#fff
style X1 fill:#9370db,stroke:#6a0dad,color:#fff
style X2 fill:#4a90d9,stroke:#2e5a87,color:#fff
style X3 fill:#ff7f50,stroke:#cc5500,color:#fff
- Homogeneous Decision-Makers
All agents use the same paradigm (all RL, all LLM, etc.). Covers the Operator Protocol, the five categories (human, llm, rl, baseline), the single-worker pattern, and GUI adaptation by category.
- Heterogeneous Decision-Maker
Agents use different paradigms in the same experiment (e.g., RL + LLM as teammates). Covers the research gap this addresses, the WorkerAssignment system, experimental configurations, deterministic cross-paradigm evaluation, and the research questions this enables.
OperatorConfig¶
Each operator is configured via an OperatorConfig dataclass:
@dataclass
class OperatorConfig:
operator_id: str
display_name: str
env_name: str
task: str
workers: Dict[str, WorkerAssignment]
run_id: str | None = None
execution_mode: str = "aec" # "aec" (Agent-Environment Cycle, turn-based) or "parallel" (simultaneous)
max_steps: int | None = None
WorkerAssignment¶
Each agent slot in an operator maps to a WorkerAssignment. For
single-agent environments there is one assignment; for multi-agent
environments there is one per player:
@dataclass
class WorkerAssignment:
worker_id: str # e.g. "cleanrl_worker", "random_worker"
worker_type: str # "llm", "vlm", "rl", "human", "random", "passive"
settings: Dict[str, Any] = field(default_factory=dict)
The worker_type controls how the GUI renders configuration fields
and how the OperatorLauncher builds the subprocess command. Valid
types:
Type |
UI Label |
Description |
|---|---|---|
|
LLM |
Language model agent. Settings include |
|
VLM |
Vision-language model. Same as LLM plus
|
|
RL |
Trained RL policy. Settings include |
|
Human |
Keyboard-driven. The GUI captures input via action buttons. |
|
Random |
Uniformly random action selection. Uses |
|
Passive |
Always selects the do-nothing (NOOP/STILL) action. Uses
|
Note
Each decision-maker type maps directly to its own worker_type.
For example, selecting “Random” in the Type dropdown sets
worker_type="random" and worker_id="random_worker".
Selecting “Passive” sets worker_type="passive" and
worker_id="passive_worker".
Agent-Level Interface¶
The agent-level interface is the core abstraction that sits between
environments and decision-makers. Every agent slot in a multi-agent
environment is assigned to exactly one decision-maker – an RL policy,
an LLM, a human, or a random baseline. The interface is uniform:
regardless of what runs behind it, the environment only ever calls
select_action(obs) → action.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
ENV["Environment<br/>(MultiGrid / MeltingPot / PettingZoo)"]
ENV -- "obs" --> A0
ENV -- "obs" --> A1
ENV -- "obs" --> A2
A0 -- "action" --> ENV
A1 -- "action" --> ENV
A2 -- "action" --> ENV
subgraph AGENTS["Agent-Level Interface (Player Assignments)"]
A0["agent_0<br/>RL · XuanCe"]
A1["agent_1<br/>LLM · GPT-4o"]
A2["agent_2<br/>Random · Baseline"]
end
style ENV fill:#4a90d9,stroke:#2e5a87,color:#fff
style AGENTS fill:#f5f5f5,stroke:#999,color:#333
style A0 fill:#9370db,stroke:#6a0dad,color:#fff
style A1 fill:#50c878,stroke:#2e8b57,color:#fff
style A2 fill:#ff7f50,stroke:#cc5500,color:#fff
This is what makes heterogeneous teams possible – each agent slot is independently configured, yet they all plug into the same environment through a single protocol.
Policy Mappings for Multi-Agent RL¶
When deploying RL policies in multi-agent scenarios, MOSAIC supports flexible policy-to-agent mappings through link groups. This is essential because MAPPO/IPPO checkpoints store all agents’ policies in a single file.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
subgraph ONE["One-to-One (Default)"]
direction TB
A0["agent_0<br/>ppo.pth"]
A1["agent_1<br/>dqn.pth"]
end
subgraph MANY["One-to-Many (Link Groups)"]
direction TB
CHECKPOINT["mappo_team.pth"]
B0["agent_0<br/>(Primary)"]
B1["agent_1<br/>(Linked)"]
B2["agent_2<br/>(Linked)"]
CHECKPOINT -->|"Shared"| B0
CHECKPOINT -->|"Shared"| B1
CHECKPOINT -->|"Shared"| B2
end
style ONE fill:#e8f5e9,stroke:#2e8b57,color:#333
style MANY fill:#f3e5f5,stroke:#9c27b0,color:#333
style A0 fill:#50c878,stroke:#2e8b57,color:#fff
style A1 fill:#4a90d9,stroke:#2e5a87,color:#fff
style CHECKPOINT fill:#ff7f50,stroke:#cc5500,color:#fff
style B0 fill:#9370db,stroke:#6a0dad,color:#fff
style B1 fill:#ba68c8,stroke:#8e24aa,color:#fff
style B2 fill:#ba68c8,stroke:#8e24aa,color:#fff
- One-to-one mapping (default):
Each agent has its own independent policy checkpoint. Agents are configured individually with separate policy paths.
- One-to-many mapping (via link groups):
Multiple agents share a single policy checkpoint. The primary agent’s policy path is automatically synced to all linked agents.
Link groups prevent manual copy-paste errors, ensure consistency across agents, and enable complex team configurations (e.g., two independent teams with different policies). They are created manually via the “Link Agents” button in the GUI.
Example: All agents trained together
# All 4 agents share the same MAPPO checkpoint
LinkGroup(
group_id="operator_0_link_0",
primary_agent="agent_0",
linked_agents=["agent_1", "agent_2", "agent_3"],
policy_path="/path/to/checkpoint/final_train_model.pth",
algorithm="mappo",
)
Example: Two independent teams
# Offense team (agents 0 and 2)
LinkGroup(
group_id="operator_0_link_0",
primary_agent="agent_0",
linked_agents=["agent_2"],
policy_path="/path/to/offense_mappo.pth",
algorithm="mappo",
)
# Defense team (agents 1 and 3)
LinkGroup(
group_id="operator_0_link_1",
primary_agent="agent_1",
linked_agents=["agent_3"],
policy_path="/path/to/defense_mappo.pth",
algorithm="mappo",
)
See PolicyMappingService for complete documentation on link groups and policy mappings.
Player Assignment (the GUI for the Agent-Level Interface)¶
Player Assignment is the GUI realization of the agent-level
interface. The PlayerAssignmentPanel in the Configure Operators
widget lets the user wire each agent slot to a specific decision-maker
by selecting a Type and a Worker.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OCW["OperatorConfigWidget"]
OCW --> PAP
OCW --> MGS
subgraph PAP["PlayerAssignmentPanel"]
direction TB
ROW0["PlayerAssignmentRow<br/>agent_0 → RL · XuanCe Worker"]
ROW1["PlayerAssignmentRow<br/>agent_1 → LLM · GPT-4o"]
end
subgraph MGS["Environment-Specific Settings<br/>(MultiGrid / MeltingPot)"]
direction TB
OBS["Observation Mode"]
COORD["Coordination Strategy<br/>(LLM only)"]
ROLES["Role Assignment<br/>(Level 3 only)"]
end
style OCW fill:#4a90d9,stroke:#2e5a87,color:#fff
style PAP fill:#e8f5e9,stroke:#2e8b57,color:#333
style ROW0 fill:#ff7f50,stroke:#cc5500,color:#fff
style ROW1 fill:#ff7f50,stroke:#cc5500,color:#fff
style MGS fill:#ede7f6,stroke:#6a0dad,color:#333
style OBS fill:#9370db,stroke:#6a0dad,color:#fff
style COORD fill:#9370db,stroke:#6a0dad,color:#fff
style ROLES fill:#9370db,stroke:#6a0dad,color:#fff
Each PlayerAssignmentRow exposes:
Type dropdown:
LLM,RL,Human, orRandom. Controls which configuration fields are visible.Worker dropdown: populated based on the selected type. Hidden for Human and Random (single worker each).
Type-specific settings: LLM shows provider/model/API-key fields; RL shows policy path and algorithm; Human and Random show nothing extra.
The panel emits an assignments_changed signal whenever any row
changes, which the parent widget uses to:
Rebuild the
OperatorConfigviaget_config().Update the visibility of the Coordination Strategy selector – this dropdown appears only for MultiGrid and MeltingPot environments, and only when at least one agent uses an LLM worker (it configures the
mosaic_llm_worker’s coordination level). When no agent is LLM the entire coordination section is hidden.
# How the widget builds a multi-agent config (any multi-agent env)
config = OperatorConfig.multi_agent(
operator_id="op_0",
display_name="Heterogeneous Team",
env_name="<env_family>", # e.g. mosaic_multigrid, meltingpot, pettingzoo
task="<env_id>", # e.g. Soccer-2v2, predator_prey, chess_v6
player_workers={
"agent_0": WorkerAssignment(
worker_id="xuance_worker",
worker_type="rl",
settings={"policy_path": "/path/to/final_train_model.pth"},
),
"agent_1": WorkerAssignment(
worker_id="random_worker",
worker_type="random",
),
},
observation_mode="visible_teammates",
coordination_level=1,
)
OperatorService¶
The OperatorService provides a central registry for all available
operators:
class OperatorService:
def register_operator(self, operator, descriptor) -> None: ...
def set_active_operator(self, operator_id: str) -> None: ...
def select_action(self, observation: Any) -> Any: ...
def seed(self, seed: int) -> None: ...
At startup, MOSAIC registers built-in operators and any discovered via
entry points. The GUI’s operator dropdown is populated from
OperatorService.get_descriptors().
Directory Layout¶
Operator-related code lives in the MOSAIC core (not 3rd_party/):
gym_gui/
services/
operator.py # Protocol + OperatorService
operator_launcher.py # Subprocess spawning
operator_script_execution_manager.py # Script mode state machine
ui/
widgets/
operators_tab.py # Manual + Script mode tabs
operator_config_widget.py # Per-operator config rows
operator_render_container.py # Per-operator render view
multi_operator_render_view.py # Grid of render containers
script_experiment_widget.py # Script mode UI
panels/
control_panel_container.py # Service-to-UI bridge
config_panels/
single_agent/ # Per-game environment configs
multi_agent/ # Multi-agent environment configs