What Is an Operator?¶
Operators answer a single question: given an observation, what action should the agent take? They are the decision-making layer of MOSAIC, sitting above the process-level Worker abstraction.
The core interface is simple:
observation --> [Operator] --> action
Every decision-maker whether it may be a human, LLM, RL policy, or Random
policy, implements the same select_action(obs) -> action
protocol. This makes all decision-makers interchangeable: the GUI,
the experiment runner, and the telemetry system never need to know what
kind of operator they are talking to.
Operator vs Worker¶
Concept |
Definition |
Examples |
|---|---|---|
Operator |
The agent-level interface, wraps one or more Worker
subprocesses and presents |
LLM Operator, RL Operator, Human Operator, Chess Operator (wraps 2 workers) |
Worker |
A process-level execution unit inside an Operator.
Manages library lifecycle, API calls, or scripted behaviors.
Lives in |
|
Two Modes of Operation¶
MOSAIC supports two fundamentally different operator configurations:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
subgraph "Homogeneous"
H1["RL"]
H2["RL"]
H3["RL"]
end
subgraph "Heterogeneous"
X1["RL"]
X2["LLM"]
X3["Human"]
end
style H1 fill:#9370db,stroke:#6a0dad,color:#fff
style H2 fill:#9370db,stroke:#6a0dad,color:#fff
style H3 fill:#9370db,stroke:#6a0dad,color:#fff
style X1 fill:#9370db,stroke:#6a0dad,color:#fff
style X2 fill:#4a90d9,stroke:#2e5a87,color:#fff
style X3 fill:#ff7f50,stroke:#cc5500,color:#fff
- Homogeneous Decision-Makers
All agents use the same paradigm (all RL, all LLM, etc.). Covers the Operator Protocol, the five categories (human, llm, rl, baseline), the single-worker pattern, and GUI adaptation by category.
- Heterogeneous Decision-Maker
Agents use different paradigms in the same experiment (e.g., RL + LLM as teammates). Covers the research gap this addresses, the WorkerAssignment system, experimental configurations, deterministic cross-paradigm evaluation, and the research questions this enables.
OperatorConfig¶
Each operator is configured via an OperatorConfig dataclass:
@dataclass
class OperatorConfig:
operator_id: str
display_name: str
env_name: str
task: str
workers: Dict[str, WorkerAssignment]
run_id: str | None = None
execution_mode: str = "aec" # "aec" (Agent-Environment Cycle, turn-based) or "parallel" (simultaneous)
max_steps: int | None = None
WorkerAssignment¶
Each agent slot in an operator maps to a WorkerAssignment. For
single-agent environments there is one assignment; for multi-agent
environments there is one per player:
@dataclass
class WorkerAssignment:
worker_id: str # e.g. "cleanrl_worker", "operators_worker"
worker_type: str # "llm", "vlm", "rl", "human", "baseline"
settings: Dict[str, Any] = field(default_factory=dict)
The worker_type controls how the GUI renders configuration fields
and how the OperatorLauncher builds the subprocess command. Valid
types:
Type |
UI Label |
Description |
|---|---|---|
|
LLM |
Language model agent. Settings include |
|
VLM |
Vision-language model. Same as LLM plus
|
|
RL |
Trained RL policy. Settings include |
|
Human |
Keyboard-driven. No subprocess; the GUI captures input. |
|
Random |
Simple scripted behavior. Settings include |
Note
In the Configure Operators widget the user selects “Random” from
the Type dropdown. The widget maps this to worker_type="baseline"
and worker_id="operators_worker" with behavior="random"
internally. The distinction between the UI label and the internal
type is intentional: users think in terms of what the agent does
(random actions), not the internal worker category.
Agent-Level Interface¶
The agent-level interface is the core abstraction that sits between
environments and decision-makers. Every agent slot in a multi-agent
environment is assigned to exactly one decision-maker – an RL policy,
an LLM, a human, or a random baseline. The interface is uniform:
regardless of what runs behind it, the environment only ever calls
select_action(obs) → action.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
ENV["Environment<br/>(MultiGrid / MeltingPot / PettingZoo)"]
ENV -- "obs" --> A0
ENV -- "obs" --> A1
ENV -- "obs" --> A2
A0 -- "action" --> ENV
A1 -- "action" --> ENV
A2 -- "action" --> ENV
subgraph AGENTS["Agent-Level Interface (Player Assignments)"]
A0["agent_0<br/>RL · XuanCe"]
A1["agent_1<br/>LLM · GPT-4o"]
A2["agent_2<br/>Random · Baseline"]
end
style ENV fill:#4a90d9,stroke:#2e5a87,color:#fff
style AGENTS fill:#f5f5f5,stroke:#999,color:#333
style A0 fill:#9370db,stroke:#6a0dad,color:#fff
style A1 fill:#50c878,stroke:#2e8b57,color:#fff
style A2 fill:#ff7f50,stroke:#cc5500,color:#fff
This is what makes heterogeneous teams possible – each agent slot is independently configured, yet they all plug into the same environment through a single protocol.
Player Assignment (the GUI for the Agent-Level Interface)¶
Player Assignment is the GUI realization of the agent-level
interface. The PlayerAssignmentPanel in the Configure Operators
widget lets the user wire each agent slot to a specific decision-maker
by selecting a Type and a Worker.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OCW["OperatorConfigWidget"]
OCW --> PAP
OCW --> MGS
subgraph PAP["PlayerAssignmentPanel"]
direction TB
ROW0["PlayerAssignmentRow<br/>agent_0 → RL · XuanCe Worker"]
ROW1["PlayerAssignmentRow<br/>agent_1 → LLM · GPT-4o"]
end
subgraph MGS["Environment-Specific Settings<br/>(MultiGrid / MeltingPot)"]
direction TB
OBS["Observation Mode"]
COORD["Coordination Strategy<br/>(LLM only)"]
ROLES["Role Assignment<br/>(Level 3 only)"]
end
style OCW fill:#4a90d9,stroke:#2e5a87,color:#fff
style PAP fill:#e8f5e9,stroke:#2e8b57,color:#333
style ROW0 fill:#ff7f50,stroke:#cc5500,color:#fff
style ROW1 fill:#ff7f50,stroke:#cc5500,color:#fff
style MGS fill:#ede7f6,stroke:#6a0dad,color:#333
style OBS fill:#9370db,stroke:#6a0dad,color:#fff
style COORD fill:#9370db,stroke:#6a0dad,color:#fff
style ROLES fill:#9370db,stroke:#6a0dad,color:#fff
Each PlayerAssignmentRow exposes:
Type dropdown –
LLM,RL,Human, orRandom. Controls which configuration fields are visible.Worker dropdown – populated based on the selected type. Hidden for Human and Random (single worker each).
Type-specific settings – LLM shows provider/model/API-key fields; RL shows policy path and algorithm; Human and Random show nothing extra.
The panel emits an assignments_changed signal whenever any row
changes, which the parent widget uses to:
Rebuild the
OperatorConfigviaget_config().Update the visibility of the Coordination Strategy selector – this dropdown appears only for MultiGrid and MeltingPot environments, and only when at least one agent uses an LLM worker (it configures the
mosaic_llm_worker’s coordination level). When no agent is LLM the entire coordination section is hidden.
# How the widget builds a multi-agent config (any multi-agent env)
config = OperatorConfig.multi_agent(
operator_id="op_0",
display_name="Heterogeneous Team",
env_name="<env_family>", # e.g. mosaic_multigrid, meltingpot, pettingzoo
task="<env_id>", # e.g. Soccer-2v2, predator_prey, chess_v6
player_workers={
"agent_0": WorkerAssignment(
worker_id="xuance_worker",
worker_type="rl",
settings={"policy_path": "/path/to/final_train_model.pth"},
),
"agent_1": WorkerAssignment(
worker_id="operators_worker",
worker_type="baseline",
settings={"behavior": "random"},
),
},
observation_mode="visible_teammates",
coordination_level=1,
)
OperatorService¶
The OperatorService provides a central registry for all available
operators:
class OperatorService:
def register_operator(self, operator, descriptor) -> None: ...
def set_active_operator(self, operator_id: str) -> None: ...
def select_action(self, observation: Any) -> Any: ...
def seed(self, seed: int) -> None: ...
At startup, MOSAIC registers built-in operators and any discovered via
entry points. The GUI’s operator dropdown is populated from
OperatorService.get_descriptors().
Directory Layout¶
Operator-related code lives in the MOSAIC core (not 3rd_party/):
gym_gui/
services/
operator.py # Protocol + OperatorService
operator_launcher.py # Subprocess spawning
operator_script_execution_manager.py # Script mode state machine
ui/
widgets/
operators_tab.py # Manual + Script mode tabs
operator_config_widget.py # Per-operator config rows
operator_render_container.py # Per-operator render view
multi_operator_render_view.py # Grid of render containers
script_experiment_widget.py # Script mode UI
panels/
control_panel_container.py # Service-to-UI bridge
config_panels/
single_agent/ # Per-game environment configs
multi_agent/ # Multi-agent environment configs