Operators¶
An Operator is the agent-level interface of MOSAIC, the unified abstraction that lets the GUI assign a worker to each individual agent or a group of agents. While Workers handle process-level concerns (training, telemetry, GPU isolation), Operators are strictly for evaluation and interactive play. Then, the worker inside an Operator loads a trained policy (or calls an LLM API, or reads keyboard input) and computes actions step-by-step. The Operator wraps this and answers the question “given this observation, what action should I take?”
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
GUI["Qt6 GUI<br/>(Main Process)"]
LAUNCHER["OperatorLauncher<br/>(Subprocess Manager)"]
GUI --> LAUNCHER
LAUNCHER -- "stdin/stdout JSON" --> H_OP
LAUNCHER -- "stdin/stdout JSON" --> L_OP
LAUNCHER -- "stdin/stdout JSON" --> V_OP
LAUNCHER -- "stdin/stdout JSON" --> R_OP
LAUNCHER -- "stdin/stdout JSON" --> RND_OP
LAUNCHER -- "stdin/stdout JSON" --> P_OP
subgraph H_OP["Human Operator"]
HW["human_worker<br/>Keyboard Input"]
end
subgraph L_OP["LLM Operator"]
LW1["balrog_worker<br/>Single-Agent"]
LW2["llm_worker<br/>MOSAIC Native"]
LW3["chess_worker<br/>Two-Player"]
end
subgraph V_OP["VLM Operator"]
VW["vlm_worker<br/>Vision-Language"]
end
subgraph R_OP["RL Operator"]
RW1["cleanrl_worker<br/>PPO / DQN"]
RW2["xuance_worker<br/>MAPPO / QMIX"]
RW3["ray_worker<br/>PPO / IMPALA"]
end
subgraph RND_OP["Random Operator"]
RNDW["random_worker<br/>Uniform Random"]
end
subgraph P_OP["Passive Operator"]
PW["passive_worker<br/>NOOP / STILL"]
end
style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
style LAUNCHER fill:#50c878,stroke:#2e8b57,color:#fff
style H_OP fill:#9370db,stroke:#6a0dad,color:#fff
style L_OP fill:#9370db,stroke:#6a0dad,color:#fff
style V_OP fill:#9370db,stroke:#6a0dad,color:#fff
style R_OP fill:#9370db,stroke:#6a0dad,color:#fff
style RND_OP fill:#9370db,stroke:#6a0dad,color:#fff
style P_OP fill:#9370db,stroke:#6a0dad,color:#fff
style HW fill:#ff7f50,stroke:#cc5500,color:#fff
style LW1 fill:#ff7f50,stroke:#cc5500,color:#fff
style LW2 fill:#ff7f50,stroke:#cc5500,color:#fff
style LW3 fill:#ff7f50,stroke:#cc5500,color:#fff
style VW fill:#ff7f50,stroke:#cc5500,color:#fff
style RW1 fill:#ff7f50,stroke:#cc5500,color:#fff
style RW2 fill:#ff7f50,stroke:#cc5500,color:#fff
style RW3 fill:#ff7f50,stroke:#cc5500,color:#fff
style RNDW fill:#ff7f50,stroke:#cc5500,color:#fff
style PW fill:#ff7f50,stroke:#cc5500,color:#fff
Key Principles¶
Protocol-Based |
Operators implement Python |
Category System |
Every operator belongs to a category: |
Interactive Mode |
Operators run as subprocesses with |
Multi-Operator Comparison |
Multiple operators can run side-by-side on the same environment with shared seeds for scientific comparison (e.g., LLM vs RL on the same MiniGrid layout). |
Decoupled Execution |
Manual mode (click-to-step) and Script mode (automated experiments) are fully independent code paths with separate state machines. |
Available Operators¶
Operator |
Category |
Backend |
Use Case |
|---|---|---|---|
Human |
human |
Keyboard input via GUI |
Manual play and debugging |
BALROG LLM |
llm |
balrog_worker (vLLM, OpenRouter) |
Single-agent LLM benchmarking on MiniGrid/BabyAI |
MOSAIC LLM |
llm |
mosaic_llm_worker (vLLM, OpenRouter, OpenAI, Anthropic) |
Multi-agent LLM with coordination and Theory of Mind |
Chess LLM |
llm |
chess_worker (llm_chess prompting) |
LLM chess play with multi-turn dialog |
CleanRL |
rl |
cleanrl_worker (PPO, DQN) |
Trained single-agent RL policy evaluation |
XuanCe |
rl |
xuance_worker (MAPPO, QMIX) |
Trained multi-agent RL policy evaluation |
Ray RLlib |
rl |
ray_worker (PPO, IMPALA) |
Distributed RL policy evaluation |
MOSAIC Random Worker |
random |
random_worker (random action) |
Random action selection for experiments |
MOSAIC Passive Worker |
passive |
passive_worker (NOOP/STILL) |
Do-nothing agent for experiments |
Tip
An Operator wraps one or more Workers. The Operator is the
agent-level interface (select_action(obs) -> action) that the
GUI interacts with. The Worker is the process-level engine that
runs inside the Operator. This separation is what enables heterogeneous
teams – e.g., an RL-trained policy and an LLM playing side-by-side
in the same multi-agent environment. See What Is an Operator? for the
full motivation and diagrams.
Note
Policy Mappings for Multi-Agent RL: When deploying RL policies in multi-agent scenarios, MOSAIC supports flexible policy-to-agent mappings through link groups. This enables one-to-one mappings (each agent has its own policy) and one-to-many mappings (multiple agents share a single policy checkpoint). Link groups are essential for MAPPO/IPPO evaluation because these algorithms store all agents’ policies in a single checkpoint file. See PolicyMappingService for complete documentation.
- What Is an Operator?
- Homogeneous Decision-Makers
- Heterogeneous Decision-Maker
- Policy Mappings for Heterogeneous Multi-Agent Systems
- IPC Architecture
- Operator Lifecycle
- Developing an Operator
- Adding an Environment Family
- Step 1: Add to
ENV_FAMILIES - Step 2: Update
_auto_detect_agent_count - Step 3: Update
_get_execution_mode - Step 4: Update multi-agent guard tuples
- Step 5: Add environment creation
- Step 6: Add preview rendering
- Step 7: Add settings panel (optional)
- Environment Family Checklist
- Example:
mosaic_multigrid/ini_multigridSplit
- Step 1: Add to
- Operator Examples