Operator Examples¶
MOSAIC ships with several operator types, each demonstrating a different decision-making strategy.
Human Operator¶
Category: human
The simplest operator, it returns None for every action,
signalling the GUI to inject keyboard input.
class HumanOperator:
@property
def id(self) -> str:
return "human_keyboard"
@property
def name(self) -> str:
return "Human (Keyboard)"
def select_action(self, observation, legal_actions=None):
return None # GUI injects keyboard action
def reset(self, seed=None):
pass
def on_step_result(self, observation, action, reward,
terminated, truncated):
pass
The Human Operator is always available and registered at startup. No subprocess is spawned, actions come directly from Qt keyboard events.
Best for: Manual exploration, debugging, understanding environments.
BALROG LLM Operator¶
Category: llm
Worker: barlog_worker
Environments: BabyAI, MiniGrid, MiniHack, Crafter, TextWorld
The BALROG operator uses the BALROG benchmark prompting style to evaluate LLMs as agents in grid-world environments.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
GUI["GUI"] -->|"stdin"| BW["barlog_worker<br/>InteractiveRuntime"]
BW --> AGENT["BALROG Agent<br/>(naive/cot/robust)"]
AGENT --> LLM["LLM API<br/>(vLLM / OpenRouter)"]
LLM --> AGENT
AGENT --> ENV["BabyAI / MiniGrid"]
BW -->|"stdout"| GUI
style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
style BW fill:#ff7f50,stroke:#cc5500,color:#fff
style LLM fill:#ffd700,stroke:#b8860b
Prompting style: Single-turn, BALROG-style:
System: You are an agent in a grid world.
Available actions: turn_left, turn_right, forward, ...
User: [observation text/image]
What action do you take?
Assistant: forward
Configuration:
config = OperatorConfig.single_agent(
operator_id="llm_balrog",
operator_type="llm",
worker_id="barlog_worker",
display_name="GPT-4o on BabyAI",
env_name="babyai",
task="BabyAI-GoToRedBall-v0",
settings={
"client_name": "vllm",
"model_id": "Qwen/Qwen2.5-1.5B-Instruct",
"base_url": "http://127.0.0.1:8000/v1",
"agent_type": "naive",
},
)
Supported LLM providers:
Provider |
|
Notes |
|---|---|---|
vLLM (local) |
|
Local GPU inference, fastest |
OpenRouter |
|
Multi-model gateway (GPT-4, Claude, Llama) |
OpenAI |
|
GPT-4o, GPT-4-turbo |
Anthropic |
|
Claude 3.5 Sonnet, Claude 3 Opus |
|
Gemini Pro, Gemini Ultra |
Chess LLM Operator¶
Category: llm
Worker: chess_worker
Environments: PettingZoo chess_v6
A specialized operator that uses the llm_chess prompting style for multi-turn chess play.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
subgraph "Chess Worker"
RT["ChessWorkerRuntime"]
CONV["Multi-turn<br/>Conversation"]
VALID["Move Validator<br/>(regex + legal check)"]
end
LLM["LLM API"] <--> CONV
RT --> CONV --> VALID
VALID -->|"valid UCI move"| OUT["stdout response"]
style RT fill:#ff7f50,stroke:#cc5500,color:#fff
style LLM fill:#ffd700,stroke:#b8860b
Key differences from BALROG:
Feature |
BALROG Operator |
Chess Operator |
|---|---|---|
Prompting |
Single-turn |
Multi-turn dialog |
Actions |
Raw text to action |
|
Validation |
None |
Regex + legal move check |
Retry |
No (falls back to random) |
Yes (up to 3 attempts) |
Multi-turn conversation example:
System: You are a professional chess player playing as white.
Actions: get_current_board, get_legal_moves, make_move <UCI>
User: Current position: [board]. Legal moves: e2e4, d2d4, g1f3...
LLM: get_legal_moves
User: Legal moves: e2e4, d2d4, g1f3, b1c3, ...
LLM: make_move e2e4
Retry on invalid move:
LLM: make_move e5e6 (illegal)
User: Invalid move 'e5e6'. Legal moves are: e2e4, d2d4, ...
LLM: make_move e2e4 (valid)
Configuration:
config = OperatorConfig.single_agent(
operator_id="chess_llm",
operator_type="llm",
worker_id="chess_worker",
display_name="Claude on Chess",
env_name="pettingzoo",
task="chess_v6",
settings={
"client_name": "openrouter",
"model_id": "anthropic/claude-3.5-sonnet",
"base_url": "https://openrouter.ai/api/v1",
"api_key": "sk-or-...",
"temperature": 0.3,
"max_retries": 3,
},
)
RL Operator (CleanRL Interactive)¶
Category: rl
Worker: cleanrl_worker
Environments: Any Gymnasium environment
The RL operator loads a trained CleanRL checkpoint and runs inference step-by-step under GUI control. This enables side-by-side comparison of trained RL policies against LLM agents on the same environment with shared seeds.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
GUI["GUI"] -->|"stdin"| CW["cleanrl_worker<br/>InteractiveRuntime"]
CW --> POLICY["Trained Policy<br/>(PPO / DQN)"]
CW --> ENV["Gymnasium Env"]
POLICY -->|"action"| ENV
CW -->|"stdout"| GUI
style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
style CW fill:#ff7f50,stroke:#cc5500,color:#fff
style POLICY fill:#f0e68c,stroke:#bdb76b
Configuration:
config = OperatorConfig.single_agent(
operator_id="rl_ppo",
operator_type="rl",
worker_id="cleanrl_worker",
display_name="PPO on MiniGrid",
env_name="minigrid",
task="MiniGrid-Empty-8x8-v0",
settings={
"policy_path": "var/runs/ppo_minigrid/model.cleanrl_model",
"algorithm": "ppo",
},
)
Wrapper considerations: The InteractiveRuntime must apply the
same observation wrappers used during training (e.g., ImgObsWrapper
and FlattenObservation for MiniGrid).
MOSAIC MultiGrid Operator (Multi-Agent)¶
Category: llm or rl
Environments: mosaic_multigrid family (Soccer, Collect, Basketball)
The mosaic_multigrid family provides competitive team-based multi-agent
games. Environments are registered via the mosaic-multigrid PyPI
package and created using gym.make().
Configuration (2v2 Soccer with IndAgObs):
config = OperatorConfig.multi_agent(
operator_id="soccer_2v2",
operator_type="rl",
worker_id="cleanrl_worker",
display_name="Soccer 2v2 IndAgObs",
env_name="mosaic_multigrid",
task="MosaicMultiGrid-Soccer-2vs2-IndAgObs-v0",
workers={
"agent_0": WorkerConfig(worker_id="cleanrl_worker", settings={}),
"agent_1": WorkerConfig(worker_id="cleanrl_worker", settings={}),
"agent_2": WorkerConfig(worker_id="cleanrl_worker", settings={}),
"agent_3": WorkerConfig(worker_id="cleanrl_worker", settings={}),
},
)
Available environment tiers:
Tier |
Envs |
Notes |
|---|---|---|
Original (v1.0.2) |
4 |
Deprecated, backward compatible |
IndAgObs (v4.0.0) |
6 |
Individual agent observations, recommended for RL training |
TeamObs (v4.0.0) |
3 |
SMAC-style teammate awareness (2v2+ only) |
Best for: Multi-agent RL research, team coordination, competitive zero-sum games.
Note
The mosaic_multigrid package (v4.4.0+) uses the modern
Gymnasium API. All environments are registered via
gymnasium.register() in mosaic_multigrid.envs. The preview
and runtime use gymnasium.make(task) – never hardcode class
imports.
Random Operator¶
Category: random
Worker: random_worker
Environments: Any
The simplest non-human operator – selects random actions from the environment’s action space. Used as a performance baseline in scripted experiments.
config = OperatorConfig.single_agent(
operator_id="random_1",
operator_type="random",
worker_id="random_worker",
display_name="Random Agent",
env_name="minigrid",
task="MiniGrid-Empty-8x8-v0",
)
Multi-Agent Comparison¶
The most powerful use of operators is side-by-side comparison of different decision-making strategies on the same environment:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
subgraph "Shared Environment Configuration"
SEED["Shared Seed: 42"]
ENV["MiniGrid-Empty-8x8-v0"]
end
subgraph "Operator 1"
O1["LLM Operator<br/>GPT-4o"]
R1["Render Container 1"]
end
subgraph "Operator 2"
O2["RL Operator<br/>PPO (trained)"]
R2["Render Container 2"]
end
subgraph "Operator 3"
O3["Random Baseline"]
R3["Render Container 3"]
end
SEED --> O1
SEED --> O2
SEED --> O3
ENV --> O1
ENV --> O2
ENV --> O3
style SEED fill:#f0e68c,stroke:#bdb76b
style O1 fill:#9370db,stroke:#6a0dad,color:#fff
style O2 fill:#9370db,stroke:#6a0dad,color:#fff
style O3 fill:#9370db,stroke:#6a0dad,color:#fff
style R1 fill:#4a90d9,stroke:#2e5a87,color:#fff
style R2 fill:#4a90d9,stroke:#2e5a87,color:#fff
style R3 fill:#4a90d9,stroke:#2e5a87,color:#fff
Each operator gets its own:
Environment instance (same seed for reproducibility)
Render container in the GUI
Subprocess with independent state
Telemetry output file
Script for 3-way comparison:
# compare_llm_rl_random.py
operators = [
{
"id": "llm_gpt4o",
"name": "GPT-4o",
"type": "llm",
"worker_id": "barlog_worker",
"env_name": "babyai",
"task": "BabyAI-GoToRedBall-v0",
"settings": {
"client_name": "openrouter",
"model_id": "openai/gpt-4o",
},
},
{
"id": "rl_ppo",
"name": "Trained PPO",
"type": "rl",
"worker_id": "cleanrl_worker",
"env_name": "minigrid",
"task": "BabyAI-GoToRedBall-v0",
"settings": {
"policy_path": "var/runs/ppo_babyai/model.cleanrl_model",
"algorithm": "ppo",
},
},
{
"id": "random_agent",
"name": "Random",
"type": "random",
"worker_id": "random_worker",
"env_name": "minigrid",
"task": "BabyAI-GoToRedBall-v0",
},
]
execution = {
"num_episodes": 50,
"seeds": list(range(1000, 1050)),
"step_delay_ms": 0, # Fastest (no visual pacing)
"env_mode": "procedural", # Different layout per episode
}
Comparison Table¶
Feature |
Human |
BALROG LLM |
Chess LLM |
CleanRL RL |
Random |
MOSAIC LLM |
|---|---|---|---|---|---|---|
Category |
human |
llm |
llm |
rl |
random |
llm |
Subprocess |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
Training |
No |
No |
No |
Yes (offline) |
No |
No |
Multi-agent |
Yes |
No |
Yes |
No |
No |
Yes |
Retry Logic |
N/A |
No |
Yes (3x) |
N/A |
N/A |
No |
GPU Required |
No |
Optional |
Optional |
No |
No |
Optional |