Operator Examples¶

MOSAIC ships with several operator types, each demonstrating a different decision-making strategy.

Human Operator¶

Category: human

The simplest operator, it returns None for every action, signalling the GUI to inject keyboard input.

class HumanOperator:
    @property
    def id(self) -> str:
        return "human_keyboard"

    @property
    def name(self) -> str:
        return "Human (Keyboard)"

    def select_action(self, observation, legal_actions=None):
        return None  # GUI injects keyboard action

    def reset(self, seed=None):
        pass

    def on_step_result(self, observation, action, reward,
                       terminated, truncated):
        pass

The Human Operator is always available and registered at startup. No subprocess is spawned, actions come directly from Qt keyboard events.

Best for: Manual exploration, debugging, understanding environments.

BALROG LLM Operator¶

Category: llm Worker: barlog_worker Environments: BabyAI, MiniGrid, MiniHack, Crafter, TextWorld

The BALROG operator uses the BALROG benchmark prompting style to evaluate LLMs as agents in grid-world environments.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    GUI["GUI"] -->|"stdin"| BW["barlog_worker<br/>InteractiveRuntime"]
    BW --> AGENT["BALROG Agent<br/>(naive/cot/robust)"]
    AGENT --> LLM["LLM API<br/>(vLLM / OpenRouter)"]
    LLM --> AGENT
    AGENT --> ENV["BabyAI / MiniGrid"]
    BW -->|"stdout"| GUI

    style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
    style BW fill:#ff7f50,stroke:#cc5500,color:#fff
    style LLM fill:#ffd700,stroke:#b8860b

Prompting style: Single-turn, BALROG-style:

System: You are an agent in a grid world.
Available actions: turn_left, turn_right, forward, ...

User: [observation text/image]
What action do you take?

Assistant: forward

Configuration:

config = OperatorConfig.single_agent(
    operator_id="llm_balrog",
    operator_type="llm",
    worker_id="barlog_worker",
    display_name="GPT-4o on BabyAI",
    env_name="babyai",
    task="BabyAI-GoToRedBall-v0",
    settings={
        "client_name": "vllm",
        "model_id": "Qwen/Qwen2.5-1.5B-Instruct",
        "base_url": "http://127.0.0.1:8000/v1",
        "agent_type": "naive",
    },
)

Supported LLM providers:

Provider	`client_name`	Notes
vLLM (local)	`vllm`	Local GPU inference, fastest
OpenRouter	`openrouter`	Multi-model gateway (GPT-4, Claude, Llama)
OpenAI	`openai`	GPT-4o, GPT-4-turbo
Anthropic	`anthropic`	Claude 3.5 Sonnet, Claude 3 Opus
Google	`google`	Gemini Pro, Gemini Ultra

Chess LLM Operator¶

Category: llm Worker: chess_worker Environments: PettingZoo chess_v6

A specialized operator that uses the llm_chess prompting style for multi-turn chess play.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
    subgraph "Chess Worker"
        RT["ChessWorkerRuntime"]
        CONV["Multi-turn<br/>Conversation"]
        VALID["Move Validator<br/>(regex + legal check)"]
    end

    LLM["LLM API"] <--> CONV
    RT --> CONV --> VALID
    VALID -->|"valid UCI move"| OUT["stdout response"]

    style RT fill:#ff7f50,stroke:#cc5500,color:#fff
    style LLM fill:#ffd700,stroke:#b8860b

Key differences from BALROG:

Feature	BALROG Operator	Chess Operator
Prompting	Single-turn	Multi-turn dialog
Actions	Raw text to action	`get_board`, `get_legal_moves`, `make_move`
Validation	None	Regex + legal move check
Retry	No (falls back to random)	Yes (up to 3 attempts)

Multi-turn conversation example:

System: You are a professional chess player playing as white.
Actions: get_current_board, get_legal_moves, make_move <UCI>

User: Current position: [board]. Legal moves: e2e4, d2d4, g1f3...
LLM:  get_legal_moves
User: Legal moves: e2e4, d2d4, g1f3, b1c3, ...
LLM:  make_move e2e4

Retry on invalid move:

LLM:  make_move e5e6    (illegal)
User: Invalid move 'e5e6'. Legal moves are: e2e4, d2d4, ...
LLM:  make_move e2e4    (valid)

Configuration:

config = OperatorConfig.single_agent(
    operator_id="chess_llm",
    operator_type="llm",
    worker_id="chess_worker",
    display_name="Claude on Chess",
    env_name="pettingzoo",
    task="chess_v6",
    settings={
        "client_name": "openrouter",
        "model_id": "anthropic/claude-3.5-sonnet",
        "base_url": "https://openrouter.ai/api/v1",
        "api_key": "sk-or-...",
        "temperature": 0.3,
        "max_retries": 3,
    },
)

RL Operator (CleanRL Interactive)¶

Category: rl Worker: cleanrl_worker Environments: Any Gymnasium environment

The RL operator loads a trained CleanRL checkpoint and runs inference step-by-step under GUI control. This enables side-by-side comparison of trained RL policies against LLM agents on the same environment with shared seeds.

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    GUI["GUI"] -->|"stdin"| CW["cleanrl_worker<br/>InteractiveRuntime"]
    CW --> POLICY["Trained Policy<br/>(PPO / DQN)"]
    CW --> ENV["Gymnasium Env"]
    POLICY -->|"action"| ENV
    CW -->|"stdout"| GUI

    style GUI fill:#4a90d9,stroke:#2e5a87,color:#fff
    style CW fill:#ff7f50,stroke:#cc5500,color:#fff
    style POLICY fill:#f0e68c,stroke:#bdb76b

Configuration:

config = OperatorConfig.single_agent(
    operator_id="rl_ppo",
    operator_type="rl",
    worker_id="cleanrl_worker",
    display_name="PPO on MiniGrid",
    env_name="minigrid",
    task="MiniGrid-Empty-8x8-v0",
    settings={
        "policy_path": "var/runs/ppo_minigrid/model.cleanrl_model",
        "algorithm": "ppo",
    },
)

Wrapper considerations: The InteractiveRuntime must apply the same observation wrappers used during training (e.g., ImgObsWrapper and FlattenObservation for MiniGrid).

MOSAIC MultiGrid Operator (Multi-Agent)¶

Category: llm or rl Environments: mosaic_multigrid family (Soccer, Collect, Basketball)

The mosaic_multigrid family provides competitive team-based multi-agent games. Environments are registered via the mosaic-multigrid PyPI package and created using gym.make().

Configuration (2v2 Soccer with IndAgObs):

config = OperatorConfig.multi_agent(
    operator_id="soccer_2v2",
    operator_type="rl",
    worker_id="cleanrl_worker",
    display_name="Soccer 2v2 IndAgObs",
    env_name="mosaic_multigrid",
    task="MosaicMultiGrid-Soccer-2vs2-IndAgObs-v0",
    workers={
        "agent_0": WorkerConfig(worker_id="cleanrl_worker", settings={}),
        "agent_1": WorkerConfig(worker_id="cleanrl_worker", settings={}),
        "agent_2": WorkerConfig(worker_id="cleanrl_worker", settings={}),
        "agent_3": WorkerConfig(worker_id="cleanrl_worker", settings={}),
    },
)

Available environment tiers:

Tier	Envs	Notes
Original (v1.0.2)	4	Deprecated, backward compatible
IndAgObs (v4.0.0)	6	Individual agent observations, recommended for RL training
TeamObs (v4.0.0)	3	SMAC-style teammate awareness (2v2+ only)

Best for: Multi-agent RL research, team coordination, competitive zero-sum games.

Note

The mosaic_multigrid package (v4.4.0+) uses the modern Gymnasium API. All environments are registered via gymnasium.register() in mosaic_multigrid.envs. The preview and runtime use gymnasium.make(task) – never hardcode class imports.

Random Operator¶

Category: random Worker: random_worker Environments: Any

The simplest non-human operator – selects random actions from the environment’s action space. Used as a performance baseline in scripted experiments.

config = OperatorConfig.single_agent(
    operator_id="random_1",
    operator_type="random",
    worker_id="random_worker",
    display_name="Random Agent",
    env_name="minigrid",
    task="MiniGrid-Empty-8x8-v0",
)

Multi-Agent Comparison¶

The most powerful use of operators is side-by-side comparison of different decision-making strategies on the same environment:

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
    subgraph "Shared Environment Configuration"
        SEED["Shared Seed: 42"]
        ENV["MiniGrid-Empty-8x8-v0"]
    end

    subgraph "Operator 1"
        O1["LLM Operator<br/>GPT-4o"]
        R1["Render Container 1"]
    end

    subgraph "Operator 2"
        O2["RL Operator<br/>PPO (trained)"]
        R2["Render Container 2"]
    end

    subgraph "Operator 3"
        O3["Random Baseline"]
        R3["Render Container 3"]
    end

    SEED --> O1
    SEED --> O2
    SEED --> O3
    ENV --> O1
    ENV --> O2
    ENV --> O3

    style SEED fill:#f0e68c,stroke:#bdb76b
    style O1 fill:#9370db,stroke:#6a0dad,color:#fff
    style O2 fill:#9370db,stroke:#6a0dad,color:#fff
    style O3 fill:#9370db,stroke:#6a0dad,color:#fff
    style R1 fill:#4a90d9,stroke:#2e5a87,color:#fff
    style R2 fill:#4a90d9,stroke:#2e5a87,color:#fff
    style R3 fill:#4a90d9,stroke:#2e5a87,color:#fff

Each operator gets its own:

Environment instance (same seed for reproducibility)
Render container in the GUI
Subprocess with independent state
Telemetry output file

Script for 3-way comparison:

# compare_llm_rl_random.py

operators = [
    {
        "id": "llm_gpt4o",
        "name": "GPT-4o",
        "type": "llm",
        "worker_id": "barlog_worker",
        "env_name": "babyai",
        "task": "BabyAI-GoToRedBall-v0",
        "settings": {
            "client_name": "openrouter",
            "model_id": "openai/gpt-4o",
        },
    },
    {
        "id": "rl_ppo",
        "name": "Trained PPO",
        "type": "rl",
        "worker_id": "cleanrl_worker",
        "env_name": "minigrid",
        "task": "BabyAI-GoToRedBall-v0",
        "settings": {
            "policy_path": "var/runs/ppo_babyai/model.cleanrl_model",
            "algorithm": "ppo",
        },
    },
    {
        "id": "random_agent",
        "name": "Random",
        "type": "random",
        "worker_id": "random_worker",
        "env_name": "minigrid",
        "task": "BabyAI-GoToRedBall-v0",
    },
]

execution = {
    "num_episodes": 50,
    "seeds": list(range(1000, 1050)),
    "step_delay_ms": 0,           # Fastest (no visual pacing)
    "env_mode": "procedural",     # Different layout per episode
}

Comparison Table¶

Feature	Human	BALROG LLM	Chess LLM	CleanRL RL	Random	MOSAIC LLM
Category	human	llm	llm	rl	random	llm
Subprocess	No	Yes	Yes	Yes	Yes	Yes
Training	No	No	No	Yes (offline)	No	No
Multi-agent	Yes	No	Yes	No	No	Yes
Retry Logic	N/A	No	Yes (3x)	N/A	N/A	No
GPU Required	No	Optional	Optional	No	No	Optional