Homogeneous Decision-Makers¶
Homogeneous Teams: Random vs LLM: Two homogeneous teams (all-Random vs all-LLM) competing in the same multi-agent environment. See MOSAIC Random Worker and MOSAIC LLM Worker.
A homogeneous setup is one where every agent in an experiment uses the same decision-making paradigm – all RL, all LLM, all human, or all baseline. This is the simplest and most common configuration in MOSAIC.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
ENV["Environment"]
subgraph "All Same Paradigm"
A0["Agent 0<br/>(RL)"]
A1["Agent 1<br/>(RL)"]
A2["Agent 2<br/>(RL)"]
end
ENV -- "obs" --> A0
ENV -- "obs" --> A1
ENV -- "obs" --> A2
A0 -- "action" --> ENV
A1 -- "action" --> ENV
A2 -- "action" --> ENV
style ENV fill:#4a90d9,stroke:#2e5a87,color:#fff
style A0 fill:#9370db,stroke:#6a0dad,color:#fff
style A1 fill:#9370db,stroke:#6a0dad,color:#fff
style A2 fill:#9370db,stroke:#6a0dad,color:#fff
The Operator Protocol¶
Every decision-maker in MOSAIC implements the same Protocol
(structural subtyping – no base class inheritance required):
class Operator(Protocol):
"""Any entity that selects actions from observations."""
@property
def id(self) -> str: ...
@property
def name(self) -> str: ...
def select_action(
self,
observation: Any,
legal_actions: Optional[list] = None,
) -> Any:
"""Return an action given an observation."""
...
def reset(self, seed: Optional[int] = None) -> None:
"""Reset internal state for a new episode."""
...
def on_step_result(
self,
observation: Any,
action: Any,
reward: float,
terminated: bool,
truncated: bool,
) -> None:
"""Receive feedback from the environment."""
...
Any Python object with these methods is a valid Operator. The GUI,
experiment runner, and telemetry system call select_action(obs) and
receive an action – they never need to know what kind of decision-maker
they are talking to.
Categories¶
Every operator belongs to one of five categories. The category determines how the GUI renders configuration controls, how the subprocess is launched, and what action source is used:
Human¶
The human operator returns None from select_action(). The GUI
intercepts this and injects keyboard input from the user.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OBS["Observation<br/>(rendered frame)"] --> GUI["GUI<br/>Keyboard Capture"]
GUI --> ACTION["Action<br/>(key mapping)"]
ACTION --> ENV["Environment"]
style OBS fill:#eee,stroke:#999,color:#333
style GUI fill:#9370db,stroke:#6a0dad,color:#fff
style ACTION fill:#eee,stroke:#999,color:#333
style ENV fill:#4a90d9,stroke:#2e5a87,color:#fff
Worker |
|
Configuration |
Key bindings (arrow keys, WASD, custom) |
Use case |
Manual play, debugging, recording demonstrations |
LLM¶
The LLM operator converts the observation into a text prompt, sends it to a language model API, and parses the response into an action.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OBS["Observation<br/>(grid / board)"] --> PROMPT["Text Prompt<br/>Generation"]
PROMPT --> API["LLM API<br/>vLLM / OpenRouter / OpenAI"]
API --> PARSE["Response<br/>Parsing"]
PARSE --> ACTION["Action"]
style OBS fill:#eee,stroke:#999,color:#333
style PROMPT fill:#ff7f50,stroke:#cc5500,color:#fff
style API fill:#9370db,stroke:#6a0dad,color:#fff
style PARSE fill:#ff7f50,stroke:#cc5500,color:#fff
style ACTION fill:#eee,stroke:#999,color:#333
MOSAIC provides three LLM workers, each serving a different purpose:
Worker |
Scope |
Description |
|---|---|---|
|
Multi-agent |
MOSAIC’s native LLM worker built for multi-agent coordination. Features 3-level coordination strategies (emergent, hint-based, role-based) and Theory of Mind observation modes. Supports MultiGrid, BabyAI, MiniHack, Crafter, TextWorld, BabaIsAI, MeltingPot, and PettingZoo environments. |
|
Single-agent |
Imported BALROG benchmark runner for single-agent LLM evaluation on MiniGrid, BabyAI, MiniHack, and Crafter. |
|
Two-player |
LLM chess play with multi-turn dialog prompting via the llm_chess framework. |
MOSAIC LLM coordination levels:
Level |
Name |
Behavior |
|---|---|---|
1 |
Emergent |
Minimal guidance; LLMs discover coordination naturally |
2 |
Basic Hints |
Cooperation tips without being prescriptive |
3 |
Role-Based |
Explicit roles (e.g., Forward/Defender) with detailed strategies |
- Configuration fields:
Provider, model ID, API key, base URL, temperature, coordination level, observation mode (egocentric / visible teammates), agent strategy (naive, chain-of-thought, few-shot, robust)
RL¶
The RL operator loads a trained neural network checkpoint and runs a forward pass to select actions.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OBS["Observation<br/>(tensor)"] --> NET["Neural Network<br/>Forward Pass"]
NET --> ACTION["Action<br/>(argmax / sample)"]
style OBS fill:#eee,stroke:#999,color:#333
style NET fill:#9370db,stroke:#6a0dad,color:#fff
style ACTION fill:#eee,stroke:#999,color:#333
Workers |
|
Configuration |
Policy checkpoint path, algorithm name |
Use case |
Evaluating trained RL policies, comparing algorithms, ablation studies |
Baseline¶
Baseline operators provide reference points for comparison. They use simple strategies like random action selection or fixed heuristics.
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
OBS["Observation<br/>(ignored)"] --> RNG["Random / Heuristic<br/>Action Selection"]
RNG --> ACTION["Action"]
style OBS fill:#eee,stroke:#999,color:#333
style RNG fill:#9370db,stroke:#6a0dad,color:#fff
style ACTION fill:#eee,stroke:#999,color:#333
Workers |
|
Configuration |
Max steps, seed |
Use case |
Sanity checks, lower-bound performance reference |
Single-Worker Pattern¶
In homogeneous setups, one Operator wraps one Worker subprocess. This is the most common pattern:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
OBS["observation"] --> OP
subgraph OP["Operator"]
SA["select_action()"]
WK["Worker subprocess"]
SA --> WK
end
OP --> ACT["action"]
style OP fill:#9370db,stroke:#6a0dad,color:#fff
style WK fill:#ff7f50,stroke:#cc5500,color:#fff
style OBS fill:#eee,stroke:#999,color:#333
style ACT fill:#eee,stroke:#999,color:#333
The Operator translates the protocol call into a JSON command sent over stdin to the Worker subprocess, and reads the response from stdout.
Configuration¶
Single-paradigm operators are created with the single_agent factory
method:
# LLM operator for BabyAI
config = OperatorConfig.single_agent(
operator_id="llm_1",
display_name="GPT-4o on BabyAI",
worker_id="balrog_worker",
worker_type="llm",
env_name="babyai",
task="BabyAI-GoToRedBall-v0",
settings={
"client_name": "vllm",
"model_id": "Qwen/Qwen2.5-1.5B-Instruct",
"base_url": "http://127.0.0.1:8000/v1",
},
)
# RL operator with trained policy
config = OperatorConfig.single_agent(
operator_id="rl_1",
display_name="PPO on CartPole",
worker_id="cleanrl_worker",
worker_type="rl",
env_name="gymnasium",
task="CartPole-v1",
settings={
"algorithm": "ppo",
"checkpoint": "runs/ppo_cartpole/model.pt",
},
)
# Random baseline
config = OperatorConfig.single_agent(
operator_id="baseline_1",
display_name="Random Baseline",
worker_id="operators_worker",
worker_type="baseline",
env_name="gymnasium",
task="CartPole-v1",
settings={},
)
Side-by-Side Comparison¶
Even within the same paradigm, running multiple operators in parallel is useful for comparing algorithms, hyperparameters, or models:
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TB
ENV1["MiniGrid-DoorKey-6x6-v0<br/>seed=42"]
ENV2["MiniGrid-DoorKey-6x6-v0<br/>seed=42"]
ENV3["MiniGrid-DoorKey-6x6-v0<br/>seed=42"]
OP1["LLM Operator<br/>GPT-4o"]
OP2["LLM Operator<br/>Claude 3.5"]
OP3["LLM Operator<br/>Llama 3 70B"]
ENV1 --> OP1
ENV2 --> OP2
ENV3 --> OP3
style ENV1 fill:#4a90d9,stroke:#2e5a87,color:#fff
style ENV2 fill:#4a90d9,stroke:#2e5a87,color:#fff
style ENV3 fill:#4a90d9,stroke:#2e5a87,color:#fff
style OP1 fill:#9370db,stroke:#6a0dad,color:#fff
style OP2 fill:#9370db,stroke:#6a0dad,color:#fff
style OP3 fill:#9370db,stroke:#6a0dad,color:#fff
Each operator gets its own environment instance but shares the same
seed schedule, ensuring identical initial conditions. The
MultiOperatorService manages this:
class MultiOperatorService:
def add_operator(self, config: OperatorConfig) -> None: ...
def remove_operator(self, operator_id: str) -> None: ...
def start_all(self) -> None: ...
def stop_all(self) -> None: ...
GUI Adaptation by Category¶
The GUI automatically adapts its controls based on the operator
category. The OperatorConfigWidget renders different configuration
fields for each type:
Category |
GUI Controls |
Component |
|---|---|---|
|
Key binding display, action buttons |
|
|
Provider dropdown, model selector, API key field, base URL, temperature slider |
|
|
Policy file picker, algorithm dropdown |
|
|
Max steps spinner, seed field |
|
The OperatorRenderContainer also adapts its header to show a
color-coded type badge:
LLM – blue badge
RL – purple badge
Human – orange badge
Baseline – gray badge