MOSAIC LLM Worker¶

LLM vs LLM Chess: Two LLM agents playing PettingZoo Chess (chess_v6) through the MOSAIC LLM Worker action-selector protocol.

The MOSAIC LLM Worker is MOSAIC’s native multi-agent LLM coordination and adversarial evaluation worker. It drives pre-trained language models through MOSAIC environments with built-in support for LLM coordination (cooperative teammates), LLM adversarial (competitive opponents), Theory of Mind observations, and three coordination-level prompt strategies.

The MOSAIC LLM Worker grew out of the BALROG Worker integration. After wrapping BALROG with a shim for single-agent LLM evaluation, the potential of multi-agent setups became clear, both homogeneous (all-LLM teams) and heterogeneous (LLM + RL + Human) configurations. This motivated extending the BALROG foundation into a purpose-built worker for multi-agent LLM research, supporting cooperative coordination, adversarial matchups, and cross-paradigm evaluation within the MOSAIC Operator framework.

Paradigm	Multi-agent LLM coordination and adversarial (also single-agent)
Task Type	LLM coordination (cooperative teams), LLM adversarial (competitive opponents), heterogeneous teams (LLM + RL + Human)
Model Support	OpenRouter (unified), OpenAI, Anthropic, Google Gemini, vLLM (local)
Environments	MultiGrid (Soccer 1v1/2v2, Collect), BabyAI, MiniGrid, MiniHack, Crafter, TextWorld, BabaIsAI, MeltingPot, PettingZoo
Execution	Subprocess (autonomous or interactive step-by-step)
GPU required	No (API-based) / Optional (vLLM local inference)
Source	`3rd_party/mosaic/llm_worker/llm_worker/`
Entry point	`llm-worker` (CLI)

Overview¶

The MOSAIC LLM Worker bridges pre-trained language models and MOSAIC’s multi-agent environments. It converts raw grid observations into natural language, feeds them to an LLM, and parses the LLM’s text response back into discrete actions.

This enables two complementary research directions:

LLM Coordination: Can LLMs cooperate as teammates? Do they develop emergent strategies? Does Theory of Mind information improve team play?
LLM Adversarial: How do different LLM models perform head-to-head? Can an LLM team compete against RL-trained policies?

Combined with MOSAIC’s heterogeneous decision-maker, the worker enables heterogeneous experiments (e.g., RL + LLM teammates vs RL + Random opponents) that were previously impossible in any single framework.

Key features:

LLM coordination: cooperative multi-agent teams with shared objectives
LLM adversarial: head-to-head matchups between different LLM models or paradigms
Homogeneous and heterogeneous setups: all-LLM teams or mixed LLM + RL + Human teams
Theory of Mind observations: egocentric or teammate-aware text
3 coordination levels: emergent, basic hints, role-based (Forward/Defender)
Pluggable API backends: OpenRouter, OpenAI, Anthropic, Google Gemini, vLLM
6 agent strategies: naive, chain-of-thought, robust variants, few-shot, dummy
Dual runtime modes: autonomous (batch episodes) or interactive (GUI step-by-step)
Action-selector mode: for PettingZoo games where GUI owns the environment
JSONL telemetry: streamed to GUI and written to disk

Architecture¶

The worker follows the standard MOSAIC shim pattern with two runtime modes:

        graph TB
    subgraph "MOSAIC GUI"
        FORM["Operator Config<br/>(per-player model)"]
        DAEMON["Operator Launcher"]
    end

    subgraph "LLM Worker Subprocess"
        CLI["cli.py<br/>(llm-worker)"]
        CFG["config.py<br/>(LLMWorkerConfig)"]
        RT["runtime.py<br/>(LLMWorkerRuntime /<br/>InteractiveLLMRuntime)"]
        OBS["observations.py<br/>(grid → text)"]
        PROMPT["prompts.py<br/>(3 coordination levels)"]
        CLIENT["client.py<br/>(OpenAI / Claude / Gemini)"]
    end

    subgraph "LLM API"
        API["OpenRouter / OpenAI<br/>Anthropic / Gemini / vLLM"]
    end

    FORM -->|"config JSON"| DAEMON
    DAEMON -->|"spawn"| CLI
    CLI --> CFG --> RT
    RT --> OBS
    RT --> PROMPT
    RT --> CLIENT
    CLIENT -->|"chat.completions"| API

    style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
    style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
    style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
    style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
    style RT fill:#ff7f50,stroke:#cc5500,color:#fff
    style OBS fill:#dda0dd,stroke:#993399,color:#333
    style PROMPT fill:#dda0dd,stroke:#993399,color:#333
    style CLIENT fill:#ff7f50,stroke:#cc5500,color:#fff
    style API fill:#e8e8e8,stroke:#999

Observation Pipeline¶

Raw grid observations are converted to natural language before being sent to the LLM. The pipeline handles both single-agent and multi-agent environments:

3x3x3 numpy array  -->  observations.py  -->  Natural language  -->  LLM
                         (type/color/state      "You see:
                          decoding)              - red ball 1 step ahead
                                                 - green goal 2 steps east
                                                 You are facing: EAST
                                                 You are carrying: nothing"

Two observation modes (for Theory of Mind research):

Mode	Description	Research Purpose
Egocentric	Agent sees only its own 3x3 local view	Decentralized control baseline
Visible Teammates	Egocentric + teammate positions, directions, carrying status	Theory of Mind: can LLMs reason about teammate intentions?

Coordination Levels¶

Three prompt strategies study how explicit guidance affects multi-agent coordination:

Level	Strategy	Description
Level 1	Emergent	Minimal guidance. Tests whether LLMs discover coordination naturally without hints.
Level 2	Basic Hints	Adds cooperation tips (“spread out”, “don’t all chase the ball”). Balances emergence with guidance.
Level 3	Role-Based	Explicit Forward/Defender roles with detailed strategies. Tests whether role division improves team performance.

Supported Environments¶

Environment	Agents	Actions	Notes
Soccer 1v1	2	8 (Legacy)	Team Green vs Team Red, first to 2 goals
Soccer 2v2	4	8 (Legacy)	16x11 FIFA grid, passing and stealing mechanics
Collect 1v1 / 2v2	2–4	8 (Legacy)	Ball collection race
BabyAI / MiniGrid	1	7	GoTo, Pickup, Open tasks with text descriptions
MeltingPot	2–16	varies	Social dilemmas, cooperation and competition substrates
Crafter	1	varies	Open-world survival via BALROG wrapper
PettingZoo	2+	varies	Chess, Connect Four, Go, Tic-Tac-Toe (action-selector mode)

MultiGrid action space (Legacy — Soccer, Collect):

still     - do nothing (wait in place)
left      - turn left 90 degrees
right     - turn right 90 degrees
forward   - move one step in facing direction
pickup    - pick up object or steal from opponent
drop      - drop held object (scores at goal, or pass to teammate)
toggle    - interact with object in front
done      - signal completion

Supported Models¶

Backend	Models	Notes
OpenRouter (default)	All major providers via unified API: GPT-4o, Claude 3.5, Gemini, Llama, Mistral, etc.	Requires `OPENROUTER_API_KEY`. Free-tier models available (Nemotron 3 Nano, Arcee Trinity, etc.)
OpenAI	GPT-4o, GPT-4-turbo, GPT-3.5-turbo	Requires `OPENAI_API_KEY`
Anthropic	Claude 3 Opus/Sonnet/Haiku	Requires `ANTHROPIC_API_KEY`
Google Gemini	Gemini 2.0 Flash, Gemini 1.5 Pro	Requires `GOOGLE_API_KEY`
vLLM (local)	Any HuggingFace-compatible model	Self-hosted, `--base-url http://localhost:8000/v1`

Agent Strategies¶

Type	Description
`naive`	Direct observation-to-action mapping. Minimal prompt.
`cot`	Chain-of-thought reasoning before action selection.
`robust_naive`	Naive with retry and fallback on parse failure.
`robust_cot`	Chain-of-thought with retry and fallback.
`few_shot`	In-context learning with example trajectories.
`dummy`	Random actions for baseline comparison.

Runtime Modes¶

Autonomous mode (batch episodes):

llm-worker --run-id test123 \
    --env multigrid \
    --task MosaicMultiGrid-Soccer-1vs1-IndAgObs-v0 \
    --client openrouter \
    --model nvidia/nemotron-3-nano-30b-a3b:free \
    --num-episodes 10 --max-steps 200

Interactive mode (GUI step-by-step):

llm-worker --run-id test123 --interactive \
    --env multigrid \
    --task MosaicMultiGrid-Soccer-2vs2-IndAgObs-v0

Interactive mode reads JSON commands from stdin and emits telemetry to stdout:

{"cmd": "reset", "seed": 42}
{"cmd": "step"}
{"cmd": "init_agent", "game_name": "chess_v6", "player_id": "player_0"}
{"cmd": "select_action", "observation": "...", "player_id": "player_0"}
{"cmd": "stop"}

Multi-Agent Configurations¶

The MOSAIC LLM Worker supports the full range of operator configurations:

LLM Adversarial: different models compete head-to-head:

Soccer 1v1:
  Agent 0 (Green) --> Nemotron 3 Nano (OpenRouter, free)
  Agent 1 (Red)   --> Arcee Trinity Large (OpenRouter, free)

Soccer 2v2:
  Agent 0 (Green) --> Model A     Agent 2 (Red) --> Model B
  Agent 1 (Green) --> Model A     Agent 3 (Red) --> Model B

LLM Coordination: same or different models cooperate as teammates:

Soccer 2v2 (homogeneous):
  Agent 0 (Green) --> GPT-4o      Agent 2 (Red) --> GPT-4o
  Agent 1 (Green) --> GPT-4o      Agent 3 (Red) --> GPT-4o

Heterogeneous (LLM + RL): cross-paradigm teams (see heterogeneous decision-maker):

Soccer 2v2:
  Agent 0 (Green) --> RL (MAPPO)   Agent 2 (Red) --> RL (MAPPO)
  Agent 1 (Green) --> LLM (GPT-4o) Agent 3 (Red) --> Random Baseline

Each agent runs in its own worker subprocess. The GUI collects actions from all workers simultaneously and steps the environment in parallel mode.

Configuration¶

JSON config (launched by GUI or CLI):

{
  "run_id": "soccer_llm_vs_llm_001",
  "env_name": "multigrid",
  "task": "MosaicMultiGrid-Soccer-1vs1-IndAgObs-v0",
  "client_name": "openrouter",
  "model_id": "nvidia/nemotron-3-nano-30b-a3b:free",
  "agent_type": "cot",
  "num_episodes": 10,
  "max_steps": 200,
  "temperature": 0.7,
  "observation_mode": "visible_teammates",
  "coordination_level": 2,
  "role": "forward"
}

MultiGrid-specific config fields:

Field	Default	Description
`agent_id`	`0`	Agent index for multi-agent environments (0–3)
`observation_mode`	`visible_teammates`	`"egocentric"` or `"visible_teammates"` (Theory of Mind)
`coordination_level`	`1`	`1` = Emergent, `2` = Basic Hints, `3` = Role-Based
`role`	`None`	Agent role for Level 3: `"forward"` or `"defender"`