MOSAIC Human Worker¶
The MOSAIC Human Worker enables human-in-the-loop which is essentially to control the agent via keyboard for any Gymnasium-compatible environment. It bridges human decision-making with MOSAIC’s multi-agent evaluation framework, allowing researchers to play alongside or against RL, LLM, and random agents.
Important: The human owns the agent for the entire episode. Once an operator is configured with a human worker, that agent slot is controlled exclusively by the human. There is no automatic switching to an AI policy mid-execution. The human remains in control from reset to episode end, ensuring clean and comparable evaluation data.
The worker operates in two modes: interactive mode where the worker owns the environment and the GUI sends human-chosen actions via action buttons, and board-game mode where the GUI owns the environment (PettingZoo games) and the worker handles move selection with legal-move validation.
Paradigm |
Human-in-the-loop (single-agent and multi-agent) |
Task Type |
Human vs AI, human + AI cooperative, human baseline evaluation |
Modes |
|
Environments |
MiniGrid, BabyAI, MosaicMultiGrid, Crafter, MiniHack/NetHack, PettingZoo (Chess, Go, Connect Four), Gymnasium Classic Control |
Execution |
Subprocess (interactive step-by-step via GUI) |
GPU required |
No |
Source |
|
Entry point |
|
Overview¶
The MOSAIC Human Worker turns any MOSAIC environment into a playable game. The GUI displays rendered frames, action buttons with environment-specific labels, and episode statistics. The human clicks an action, the worker steps the environment, and the next frame appears.
This enables several research workflows:
Human baseline: Establish human-level performance benchmarks for comparison against RL and LLM agents.
Human-AI teams (Cooperation): Deploy a human teammate alongside RL or LLM agents in cooperative multi-agent environments (Soccer 2v2, Overcooked).
Human-AI Adversarial (Competition): Deploy human players against trained RL policies or LLM agents in competitive environments.
Environment debugging: Manually explore environments to understand dynamics, test reward functions, and verify rendering.
Key features:
Environment-aware action labels: “Turn Left”, “Forward”, “Pickup” for MiniGrid; “Push Left”, “Push Right” for CartPole; “Noop”…”Make Iron Sword” for Crafter (17 actions)
Legal move validation: for board games (Chess, Go), invalid moves are rejected with feedback
Custom initial states: MiniGrid environments support JSON-based grid state injection for reproducible scenarios
Crafter support: Custom gymnasium wrapper with configurable render resolution (64x64 to 512x512)
RGB frame rendering: Real-time visualization in the GUI
Episode telemetry: Step count, reward, success/failure, duration
Dual runtime modes: Interactive (worker owns env) and board-game (GUI owns env)
Architecture¶
The worker follows the standard MOSAIC shim pattern with two runtime classes:
graph TB
subgraph "MOSAIC GUI"
RENDER["Render View<br/>(RGB frames)"]
BUTTONS["Action Buttons<br/>(env-specific labels)"]
DAEMON["Operator Launcher"]
end
subgraph "Human Worker Subprocess"
CLI["cli.py<br/>(human-worker)"]
CFG["config.py<br/>(HumanWorkerConfig)"]
IRT["HumanInteractiveRuntime<br/>(env-owning)"]
LRT["HumanWorkerRuntime<br/>(board-game)"]
end
subgraph "Environment"
ENV["Gymnasium / PettingZoo<br/>(MiniGrid, Crafter, Chess...)"]
end
DAEMON -->|"spawn"| CLI
CLI --> CFG
CFG --> IRT
CFG --> LRT
IRT -->|"reset / step"| ENV
IRT -->|"RGB frames"| RENDER
BUTTONS -->|"action click"| IRT
LRT -->|"waiting_for_human"| BUTTONS
style RENDER fill:#4a90d9,stroke:#2e5a87,color:#fff
style BUTTONS fill:#4a90d9,stroke:#2e5a87,color:#fff
style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
style IRT fill:#ff7f50,stroke:#cc5500,color:#fff
style LRT fill:#ff7f50,stroke:#cc5500,color:#fff
style ENV fill:#e8e8e8,stroke:#999
Runtime Modes¶
Interactive mode (worker owns the environment):
Used for grid-world and continuous environments (MiniGrid, BabyAI, Crafter, Classic Control). The worker creates the gymnasium environment, renders frames, and accepts human-chosen actions.
human-worker --mode interactive --run-id game_001 \
--env-name minigrid --task MiniGrid-DoorKey-8x8-v0 --seed 42
Protocol:
{"cmd": "reset", "seed": 42, "env_name": "minigrid", "task": "MiniGrid-DoorKey-8x8-v0"}
{"cmd": "step", "action": 2}
{"cmd": "stop"}
Board-game mode (GUI owns the environment):
Used for PettingZoo turn-based games (Chess, Go, Connect Four). The GUI owns the environment and sends observations with legal moves. The worker displays options and waits for human selection.
human-worker --mode board-game --run-id chess_001 \
--player-name "Alice"
Protocol:
{"cmd": "init_agent", "game_name": "chess_v6", "player_id": "player_0"}
{"cmd": "select_action", "observation": "...", "info": {"legal_moves": ["e2e4", "d2d4"]}}
{"cmd": "human_input", "move": "e2e4", "player_id": "player_0"}
{"cmd": "stop"}
Action Labels¶
The worker provides environment-specific action labels for the GUI’s action buttons:
Environment |
Actions |
Labels |
|---|---|---|
MiniGrid / BabyAI |
7 |
Turn Left, Turn Right, Forward, Pickup, Drop, Toggle, Done |
MosaicMultiGrid |
8 |
Still, Turn Left, Turn Right, Forward, Pickup, Drop, Toggle, Done |
Crafter |
17 |
Noop, Move Left/Right/Up/Down, Do, Sleep, Place Stone/Table/Furnace/Plant, Make Wood/Stone/Iron Pickaxe, Make Wood/Stone/Iron Sword |
NetHack / NLE |
24 |
North, East, South, West, NE, SE, SW, NW, Wait, Kick, Open, Search, … |
FrozenLake |
4 |
Left, Down, Right, Up |
Taxi |
6 |
South, North, East, West, Pickup, Dropoff |
CartPole |
2 |
Push Left, Push Right |
LunarLander |
4 |
Noop, Fire Left, Fire Main, Fire Right |
For unknown environments, generic labels (Action 0, Action 1, …) are
generated automatically.
Configuration¶
CLI arguments:
Argument |
Default |
Description |
|---|---|---|
|
|
|
|
|
Unique run identifier (assigned by GUI) |
|
|
Display name for the human player |
|
|
Environment family (minigrid, babyai, crafter, etc.) |
|
|
Gymnasium environment ID |
|
|
Random seed for environment |
|
|
Render resolution for Crafter (e.g., |
|
|
Timeout for human input in seconds (0 = no timeout) |
|
|
Highlight legal moves in board-game mode |
|
|
Require move confirmation before submitting |
HumanWorkerConfig dataclass:
@dataclass
class HumanWorkerConfig:
run_id: str = ""
player_name: str = "Human"
env_name: str = ""
task: str = ""
render_mode: str = "rgb_array"
seed: int = 42
game_resolution: Tuple[int, int] = (512, 512)
timeout_seconds: float = 0.0
show_legal_moves: bool = True
confirm_moves: bool = False
telemetry_dir: str = "var/telemetry"
Supported Environments¶
Environment Family |
Mode |
Notes |
|---|---|---|
MiniGrid |
Interactive |
All variants; supports custom initial state injection |
BabyAI |
Interactive |
Language-grounded instruction following |
MosaicMultiGrid |
Interactive |
Soccer, Collect, Basketball (multi-agent via heterogeneous operators) |
Crafter |
Interactive |
Custom gymnasium wrapper, configurable render size |
Gymnasium Classic |
Interactive |
CartPole, MountainCar, Acrobot, FrozenLake, Taxi, etc. |
PettingZoo |
Board-game |
Chess, Connect Four, Go, Tic-Tac-Toe (legal move validation) |
NetHack / MiniHack |
Interactive |
Roguelike dungeon crawling |
Test Coverage¶
The Human Worker has 35 tests across 8 test classes:
Test Class |
Tests |
Coverage |
|---|---|---|
TestHumanWorkerConfig |
5 |
Config defaults, custom values, serialization (to_dict/from_dict) |
TestHumanWorkerRuntime |
4 |
Init state, agent init, human input request, move validation |
TestHumanWorkerRuntimeInteractive |
3 |
init_agent command, select_action, human_input processing |
TestWorkerMetadata |
3 |
Metadata values, capabilities (worker_type, paradigms, GPU) |
TestHumanWorkerEdgeCases |
3 |
Empty legal moves, multiple init calls, empty move string |
TestActionLabels |
5 |
MiniGrid, FrozenLake, Taxi, unknown env, label truncation |
TestHumanWorkerConfigNew |
3 |
Environment config fields, serialization with env fields |
TestHumanInteractiveRuntime(WithEnv) |
9 |
Import, initial state, emit, reset with MiniGrid, step, invalid action, step without reset |