MOSAIC Human Worker



The MOSAIC Human Worker enables human-in-the-loop which is essentially to control the agent via keyboard for any Gymnasium-compatible environment. It bridges human decision-making with MOSAIC’s multi-agent evaluation framework, allowing researchers to play alongside or against RL, LLM, and random agents.

Important: The human owns the agent for the entire episode. Once an operator is configured with a human worker, that agent slot is controlled exclusively by the human. There is no automatic switching to an AI policy mid-execution. The human remains in control from reset to episode end, ensuring clean and comparable evaluation data.

The worker operates in two modes: interactive mode where the worker owns the environment and the GUI sends human-chosen actions via action buttons, and board-game mode where the GUI owns the environment (PettingZoo games) and the worker handles move selection with legal-move validation.

Paradigm

Human-in-the-loop (single-agent and multi-agent)

Task Type

Human vs AI, human + AI cooperative, human baseline evaluation

Modes

interactive (env-owning), board-game (action-selector)

Environments

MiniGrid, BabyAI, MosaicMultiGrid, Crafter, MiniHack/NetHack, PettingZoo (Chess, Go, Connect Four), Gymnasium Classic Control

Execution

Subprocess (interactive step-by-step via GUI)

GPU required

No

Source

3rd_party/workers/mosaic/human_worker/human_worker/

Entry point

human-worker (CLI)

Overview

The MOSAIC Human Worker turns any MOSAIC environment into a playable game. The GUI displays rendered frames, action buttons with environment-specific labels, and episode statistics. The human clicks an action, the worker steps the environment, and the next frame appears.

This enables several research workflows:

  • Human baseline: Establish human-level performance benchmarks for comparison against RL and LLM agents.

  • Human-AI teams (Cooperation): Deploy a human teammate alongside RL or LLM agents in cooperative multi-agent environments (Soccer 2v2, Overcooked).

  • Human-AI Adversarial (Competition): Deploy human players against trained RL policies or LLM agents in competitive environments.

  • Environment debugging: Manually explore environments to understand dynamics, test reward functions, and verify rendering.

Key features:

  • Environment-aware action labels: “Turn Left”, “Forward”, “Pickup” for MiniGrid; “Push Left”, “Push Right” for CartPole; “Noop”…”Make Iron Sword” for Crafter (17 actions)

  • Legal move validation: for board games (Chess, Go), invalid moves are rejected with feedback

  • Custom initial states: MiniGrid environments support JSON-based grid state injection for reproducible scenarios

  • Crafter support: Custom gymnasium wrapper with configurable render resolution (64x64 to 512x512)

  • RGB frame rendering: Real-time visualization in the GUI

  • Episode telemetry: Step count, reward, success/failure, duration

  • Dual runtime modes: Interactive (worker owns env) and board-game (GUI owns env)

Architecture

The worker follows the standard MOSAIC shim pattern with two runtime classes:

        graph TB
    subgraph "MOSAIC GUI"
        RENDER["Render View<br/>(RGB frames)"]
        BUTTONS["Action Buttons<br/>(env-specific labels)"]
        DAEMON["Operator Launcher"]
    end

    subgraph "Human Worker Subprocess"
        CLI["cli.py<br/>(human-worker)"]
        CFG["config.py<br/>(HumanWorkerConfig)"]
        IRT["HumanInteractiveRuntime<br/>(env-owning)"]
        LRT["HumanWorkerRuntime<br/>(board-game)"]
    end

    subgraph "Environment"
        ENV["Gymnasium / PettingZoo<br/>(MiniGrid, Crafter, Chess...)"]
    end

    DAEMON -->|"spawn"| CLI
    CLI --> CFG
    CFG --> IRT
    CFG --> LRT
    IRT -->|"reset / step"| ENV
    IRT -->|"RGB frames"| RENDER
    BUTTONS -->|"action click"| IRT
    LRT -->|"waiting_for_human"| BUTTONS

    style RENDER fill:#4a90d9,stroke:#2e5a87,color:#fff
    style BUTTONS fill:#4a90d9,stroke:#2e5a87,color:#fff
    style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
    style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
    style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
    style IRT fill:#ff7f50,stroke:#cc5500,color:#fff
    style LRT fill:#ff7f50,stroke:#cc5500,color:#fff
    style ENV fill:#e8e8e8,stroke:#999
    

Runtime Modes

Interactive mode (worker owns the environment):

Used for grid-world and continuous environments (MiniGrid, BabyAI, Crafter, Classic Control). The worker creates the gymnasium environment, renders frames, and accepts human-chosen actions.

human-worker --mode interactive --run-id game_001 \
    --env-name minigrid --task MiniGrid-DoorKey-8x8-v0 --seed 42

Protocol:

{"cmd": "reset", "seed": 42, "env_name": "minigrid", "task": "MiniGrid-DoorKey-8x8-v0"}
{"cmd": "step", "action": 2}
{"cmd": "stop"}

Board-game mode (GUI owns the environment):

Used for PettingZoo turn-based games (Chess, Go, Connect Four). The GUI owns the environment and sends observations with legal moves. The worker displays options and waits for human selection.

human-worker --mode board-game --run-id chess_001 \
    --player-name "Alice"

Protocol:

{"cmd": "init_agent", "game_name": "chess_v6", "player_id": "player_0"}
{"cmd": "select_action", "observation": "...", "info": {"legal_moves": ["e2e4", "d2d4"]}}
{"cmd": "human_input", "move": "e2e4", "player_id": "player_0"}
{"cmd": "stop"}

Action Labels

The worker provides environment-specific action labels for the GUI’s action buttons:

Environment

Actions

Labels

MiniGrid / BabyAI

7

Turn Left, Turn Right, Forward, Pickup, Drop, Toggle, Done

MosaicMultiGrid

8

Still, Turn Left, Turn Right, Forward, Pickup, Drop, Toggle, Done

Crafter

17

Noop, Move Left/Right/Up/Down, Do, Sleep, Place Stone/Table/Furnace/Plant, Make Wood/Stone/Iron Pickaxe, Make Wood/Stone/Iron Sword

NetHack / NLE

24

North, East, South, West, NE, SE, SW, NW, Wait, Kick, Open, Search, …

FrozenLake

4

Left, Down, Right, Up

Taxi

6

South, North, East, West, Pickup, Dropoff

CartPole

2

Push Left, Push Right

LunarLander

4

Noop, Fire Left, Fire Main, Fire Right

For unknown environments, generic labels (Action 0, Action 1, …) are generated automatically.

Configuration

CLI arguments:

Argument

Default

Description

--mode

interactive

interactive (env-owning) or board-game (action-selector)

--run-id

""

Unique run identifier (assigned by GUI)

--player-name

"Human"

Display name for the human player

--env-name

""

Environment family (minigrid, babyai, crafter, etc.)

--task

""

Gymnasium environment ID

--seed

42

Random seed for environment

--game-resolution

512x512

Render resolution for Crafter (e.g., 64x64, 512x512)

--timeout

0.0

Timeout for human input in seconds (0 = no timeout)

--show-legal-moves

true

Highlight legal moves in board-game mode

--confirm-moves

false

Require move confirmation before submitting

HumanWorkerConfig dataclass:

@dataclass
class HumanWorkerConfig:
    run_id: str = ""
    player_name: str = "Human"
    env_name: str = ""
    task: str = ""
    render_mode: str = "rgb_array"
    seed: int = 42
    game_resolution: Tuple[int, int] = (512, 512)
    timeout_seconds: float = 0.0
    show_legal_moves: bool = True
    confirm_moves: bool = False
    telemetry_dir: str = "var/telemetry"

Supported Environments

Environment Family

Mode

Notes

MiniGrid

Interactive

All variants; supports custom initial state injection

BabyAI

Interactive

Language-grounded instruction following

MosaicMultiGrid

Interactive

Soccer, Collect, Basketball (multi-agent via heterogeneous operators)

Crafter

Interactive

Custom gymnasium wrapper, configurable render size

Gymnasium Classic

Interactive

CartPole, MountainCar, Acrobot, FrozenLake, Taxi, etc.

PettingZoo

Board-game

Chess, Connect Four, Go, Tic-Tac-Toe (legal move validation)

NetHack / MiniHack

Interactive

Roguelike dungeon crawling

Test Coverage

The Human Worker has 35 tests across 8 test classes:

Test Class

Tests

Coverage

TestHumanWorkerConfig

5

Config defaults, custom values, serialization (to_dict/from_dict)

TestHumanWorkerRuntime

4

Init state, agent init, human input request, move validation

TestHumanWorkerRuntimeInteractive

3

init_agent command, select_action, human_input processing

TestWorkerMetadata

3

Metadata values, capabilities (worker_type, paradigms, GPU)

TestHumanWorkerEdgeCases

3

Empty legal moves, multiple init calls, empty move string

TestActionLabels

5

MiniGrid, FrozenLake, Taxi, unknown env, label truncation

TestHumanWorkerConfigNew

3

Environment config fields, serialization with env fields

TestHumanInteractiveRuntime(WithEnv)

9

Import, initial state, emit, reset with MiniGrid, step, invalid action, step without reset