XuanCe Worker¶

The XuanCe worker is MOSAIC’s multi-agent and comprehensive RL integration. It wraps the XuanCe library. A unified deep RL library with 46+ algorithms across single-agent, multi-agent, and offline RL behind the standard shim pattern, adding subprocess isolation, FastLane telemetry, curriculum learning, and GUI configuration.

Paradigm	Single-agent, Multi-agent (parameter sharing, independent)
Algorithms	46+ including PPO, DQN, SAC, MAPPO, QMIX, MADDPG, VDN, COMA
Backends	PyTorch (primary), TensorFlow, MindSpore
Environments	Gymnasium, Atari, MuJoCo, PettingZoo, SMAC, MPE, MultiGrid, MultiGrid Soccer
Execution	In-process (single OS process, vectorized environments)
GPU required	No (optional CUDA acceleration)
Source	`3rd_party/xuance_worker/xuance_worker/`

Note

Known Import Issue: The mpi4py package triggers MPI_Init() at import time, which blocks indefinitely outside an MPI launch environment. The worker sets MPI4PY_RC_INITIALIZE=0 to suppress this. If you see the worker hanging on startup, verify this environment variable is set. Multi-agent runs that require MPI (e.g. SMAC on HPC clusters) must be launched via mpirun.

Architecture¶

        graph TB
    subgraph "MOSAIC GUI"
        FORM["Training Form<br/>(XuanCe widgets)"]
        DAEMON["Trainer Daemon"]
    end

    subgraph "Worker Process"
        CLI["cli.py<br/>entry point"]
        CFG["config.py<br/>XuanCeWorkerConfig"]
        RT["runtime.py<br/>XuanCeWorkerRuntime"]
        FL["fastlane.py<br/>FastLane telemetry"]
        SITE["sitecustomize.py<br/>import-time patches"]
        AR["algorithm_registry.py<br/>Backend / Paradigm index"]
        SHIMS["xuance_shims.py<br/>path + dir redirects"]
    end

    subgraph "Upstream XuanCe"
        RUNNER["RunnerDRL / RunnerMARL<br/>RunnerPettingzoo<br/>(unmodified)"]
    end

    FORM -->|"config JSON"| DAEMON
    DAEMON -->|"spawn"| CLI
    CLI --> CFG --> RT
    RT --> AR
    RT --> RUNNER
    SITE -.->|"import-time patches"| RUNNER
    SHIMS -.->|"redirect logs/checkpoints to var/"| RUNNER
    FL -.->|"shared-memory frames"| DAEMON

    style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
    style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
    style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
    style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
    style RT fill:#ff7f50,stroke:#cc5500,color:#fff
    style FL fill:#ff7f50,stroke:#cc5500,color:#fff
    style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
    style AR fill:#ff7f50,stroke:#cc5500,color:#fff
    style SHIMS fill:#ff7f50,stroke:#cc5500,color:#fff
    style RUNNER fill:#e8e8e8,stroke:#999

Lifecycle of a training run:

The GUI form builds a config JSON and hands it to the Trainer Daemon.
The daemon spawns python -m xuance_worker.cli --config <path>.
cli.py loads the config into XuanCeWorkerConfig and delegates to XuanCeWorkerRuntime.
xuance_shims.py redirects XuanCe’s hardcoded output paths (logs, checkpoints, TensorBoard) into MOSAIC’s var/ directory.
runtime.py calls xuance.get_runner() with the resolved algorithm, environment family, and parser args.
XuanCe’s runner (RunnerDRL for single-agent, RunnerMARL for multi-agent) executes the training loop.
FastLane telemetry streams render frames to the GUI via shared memory.

Supported Algorithms¶

Algorithms are indexed in algorithm_registry.py by Backend and Paradigm. The table below shows the primary families:

Family	Algorithms	Paradigm	Notes
Policy Gradient	PPO, A2C, A3C, PG, PDPG	Single-agent	Stable on-policy training
Q-Learning	DQN, DDQN, Dueling DQN, NoisyDQN, PerDQN, C51, QRDQN	Single-agent	Discrete action spaces
Actor-Critic (continuous)	SAC, TD3, DDPG, MASAC	Single-agent	Continuous control
Model-Based	DreamerV3	Single-agent	World-model planning
Cooperative MARL	MAPPO, QMIX, VDN, COMA, MADDPG, IDDPG	Multi-agent	CTDE paradigm
Competitive MARL	MAPPO (self-play), MADDPG	Multi-agent	Adversarial training

Runners¶

XuanCe selects the training runner based on the environment family:

Runner	Environment Family	Use Case
`RunnerDRL`	`classic_control`, `atari`, `mujoco`, `minigrid`	Standard single-agent Gymnasium environments
`RunnerMARL`	`multigrid`, `smac`	Cooperative multi-agent (CTDE algorithms)
`RunnerPettingzoo`	`mpe`, `pettingzoo`	PettingZoo AEC and parallel API environments
`RunnerStarCraft2`	`smac`	StarCraft Multi-Agent Challenge (requires SC2 installation)
`RunnerFootball`	`football`	Google Research Football

Configuration¶

The XuanCeWorkerConfig dataclass (config.py) is the single source of truth for all run parameters:

@dataclass
class XuanCeWorkerConfig:
    run_id: str           # ULID-format unique run identifier
    method: str           # Algorithm name ("ppo", "mappo", "qmix", ...)
    env: str              # Environment family ("classic_control", "multigrid", ...)
    env_id: str           # Specific env ID ("CartPole-v1", "soccer_1vs1", ...)
    dl_toolbox: str       # Backend: "torch" (default), "tensorflow", "mindspore"
    running_steps: int    # Total training timesteps (default: 1_000_000)
    seed: int | None      # Random seed (None = random)
    device: str           # "cpu" or "cuda:0"
    parallels: int        # Number of parallel environments (default: 8)
    test_mode: bool       # True = evaluation mode (load checkpoint, no training)
    config_path: str | None  # Custom YAML config (None = XuanCe defaults)
    extras: dict          # Algorithm-specific overrides

Key extras fields:

training_mode: "cooperative" or "competitive" (for MARL)
curriculum_schedule: list of {"env_id": ..., "steps": ...} dicts
tensorboard_dir: relative path for TensorBoard logs
checkpoint_dir: relative path for model checkpoints
num_envs: alias for parallels (used by some MARL algorithms)

Curriculum Training¶

The XuanCe worker supports single-process curriculum training via multi_agent_curriculum_training.py. Unlike the two-process approach, it hot-swaps environments in memory, preserving the Adam optimizer momentum and learning-rate schedule across phases.

{
  "curriculum_schedule": [
    {"env_id": "collect_1vs1", "steps": 1000000},
    {"env_id": "soccer_1vs1",  "steps": 4000000}
  ]
}

Both environments must share the same observation and action spaces so the network architecture requires no modification between phases.

Multi-Agent Configuration¶

For MARL algorithms (MAPPO, QMIX, etc.), two key choices affect the checkpoint format and deployment:

Parameter Sharing (use_parameter_sharing=True) — see MARL Book ch. 5:

All agents share one policy network. The network input is obs_dim + n_agents (a one-hot agent identity is appended). This is more sample-efficient for symmetric games but creates a dimension dependency on n_agents at inference time.

Independent Networks (use_parameter_sharing=False):

Each agent has its own separate policy. Network input is obs_dim only. Checkpoints are fully self-contained and can be loaded for any agent slot without configuration.

Warning

If you train with parameter sharing on a 1v1 environment (n_agents=2) and then deploy in a 2v2 environment (n_agents=4), the actor’s first linear layer will have an input dimension mismatch (obs+2 vs obs+4). Either train with use_parameter_sharing=False, or bypass agent.action() and construct the one-hot manually at inference.

FastLane Telemetry¶

FastLane streams render frames from the training process to the MOSAIC GUI via shared memory. Environment variables controlling behaviour:

GYM_GUI_FASTLANE_ONLY: 1 to stream, 0 to disable
GYM_GUI_FASTLANE_SLOT: which parallel env index to probe
GYM_GUI_FASTLANE_VIDEO_MODE: "single" or "grid"
GYM_GUI_FASTLANE_GRID_LIMIT: max envs to tile in grid mode

GUI Integration¶

The XuanCe worker provides two form widgets in gym_gui/ui/widgets/:

Form	Purpose
`xuance_train_form.py`	Primary training dialog. Algorithm and environment family selection, deep learning backend toggle (PyTorch / TensorFlow / MindSpore), hyperparameter configuration, FastLane and TensorBoard settings.
`xuance_script_form.py`	Custom shell script launcher for multi-phase curriculum runs. Reads `*.sh` scripts with inline `@description`, `@phases`, and `@total_timesteps` metadata.

Worker Discovery¶

The worker registers itself via the mosaic.workers entry point in pyproject.toml:

[project.entry-points."mosaic.workers"]
xuance = "xuance_worker:get_worker_metadata"

get_worker_metadata() returns a WorkerCapabilities descriptor advertising support for up to 100 agents, discrete and continuous action spaces, and the multigrid, smac, mpe, and pettingzoo environment families.