XuanCe Worker¶
The XuanCe worker is MOSAIC’s multi-agent and comprehensive RL integration. It wraps the XuanCe library. A unified deep RL library with 46+ algorithms across single-agent, multi-agent, and offline RL behind the standard shim pattern, adding subprocess isolation, FastLane telemetry, curriculum learning, and GUI configuration.
Paradigm |
Single-agent, Multi-agent (parameter sharing, independent) |
Algorithms |
46+ including PPO, DQN, SAC, MAPPO, QMIX, MADDPG, VDN, COMA |
Backends |
PyTorch (primary), TensorFlow, MindSpore |
Environments |
Gymnasium, Atari, MuJoCo, PettingZoo, SMAC, MPE, MultiGrid, MultiGrid Soccer |
Execution |
In-process (single OS process, vectorized environments) |
GPU required |
No (optional CUDA acceleration) |
Source |
|
Note
Known Import Issue: The mpi4py package triggers MPI_Init() at
import time, which blocks indefinitely outside an MPI launch environment.
The worker sets MPI4PY_RC_INITIALIZE=0 to suppress this. If you see
the worker hanging on startup, verify this environment variable is set.
Multi-agent runs that require MPI (e.g. SMAC on HPC clusters) must be
launched via mpirun.
Architecture¶
graph TB
subgraph "MOSAIC GUI"
FORM["Training Form<br/>(XuanCe widgets)"]
DAEMON["Trainer Daemon"]
end
subgraph "Worker Process"
CLI["cli.py<br/>entry point"]
CFG["config.py<br/>XuanCeWorkerConfig"]
RT["runtime.py<br/>XuanCeWorkerRuntime"]
FL["fastlane.py<br/>FastLane telemetry"]
SITE["sitecustomize.py<br/>import-time patches"]
AR["algorithm_registry.py<br/>Backend / Paradigm index"]
SHIMS["xuance_shims.py<br/>path + dir redirects"]
end
subgraph "Upstream XuanCe"
RUNNER["RunnerDRL / RunnerMARL<br/>RunnerPettingzoo<br/>(unmodified)"]
end
FORM -->|"config JSON"| DAEMON
DAEMON -->|"spawn"| CLI
CLI --> CFG --> RT
RT --> AR
RT --> RUNNER
SITE -.->|"import-time patches"| RUNNER
SHIMS -.->|"redirect logs/checkpoints to var/"| RUNNER
FL -.->|"shared-memory frames"| DAEMON
style FORM fill:#4a90d9,stroke:#2e5a87,color:#fff
style DAEMON fill:#50c878,stroke:#2e8b57,color:#fff
style CLI fill:#ff7f50,stroke:#cc5500,color:#fff
style CFG fill:#ff7f50,stroke:#cc5500,color:#fff
style RT fill:#ff7f50,stroke:#cc5500,color:#fff
style FL fill:#ff7f50,stroke:#cc5500,color:#fff
style SITE fill:#ff7f50,stroke:#cc5500,color:#fff
style AR fill:#ff7f50,stroke:#cc5500,color:#fff
style SHIMS fill:#ff7f50,stroke:#cc5500,color:#fff
style RUNNER fill:#e8e8e8,stroke:#999
Lifecycle of a training run:
The GUI form builds a config JSON and hands it to the Trainer Daemon.
The daemon spawns
python -m xuance_worker.cli --config <path>.cli.pyloads the config intoXuanCeWorkerConfigand delegates toXuanCeWorkerRuntime.xuance_shims.pyredirects XuanCe’s hardcoded output paths (logs, checkpoints, TensorBoard) into MOSAIC’svar/directory.runtime.pycallsxuance.get_runner()with the resolved algorithm, environment family, and parser args.XuanCe’s runner (
RunnerDRLfor single-agent,RunnerMARLfor multi-agent) executes the training loop.FastLane telemetry streams render frames to the GUI via shared memory.
Supported Algorithms¶
Algorithms are indexed in algorithm_registry.py by Backend and
Paradigm. The table below shows the primary families:
Family |
Algorithms |
Paradigm |
Notes |
|---|---|---|---|
Policy Gradient |
PPO, A2C, A3C, PG, PDPG |
Single-agent |
Stable on-policy training |
Q-Learning |
DQN, DDQN, Dueling DQN, NoisyDQN, PerDQN, C51, QRDQN |
Single-agent |
Discrete action spaces |
Actor-Critic (continuous) |
SAC, TD3, DDPG, MASAC |
Single-agent |
Continuous control |
Model-Based |
DreamerV3 |
Single-agent |
World-model planning |
Cooperative MARL |
Multi-agent |
CTDE paradigm |
|
Competitive MARL |
MAPPO (self-play), MADDPG |
Multi-agent |
Adversarial training |
Runners¶
XuanCe selects the training runner based on the environment family:
Runner |
Environment Family |
Use Case |
|---|---|---|
|
|
Standard single-agent Gymnasium environments |
|
|
Cooperative multi-agent (CTDE algorithms) |
|
|
PettingZoo AEC and parallel API environments |
|
|
StarCraft Multi-Agent Challenge (requires SC2 installation) |
|
|
Google Research Football |
Configuration¶
The XuanCeWorkerConfig dataclass (config.py) is the single source
of truth for all run parameters:
@dataclass
class XuanCeWorkerConfig:
run_id: str # ULID-format unique run identifier
method: str # Algorithm name ("ppo", "mappo", "qmix", ...)
env: str # Environment family ("classic_control", "multigrid", ...)
env_id: str # Specific env ID ("CartPole-v1", "soccer_1vs1", ...)
dl_toolbox: str # Backend: "torch" (default), "tensorflow", "mindspore"
running_steps: int # Total training timesteps (default: 1_000_000)
seed: int | None # Random seed (None = random)
device: str # "cpu" or "cuda:0"
parallels: int # Number of parallel environments (default: 8)
test_mode: bool # True = evaluation mode (load checkpoint, no training)
config_path: str | None # Custom YAML config (None = XuanCe defaults)
extras: dict # Algorithm-specific overrides
Key extras fields:
training_mode:"cooperative"or"competitive"(for MARL)curriculum_schedule: list of{"env_id": ..., "steps": ...}dictstensorboard_dir: relative path for TensorBoard logscheckpoint_dir: relative path for model checkpointsnum_envs: alias forparallels(used by some MARL algorithms)
Curriculum Training¶
The XuanCe worker supports single-process curriculum training via
multi_agent_curriculum_training.py. Unlike the two-process approach,
it hot-swaps environments in memory, preserving the Adam optimizer
momentum and learning-rate schedule across phases.
{
"curriculum_schedule": [
{"env_id": "collect_1vs1", "steps": 1000000},
{"env_id": "soccer_1vs1", "steps": 4000000}
]
}
Both environments must share the same observation and action spaces so the network architecture requires no modification between phases.
Multi-Agent Configuration¶
For MARL algorithms (MAPPO, QMIX, etc.), two key choices affect the checkpoint format and deployment:
Parameter Sharing (use_parameter_sharing=True) — see MARL Book ch. 5:
All agents share one policy network. The network input is
obs_dim + n_agents (a one-hot agent identity is appended).
This is more sample-efficient for symmetric games but creates a
dimension dependency on n_agents at inference time.
Independent Networks (use_parameter_sharing=False):
Each agent has its own separate policy. Network input is obs_dim
only. Checkpoints are fully self-contained and can be loaded for
any agent slot without configuration.
Warning
If you train with parameter sharing on a 1v1 environment
(n_agents=2) and then deploy in a 2v2 environment
(n_agents=4), the actor’s first linear layer will have an
input dimension mismatch (obs+2 vs obs+4). Either train
with use_parameter_sharing=False, or bypass
agent.action() and construct the one-hot manually at inference.
FastLane Telemetry¶
FastLane streams render frames from the training process to the MOSAIC GUI via shared memory. Environment variables controlling behaviour:
GYM_GUI_FASTLANE_ONLY:1to stream,0to disableGYM_GUI_FASTLANE_SLOT: which parallel env index to probeGYM_GUI_FASTLANE_VIDEO_MODE:"single"or"grid"GYM_GUI_FASTLANE_GRID_LIMIT: max envs to tile in grid mode
GUI Integration¶
The XuanCe worker provides two form widgets in gym_gui/ui/widgets/:
Form |
Purpose |
|---|---|
|
Primary training dialog. Algorithm and environment family selection, deep learning backend toggle (PyTorch / TensorFlow / MindSpore), hyperparameter configuration, FastLane and TensorBoard settings. |
|
Custom shell script launcher for multi-phase curriculum runs.
Reads |
Worker Discovery¶
The worker registers itself via the mosaic.workers entry point in
pyproject.toml:
[project.entry-points."mosaic.workers"]
xuance = "xuance_worker:get_worker_metadata"
get_worker_metadata() returns a WorkerCapabilities descriptor
advertising support for up to 100 agents, discrete and continuous action
spaces, and the multigrid, smac, mpe, and pettingzoo
environment families.