Observability¶
MOSAIC supports two external experiment-tracking backends for logging metrics, hyperparameters, and artefacts beyond the built-in structured logging pipeline: TensorBoard and Weights & Biases (W&B).
Both integrations are optional. Training runs work without them; enabling either backend adds richer metric visualisation and, in the case of W&B, cloud-hosted experiment comparison.
Backend |
Mode |
Best for |
|---|---|---|
TensorBoard |
Local |
Quick scalar plots, lightweight, no account required |
Weights & Biases |
Cloud / local |
Experiment comparison, artefact versioning, team sharing |
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
subgraph Workers["MOSAIC Workers"]
CRL["CleanRL Worker"]
XU["XuanCe Worker"]
RLL["RLlib Worker"]
end
subgraph Metrics["Logged Metrics"]
M1["episodic_return"]
M2["policy_loss / value_loss"]
M3["entropy / learning_rate"]
M4["SPS (steps/sec)"]
end
subgraph TB["TensorBoard (Local)"]
TBD["Dashboard :6006<br/>var/trainer/runs/<run_id>/tensorboard/"]
end
subgraph WB["Weights & Biases (Cloud)"]
WBD["Project Dashboard<br/>run_name = MOSAIC ULID<br/>artefacts + system stats"]
end
CRL & XU & RLL --> M1 & M2 & M3 & M4
M1 & M2 & M3 & M4 --> TBD
M1 & M2 & M3 & M4 --> WBD
style Workers fill:#e8f5e9,stroke:#2e7d32,color:#333
style Metrics fill:#fff3e0,stroke:#e65100,color:#333
style TB fill:#e3f2fd,stroke:#1565c0,color:#333
style WB fill:#fce4ec,stroke:#c62828,color:#333
Both backends receive the same metrics from worker training loops. TensorBoard writes summaries to local disk; W&B streams them to a cloud dashboard (or logs offline for later sync).
TensorBoard¶
TensorBoard is a browser-based
dashboard for visualising training scalars (reward, loss, entropy), histograms,
and media. MOSAIC workers write TensorBoard summaries via
torch.utils.tensorboard.SummaryWriter.
Installation¶
TensorBoard is included in MOSAIC’s base dependencies. Verify it is available:
source .venv/bin/activate
python -c "import tensorboard; print(tensorboard.__version__)"
If missing:
pip install tensorboard
Enabling per Worker¶
Each worker writes summaries to a run-specific subdirectory under
var/trainer/runs/<run_id>/tensorboard/. The path is controlled by the
worker config:
CleanRL:
--track --tensorboard-log var/trainer/runs
XuanCe (via YAML or GUI extras):
logger: tensorboard
log_dir: var/trainer/runs
Ray RLlib (via extras in the run config):
{
"tensorboard_dir": "var/trainer/runs"
}
Launching the Dashboard¶
source .venv/bin/activate
tensorboard --logdir var/trainer/runs --port 6006
Then open http://localhost:6006 in a browser. All runs under
var/trainer/runs/ appear as separate experiment entries.
For WSL users, forward the port from WSL to Windows:
tensorboard --logdir var/trainer/runs --host 0.0.0.0 --port 6006
Then open http://localhost:6006 from the Windows browser.
Key Metrics Logged¶
Tag |
Description |
|---|---|
|
Mean episode return per rollout |
|
Mean episode length |
|
Policy gradient loss (PPO clip loss) |
|
Value function MSE loss |
|
Policy entropy (exploration signal) |
|
Current learning rate (with schedule) |
|
Environment steps per second |
Weights and Biases¶
Weights and Biases (W&B) is a cloud experiment-tracking platform. It stores runs, metrics, system stats, model artefacts, and allows side-by-side comparison of experiments across machines and team members.
Installation¶
source .venv/bin/activate
pip install wandb
Authenticate once per machine:
wandb login
This stores a credentials token in ~/.netrc. For offline / air-gapped
environments see the Offline Mode section below.
Enabling per Worker¶
CleanRL:
--track --wandb-project mosaic --wandb-entity <your-team>
Or set environment variables before launching:
export WANDB_PROJECT=mosaic
export WANDB_ENTITY=<your-team>
XuanCe (via YAML or GUI extras):
logger: wandb
wandb_project: mosaic
wandb_entity: <your-team>
Ray RLlib (via extras):
{
"wandb_project": "mosaic",
"wandb_entity": "<your-team>"
}
Run Naming¶
MOSAIC passes the run ULID as the W&B run_name so that W&B runs map
one-to-one to MOSAIC run IDs:
wandb.init(
project=config.wandb_project,
name=config.run_id, # MOSAIC ULID
config=config.to_dict(),
)
This makes it straightforward to cross-reference a W&B dashboard entry with
the corresponding checkpoint in var/trainer/runs/<run_id>/.
Offline Mode¶
For machines without internet access, W&B can log locally and sync later:
export WANDB_MODE=offline
# After the run completes, sync to the cloud:
wandb sync var/wandb/offline-run-*
Disabling W&B¶
To suppress all W&B output without removing the flag from the config:
export WANDB_DISABLED=true
Or pass --no-track / set logger: tensorboard in the worker YAML.
Proxy Configuration¶
If your network requires a proxy, set the W&B proxy environment variables before launching a training run:
export WANDB_VPN_HTTPS_PROXY=https://<proxy-host>:<port>
export WANDB_VPN_HTTP_PROXY=http://<proxy-host>:<port>
See Also¶
Structured Logging: MOSAIC’s internal structured log pipeline (
LogConstant, filters, rotating file handlers).Fast Lane: real-time frame streaming from worker processes to the GUI, separate from metric logging.
Application Constants: numeric defaults that govern queue sizes and backpressure thresholds in the rendering subsystem.