Observability¶

MOSAIC supports two external experiment-tracking backends for logging metrics, hyperparameters, and artefacts beyond the built-in structured logging pipeline: TensorBoard and Weights & Biases (W&B).

Both integrations are optional. Training runs work without them; enabling either backend adds richer metric visualisation and, in the case of W&B, cloud-hosted experiment comparison.

Backend	Mode	Best for
TensorBoard	Local	Quick scalar plots, lightweight, no account required
Weights & Biases	Cloud / local	Experiment comparison, artefact versioning, team sharing

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    subgraph Workers["MOSAIC Workers"]
        CRL["CleanRL Worker"]
        XU["XuanCe Worker"]
        RLL["RLlib Worker"]
    end

    subgraph Metrics["Logged Metrics"]
        M1["episodic_return"]
        M2["policy_loss / value_loss"]
        M3["entropy / learning_rate"]
        M4["SPS (steps/sec)"]
    end

    subgraph TB["TensorBoard (Local)"]
        TBD["Dashboard :6006<br/>var/trainer/runs/&lt;run_id&gt;/tensorboard/"]
    end

    subgraph WB["Weights & Biases (Cloud)"]
        WBD["Project Dashboard<br/>run_name = MOSAIC ULID<br/>artefacts + system stats"]
    end

    CRL & XU & RLL --> M1 & M2 & M3 & M4
    M1 & M2 & M3 & M4 --> TBD
    M1 & M2 & M3 & M4 --> WBD

    style Workers fill:#e8f5e9,stroke:#2e7d32,color:#333
    style Metrics fill:#fff3e0,stroke:#e65100,color:#333
    style TB fill:#e3f2fd,stroke:#1565c0,color:#333
    style WB fill:#fce4ec,stroke:#c62828,color:#333

Both backends receive the same metrics from worker training loops. TensorBoard writes summaries to local disk; W&B streams them to a cloud dashboard (or logs offline for later sync).

TensorBoard¶

TensorBoard is a browser-based dashboard for visualising training scalars (reward, loss, entropy), histograms, and media. MOSAIC workers write TensorBoard summaries via torch.utils.tensorboard.SummaryWriter.

Installation¶

TensorBoard is included in MOSAIC’s base dependencies. Verify it is available:

source .venv/bin/activate
python -c "import tensorboard; print(tensorboard.__version__)"

If missing:

pip install tensorboard

Enabling per Worker¶

Each worker writes summaries to a run-specific subdirectory under var/trainer/runs/<run_id>/tensorboard/. The path is controlled by the worker config:

CleanRL:

--track --tensorboard-log var/trainer/runs

XuanCe (via YAML or GUI extras):

logger: tensorboard
log_dir: var/trainer/runs

Ray RLlib (via extras in the run config):

{
  "tensorboard_dir": "var/trainer/runs"
}

Launching the Dashboard¶

source .venv/bin/activate
tensorboard --logdir var/trainer/runs --port 6006

Then open http://localhost:6006 in a browser. All runs under var/trainer/runs/ appear as separate experiment entries.

For WSL users, forward the port from WSL to Windows:

tensorboard --logdir var/trainer/runs --host 0.0.0.0 --port 6006

Then open http://localhost:6006 from the Windows browser.

Key Metrics Logged¶

Tag	Description
`charts/episodic_return`	Mean episode return per rollout
`charts/episodic_length`	Mean episode length
`losses/policy_loss`	Policy gradient loss (PPO clip loss)
`losses/value_loss`	Value function MSE loss
`losses/entropy`	Policy entropy (exploration signal)
`charts/learning_rate`	Current learning rate (with schedule)
`charts/SPS`	Environment steps per second

Weights and Biases¶

Weights and Biases (W&B) is a cloud experiment-tracking platform. It stores runs, metrics, system stats, model artefacts, and allows side-by-side comparison of experiments across machines and team members.

Installation¶

source .venv/bin/activate
pip install wandb

Authenticate once per machine:

wandb login

This stores a credentials token in ~/.netrc. For offline / air-gapped environments see the Offline Mode section below.

Enabling per Worker¶

CleanRL:

--track --wandb-project mosaic --wandb-entity <your-team>

Or set environment variables before launching:

export WANDB_PROJECT=mosaic
export WANDB_ENTITY=<your-team>

XuanCe (via YAML or GUI extras):

logger: wandb
wandb_project: mosaic
wandb_entity: <your-team>

Ray RLlib (via extras):

{
  "wandb_project": "mosaic",
  "wandb_entity": "<your-team>"
}

Run Naming¶

MOSAIC passes the run ULID as the W&B run_name so that W&B runs map one-to-one to MOSAIC run IDs:

wandb.init(
    project=config.wandb_project,
    name=config.run_id,          # MOSAIC ULID
    config=config.to_dict(),
)

This makes it straightforward to cross-reference a W&B dashboard entry with the corresponding checkpoint in var/trainer/runs/<run_id>/.

Offline Mode¶

For machines without internet access, W&B can log locally and sync later:

export WANDB_MODE=offline

# After the run completes, sync to the cloud:
wandb sync var/wandb/offline-run-*

Disabling W&B¶

To suppress all W&B output without removing the flag from the config:

export WANDB_DISABLED=true

Or pass --no-track / set logger: tensorboard in the worker YAML.

Proxy Configuration¶

If your network requires a proxy, set the W&B proxy environment variables before launching a training run:

export WANDB_VPN_HTTPS_PROXY=https://<proxy-host>:<port>
export WANDB_VPN_HTTP_PROXY=http://<proxy-host>:<port>

Observability¶

TensorBoard¶

Installation¶

Enabling per Worker¶

Launching the Dashboard¶

Key Metrics Logged¶

Weights and Biases¶

Installation¶

Enabling per Worker¶

Run Naming¶

Offline Mode¶

Disabling W&B¶

Proxy Configuration¶

See Also¶