Observability

MOSAIC supports two external experiment-tracking backends for logging metrics, hyperparameters, and artefacts beyond the built-in structured logging pipeline: TensorBoard and Weights & Biases (W&B).

Both integrations are optional. Training runs work without them; enabling either backend adds richer metric visualisation and, in the case of W&B, cloud-hosted experiment comparison.

Backend

Mode

Best for

TensorBoard

Local

Quick scalar plots, lightweight, no account required

Weights & Biases

Cloud / local

Experiment comparison, artefact versioning, team sharing

        %%{init: {"flowchart": {"curve": "linear"}} }%%
graph LR
    subgraph Workers["MOSAIC Workers"]
        CRL["CleanRL Worker"]
        XU["XuanCe Worker"]
        RLL["RLlib Worker"]
    end

    subgraph Metrics["Logged Metrics"]
        M1["episodic_return"]
        M2["policy_loss / value_loss"]
        M3["entropy / learning_rate"]
        M4["SPS (steps/sec)"]
    end

    subgraph TB["TensorBoard (Local)"]
        TBD["Dashboard :6006<br/>var/trainer/runs/&lt;run_id&gt;/tensorboard/"]
    end

    subgraph WB["Weights & Biases (Cloud)"]
        WBD["Project Dashboard<br/>run_name = MOSAIC ULID<br/>artefacts + system stats"]
    end

    CRL & XU & RLL --> M1 & M2 & M3 & M4
    M1 & M2 & M3 & M4 --> TBD
    M1 & M2 & M3 & M4 --> WBD

    style Workers fill:#e8f5e9,stroke:#2e7d32,color:#333
    style Metrics fill:#fff3e0,stroke:#e65100,color:#333
    style TB fill:#e3f2fd,stroke:#1565c0,color:#333
    style WB fill:#fce4ec,stroke:#c62828,color:#333
    

Both backends receive the same metrics from worker training loops. TensorBoard writes summaries to local disk; W&B streams them to a cloud dashboard (or logs offline for later sync).


TensorBoard

TensorBoard is a browser-based dashboard for visualising training scalars (reward, loss, entropy), histograms, and media. MOSAIC workers write TensorBoard summaries via torch.utils.tensorboard.SummaryWriter.

Installation

TensorBoard is included in MOSAIC’s base dependencies. Verify it is available:

source .venv/bin/activate
python -c "import tensorboard; print(tensorboard.__version__)"

If missing:

pip install tensorboard

Enabling per Worker

Each worker writes summaries to a run-specific subdirectory under var/trainer/runs/<run_id>/tensorboard/. The path is controlled by the worker config:

CleanRL:

--track --tensorboard-log var/trainer/runs

XuanCe (via YAML or GUI extras):

logger: tensorboard
log_dir: var/trainer/runs

Ray RLlib (via extras in the run config):

{
  "tensorboard_dir": "var/trainer/runs"
}

Launching the Dashboard

source .venv/bin/activate
tensorboard --logdir var/trainer/runs --port 6006

Then open http://localhost:6006 in a browser. All runs under var/trainer/runs/ appear as separate experiment entries.

For WSL users, forward the port from WSL to Windows:

tensorboard --logdir var/trainer/runs --host 0.0.0.0 --port 6006

Then open http://localhost:6006 from the Windows browser.

Key Metrics Logged

Tag

Description

charts/episodic_return

Mean episode return per rollout

charts/episodic_length

Mean episode length

losses/policy_loss

Policy gradient loss (PPO clip loss)

losses/value_loss

Value function MSE loss

losses/entropy

Policy entropy (exploration signal)

charts/learning_rate

Current learning rate (with schedule)

charts/SPS

Environment steps per second


Weights and Biases

Weights and Biases (W&B) is a cloud experiment-tracking platform. It stores runs, metrics, system stats, model artefacts, and allows side-by-side comparison of experiments across machines and team members.

Installation

source .venv/bin/activate
pip install wandb

Authenticate once per machine:

wandb login

This stores a credentials token in ~/.netrc. For offline / air-gapped environments see the Offline Mode section below.

Enabling per Worker

CleanRL:

--track --wandb-project mosaic --wandb-entity <your-team>

Or set environment variables before launching:

export WANDB_PROJECT=mosaic
export WANDB_ENTITY=<your-team>

XuanCe (via YAML or GUI extras):

logger: wandb
wandb_project: mosaic
wandb_entity: <your-team>

Ray RLlib (via extras):

{
  "wandb_project": "mosaic",
  "wandb_entity": "<your-team>"
}

Run Naming

MOSAIC passes the run ULID as the W&B run_name so that W&B runs map one-to-one to MOSAIC run IDs:

wandb.init(
    project=config.wandb_project,
    name=config.run_id,          # MOSAIC ULID
    config=config.to_dict(),
)

This makes it straightforward to cross-reference a W&B dashboard entry with the corresponding checkpoint in var/trainer/runs/<run_id>/.

Offline Mode

For machines without internet access, W&B can log locally and sync later:

export WANDB_MODE=offline

# After the run completes, sync to the cloud:
wandb sync var/wandb/offline-run-*

Disabling W&B

To suppress all W&B output without removing the flag from the config:

export WANDB_DISABLED=true

Or pass --no-track / set logger: tensorboard in the worker YAML.

Proxy Configuration

If your network requires a proxy, set the W&B proxy environment variables before launching a training run:

export WANDB_VPN_HTTPS_PROXY=https://<proxy-host>:<port>
export WANDB_VPN_HTTP_PROXY=http://<proxy-host>:<port>

See Also

  • Structured Logging: MOSAIC’s internal structured log pipeline (LogConstant, filters, rotating file handlers).

  • Fast Lane: real-time frame streaming from worker processes to the GUI, separate from metric logging.

  • Application Constants: numeric defaults that govern queue sizes and backpressure thresholds in the rendering subsystem.