Common Errors¶
This section documents errors commonly encountered when installing and running MOSAIC. Errors shared across platforms are listed below. For platform-specific issues, see the sub-pages:
- WSL-Specific Errors
- box2d-py – “ModuleNotFoundError: No module named ‘swig’”
- Broken
.venv(I/O errors on symlinks) - Trainer daemon – “libGL.so.1: cannot open shared object file”
- QRhiGles2 – “Failed to create temporary context”
- GUI launches but text shows as squares or is invisible
- PyQt6 – “Could not load the Qt platform plugin”
- CUDA / GPU not detected on WSL
- Quick Reference: WSL System Dependencies
- Ubuntu-Specific Errors
- Worker-Specific Errors
- Operator-Specific Errors
Virtual Environment¶
python -m env – “No module named env”¶
Error:
/usr/bin/python3: No module named env
Cause: The module is called venv (with a v), not env.
Fix:
python3.11 -m venv .venv
Wrong Python version (< 3.10)¶
Error:
ERROR: Package 'mosaic' requires a different Python: 3.9.13 not in '>=3.10,<3.13'
Cause: MOSAIC requires Python 3.10, 3.11, or 3.12. Your system default
python3 may point to an older version.
Fix:
# Check what you have
python3 --version
# Install Python 3.11 if needed (Ubuntu)
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3.11-dev
# Create venv with the correct version
python3.11 -m venv .venv
System Dependencies (Build Failures)¶
Several MOSAIC optional packages compile native C/C++ code and require
system-level build tools. If you see Failed building wheel for ...,
the fix is usually installing the missing system package.
Install all build dependencies at once:
sudo apt-get update
sudo apt-get install -y \
build-essential \
cmake \
swig \
flex \
bison \
libbz2-dev \
libopenmpi-dev
box2d-py: “command ‘swig’ failed”¶
Error:
swig -python -c++ -IBox2D -small -O -includeall ...
error: command 'swig' failed: No such file or directory
ERROR: Failed building wheel for box2d-py
Cause: The box2d extra (LunarLander, BipedalWalker, CarRacing)
requires box2d-py, which wraps C++ using SWIG. SWIG is not installed
by default.
Fix:
sudo apt-get install -y swig build-essential
pip install -e ".[box2d]"
nle (NetHack): build failure¶
Error:
ERROR: Failed building wheel for nle
Cause: The NetHack Learning Environment (nle) compiles the NetHack
game engine from C source. It requires cmake, flex, bison, and
libbz2-dev.
Fix:
sudo apt-get install -y build-essential cmake flex bison libbz2-dev
pip install -e ".[nethack]"
mpi4py: “Cannot compile MPI programs”¶
Error:
error: Cannot compile MPI programs. Check your configuration!
or:
mpicc: No such file or directory
Cause: The XuanCe worker depends on mpi4py, which compiles against
MPI libraries. The MPI compiler wrapper mpicc is not installed by default.
Fix:
# Ubuntu/Debian/WSL
sudo apt-get install -y libopenmpi-dev
# macOS
brew install open-mpi
# Then retry
pip install mpi4py
Tip
MOSAIC sets MPI4PY_RC_INITIALIZE=0 in the .env file to prevent
MPI_Init() from blocking when XuanCe is imported outside of mpirun.
Make sure your .env file includes this setting.
TensorBoard: “No module named ‘pkg_resources’”¶
Error:
ModuleNotFoundError: No module named 'pkg_resources'
This error may appear when launching TensorBoard (either from the command line or via the embedded TensorBoard viewer in the Render View).
Cause: setuptools version 78 and later removed the pkg_resources
package, which was previously bundled with setuptools. TensorBoard
(and several other packages) still imports pkg_resources at startup.
If your environment has setuptools>=78, the import fails.
Fix: Downgrade setuptools to a version that still includes
pkg_resources:
pip install "setuptools<78"
# Verify
python -c "import pkg_resources; print('pkg_resources OK')"
tensorboard --version
MOSAIC’s requirements/base.txt now pins setuptools<78 to prevent
this on fresh installs.
Dependency Conflicts¶
smac / smacv2: protobuf version conflict¶
Error:
ERROR: Cannot install mosaic[smac] because these package versions have
conflicting dependencies.
The conflict is caused by:
mosaic 0.1.0 depends on protobuf>=4.25.0
smac 1.0.0 depends on protobuf<3.21
Cause: This is a fundamental version conflict between two parts of the stack:
MOSAIC uses gRPC (
grpcio) for trainer <-> GUI communication. The gRPC protocol buffers are compiled usinggrpcio-toolsand requireprotobuf>=4.25. The generated stubs live ingym_gui/services/trainer/proto/and can be regenerated with:bash tools/generate_protos.shSMAC / SMACv2 (StarCraft Multi-Agent Challenge) uses the StarCraft II client protocol (
s2clientprotocol), which declares a dependency onprotobuf<3.21.
These two version ranges (>=4.25 vs <3.21) are mutually exclusive,
so pip refuses to install both.
Fix: Install SMAC with --no-deps to bypass the declared constraint:
# Install SMAC without its dependency constraints
pip install --no-deps "smac @ git+https://github.com/oxwhirl/smac.git"
pip install --no-deps "smacv2 @ git+https://github.com/oxwhirl/smacv2.git"
# Manually install SMAC's other dependencies (excluding protobuf)
pip install pysc2 s2clientprotocol pygame
Note
Why does this work at runtime? The protobuf<3.21 pin in SMAC is
overly conservative. The s2clientprotocol package works with
protobuf>=4.25 in practice – the protobuf library maintains backwards
compatibility for compiled proto messages. Hamid’s development machine
runs both MOSAIC (gRPC) and SMAC successfully with protobuf>=4.25.
Proto regeneration: If you modify trainer.proto or encounter stale
proto stubs, regenerate them:
source .venv/bin/activate
bash tools/generate_protos.sh
This runs grpc_tools.protoc on gym_gui/services/trainer/proto/trainer.proto
and produces:
trainer_pb2.py: message classestrainer_pb2_grpc.py: gRPC service stubstrainer_pb2.pyi: type stubs
grpcio-tools: “grpcio-tools not installed”¶
Error (when running ``tools/generate_protos.sh``):
(.venv) $ bash ./tools/generate_protos.sh
[protos] Root: /path/to/mosaic
[protos] grpcio-tools not installed. Run: pip install grpcio-tools
Cause: The proto generation script requires grpcio-tools to compile
.proto files into Python stubs. This package is listed in MOSAIC’s core
dependencies (grpcio-tools>=1.60.0), but may be missing if you installed
with --no-deps or if the initial pip install -e . was interrupted.
Fix:
pip install grpcio-tools
Then re-run the proto generation:
(.venv) $ bash ./tools/generate_protos.sh
[protos] Root: /path/to/mosaic
[protos] Generating trainer stubs
[protos] Done
Note
Proto stubs are checked into the repository, so you only need to
regenerate them if you modify trainer.proto or if the stubs become
stale after a protobuf version upgrade.
Startup Errors¶
Trainer daemon failed to start: “VersionError: Detected incompatible Protobuf Gencode/Runtime versions”¶
Error (when running ``./run.sh``):
Checking for existing trainer processes...
Starting trainer daemon...
Waiting for trainer daemon...
ERROR: Trainer daemon failed to start. Check var/logs/trainer_daemon.log
Check the log (var/logs/trainer_daemon.log):
File ".../google/protobuf/runtime_version.py", line 100, in ValidateProtobufRuntimeVersion
_ReportVersionError(
File ".../google/protobuf/runtime_version.py", line 50, in _ReportVersionError
raise VersionError(msg)
google.protobuf.runtime_version.VersionError: Detected incompatible Protobuf Gencode/Runtime versions when loading trainer.proto: gencode 6.31.1 runtime 5.29.6. Runtime version cannot be older than the linked gencode version.
Cause: The generated Protobuf files (trainer_pb2.py) checked into the repository were compiled with a newer version of Protobuf (e.g., 6.31.1) than the one installed in your Python environment (e.g., 5.29.6). Protobuf requires that the runtime library be at least as new as the compiler used to generate the code.
Fix: Regenerate the Protobuf files using your local environment’s tools:
source .venv/bin/activate
bash tools/generate_protos.sh
Then restart the application:
./run.sh
Trainer daemon failed to start: “No module named ‘dotenv’”¶
Error (when running ``./run.sh``):
Checking for existing trainer processes...
Starting trainer daemon...
Waiting for trainer daemon...
ERROR: Trainer daemon failed to start. Check var/logs/trainer_daemon.log
Check the log (var/logs/trainer_daemon.log):
google.protobuf.runtime_version.VersionError: Detected incompatible Protobuf
Gencode/Runtime versions when loading trainer.proto: gencode 6.31.1 runtime 5.29.6.
Cause: The generated Protobuf files (trainer_pb2.py) checked into the
repository were compiled with a newer version of Protobuf than the one
installed in your environment.
Fix: Regenerate the Protobuf files using your local environment’s tools:
source .venv/bin/activate
bash tools/generate_protos.sh
./run.sh
Trainer daemon: “No module named ‘dotenv’”¶
Error (in ``var/logs/trainer_daemon.log``):
ModuleNotFoundError: No module named 'dotenv'
Cause: The core MOSAIC package was not installed.
Fix:
source .venv/bin/activate
pip install -e .
./run.sh
Missing .env file¶
Symptom: The GUI launches but uses incorrect defaults, or the trainer daemon fails to connect with unexpected settings.
Cause: MOSAIC reads configuration from a .env file in the project
root. This file is gitignored (never committed) and must be created
locally from the provided template.
Fix:
cp .env.example .env
The .env.example file is fully documented with all available settings.
Key variables you should review:
Variable |
Purpose |
|---|---|
|
Required. Must be |
|
Platform identifier ( |
|
MuJoCo rendering backend ( |
|
Required. Prevents MPI deadlock when XuanCe is imported |
|
Required for SMAC. Allows protobuf 5.x to read SMAC’s 3.x pb2 files |
|
gRPC log level ( |
|
Weights & Biases API key (get from https://wandb.ai/authorize) |
|
OpenRouter LLM API key (get from https://openrouter.ai/keys) |
|
HuggingFace token for gated models |
Warning
The .env file contains API keys and secrets. It is gitignored by
default. Never commit it to version control.
Startup crash: “No module named ‘torch’” / “No module named ‘psutil’”¶
Error:
ModuleNotFoundError: No module named 'torch'
Or:
ModuleNotFoundError: No module named 'psutil'
Cause: app.py imports torch, psutil, and pynvml
unconditionally at startup for GPU detection and system info display.
Fix:
pip install torch psutil pynvml
Note
For CUDA GPU support, install PyTorch with the CUDA index:
pip install torch --index-url https://download.pytorch.org/whl/cu121
Startup crash: “No module named ‘stockfish’” (misleading Qt error)¶
Error:
[gym_gui] Qt bindings not available. Install qtpy and a Qt backend
(PyQt5/PyQt6/PySide2/PySide6): No module named 'stockfish'
Cause: The stockfish Python package was imported at module level.
The missing import was caught by a generic handler, producing a misleading
“Qt bindings not available” message.
Fix: This was a bug (fixed). The import was moved to a lazy location. If you encounter this on an older version:
pip install stockfish
Startup crash: “No module named ‘syllabus’”¶
Error:
ModuleNotFoundError: No module named 'syllabus'
Cause: The CleanRL worker uses Syllabus-RL for curriculum learning. Syllabus is a local 3rd-party package.
Fix:
# If the directory is empty, initialize the submodule
git submodule update --init 3rd_party/tools/Syllabus
# Install in editable mode
pip install -e 3rd_party/tools/Syllabus
Worker load warnings (non-fatal)¶
During startup, MOSAIC discovers workers via entry points. Missing optional workers produce warnings but do not prevent the GUI from launching:
Failed to load worker 'cleanrl' from entry point: No module named 'scipy'
Failed to load worker 'ray' from entry point: No module named 'tree'
These are informational – install the relevant extras to enable each worker:
# CleanRL worker
pip install -e 3rd_party/tools/Syllabus
pip install matplotlib scipy
# Ray/RLlib worker
pip install -e ".[ray-rllib]"
Runtime Errors¶
CleanRL training crash: “No module named ‘tyro’”¶
Error (in ``var/logs/trainer_daemon.log``):
ModuleNotFoundError: No module named 'tyro'
Cause: The CleanRL worker requires tyro, wandb, tensorboard,
tenacity, and moviepy.
Fix: Install the full CleanRL extras:
pip install -e ".[cleanrl]"
pynvml deprecation warning¶
Warning (at startup):
FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead.
Fix: Uninstall the deprecated wrapper:
pip uninstall -y pynvml
pip install nvidia-ml-py
gRPC: “Log level DEBUG is not suitable for production”¶
Warning (when running ``./run.sh``):
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1771284312.764300 84092 log.cc:110] Log level DEBUG is not suitable
for production.
This is harmless. Set GRPC_VERBOSITY=ERROR in .env to suppress it.
Syllabus / CleanRL: missing transitive dependencies¶
Error:
ModuleNotFoundError: No module named 'matplotlib'
or:
Failed to load worker 'cleanrl' from entry point: No module named 'scipy'
Fix:
pip install matplotlib scipy
Stockfish not found (Chess Engine)¶
Error:
stockfish.StockfishException: Stockfish binary not found
Fix:
# Ubuntu/Debian/WSL
sudo apt-get install -y stockfish
# macOS
brew install stockfish
CleanRL PPO crash: “mat1 and mat2 shapes cannot be multiplied” (Discrete obs)¶
Error (when training PPO on FrozenLake-v1, Taxi-v3, CliffWalking-v0, etc.):
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x4 and 1x64)
Cause: Environments with Discrete observation spaces (e.g.
FrozenLake-v1 has Discrete(16)) return a single integer as the
observation. CleanRL’s MLPAgent computes the input dimension as
prod(obs_space.shape). For Discrete(n) the shape is () so
prod(()) == 1, causing a shape mismatch.
Fix: This is now handled automatically. The make_env factory detects
Discrete observation spaces and wraps them with one-hot encoding. For
example, FrozenLake-v1’s Discrete(16) becomes Box(0, 1, (16,),
float32).
Affected environments: Any environment with a Discrete observation
space, including FrozenLake-v1, Taxi-v3, and CliffWalking-v0.
Environments with Box observation spaces (CartPole-v1,
LunarLander-v2, etc.) are unaffected.
Quick Reference: All System Dependencies¶
# Install everything at once
sudo apt-get update
sudo apt-get install -y \
build-essential \
cmake \
swig \
flex \
bison \
libbz2-dev \
libopenmpi-dev \
libegl1 libgl1 libopengl0 libxkbcommon0 \
libnss3 libnspr4 libasound2 \
libxcomposite1 libxdamage1 libxkbfile1 \
libxcb-cursor0 \
fontconfig fonts-dejavu-extra fonts-liberation fonts-noto \
stockfish \
xvfb
System Package |
Required By |
Error Without It |
|---|---|---|
|
Most native extensions |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PyQt6 (OpenGL/EGL) |
|
|
PyQt6 (keyboard handling) |
|
|
PyQt6-WebEngine (TensorBoard viewer) |
|
|
PyQt6-WebEngine (audio backend) |
|
|
PyQt6-WebEngine (X11 compositing) |
|
|
PyQt6-WebEngine (X11 damage tracking) |
|
|
PyQt6-WebEngine (keyboard layout files) |
|
|
PyQt6 (X11 cursor support, required since Qt 6.5) |
|
|
PyQt6 (font discovery) |
GUI launches but all text is invisible |
|
System fonts for Qt rendering |
GUI launches but all text is invisible |
|
Chess engine |
|
|
Headless display (optional) |
|