Environments Reference¶
All 13 MalmoEnv missions available in MOSAIC are listed here. Each mission is defined by
an XML file bundled in 3rd_party/environments/malmo/MalmoEnv/missions/.
Action Space¶
MalmoEnv action spaces are mission-specific — each mission XML defines which
commands are allowed. The default action filter is {move, turn, use, attack}
but individual missions may deny specific commands (e.g. attack).
Three action-space profiles exist across the 13 missions:
Discrete(4) — move + turn only (FindTheGoal, TreasureHunt agent 0):
Index |
Command |
Description |
|---|---|---|
0 |
|
Move forward |
1 |
|
Move backward |
2 |
|
Turn right |
3 |
|
Turn left |
Discrete(6) — move + turn + use (Attic, MobChase, Vertical, CliffWalking, CatchTheMob, Eating, Obstacles, TrickyArena):
Index |
Command |
Description |
|---|---|---|
0 |
|
Move forward |
1 |
|
Move backward |
2 |
|
Turn right |
3 |
|
Turn left |
4 |
|
Use / interact |
5 |
|
Stop use |
Discrete(8) — move + turn + attack + use (MazeRunner, DefaultFlatWorld, DefaultWorld):
Index |
Command |
Description |
|---|---|---|
0 |
|
Move forward |
1 |
|
Move backward |
2 |
|
Turn right |
3 |
|
Turn left |
4 |
|
Attack / break block |
5 |
|
Stop attack |
6 |
|
Use / place |
7 |
|
Stop use |
Observation Space¶
All missions return an RGB image (H, W, 3) as the observation. The default
resolution is 84 × 84 pixels. The frame is rendered by Minecraft and streamed over TCP
to the Python agent.
Note
The frame dimensions are controlled by the <VideoProducer> element in the mission XML.
You can edit the XML to change the resolution; just remember to re-init the env.
Movement Types¶
Malmo missions use one of two movement command handlers, defined in the mission XML. This fundamentally affects how actions behave:
DiscreteMovementCommands — each action is a one-shot, block-based movement:
move 1moves the agent exactly one block forward, then stops.turn 1rotates the agent exactly 90 degrees, then stops.No persistence — the agent is stationary between commands.
Best for: grid-world style reasoning, turn-based evaluation.
ContinuousMovementCommands — each action sets a persistent velocity:
move 1sets forward velocity to 1.0 — the agent keeps moving untilmove 0is sent.turn 1sets rotation velocity — the agent keeps turning untilturn 0is sent.Values between -1 and 1 control speed (e.g.
move 0.5= half speed).The agent has inertia (velocity is interpolated over ~6 ticks).
Best for: smooth navigation, FPS-style control, realistic physics.
Note
For RL training, the agent must learn to manage velocity in continuous missions
(e.g. send move 0 to stop). For human play, MOSAIC routes keyboard input
through a native side-channel (TCP port 9001) that handles press/release naturally.
Missions¶
MalmoEnv-MobChase-v0¶
- XML:
mobchase_single_agent.xml- Movement:
Discrete: one block per
move, 90° perturn- Actions:
Discrete(6): move, turn, use
- Objective:
Chase and reach a mob (pig/cow) in an open flat arena.
- Reward:
Positive reward when the agent reaches within a threshold distance of the mob.
- Termination:
Fixed time limit (in Malmo ticks).
Use case: Testing pursuit / chasing behaviours. The mob moves randomly, providing a moving target.
MalmoEnv-MazeRunner-v0¶
- XML:
mazerunner.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(8): move, turn, attack, use
- Objective:
Navigate from the start position to the goal block at the exit of a maze.
- Reward:
Positive reward on reaching the goal; small negative step penalty.
- Termination:
Time limit or goal reached.
Use case: Pathfinding and navigation benchmarks in a structured environment.
MalmoEnv-Vertical-v0¶
- XML:
vertical.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(6): move, turn, use
- Objective:
Climb a vertical tower of blocks placed on a platform over a void.
- Reward:
Reward proportional to height gained.
- Termination:
Agent falls off or time limit expires.
Use case: Training agents to climb and jump in 3-D environments.
MalmoEnv-CliffWalking-v0¶
- XML:
cliffwalking.xml- Movement:
Discrete: one block per
move, 90° perturn- Actions:
Discrete(6): move, turn, use
- Objective:
Walk along the top of a cliff from start to goal without falling.
- Reward:
+1 for each step toward the goal; large negative reward for falling.
- Termination:
Agent falls or reaches the goal.
Use case: Safety-aware navigation; dense reward signal for curriculum learning.
MalmoEnv-CatchTheMob-v0¶
- XML:
catchthemob.xml- Movement:
Discrete: one block per
move, 90° perturn- Actions:
Discrete(6): move, turn, use
- Objective:
Catch a mob that is enclosed in a small arena.
- Reward:
Reward on contact with the mob.
- Termination:
Time limit.
Use case: Simpler mob-chasing variant with a confined space; easier exploration problem.
MalmoEnv-FindTheGoal-v0¶
- XML:
findthegoal.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(4): move, turn only
- Objective:
Locate and stand on a gold block hidden somewhere in a large flat world.
- Reward:
Large positive reward on reaching the goal block.
- Termination:
Time limit.
Use case: Sparse-reward exploration. The agent must search a wide area with no intermediate guidance.
MalmoEnv-Attic-v0¶
- XML:
attic.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(6): move, turn, use
- Objective:
Navigate an indoor “attic” layout (corridors, rooms, furniture).
- Reward:
Positive reward on reaching the designated exit.
- Termination:
Time limit.
Use case: Indoor navigation with obstacles; closer to real-world room layout.
MalmoEnv-DefaultFlatWorld-v0¶
- XML:
defaultflatworld.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(8): move, turn, attack, use
- Objective:
Open-ended flat creative world — no specific mission goal.
- Reward:
No default reward (extendable via XML).
- Termination:
Time limit.
Use case: Free exploration, block placement experiments, custom reward shaping.
MalmoEnv-DefaultWorld-v0¶
- XML:
defaultworld.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(8): move, turn, attack, use
- Objective:
Open-ended default Minecraft world generation (survival-style terrain).
- Reward:
No default reward.
- Termination:
Time limit.
Use case: Exploration in a rich procedurally generated landscape; closest to vanilla Minecraft.
MalmoEnv-Eating-v0¶
- XML:
eating.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(6): move, turn, use
- Objective:
Collect food items (bread, carrots, etc.) placed in the world.
- Reward:
Positive reward for each food item collected.
- Termination:
All items collected or time limit.
Use case: Reward-dense item collection for initial policy bootstrapping.
MalmoEnv-Obstacles-v0¶
- XML:
obstacles.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(6): move, turn, use
- Objective:
Navigate from start to goal while bypassing a series of obstacles (walls, pits, lava).
- Reward:
Positive reward on reaching the goal.
- Termination:
Agent dies or time limit.
Use case: Multi-hazard navigation; tests the agent’s ability to plan around traps.
MalmoEnv-TrickyArena-v0¶
- XML:
trickyarena.xml- Movement:
Continuous: persistent velocity
- Actions:
Discrete(6): move, turn, use
- Objective:
Survive in an arena with pits, moving platforms, and hazards.
- Reward:
Reward for time survived; penalty for falling into pits.
- Termination:
Agent dies or time limit.
Use case: Robustness testing — the agent must be reactive to environmental hazards.
MalmoEnv-TreasureHunt-v0¶
- XML:
treasurehunt.xml- Movement:
Discrete (Turn-Based): one block per
move, 90° perturn- Actions:
Discrete(4): move, turn only (Agent 0)
- Agents:
2 agents required: Agent 0 (deny attack) + Agent 1 (deny use). Single-agent mode not supported.
- Objective:
Find and collect treasure chests scattered across the world.
- Reward:
Positive reward for each chest opened.
- Termination:
All chests collected or time limit.
Warning
This is a 2-agent mission. The Malmo server waits for both agents to connect
before starting the episode. Running in single-agent mode causes a timeout
(ERROR_TIMED_OUT_WAITING_FOR_EPISODE_START).
Use case: Multi-target collection with sparse rewards and exploration requirement.
MarLo Mission History¶
These missions were originally part of the MarLo benchmark (the MOSAIC predecessor
used the Go-based mosaic_malmo package). They are now served identically through
MalmoEnv by sending the same mission XML files to Minecraft. The MOSAIC game IDs have
been renamed from MosaicMarLo-*-v0 to MalmoEnv-*-v0 to reflect the new backend.
Old ID (Go backend, removed) |
New ID (MalmoEnv backend) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|