Environments Reference

All 13 MalmoEnv missions available in MOSAIC are listed here. Each mission is defined by an XML file bundled in 3rd_party/environments/malmo/MalmoEnv/missions/.

Action Space

MalmoEnv action spaces are mission-specific — each mission XML defines which commands are allowed. The default action filter is {move, turn, use, attack} but individual missions may deny specific commands (e.g. attack).

Three action-space profiles exist across the 13 missions:

Discrete(4) — move + turn only (FindTheGoal, TreasureHunt agent 0):

Index

Command

Description

0

move 1

Move forward

1

move -1

Move backward

2

turn 1

Turn right

3

turn -1

Turn left

Discrete(6) — move + turn + use (Attic, MobChase, Vertical, CliffWalking, CatchTheMob, Eating, Obstacles, TrickyArena):

Index

Command

Description

0

move 1

Move forward

1

move -1

Move backward

2

turn 1

Turn right

3

turn -1

Turn left

4

use 1

Use / interact

5

use 0

Stop use

Discrete(8) — move + turn + attack + use (MazeRunner, DefaultFlatWorld, DefaultWorld):

Index

Command

Description

0

move 1

Move forward

1

move -1

Move backward

2

turn 1

Turn right

3

turn -1

Turn left

4

attack 1

Attack / break block

5

attack 0

Stop attack

6

use 1

Use / place

7

use 0

Stop use

Observation Space

All missions return an RGB image (H, W, 3) as the observation. The default resolution is 84 × 84 pixels. The frame is rendered by Minecraft and streamed over TCP to the Python agent.

Note

The frame dimensions are controlled by the <VideoProducer> element in the mission XML. You can edit the XML to change the resolution; just remember to re-init the env.

Movement Types

Malmo missions use one of two movement command handlers, defined in the mission XML. This fundamentally affects how actions behave:

DiscreteMovementCommands — each action is a one-shot, block-based movement:

  • move 1 moves the agent exactly one block forward, then stops.

  • turn 1 rotates the agent exactly 90 degrees, then stops.

  • No persistence — the agent is stationary between commands.

  • Best for: grid-world style reasoning, turn-based evaluation.

ContinuousMovementCommands — each action sets a persistent velocity:

  • move 1 sets forward velocity to 1.0 — the agent keeps moving until move 0 is sent.

  • turn 1 sets rotation velocity — the agent keeps turning until turn 0 is sent.

  • Values between -1 and 1 control speed (e.g. move 0.5 = half speed).

  • The agent has inertia (velocity is interpolated over ~6 ticks).

  • Best for: smooth navigation, FPS-style control, realistic physics.

Note

For RL training, the agent must learn to manage velocity in continuous missions (e.g. send move 0 to stop). For human play, MOSAIC routes keyboard input through a native side-channel (TCP port 9001) that handles press/release naturally.

Missions

MalmoEnv-MobChase-v0

XML:

mobchase_single_agent.xml

Movement:

Discrete: one block per move, 90° per turn

Actions:

Discrete(6): move, turn, use

Objective:

Chase and reach a mob (pig/cow) in an open flat arena.

Reward:

Positive reward when the agent reaches within a threshold distance of the mob.

Termination:

Fixed time limit (in Malmo ticks).

Use case: Testing pursuit / chasing behaviours. The mob moves randomly, providing a moving target.

MalmoEnv-MazeRunner-v0

XML:

mazerunner.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(8): move, turn, attack, use

Objective:

Navigate from the start position to the goal block at the exit of a maze.

Reward:

Positive reward on reaching the goal; small negative step penalty.

Termination:

Time limit or goal reached.

Use case: Pathfinding and navigation benchmarks in a structured environment.

MalmoEnv-Vertical-v0

XML:

vertical.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(6): move, turn, use

Objective:

Climb a vertical tower of blocks placed on a platform over a void.

Reward:

Reward proportional to height gained.

Termination:

Agent falls off or time limit expires.

Use case: Training agents to climb and jump in 3-D environments.

MalmoEnv-CliffWalking-v0

XML:

cliffwalking.xml

Movement:

Discrete: one block per move, 90° per turn

Actions:

Discrete(6): move, turn, use

Objective:

Walk along the top of a cliff from start to goal without falling.

Reward:

+1 for each step toward the goal; large negative reward for falling.

Termination:

Agent falls or reaches the goal.

Use case: Safety-aware navigation; dense reward signal for curriculum learning.

MalmoEnv-CatchTheMob-v0

XML:

catchthemob.xml

Movement:

Discrete: one block per move, 90° per turn

Actions:

Discrete(6): move, turn, use

Objective:

Catch a mob that is enclosed in a small arena.

Reward:

Reward on contact with the mob.

Termination:

Time limit.

Use case: Simpler mob-chasing variant with a confined space; easier exploration problem.

MalmoEnv-FindTheGoal-v0

XML:

findthegoal.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(4): move, turn only

Objective:

Locate and stand on a gold block hidden somewhere in a large flat world.

Reward:

Large positive reward on reaching the goal block.

Termination:

Time limit.

Use case: Sparse-reward exploration. The agent must search a wide area with no intermediate guidance.

MalmoEnv-Attic-v0

XML:

attic.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(6): move, turn, use

Objective:

Navigate an indoor “attic” layout (corridors, rooms, furniture).

Reward:

Positive reward on reaching the designated exit.

Termination:

Time limit.

Use case: Indoor navigation with obstacles; closer to real-world room layout.

MalmoEnv-DefaultFlatWorld-v0

XML:

defaultflatworld.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(8): move, turn, attack, use

Objective:

Open-ended flat creative world — no specific mission goal.

Reward:

No default reward (extendable via XML).

Termination:

Time limit.

Use case: Free exploration, block placement experiments, custom reward shaping.

MalmoEnv-DefaultWorld-v0

XML:

defaultworld.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(8): move, turn, attack, use

Objective:

Open-ended default Minecraft world generation (survival-style terrain).

Reward:

No default reward.

Termination:

Time limit.

Use case: Exploration in a rich procedurally generated landscape; closest to vanilla Minecraft.

MalmoEnv-Eating-v0

XML:

eating.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(6): move, turn, use

Objective:

Collect food items (bread, carrots, etc.) placed in the world.

Reward:

Positive reward for each food item collected.

Termination:

All items collected or time limit.

Use case: Reward-dense item collection for initial policy bootstrapping.

MalmoEnv-Obstacles-v0

XML:

obstacles.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(6): move, turn, use

Objective:

Navigate from start to goal while bypassing a series of obstacles (walls, pits, lava).

Reward:

Positive reward on reaching the goal.

Termination:

Agent dies or time limit.

Use case: Multi-hazard navigation; tests the agent’s ability to plan around traps.

MalmoEnv-TrickyArena-v0

XML:

trickyarena.xml

Movement:

Continuous: persistent velocity

Actions:

Discrete(6): move, turn, use

Objective:

Survive in an arena with pits, moving platforms, and hazards.

Reward:

Reward for time survived; penalty for falling into pits.

Termination:

Agent dies or time limit.

Use case: Robustness testing — the agent must be reactive to environmental hazards.

MalmoEnv-TreasureHunt-v0

XML:

treasurehunt.xml

Movement:

Discrete (Turn-Based): one block per move, 90° per turn

Actions:

Discrete(4): move, turn only (Agent 0)

Agents:

2 agents required: Agent 0 (deny attack) + Agent 1 (deny use). Single-agent mode not supported.

Objective:

Find and collect treasure chests scattered across the world.

Reward:

Positive reward for each chest opened.

Termination:

All chests collected or time limit.

Warning

This is a 2-agent mission. The Malmo server waits for both agents to connect before starting the episode. Running in single-agent mode causes a timeout (ERROR_TIMED_OUT_WAITING_FOR_EPISODE_START).

Use case: Multi-target collection with sparse rewards and exploration requirement.

MarLo Mission History

These missions were originally part of the MarLo benchmark (the MOSAIC predecessor used the Go-based mosaic_malmo package). They are now served identically through MalmoEnv by sending the same mission XML files to Minecraft. The MOSAIC game IDs have been renamed from MosaicMarLo-*-v0 to MalmoEnv-*-v0 to reflect the new backend.

Old ID (Go backend, removed)

New ID (MalmoEnv backend)

MosaicMalmo-Navigate-v0

MalmoEnv-DefaultWorld-v0

MosaicMarLo-Vertical-v0

MalmoEnv-Vertical-v0

MosaicMarLo-MazeRunner-v0

MalmoEnv-MazeRunner-v0

MosaicMarLo-CliffWalking-v0

MalmoEnv-CliffWalking-v0

MosaicMarLo-CatchTheMob-v0

MalmoEnv-CatchTheMob-v0

MosaicMarLo-FindTheGoal-v0

MalmoEnv-FindTheGoal-v0

MosaicMarLo-Attic-v0

MalmoEnv-Attic-v0

MosaicMarLo-DefaultFlatWorld-v0

MalmoEnv-DefaultFlatWorld-v0

MosaicMarLo-DefaultWorld-v0

MalmoEnv-DefaultWorld-v0

MosaicMarLo-Eating-v0

MalmoEnv-Eating-v0

MosaicMarLo-Obstacles-v0

MalmoEnv-Obstacles-v0

MosaicMarLo-TrickyArena-v0

MalmoEnv-TrickyArena-v0