Actor Concept¶

What is an Actor?¶

In MOSAIC, the term Actor refers to any object that can produce an action given the current environment state. Actors operate entirely inside the GUI process and are invoked synchronously on every environment step.

Actors are intentionally lightweight. They do not own environments, do not spawn subprocesses, and do not perform gradient updates. Their single responsibility is: given a snapshot of the world, return an action.

This separates the concerns of training (handled by Workers) from inference (handled by Actors):

Concern	Handled by	Lives in
Policy training	Worker subprocess (CleanRL, Ray, XuanCe)	Isolated subprocess via gRPC or IPC
Action selection	Actor (`CleanRLWorkerActor`, `HumanKeyboardActor`)	GUI main process
Actor coordination	`ActorService`	GUI main process

The Two Actor Protocols¶

MOSAIC defines two complementary protocols in gym_gui/services/actor.py:

`Actor` - Simple, Single-Agent Protocol¶

The Actor protocol is the original, lightweight interface designed for single-agent environments. Any class that implements these three methods is a valid Actor:

class Actor(Protocol):
    id: str

    def select_action(self, step: StepSnapshot) -> Optional[int]: ...
    def on_step(self, step: StepSnapshot) -> None: ...
    def on_episode_end(self, summary: EpisodeSummary) -> None: ...

`PolicyController` - Paradigm-Aware, Multi-Agent Protocol¶

The PolicyController protocol extends the actor concept for multi-agent environments. It adds:

Agent-specific action selection via select_action(agent_id, obs, info)
Batch action selection via select_actions(observations) for simultaneous (POSG) paradigms where all agents act at the same time
Paradigm declaration via the paradigm property, which signals whether the controller targets AEC (turn-based) or POSG (simultaneous) environments
Per-agent lifecycle hooks via on_step_result and on_episode_end, which carry the agent_id alongside the usual feedback

Data Containers¶

Two frozen dataclasses carry data between the environment loop and actors:

`StepSnapshot`¶

Passed to select_action and on_step on every environment step:

Field	Description
`step_index`	Zero-based step counter within the current episode
`observation`	Raw observation returned by the environment
`reward`	Reward received on the previous step (`0.0` on the first step)
`terminated`	`True` if the episode ended naturally (goal reached, game over)
`truncated`	`True` if the episode was cut short (time limit)
`seed`	Optional seed used to reset this episode
`info`	Auxiliary environment info dict

`EpisodeSummary`¶

Delivered via on_episode_end at the end of every episode:

Field	Description
`episode_index`	Zero-based episode counter for the current session
`total_reward`	Sum of all rewards across the episode
`steps`	Number of steps taken in the episode
`metadata`	Arbitrary key-value pairs (worker-specific diagnostics)

When to use Actors vs Workers¶

You want to	Use	Why
Train a new policy from scratch	Worker (CleanRL, Ray, XuanCe)	Workers own the training loop, checkpointing, and telemetry
Evaluate a trained policy in the GUI	Actor (loads checkpoint, selects actions)	Actors are lightweight and run inside the GUI process
Let a human play an environment	`HumanKeyboardActor`	Forwards keyboard input captured by `HumanInputController`
Track which backend is currently active	`CleanRLWorkerActor` or `RayRLlibWorkerActor`	Placeholder actors represent active training backends in the UI

Actor Concept¶

What is an Actor?¶

The Two Actor Protocols¶

Actor - Simple, Single-Agent Protocol¶

PolicyController - Paradigm-Aware, Multi-Agent Protocol¶