Actor Concept¶
What is an Actor?¶
In MOSAIC, the term Actor refers to any object that can produce an action given the current environment state. Actors operate entirely inside the GUI process and are invoked synchronously on every environment step.
Actors are intentionally lightweight. They do not own environments, do not spawn subprocesses, and do not perform gradient updates. Their single responsibility is: given a snapshot of the world, return an action.
This separates the concerns of training (handled by Workers) from inference (handled by Actors):
Concern |
Handled by |
Lives in |
|---|---|---|
Policy training |
Worker subprocess (CleanRL, Ray, XuanCe) |
Isolated subprocess via gRPC or IPC |
Action selection |
Actor ( |
GUI main process |
Actor coordination |
|
GUI main process |
The Two Actor Protocols¶
MOSAIC defines two complementary protocols in gym_gui/services/actor.py:
Actor - Simple, Single-Agent Protocol¶
The Actor protocol is the original, lightweight interface designed for
single-agent environments. Any class that implements these three methods is
a valid Actor:
class Actor(Protocol):
id: str
def select_action(self, step: StepSnapshot) -> Optional[int]: ...
def on_step(self, step: StepSnapshot) -> None: ...
def on_episode_end(self, summary: EpisodeSummary) -> None: ...
PolicyController - Paradigm-Aware, Multi-Agent Protocol¶
The PolicyController protocol extends the actor concept for multi-agent
environments. It adds:
Agent-specific action selection via
select_action(agent_id, obs, info)Batch action selection via
select_actions(observations)for simultaneous (POSG) paradigms where all agents act at the same timeParadigm declaration via the
paradigmproperty, which signals whether the controller targets AEC (turn-based) or POSG (simultaneous) environmentsPer-agent lifecycle hooks via
on_step_resultandon_episode_end, which carry theagent_idalongside the usual feedback
Data Containers¶
Two frozen dataclasses carry data between the environment loop and actors:
StepSnapshot¶
Passed to select_action and on_step on every environment step:
Field |
Description |
|---|---|
|
Zero-based step counter within the current episode |
|
Raw observation returned by the environment |
|
Reward received on the previous step ( |
|
|
|
|
|
Optional seed used to reset this episode |
|
Auxiliary environment info dict |
EpisodeSummary¶
Delivered via on_episode_end at the end of every episode:
Field |
Description |
|---|---|
|
Zero-based episode counter for the current session |
|
Sum of all rewards across the episode |
|
Number of steps taken in the episode |
|
Arbitrary key-value pairs (worker-specific diagnostics) |
When to use Actors vs Workers¶
You want to |
Use |
Why |
|---|---|---|
Train a new policy from scratch |
Worker (CleanRL, Ray, XuanCe) |
Workers own the training loop, checkpointing, and telemetry |
Evaluate a trained policy in the GUI |
Actor (loads checkpoint, selects actions) |
Actors are lightweight and run inside the GUI process |
Let a human play an environment |
|
Forwards keyboard input captured by |
Track which backend is currently active |
|
Placeholder actors represent active training backends in the UI |