Actor Architecture¶

This page describes the internal structure of the Actor subsystem and how it integrates with the rest of MOSAIC at evaluation time.

High-Level Position in MOSAIC¶

Actors sit at the boundary between the GUI evaluation loop and agent decision logic. The diagram below shows the four components involved:

        graph LR
    SC["SessionController"] --> AS["ActorService"]
    AS --> A["Active Actor"]
    PM["PolicyMappingService"] --> AS
    A -.-> W["Worker Subprocess"]

    style SC fill:#d6eaf8,stroke:#2874a6
    style AS fill:#d6eaf8,stroke:#2874a6
    style PM fill:#d6eaf8,stroke:#2874a6
    style A fill:#eafaf1,stroke:#1e8449
    style W fill:#fef9e7,stroke:#d4ac0d

The dashed arrow from Actor to Worker Subprocess means the actor is a placeholder: the worker subprocess manages its own loop, and the actor in the GUI simply reports which backend is active.

ActorService Internals¶

ActorService maintains three internal maps:

Map	Purpose
`_actors`	Maps `actor_id` to `Actor` instance
`_descriptors`	Maps `actor_id` to `ActorDescriptor` (display name, policy label, backend label for the UI widget)
`_active_actor_id`	The one actor that receives `select_action` calls

        graph TD
    REG["register_actor()"] --> DB[("_actors + _descriptors")]
    ACT["set_active_actor(id)"] --> DB
    DB --> SEL["select_action(snapshot)"]
    DB --> STEP["notify_step(snapshot)"]
    DB --> END["notify_episode_end(summary)"]
    SEED["seed(n)"] --> DB

    style DB fill:#eaf4fb,stroke:#2874a6

Key design decisions:

Only one actor is active at a time. Multiple actors can be registered (one per training backend), but only the active one receives select_action calls. Switching actors does not restart the session.
Seeding is broadcast to all actors. ActorService.seed(n) iterates over every registered actor and calls their optional seed method. This ensures all actors have deterministic state when a new episode begins.
Descriptors are UI-only. ActorDescriptor carries display metadata for the Active Actor widget and has no effect on action selection.

Policy Mapping Integration¶

In multi-agent environments, PolicyMappingService maps each agent_id to an actor_id. Before calling select_action, SessionController uses this mapping to activate the correct actor for the current agent:

        sequenceDiagram
    participant Env as Environment
    participant SC as SessionController
    participant PM as PolicyMappingService
    participant AS as ActorService
    participant A as Active Actor

    Env->>SC: obs, reward, done, agent_id
    SC->>PM: get_actor_id(agent_id)
    PM-->>SC: actor_id
    SC->>AS: set_active_actor(actor_id)
    SC->>AS: select_action(StepSnapshot)
    AS->>A: select_action(snapshot)
    A-->>AS: action
    AS-->>SC: action
    SC->>Env: env.step(action)

Placeholder Actors vs Real Actors¶

MOSAIC has two categories of actor:

Actor	Category	How action is produced
`HumanKeyboardActor`	Real actor	Reads the pending key press set by `HumanInputController` and returns it as an `int`
`CleanRLWorkerActor`	Placeholder	Always returns `None`; the CleanRL subprocess manages its own `env.step`
`XuanCeWorkerActor`	Placeholder	Always returns `None`; the XuanCe subprocess manages its own training loop
`RayRLlibWorkerActor`	Placeholder	Always returns `None`; the Ray cluster manages its own distributed loop

When a placeholder actor is active, the GUI evaluation loop receives None from select_action and treats it as a no-op. The visual widget still shows which backend is running, but the GUI does not drive the environment steps.