Actor Lifecycle¶
This page traces the complete lifecycle of an Actor from registration through episode end.
Overview¶
%%{init: {"flowchart": {"curve": "linear"}} }%%
graph TD
A["1. Register Actor<br/>ActorService.register_actor()"]
B["2. Activate Actor<br/>set_active_actor(id)"]
C["3. Seed Broadcast<br/>ActorService.seed(n)"]
D["4. Episode Begin<br/>(env.reset called externally)"]
E["5. Step Loop<br/>select_action → env.step → on_step"]
F{"Episode done?"}
G["6. Episode End<br/>notify_episode_end(summary)"]
H{"Session over?"}
I["7. Actor remains registered<br/>(available for next episode)"]
A --> B --> C --> D --> E --> F
F -->|"No"| E
F -->|"Yes"| G --> H
H -->|"No: next episode"| C
H -->|"Yes"| I
style A fill:#d6eaf8,stroke:#2874a6
style B fill:#d6eaf8,stroke:#2874a6
style C fill:#d5f5e3,stroke:#1e8449
style D fill:#fef9e7,stroke:#d4ac0d
style E fill:#fef9e7,stroke:#d4ac0d
style G fill:#f9ebea,stroke:#c0392b
style I fill:#eaecee,stroke:#717d7e
Phase 1: Registration¶
Actors are registered with ActorService.register_actor() at session
startup. Multiple actors can be registered in a single session:
actor_service = ActorService()
actor_service.register_actor(
HumanKeyboardActor(),
display_name="Human",
description="Keyboard-controlled agent",
)
actor_service.register_actor(
CleanRLWorkerActor(),
display_name="CleanRL PPO",
backend_label="cleanrl_worker",
activate=False, # don't immediately switch to this actor
)
The first actor registered automatically becomes the active actor unless
activate=False is passed.
Phase 2: Activation¶
Only one actor is active at a time. The GUI switches the active actor via
set_active_actor(actor_id) in response to user interaction (e.g.
selecting a different backend in the Active Actor widget) or programmatically
when PolicyMappingService routes a specific agent to a different actor:
# Switch to human control
actor_service.set_active_actor("human_keyboard")
# Switch back to autonomous backend
actor_service.set_active_actor("cleanrl_worker")
Phase 3: Seed Broadcast¶
Before every episode, ActorService.seed(n) broadcasts the episode seed to
all registered actors (not just the active one). This ensures every actor
has a deterministic RNG state, even if it is switched in mid-session:
actor_service.seed(42) # all actors receive seed(42)
If an actor does not implement a seed method the call is silently skipped.
Errors during seeding are caught and logged with LOG_SERVICE_ACTOR_SEED_ERROR
rather than crashing the evaluation loop.
Phase 4: Episode Begin¶
Episode setup (env.reset) is handled by SessionController, not by the
actor directly. Actors are stateless with respect to environment reset; they
simply start receiving StepSnapshot objects once the environment is ready.
Phase 5: Step Loop¶
On each step the evaluation loop:
Builds a
StepSnapshotfrom the environment response.Calls
ActorService.select_action(snapshot)→ dispatched to the active actor.Applies the returned action via
env.step(action).Calls
ActorService.notify_step(snapshot)→ dispatched to the active actor.
%%{init: {"sequenceDiagram": {"mirrorActors": false}} }%%
sequenceDiagram
participant SC as SessionController
participant AS as ActorService
participant A as Active Actor
participant Env as Environment
SC->>AS: select_action(StepSnapshot)
AS->>A: actor.select_action(snapshot)
A-->>AS: action (int or None)
AS-->>SC: action
SC->>Env: env.step(action)
Env-->>SC: obs, reward, terminated, truncated, info
SC->>AS: notify_step(StepSnapshot)
AS->>A: actor.on_step(snapshot)
If select_action returns None the evaluation loop treats this as a
no-op (typically NOOP action 0 or environment-specific default).
Phase 6: Episode End¶
When terminated or truncated is True, SessionController builds an
EpisodeSummary and calls ActorService.notify_episode_end(summary),
which dispatches to the active actor’s on_episode_end hook:
summary = EpisodeSummary(
episode_index=3,
total_reward=18.5,
steps=200,
metadata={"truncated": True},
)
actor_service.notify_episode_end(summary)
Phase 7: Session End¶
Actors are not explicitly destroyed at session end. The ActorService
instance is garbage-collected along with the session. Actors that hold
external resources (e.g. a loaded PyTorch checkpoint) should implement
__del__ or a close() method if cleanup is required.
Error Handling¶
Failure |
Behaviour |
|---|---|
|
Exception propagates to |
|
Exception propagates to |
|
Exception is caught; logged with |
|
|