PolicyMappingService

The PolicyMappingService is the core abstraction for assigning policies to agents in multi-agent environments.

Overview

Unlike single-agent RL where one policy controls everything, multi-agent environments require mapping each agent to its policy. MOSAIC’s PolicyMappingService handles this with paradigm awareness.

from gym_gui.services import PolicyMappingService

service = PolicyMappingService()

# Configure agents
service.set_agents(["player_0", "player_1"])

# Bind different policies
service.bind_agent_policy("player_0", "human_keyboard")
service.bind_agent_policy("player_1", "cleanrl_ppo", config={"model_path": "..."})

AgentPolicyBinding

Each binding stores the relationship between an agent and its policy:

@dataclass
class AgentPolicyBinding:
    agent_id: str
    policy_id: str
    worker_id: Optional[str] = None
    config: Dict[str, Any] = field(default_factory=dict)

Action Selection

The service supports both sequential and simultaneous modes:

Sequential (AEC)

# Get action for current agent
action = service.select_action(agent_id, snapshot)

Simultaneous (POSG)

# Get actions for all agents at once
actions = service.select_actions(observations, snapshots)

Step Notification

Notify policies of step results for learning:

# Per-agent notification
service.notify_step(agent_id, snapshot)

# Episode end notification
service.notify_episode_end(agent_id, summary)

Integration with SessionController

The SessionController uses PolicyMappingService for the game loop:

def _select_agent_action(self) -> Optional[int]:
    agent_id = self._get_active_agent()

    if self._policy_mapping is not None and agent_id:
        return self._policy_mapping.select_action(agent_id, snapshot)

    # Fallback to legacy ActorService
    return self._actor_service.select_action(snapshot)