Finite State Machine Environment

The FiniteStateMachineEnv class maps states in a finite state machine to functions that handle the logic of the state. At the end of each state agents take observations and at the start of the next step the agents provide actions based on the observations and their respective policies.

It is possible to restrict which agents take actions and compute rewards for each state with the acting_agents and rewarded_agents properties of the FSMStage class.

In each handler method the user must take care to call This is left to the user as to allow full flexibility on both when the messages on the network are resolved and also, in advanced cases, which resolve method is called.

There are two methods to define the finite state machine structure. It is possible to use a mix of both methods. The following two examples are equivalent.

The first uses the FSMStage as a decorator directly on the state handler method:

class CustomEnv(ph.FiniteStateMachineEnv):
   def __init__(self):
      agents = [MinimalAgent("agent")]

      network = ph.Network(agents)

      super().__init__(num_steps=10, network=network, initial_stage="A")

   @ph.FSMStage(stage_id="A", next_stages=["A"])
   def handle(self):
      # Perform any pre-resolve tasks
      # Perform any post-resolve tasks

The second defines the states via a list of FSMStage instances passed to the FiniteStateMachineEnv init method. This method is needed when values of parameters passed to the FSMStage initialisers are only known when the environment class is initialised (eg. lists of agent IDs).

class CustomEnv(ph.FiniteStateMachineEnv):
   def __init__(self):
      agents = [MinimalAgent("agent")]

      network = ph.Network(agents)


   def handle(self):
      # Perform any pre-resolve tasks
      # Perform any post-resolve tasks


class phantom.fsm.FiniteStateMachineEnv(num_steps, network, initial_stage, env_supertype=None, agent_supertypes=None, stages=None)[source]

Base environment class that allows implementation of a finite state machine to handle complex environment multi-step setups. This class should not be used directly and instead should be subclassed. Use the FSMStage decorator on handler methods within subclasses of this class to register stages to the FSM.

A ‘stage’ corresponds to a state in the finite state machine, however to avoid any confusion with Environment states we refer to them as stages. Stage IDs can be anything type that is hashable, eg. strings, ints, enums.

  • num_steps (int) – The maximum number of steps the environment allows per episode.

  • network (Network) – A Network class or derived class describing the connections between agents and agents in the environment.

  • initial_stage (Hashable) – The initial starting stage of the FSM. When the reset() method is called the environment is initialised into this stage.

  • env_supertype (Optional[Supertype]) – Optional Supertype class instance for the environment. If this is set, it will be sampled from and the env_type property set on the class with every call to reset().

  • agent_supertypes (Optional[Mapping[Hashable, Supertype]]) – Optional mapping of agent IDs to Supertype class instances. If these are set, each supertype will be sampled from and the type property set on the related agent with every call to reset().

  • stages (Optional[Sequence[FSMStage]]) – List of FSM stages. FSM stages can be defined via this list or alternatively via the FSMStage decorator.

class Step(observations, rewards, terminations, truncations, infos)
count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

infos: Dict[Hashable, Any]

Alias for field number 4

observations: Dict[Hashable, Any]

Alias for field number 0

rewards: Dict[Hashable, float]

Alias for field number 1

terminations: Dict[Hashable, bool]

Alias for field number 2

truncations: Dict[Hashable, bool]

Alias for field number 3

property agent_ids: List[Hashable]

Return a list of the IDs of the agents in the environment.

property agents: Dict[Hashable, Agent]

Return a mapping of agent IDs to agents in the environment.


After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

property current_stage: Hashable

Returns the current stage of the FSM Env.

property current_step: int

Return the current step of the environment.

property initial_stage: Hashable

Returns the initial stage of the FSM Env.


Implements the logic to decide when the episode is terminated.

Return type:



Implements the logic to decide when the episode is truncated.

Return type:


property n_agents: int

Return the number of agents in the environment.

property non_strategic_agent_ids: List[Hashable]

Return a list of the IDs of the agents that do not take actions.

property non_strategic_agents: List[Agent]

Return a list of agents that do not take actions.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.


Instances of np.random.Generator


Perform internal, post-message resolution updates to the environment.

Return type:



Perform internal, pre-message resolution updates to the environment.

Return type:



Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype: None


As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().


Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(seed=None, options=None)[source]

Reset the environment and return an initial observation.

This method resets the step count and the network. This includes all the agents in the network.

  • seed (Optional[int]) – An optional seed to use for the new episode.

  • options (Optional[Dict[str, Any]]) – Additional information to specify how the environment is reset.

Return type:

Tuple[Dict[Hashable, Any], Dict[str, Any]]


  • A dictionary mapping Agent IDs to observations made by the respective

agents. It is not required for all agents to make an initial observation. - An optional dictionary with auxillary information, equivalent to the info dictionary in env.step().


Step the simulation forward one step given some set of agent actions.


actions (Mapping[Hashable, Any]) – Actions output by the agent policies to be translated into messages and passed throughout the network.

Return type:



A PhantomEnv.Step object containing observations, rewards, terminations, truncations and infos.

property strategic_agent_ids: List[Hashable]

Return a list of the IDs of the agents that take actions.

property strategic_agents: List[StrategicAgent]

Return a list of agents that take actions.

property unwrapped: Env

Returns the base non-wrapped environment (i.e., removes all wrappers).


The base non-wrapped gymnasium.Env instance

Return type:



Return an immutable view to the FSM environment’s public state.

Return type:


class phantom.fsm.FSMEnvView(current_step, proportion_time_elapsed, stage)[source]

Extension of the EnvView class that records the current stage that the environment is in.


class phantom.fsm.FSMStage(stage_id, acting_agents, rewarded_agents=None, next_stages=None, handler=None)[source]

Decorator used in the FiniteStateMachineEnv to declare the finite state machine structure and assign handler functions to stages.

A ‘stage’ corresponds to a state in the finite state machine, however to avoid any confusion with Environment states we refer to them as stages.


The name of this stage.


The agents that will take an action at the end of the steps that belong to this stage..


If provided, only the given agents will calculate and return a reward at the end of the step for this stage. If not provided, a reward will be computed for all acting agents for the current stage.


The stages that this stage can transition to.


Environment class method to be called when the FSM enters this stage.


class phantom.fsm.FSMValidationError[source]

Error raised when validating the FSM when initialising the FiniteStateMachineEnv.


Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class phantom.fsm.FSMRuntimeError[source]

Error raised when validating FSM stage changes when running an episode using the FiniteStateMachineEnv.


Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.