Environment

PhantomEnv

This is the Phantom environment class that should be subclassed from when defining new environments.

This class generally follows the RLlib MultiAgentEnv class interface (However not exactly. When using RLlib for training, a wrapper env will be used to provide full compatibility).

class phantom.PhantomEnv(num_steps, network=None, env_supertype=None, agent_supertypes=None)[source]

Base Phantom environment.

Usage:

>>> env = PhantomEnv({ ... })
>>> env.reset()
<Observation: dict>
>>> env.step({ ... })
<Step: 4-tuple>

num_steps: The maximum number of steps the environment allows per episode.

network: A Network class or derived class describing the connections between agents and agents in the environment.

env_supertype: Optional Supertype class instance for the environment. If this is set, it will be sampled from and the env_type property set on the class with every call to reset().

agent_supertypes: Optional mapping of agent IDs to Supertype class instances. If these are set, each supertype will be sampled from and the type property set on the related agent with every call to reset().

class Step(observations, rewards, terminations, truncations, infos)[source]

count(value, /): Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

infos: Dict[Hashable, Any]: Alias for field number 4

observations: Dict[Hashable, Any]: Alias for field number 0

rewards: Dict[Hashable, float]: Alias for field number 1

terminations: Dict[Hashable, bool]: Alias for field number 2

truncations: Dict[Hashable, bool]: Alias for field number 3

property agent_ids: List[Hashable]: Return a list of the IDs of the agents in the environment.

property agents: Dict[Hashable, Agent]: Return a mapping of agent IDs to agents in the environment.

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

property current_step: int: Return the current step of the environment.

is_terminated()[source]

Implements the logic to decide when the episode is terminated.

Return type:: bool

is_truncated()[source]

Implements the logic to decide when the episode is truncated.

Return type:: bool

property n_agents: int: Return the number of agents in the environment.

property non_strategic_agent_ids: List[Hashable]: Return a list of the IDs of the agents that do not take actions.

property non_strategic_agents: List[Agent]: Return a list of agents that do not take actions.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

post_message_resolution()[source]

Perform internal, post-message resolution updates to the environment.

Return type:: None

pre_message_resolution()[source]

Perform internal, pre-message resolution updates to the environment.

Return type:: None

render()[source]

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype: None

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(seed=None, options=None)[source]

Reset the environment and return an initial observation.

This method resets the step count and the network. This includes all the agents in the network.

Parameters:

seed (Optional[int]) – An optional seed to use for the new episode.
options (Optional[Dict[str, Any]]) – Additional information to specify how the environment is reset.

Return type:

Tuple[Dict[Hashable, Any], Dict[str, Any]]

Returns:

A dictionary mapping Agent IDs to observations made by the respective

agents. It is not required for all agents to make an initial observation. - A dictionary with auxillary information, equivalent to the info dictionary

in env.step().

step(actions)[source]

Step the simulation forward one step given some set of agent actions.

Parameters:: actions (Mapping[Hashable, Any]) – Actions output by the agent policies to be translated into messages and passed throughout the network.
Return type:: Step
Returns:: A PhantomEnv.Step object containing observations, rewards, terminations, truncations and infos.

property strategic_agent_ids: List[Hashable]: Return a list of the IDs of the agents that take actions.

property strategic_agents: List[StrategicAgent]: Return a list of agents that take actions.

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: The base non-wrapped gymnasium.Env instance
Return type:: Env

view(agent_views)[source]

Return an immutable view to the environment’s public state.

Return type:: EnvView

Step

class phantom.PhantomEnv.Step(observations, rewards, terminations, truncations, infos)

infos: Dict[Hashable, Any]: Alias for field number 4

observations: Dict[Hashable, Any]: Alias for field number 0

rewards: Dict[Hashable, float]: Alias for field number 1

terminations: Dict[Hashable, bool]: Alias for field number 2

truncations: Dict[Hashable, bool]: Alias for field number 3