Environment
PhantomEnv
This is the Phantom environment class that should be subclassed from when defining new environments.
This class generally follows the RLlib MultiAgentEnv class interface (However
not exactly. When using RLlib for training, a wrapper env will be used to provide full
compatibility).
- class phantom.PhantomEnv(num_steps, network=None, env_supertype=None, agent_supertypes=None)[source]
Base Phantom environment.
- Usage:
>>> env = PhantomEnv({ ... }) >>> env.reset() <Observation: dict> >>> env.step({ ... }) <Step: 4-tuple>
- num_steps
The maximum number of steps the environment allows per episode.
- network
A Network class or derived class describing the connections between agents and agents in the environment.
- env_supertype
Optional Supertype class instance for the environment. If this is set, it will be sampled from and the
env_typeproperty set on the class with every call toreset().
- agent_supertypes
Optional mapping of agent IDs to Supertype class instances. If these are set, each supertype will be sampled from and the
typeproperty set on the related agent with every call toreset().
- class Step(observations, rewards, terminations, truncations, infos)[source]
- count(value, /)
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)
Return first index of value.
Raises ValueError if the value is not present.
- close()
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections.
- is_terminated()[source]
Implements the logic to decide when the episode is terminated.
- Return type:
- property non_strategic_agent_ids: List[Hashable]
Return a list of the IDs of the agents that do not take actions.
- property np_random: Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- post_message_resolution()[source]
Perform internal, post-message resolution updates to the environment.
- Return type:
- pre_message_resolution()[source]
Perform internal, pre-message resolution updates to the environment.
- Return type:
- render()[source]
Compute the render frames as specified by
render_modeduring the initialization of the environment.The environment’s
metadatarender modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype:NoneNote
As the
render_modeis known during__init__, the objects used to render the environment state should be initialised in__init__.By convention, if the
render_modeis:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()andrender()doesn’t need to be called. ReturnsNone.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarraywith shape(x, y, 3)representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str) orStringIO.StringIOcontaining a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollectionthat is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped afterrender()is called orreset().
Note
Make sure that your class’s
metadata"render_modes"key includes the list of supported modes.Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset(seed=None, options=None)[source]
Reset the environment and return an initial observation.
This method resets the step count and the
network. This includes all the agents in the network.- Parameters:
- Return type:
- Returns:
A dictionary mapping Agent IDs to observations made by the respective
agents. It is not required for all agents to make an initial observation. - A dictionary with auxillary information, equivalent to the info dictionary
in env.step().
- step(actions)[source]
Step the simulation forward one step given some set of agent actions.
- Parameters:
actions (
Mapping[Hashable,Any]) – Actions output by the agent policies to be translated into messages and passed throughout the network.- Return type:
- Returns:
A
PhantomEnv.Stepobject containing observations, rewards, terminations, truncations and infos.
- property strategic_agent_ids: List[Hashable]
Return a list of the IDs of the agents that take actions.
- property strategic_agents: List[StrategicAgent]
Return a list of agents that take actions.
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Envinstance- Return type:
Env
Step
- class phantom.PhantomEnv.Step(observations, rewards, terminations, truncations, infos)