Env Wrappers

SingleAgentEnvAdapter

In the diagram above we have 4 agents/policies. Agent “A” is the selected agent and the SingleAgentEnvAdapter will expose an environment as seen from the perspective of just that agent. The other agents, with pre-defined policies, will have their actions handled internally by the wrapper.

class phantom.env_wrappers.SingleAgentEnvAdapter(env_class, agent_id, other_policies, env_config=None)[source]

Wraps a PhantomEnv instance or sub-class providing a fully compatible gym.Env interface, from the perspective of a single agent.

This can be used to test and experiment with Phantom environments using other single-agent only frameworks when only one agent is an active learning agent.

Parameters:

env_class (Type[PhantomEnv]) – The PhantomEnv class or sub-class to wrap (note: must not be an already initialised class instance)
agent_id (Hashable) – The ID of the agent that the wrapper will explicitly control.
other_policies (Mapping[Hashable, Tuple[Type[Policy], Mapping[str, Any]]]) – A mapping of all other agent IDs to their policies and policy configs. The policies must be fixed/pre-trained policies.
env_config (Optional[Mapping[str, Any]]) – Any config options to pass to the underlying env when initialising.

property action_space: Space: Return the action space of the selected env agent.

property agent_ids: List[Hashable]: Return a list of the IDs of the agents in the environment.

property agents: Dict[Hashable, Agent]: Return a mapping of agent IDs to agents in the environment.

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

property current_step: int: Return the current step of the environment.

property n_agents: int: Return the number of agents in the environment.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property observation_space: Space: Return the observation space of the selected env agent.

render()

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype: Union[TypeVar(RenderFrame), List[TypeVar(RenderFrame)], None]

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Return type:

Tuple[TypeVar(ObsType), Dict[str, Any]]

Returns:

The initial observation.
A dictionary with auxillary information, equivalent to the info dictionary
in env.step().

step(action)[source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:

action (TypeVar(ActType)) – an action provided by the agent

Returns:

this will be an element of the environment’s: observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward: The amount of reward returned as a result of taking the action. terminated: Whether the agent reaches the terminal state (as defined under

the MDP of the task) which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barton, Gridworld. If true, the user needs to call reset().

truncated: Whether the truncation condition outside the scope of the MDP is: satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call reset().
info: A dictionary that may contain additional information regarding the: reason for a done signal. info contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, information that distinguishes truncation and termination or individual reward terms that are combined to produce the total reward

Return type:

observation

property unwrapped: Env

Returns the base non-wrapped environment (i.e., removes all wrappers).

Returns:: The base non-wrapped gymnasium.Env instance
Return type:: Env