Env Wrappers
SingleAgentEnvAdapter
In the diagram above we have 4 agents/policies. Agent “A” is the selected agent and the
SingleAgentEnvAdapter
will expose an environment as seen from the perspective
of just that agent. The other agents, with pre-defined policies, will have their actions
handled internally by the wrapper.
- class phantom.env_wrappers.SingleAgentEnvAdapter(env_class, agent_id, other_policies, env_config=None)[source]
Wraps a
PhantomEnv
instance or sub-class providing a fully compatiblegym.Env
interface, from the perspective of a single agent.This can be used to test and experiment with Phantom environments using other single-agent only frameworks when only one agent is an active learning agent.
- Parameters:
env_class (
Type
[PhantomEnv
]) – ThePhantomEnv
class or sub-class to wrap (note: must not be an already initialised class instance)agent_id (
Hashable
) – The ID of the agent that the wrapper will explicitly control.other_policies (
Mapping
[Hashable
,Tuple
[Type
[Policy
],Mapping
[str
,Any
]]]) – A mapping of all other agent IDs to their policies and policy configs. The policies must be fixed/pre-trained policies.env_config (
Optional
[Mapping
[str
,Any
]]) – Any config options to pass to the underlying env when initialising.
- property action_space: Space
Return the action space of the selected env agent.
- close()
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections.
- property np_random: Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property observation_space: Space
Return the observation space of the selected env agent.
- render()
Compute the render frames as specified by
render_mode
during the initialization of the environment.The environment’s
metadata
render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype:Union
[TypeVar
(RenderFrame
),List
[TypeVar
(RenderFrame
)],None
]Note
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialised in__init__
.By convention, if the
render_mode
is:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn’t need to be called. ReturnsNone
.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
Note
Make sure that your class’s
metadata
"render_modes"
key includes the list of supported modes.Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- step(action)[source]
Run one timestep of the environment’s dynamics.
When end of episode is reached, you are responsible for calling
reset()
to reset this environment’s state.Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters:
action (
TypeVar
(ActType
)) – an action provided by the agent- Returns:
- this will be an element of the environment’s
observation_space
. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward: The amount of reward returned as a result of taking the action. terminated: Whether the agent reaches the terminal state (as defined under
the MDP of the task) which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barton, Gridworld. If true, the user needs to call reset().
- truncated: Whether the truncation condition outside the scope of the MDP is
satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call reset().
- info: A dictionary that may contain additional information regarding the
reason for a
done
signal. info contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, information that distinguishes truncation and termination or individual reward terms that are combined to produce the total reward
- Return type:
observation
- property unwrapped: Env
Returns the base non-wrapped environment (i.e., removes all wrappers).
- Returns:
The base non-wrapped
gymnasium.Env
instance- Return type:
Env