Phantom Design

This page outlines the key design concepts of Phantom. For a full reference please see the API pages.

A phantom experiment consists of many independent episodes. Depending on the learning algorithm, it may be possible to perform many episodes in parallel. At the end of each episode or group of episodes, the learning policies are updated. Within each episode, multiple steps are performed. Episodes can consist of a fixed or variable number of steps.

Environment

The Environment is the main element of a Phantom experiment. In the Phantom framework the environment describes all the agents and actors that are part of the environment, how these interact and how in each step these agents and actors progress through the episode.

Phantom provides a PhantomEnv class that provides sensible defaults for controlling each step and the progression of the actors and agents through the episode (In advanced use-cases it is possible to override this). It is up to the user to define the actors and agents and define how they are connected and how they interact.

Episode Cycle

The following diagram details the basic flow of an episode. First the entire environment is reset - this includes all actors, agents and supertypes. This reset provides default observations from the agents.

The episode then enters a loop of producing actions from these observations using the policy, acting on the actions with the step() function, producing more observations and so on.

This continues until the end of the episode which is either a fixed number of steps or at a point when all agents have finished.

../_images/episode-flow.svg

Step Cycle

../_images/step-flow.svg