Rewards

RewardFunctions classes are used to convert observations of an agents state into a numerical value that is used to inform the policy of the effectiveness of its actions so as to allow the policy to learn and improve future policy decisions.

../_images/reward-function.svg

RewardFunction classes are a fully optional feature of Phantom. There is no functional difference between defining an compute_reward() method on an Agent and defining a RewardFunction (whose reward() method performs the same actions) and attaching it to the Agent.

Base RewardFunction

class phantom.reward_functions.RewardFunction[source]

A trait for types that can compute rewards from a local context.

Note: this trait only support scalar rewards for the time being.

reset()[source]

Resets the reward function.

abstract reward(ctx)[source]

Compute the reward from context.

Parameters:

ctx (Context) – The local network context.

Return type:

float

Provided Implementations

class phantom.reward_functions.Constant(value=0.0)[source]

A reward function that always returns a given constant.

value

The reward to be returned in any state.

reset()

Resets the reward function.

reward(_)[source]

Compute the reward from context.

Parameters:

ctx – The local network context.

Return type:

float