Rewards

RewardFunctions classes are used to convert observations of an agents state into a numerical value that is used to inform the policy of the effectiveness of its actions so as to allow the policy to learn and improve future policy decisions.

RewardFunction classes are a fully optional feature of Phantom. There is no functional difference between defining an compute_reward() method on an Agent and defining a RewardFunction (whose reward() method performs the same actions) and attaching it to the Agent.

Base RewardFunction

class phantom.reward_functions.RewardFunction[source]

A trait for types that can compute rewards from a local context.

Note: this trait only support scalar rewards for the time being.

reset()[source]: Resets the reward function.

abstract reward(ctx)[source]

Compute the reward from context.

Parameters:: ctx (Context) – The local network context.
Return type:: float

Provided Implementations

class phantom.reward_functions.Constant(value=0.0)[source]

A reward function that always returns a given constant.

value: The reward to be returned in any state.

reset(): Resets the reward function.

reward(_)[source]

Compute the reward from context.

Parameters:: ctx – The local network context.
Return type:: float