Rewards
RewardFunctions classes are used to convert observations of an agents state into a numerical value that is used to inform the policy of the effectiveness of its actions so as to allow the policy to learn and improve future policy decisions.
RewardFunction classes are a fully optional feature of Phantom. There is no functional
difference between defining an compute_reward()
method on an Agent and defining
a RewardFunction
(whose reward()
method performs the same actions) and
attaching it to the Agent.