Finite State Machine Environment Design Patterns
This page describes the various ways shared policies can be implemented in complex Finite State Machine (FSM) based environments with policies shared both across agents and across stages. Included are simple code examples showing how some basic different combinations should be implemented.
Standard Environment - No FSM
In both FSM and non-FSM Phantom environments policies are mapped to agents using the
policies
parameter in Trainer.train()
functions or the
utils.rllib.train/rollout()
functions. A single policy can be used by multiple
agents.
Implementing the environment and agents:
class ExampleAgent(ph.Agent):
...
class ExampleEnv(ph.PhantomEnv):
def __init__(self):
agents = [
ExampleAgent("Agent 1"),
ExampleAgent("Agent 2"),
ExampleAgent("Agent 3"),
]
network = ph.Network(agents)
super().__init__(num_steps=100, network=network)
Defining the left side example (no shared policies):
trainer.train(
...
policies={
"policy_a": ["Agent 1"],
"policy_b": ["Agent 2"],
"policy_c": ["Agent 3"],
},
...
)
Defining the right side example (shared policies):
trainer.train(
...
policies={
"shared_policy": ["Agent 1", "Agent 2"],
"other_policy": ["Agent 3"],
},
...
)