cambrian.envs.reward_fns¶
Reward fns. These can be used to calculate rewards for agents.
Functions¶
|
Calculates the delta position of the agent from a point. |
|
Calculates the quickness of the agent. |
|
Applies the reward function to the agent if it is in the for_agents list. |
|
Returns a constant reward. |
|
Rewards the agent if the episode is done. Termination indicates a successful |
|
Rewards the change in distance over the previous step. |
|
Rewards the change in distance to any enabled agent over the previous step. |
|
This reward function rewards the agent if it has been respawned. |
|
This reward function rewards the agent if it is close to another agent. |
|
Rewards the agent if it has contacts. |
|
Rewards the agent based on the action taken. |
|
Combines multiple reward functions into one. |
Module Contents¶
- calc_delta(agent, info, point=np.array([0, 0]))[source]¶
Calculates the delta position of the agent from a point.
- Returns:
np.ndarray –
- The delta position of the agent from the point
(i.e. current - prev).
- apply_reward_fn(env, agent, *, reward_fn, for_agents=None, scale_by_quickness=False, disable=False, disable_on_max_episode_steps=False)[source]¶
Applies the reward function to the agent if it is in the for_agents list.
- reward_fn_constant(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶
Returns a constant reward.
- reward_fn_done(env, agent, terminated, truncated, info, *, termination_reward=0.0, truncation_reward=0.0, **kwargs)[source]¶
Rewards the agent if the episode is done. Termination indicates a successful episode, while truncation indicates an unsuccessful episode. If the time limit is reached, this is considered a termination. Applying a reward in this case can be disabled with the
disable_on_max_episode_steps
keyword argument.- Keyword Arguments:
termination_reward (float) – The reward to give the agent if the episode is terminated. Defaults to 0.
truncation_reward (float) – The reward to give the agent if the episode is truncated. Defaults to 0.
- reward_fn_euclidean_delta_from_init(env, agent, terminated, truncated, info, *, reward=1.0, **kwargs)[source]¶
Rewards the change in distance over the previous step.
- reward_fn_euclidean_delta_to_agent(env, agent, terminated, truncated, info, *, reward, to_agents=None, **kwargs)[source]¶
Rewards the change in distance to any enabled agent over the previous step. Convention is that a positive reward indicates getting closer to the agent.
- reward_fn_agent_respawned(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶
This reward function rewards the agent if it has been respawned.
- reward_fn_close_to_agent(env, agent, terminated, truncated, info, *, reward, distance_threshold, from_agents=None, to_agents=None, **kwargs)[source]¶
This reward function rewards the agent if it is close to another agent.
- Keyword Arguments:
reward (float) – The reward to give the agent if it is close to another agent. Default is 0.
distance_threshold (float) – The distance threshold to check if the agent is close to another agent.
from_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated from. If None, the reward will be calculated from all agents.
to_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated to. If None, the reward will be calculated to all agents.
- reward_fn_has_contacts(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶
Rewards the agent if it has contacts.
- reward_fn_action(env, agent, terminated, truncated, info, *, reward, index=None, normalize=False, absolute=False, **kwargs)[source]¶
Rewards the agent based on the action taken.
- Keyword Arguments:
reward (float) – The reward to give the agent if the action is taken.
index (Optional[int]) – The index of the action to use for the reward. If None, the sum of the action is used.
normalize (bool) – Whether to normalize the action to be in the range [0, 1).
absolute (bool) – Whether to use the absolute value of the action.
- reward_combined(env, agent, terminated, truncated, info, *, exclusive_fns=[], **reward_fns)[source]¶
Combines multiple reward functions into one.
- Keyword Arguments:
exclusive_fns (Optional[List[str]]) – If provided, only the reward functions with this name will be used if it’s non-zero. As in, in order, the first function to return a non-zero reward will be returned.