cambrian.envs.reward_fns¶

Reward fns. These can be used to calculate rewards for agents.

Functions¶

`calc_delta`(agent, info[, point])	Calculates the delta position of the agent from a point.
`calc_quickness`(env)	Calculates the quickness of the agent.
`apply_reward_fn`(env, agent, *, reward_fn[, ...])	Applies the reward function to the agent if it is in the for_agents list.
`reward_fn_constant`(env, agent, terminated, truncated, ...)	Returns a constant reward.
`reward_fn_done`(env, agent, terminated, truncated, info, *)	Rewards the agent if the episode is done. Termination indicates a successful
`reward_fn_euclidean_delta_from_init`(env, agent, ...[, ...])	Rewards the change in distance over the previous step.
`reward_fn_euclidean_delta_to_agent`(env, agent, ...[, ...])	Rewards the change in distance to any enabled agent over the previous step.
`reward_fn_agent_respawned`(env, agent, terminated, ...)	This reward function rewards the agent if it has been respawned.
`reward_fn_close_to_agent`(env, agent, terminated, ...)	This reward function rewards the agent if it is close to another agent.
`reward_fn_has_contacts`(env, agent, terminated, ...)	Rewards the agent if it has contacts.
`reward_fn_action`(env, agent, terminated, truncated, ...)	Rewards the agent based on the action taken.
`reward_combined`(env, agent, terminated, truncated, info, *)	Combines multiple reward functions into one.

Module Contents¶

calc_delta(agent, info, point=np.array([0, 0]))[source]¶

Calculates the delta position of the agent from a point.

Returns:

np.ndarray –

The delta position of the agent from the point: (i.e. current - prev).

calc_quickness(env)[source]¶: Calculates the quickness of the agent.

apply_reward_fn(env, agent, *, reward_fn, for_agents=None, scale_by_quickness=False, disable=False, disable_on_max_episode_steps=False)[source]¶: Applies the reward function to the agent if it is in the for_agents list.

reward_fn_constant(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶: Returns a constant reward.

reward_fn_done(env, agent, terminated, truncated, info, *, termination_reward=0.0, truncation_reward=0.0, **kwargs)[source]¶

Rewards the agent if the episode is done. Termination indicates a successful episode, while truncation indicates an unsuccessful episode. If the time limit is reached, this is considered a termination. Applying a reward in this case can be disabled with the disable_on_max_episode_steps keyword argument.

Keyword Arguments:

termination_reward (float) – The reward to give the agent if the episode is terminated. Defaults to 0.
truncation_reward (float) – The reward to give the agent if the episode is truncated. Defaults to 0.

reward_fn_euclidean_delta_from_init(env, agent, terminated, truncated, info, *, reward=1.0, **kwargs)[source]¶: Rewards the change in distance over the previous step.

reward_fn_euclidean_delta_to_agent(env, agent, terminated, truncated, info, *, reward, to_agents=None, **kwargs)[source]¶: Rewards the change in distance to any enabled agent over the previous step. Convention is that a positive reward indicates getting closer to the agent.

reward_fn_agent_respawned(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶: This reward function rewards the agent if it has been respawned.

reward_fn_close_to_agent(env, agent, terminated, truncated, info, *, reward, distance_threshold, from_agents=None, to_agents=None, **kwargs)[source]¶

This reward function rewards the agent if it is close to another agent.

Keyword Arguments:

reward (float) – The reward to give the agent if it is close to another agent. Default is 0.
distance_threshold (float) – The distance threshold to check if the agent is close to another agent.
from_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated from. If None, the reward will be calculated from all agents.
to_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated to. If None, the reward will be calculated to all agents.

reward_fn_has_contacts(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]¶: Rewards the agent if it has contacts.

reward_fn_action(env, agent, terminated, truncated, info, *, reward, index=None, normalize=False, absolute=False, **kwargs)[source]¶

Rewards the agent based on the action taken.

Keyword Arguments:

reward (float) – The reward to give the agent if the action is taken.
index (Optional[int]) – The index of the action to use for the reward. If None, the sum of the action is used.
normalize (bool) – Whether to normalize the action to be in the range [0, 1).
absolute (bool) – Whether to use the absolute value of the action.

reward_combined(env, agent, terminated, truncated, info, *, exclusive_fns=[], **reward_fns)[source]¶

Combines multiple reward functions into one.

Keyword Arguments:: exclusive_fns (Optional[List[str]]) – If provided, only the reward functions with this name will be used if it’s non-zero. As in, in order, the first function to return a non-zero reward will be returned.