cambrian.envs.reward_fns

Reward fns. These can be used to calculate rewards for agents.

Functions

calc_delta(agent, info[, point])

Calculates the delta position of the agent from a point.

calc_quickness(env)

Calculates the quickness of the agent.

apply_reward_fn(env, agent, *, reward_fn[, ...])

Applies the reward function to the agent if it is in the for_agents list.

reward_fn_constant(env, agent, terminated, truncated, ...)

Returns a constant reward.

reward_fn_done(env, agent, terminated, truncated, info, *)

Rewards the agent if the episode is done. Termination indicates a successful

reward_fn_euclidean_delta_from_init(env, agent, ...[, ...])

Rewards the change in distance over the previous step.

reward_fn_euclidean_delta_to_agent(env, agent, ...[, ...])

Rewards the change in distance to any enabled agent over the previous step.

reward_fn_agent_respawned(env, agent, terminated, ...)

This reward function rewards the agent if it has been respawned.

reward_fn_close_to_agent(env, agent, terminated, ...)

This reward function rewards the agent if it is close to another agent.

reward_fn_has_contacts(env, agent, terminated, ...)

Rewards the agent if it has contacts.

reward_fn_action(env, agent, terminated, truncated, ...)

Rewards the agent based on the action taken.

reward_combined(env, agent, terminated, truncated, info, *)

Combines multiple reward functions into one.

Module Contents

calc_delta(agent, info, point=np.array([0, 0]))[source]

Calculates the delta position of the agent from a point.

Returns:

np.ndarray

The delta position of the agent from the point

(i.e. current - prev).

calc_quickness(env)[source]

Calculates the quickness of the agent.

apply_reward_fn(env, agent, *, reward_fn, for_agents=None, scale_by_quickness=False, disable=False, disable_on_max_episode_steps=False)[source]

Applies the reward function to the agent if it is in the for_agents list.

reward_fn_constant(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]

Returns a constant reward.

reward_fn_done(env, agent, terminated, truncated, info, *, termination_reward=0.0, truncation_reward=0.0, **kwargs)[source]

Rewards the agent if the episode is done. Termination indicates a successful episode, while truncation indicates an unsuccessful episode. If the time limit is reached, this is considered a termination. Applying a reward in this case can be disabled with the disable_on_max_episode_steps keyword argument.

Keyword Arguments:
  • termination_reward (float) – The reward to give the agent if the episode is terminated. Defaults to 0.

  • truncation_reward (float) – The reward to give the agent if the episode is truncated. Defaults to 0.

reward_fn_euclidean_delta_from_init(env, agent, terminated, truncated, info, *, reward=1.0, **kwargs)[source]

Rewards the change in distance over the previous step.

reward_fn_euclidean_delta_to_agent(env, agent, terminated, truncated, info, *, reward, to_agents=None, **kwargs)[source]

Rewards the change in distance to any enabled agent over the previous step. Convention is that a positive reward indicates getting closer to the agent.

reward_fn_agent_respawned(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]

This reward function rewards the agent if it has been respawned.

reward_fn_close_to_agent(env, agent, terminated, truncated, info, *, reward, distance_threshold, from_agents=None, to_agents=None, **kwargs)[source]

This reward function rewards the agent if it is close to another agent.

Keyword Arguments:
  • reward (float) – The reward to give the agent if it is close to another agent. Default is 0.

  • distance_threshold (float) – The distance threshold to check if the agent is close to another agent.

  • from_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated from. If None, the reward will be calculated from all agents.

  • to_agents (Optional[List[str]]) – The names of the agents that the reward should be calculated to. If None, the reward will be calculated to all agents.

reward_fn_has_contacts(env, agent, terminated, truncated, info, *, reward, **kwargs)[source]

Rewards the agent if it has contacts.

reward_fn_action(env, agent, terminated, truncated, info, *, reward, index=None, normalize=False, absolute=False, **kwargs)[source]

Rewards the agent based on the action taken.

Keyword Arguments:
  • reward (float) – The reward to give the agent if the action is taken.

  • index (Optional[int]) – The index of the action to use for the reward. If None, the sum of the action is used.

  • normalize (bool) – Whether to normalize the action to be in the range [0, 1).

  • absolute (bool) – Whether to use the absolute value of the action.

reward_combined(env, agent, terminated, truncated, info, *, exclusive_fns=[], **reward_fns)[source]

Combines multiple reward functions into one.

Keyword Arguments:

exclusive_fns (Optional[List[str]]) – If provided, only the reward functions with this name will be used if it’s non-zero. As in, in order, the first function to return a non-zero reward will be returned.