Parking¶

A goal-conditioned continuous control task in which the ego-vehicle must park in a given space with the appropriate heading.

Usage¶

env = gym.make("parking-v0")

Default configuration¶

{
    "observation": {
        "type": "KinematicsGoal",
        "features": ['x', 'y', 'vx', 'vy', 'cos_h', 'sin_h'],
        "scales": [100, 100, 5, 5, 1, 1],
        "normalize": False
    },
    "action": {
        "type": "ContinuousAction"
    },
    "simulation_frequency": 15,
    "policy_frequency": 5,
    "screen_width": 600,
    "screen_height": 300,
    "centering_position": [0.5, 0.5],
    "scaling": 7
    "show_trajectories": False,
    "render_agent": True,
    "offscreen_rendering": False
}

More specifically, it is defined in:

classmethod ParkingEnv.default_config() → dict[source]¶

Default environment configuration.

Can be overloaded in environment implementations, or by calling configure(). :return: a configuration dict

API¶

class highway_env.envs.parking_env.ParkingEnv(config: dict = None)[source]¶

A continuous control environment.

It implements a reach-type task, where the agent observes their position and speed and must control their acceleration and steering so as to reach a given goal.

Credits to Munir Jojo-Verge for the idea and initial implementation.

REWARD_WEIGHTS: numpy.ndarray = array([1. , 0.3 , 0. , 0. , 0.02, 0.02])¶

SUCCESS_GOAL_REWARD: float = 0.12¶

STEERING_RANGE: float = 0.7853981633974483¶

classmethod default_config() → dict[source]

Default environment configuration.

Can be overloaded in environment implementations, or by calling configure(). :return: a configuration dict

step(action: numpy.ndarray) → Tuple[numpy.ndarray, float, bool, dict][source]¶

Perform an action and step the environment dynamics.

The action is executed by the ego-vehicle, and all other vehicles on the road performs their default behaviour for several simulation timesteps until the next decision making step.

Parameters: action – the action performed by the ego-vehicle
Returns: a tuple (observation, reward, terminal, info)

reset() → numpy.ndarray[source]¶

Reset the environment to it’s initial configuration

Returns: the observation of the reset state

_create_road(spots: int = 15) → None[source]¶

Create a road composed of straight adjacent lanes.

Parameters: spots – number of spots in the parking

_create_vehicles() → None[source]¶: Create some new random vehicles of a given type, and add them on the road.

compute_reward(achieved_goal: numpy.ndarray, desired_goal: numpy.ndarray, info: dict, p: float = 0.5) → float[source]¶

Proximity to the goal is rewarded

We use a weighted p-norm

Parameters

achieved_goal – the goal that was achieved
desired_goal – the goal that was desired
info (dict) – any supplementary information
p – the Lp^p norm used in the reward. Use p<1 to have high kurtosis for rewards in [0, 1]

Returns

the corresponding reward

_reward(action: numpy.ndarray) → float[source]¶

Return the reward associated with performing a given action and ending up in the current state.

Parameters: action – the last action performed
Returns: the reward

_is_success(achieved_goal: numpy.ndarray, desired_goal: numpy.ndarray) → bool[source]¶

_is_terminal() → bool[source]¶: The episode is over if the ego vehicle crashed or the goal is reached.

__annotations__ = {'REWARD_WEIGHTS': <class 'numpy.ndarray'>, 'STEERING_RANGE': <class 'float'>, 'SUCCESS_GOAL_REWARD': <class 'float'>}¶

__module__ = 'highway_env.envs.parking_env'¶