Link to video summary of project’s progress and stories of our successes, challenges, and future works can be found here.
We decided to work on the Duckietown project. The goal of our project is to develop an intelligent system for DuckieBot that enables self-navigation within the “Town” using reinforcement learning and machine learning. Our focus is on developing a simulation-based Duckietown environment and deploying a machine-learning-based solution to train the DuckieBot to detect lanes, follow them, recognize signals, stop when necessary, and avoid collisions with other objects such as walls, trees, and buses. Additionally, the system should respond effectively to environmental factors.
To achieve this, we will either purchase or create a DuckieBot equipped with a camera for sensor data collection. The DuckieBot should be capable of driving (accelerating), stopping, and turning. This is an exciting and novel project for us, and we believe it will significantly enhance our technical knowledge and teamwork skills. We are committed to collaborating effectively to ensure the successful completion of this project.
Since we aim to compare the performance of Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), our team is split into two groups: one focusing on PPO and the other on SAC training.
For SAC (not using Stable-Baselines3’s SAC implementation), we use a 5-dimensional state space:
(x, z): 2D position of the DuckieBot.sin(θ), cos(θ): Direction of the DuckieBot.velocity: Speed of the DuckieBot.
The 2-dimensional action space consists of:
The reward structure is designed as follows:
For PPO, we follow a CNN-based policy to process image-based observations. The key hyperparameters include:
3e-410240.990.950.01Training is conducted for 100,000 timesteps, utilizing a vectorized environment setup with make_vec_env and VecTransposeImage to ensure proper input shape.
We assess performance using key metrics:
ep_rew_mean): The average cumulative reward per episode. Currently, SAC’s training results show poor performance, as the DuckieBot frequently veers off-lane and incurs penalties.ep_len_mean): The average number of timesteps per episode. A decreasing trend suggests increased collisions, which shorten episode duration.fps): The FPS rate is <10, which may hinder performance, particularly for CNN-based policies that rely on visual input.


Due to these challenges, we plan to: