Connect 4 AI: Dueling Double DQN with Self-Play

A connect 4 autonomous agent. This project features a Dueling Double Deep Q-Network (D3QN) architecture optimized through Self-Play and a Minimax baseline for performance benchmarking.

Key Features

Deep RL Architecture: Implements a Dueling Double DQN with Convolutional layers (CNN) to capture spatial patterns in the 6x7 grid.
Self-Play Training: The agent learns by competing against historical versions of itself, ensuring continuous policy improvement and robustness.
Vectorized Environment: Custom Gymnasium environment utilizing 2D convolutions for ultra-fast win detection and reward shaping.
Symmetry-Based Data Augmentation: Experience Replay buffer automatically mirrors transitions (horizontal flip) to double training data and improve generalization.
Minimax Baseline: Includes a Minimax agent with Alpha-Beta pruning for performance validation and benchmarking.
Interactive Mode: Play against the trained AI directly in the terminal with a colored UI.

Technical Architecture

1. The Neural Network (Dueling DQN)

The agent uses a Dueling DQN structure to decouple state-value estimation from action-advantage. This is particularly effective in Connect 4, where many actions in a given state might not significantly change the outcome.

The Q-value is computed using the following aggregation logic:

$$Q(s, a) = V(s) + \left( A(s, a) - \frac{1}{|\mathcal{A}|} \sum_{a'} A(s, a') \right)$$

Input: 6x7 game board (1x6x7 tensor).
Backbone: 3 Convolutional layers (64 to 128 filters) with Batch Normalization and LeakyReLU activation.
Heads: Two separate fully connected streams for State Value ($V$) and Action Advantage ($A$).

2. Training Strategy: Self-Play

To avoid overfitting against a specific strategy, the training script implements Self-Play:

Opponent Selection: In 20% of episodes, the agent faces a randomly selected "past version" of itself stored in the history directory.
Symmetry Invariance: The Replay Memory stores both the original move and its horizontal mirror, teaching the agent that the game is axially symmetrical.

3. Reward Shaping

To accelerate convergence, the environment provides a composite reward signal:

Terminal States: Win (+1.0), Loss (-1.0), Draw (0.0).
Intermediate Heuristics: Small rewards for controlling the center column and creating "3-in-a-row" threats, with penalties for failing to block opponent threats.

Installation

Install dependencies: bash pip install -r requirements.txt

Usage

1. Training the Agent

Start the training process using a hyperparameter set defined in config/hyperparameters.yml:

python train.py 4MasterDuelingD3QN

2. Monitoring with TensorBoard

Visualize training metrics (win-rate, loss, rewards) in your browser at http://localhost:6006 :

tensorboard --logdir=runs/logs

3. Benchmarking vs Minimax

Evaluate the performance of a trained model against a classical Minimax baseline :

python play_minimax.py runs/4MasterDuelingD3QN.pt -d 2 -n 20

4. AI vs Minimax with Visual Rendering

Watch a few games between the AI and Minimax to analyze the agent's strategy' :

python play_minimax.py runs/4MasterDuelingD3QN.pt -d 2 -n 2 --render

5. Interactive Mode (Human vs AI)

Play against your trained model directly in the terminal (you start first) :

python play_human.py runs/4MasterDuelingD3QN.pt

6. Interactive Mode (AI Starts)

Play against the model where the AI takes the first move :

python play_human.py runs/4MasterDuelingD3QN.pt --first model

7. Duel Between Two Models

Evaluate the performance of a trained model against another model:

python compare_model.py runs/model_v1.pt runs/model_v2.pt -n 100

8. Duel Between Two Models Rendering

Watch a few games between two models :

python compare_model.py runs/model_v1.pt runs/model_v2.pt -n 2 --render

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
config		config
runs		runs
src		src
.gitignore		.gitignore
README.md		README.md
compare_models.py		compare_models.py
play_human.py		play_human.py
play_minimax.py		play_minimax.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Connect 4 AI: Dueling Double DQN with Self-Play

Key Features

Technical Architecture

1. The Neural Network (Dueling DQN)

2. Training Strategy: Self-Play

3. Reward Shaping

Installation

Usage

1. Training the Agent

2. Monitoring with TensorBoard

3. Benchmarking vs Minimax

4. AI vs Minimax with Visual Rendering

5. Interactive Mode (Human vs AI)

6. Interactive Mode (AI Starts)

7. Duel Between Two Models

8. Duel Between Two Models Rendering

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Connect 4 AI: Dueling Double DQN with Self-Play

Key Features

Technical Architecture

1. The Neural Network (Dueling DQN)

2. Training Strategy: Self-Play

3. Reward Shaping

Installation

Usage

1. Training the Agent

2. Monitoring with TensorBoard

3. Benchmarking vs Minimax

4. AI vs Minimax with Visual Rendering

5. Interactive Mode (Human vs AI)

6. Interactive Mode (AI Starts)

7. Duel Between Two Models

8. Duel Between Two Models Rendering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages