Skip to content

Tino-Rg/4Master

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Connect 4 AI: Dueling Double DQN with Self-Play

A connect 4 autonomous agent. This project features a Dueling Double Deep Q-Network (D3QN) architecture optimized through Self-Play and a Minimax baseline for performance benchmarking.


Key Features

  • Deep RL Architecture: Implements a Dueling Double DQN with Convolutional layers (CNN) to capture spatial patterns in the 6x7 grid.
  • Self-Play Training: The agent learns by competing against historical versions of itself, ensuring continuous policy improvement and robustness.
  • Vectorized Environment: Custom Gymnasium environment utilizing 2D convolutions for ultra-fast win detection and reward shaping.
  • Symmetry-Based Data Augmentation: Experience Replay buffer automatically mirrors transitions (horizontal flip) to double training data and improve generalization.
  • Minimax Baseline: Includes a Minimax agent with Alpha-Beta pruning for performance validation and benchmarking.
  • Interactive Mode: Play against the trained AI directly in the terminal with a colored UI.

Technical Architecture

1. The Neural Network (Dueling DQN)

The agent uses a Dueling DQN structure to decouple state-value estimation from action-advantage. This is particularly effective in Connect 4, where many actions in a given state might not significantly change the outcome.

The Q-value is computed using the following aggregation logic:

$$Q(s, a) = V(s) + \left( A(s, a) - \frac{1}{|\mathcal{A}|} \sum_{a'} A(s, a') \right)$$

  • Input: 6x7 game board (1x6x7 tensor).
  • Backbone: 3 Convolutional layers (64 to 128 filters) with Batch Normalization and LeakyReLU activation.
  • Heads: Two separate fully connected streams for State Value ($V$) and Action Advantage ($A$).

2. Training Strategy: Self-Play

To avoid overfitting against a specific strategy, the training script implements Self-Play:

  • Opponent Selection: In 20% of episodes, the agent faces a randomly selected "past version" of itself stored in the history directory.
  • Symmetry Invariance: The Replay Memory stores both the original move and its horizontal mirror, teaching the agent that the game is axially symmetrical.

3. Reward Shaping

To accelerate convergence, the environment provides a composite reward signal:

  • Terminal States: Win (+1.0), Loss (-1.0), Draw (0.0).
  • Intermediate Heuristics: Small rewards for controlling the center column and creating "3-in-a-row" threats, with penalties for failing to block opponent threats.

Installation

Install dependencies: bash pip install -r requirements.txt

Usage

1. Training the Agent

Start the training process using a hyperparameter set defined in config/hyperparameters.yml:

python train.py 4MasterDuelingD3QN

2. Monitoring with TensorBoard

Visualize training metrics (win-rate, loss, rewards) in your browser at http://localhost:6006 :

tensorboard --logdir=runs/logs

3. Benchmarking vs Minimax

Evaluate the performance of a trained model against a classical Minimax baseline :

python play_minimax.py runs/4MasterDuelingD3QN.pt -d 2 -n 20 

4. AI vs Minimax with Visual Rendering

Watch a few games between the AI and Minimax to analyze the agent's strategy' :

python play_minimax.py runs/4MasterDuelingD3QN.pt -d 2 -n 2 --render

5. Interactive Mode (Human vs AI)

Play against your trained model directly in the terminal (you start first) :

python play_human.py runs/4MasterDuelingD3QN.pt

6. Interactive Mode (AI Starts)

Play against the model where the AI takes the first move :

python play_human.py runs/4MasterDuelingD3QN.pt --first model

7. Duel Between Two Models

Evaluate the performance of a trained model against another model:

python compare_model.py runs/model_v1.pt runs/model_v2.pt -n 100

8. Duel Between Two Models Rendering

Watch a few games between two models :

python compare_model.py runs/model_v1.pt runs/model_v2.pt -n 2 --render

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages