The objective of this project is to develop a computer vision solution for classifying and segmenting face masks in images. The approach involves using both traditional machine learning techniques with handcrafted features and deep learning models to achieve accurate classification and segmentation.
- Extract handcrafted features from the dataset to represent image characteristics.
- Train and evaluate at least two machine learning classifiers, such as Support Vector Machines (SVM) and Neural Networks, to distinguish between faces "with mask" and "without mask."
- Compare the accuracy and performance of these classifiers.
- Design and train a Convolutional Neural Network (CNN) for binary classification.
- Experiment with hyper-parameter variations, such as learning rate, batch size, optimizer, and activation functions in the classification layer.
- Compare CNN performance with the machine learning classifiers to determine effectiveness.
- Implement a region-based segmentation approach, such as thresholding or edge detection, to segment mask regions for faces identified as "with mask."
- Visualize and evaluate the segmentation results.
- Train a U-Net model for precise segmentation of mask regions in the images.
- Compare the performance of U-Net with traditional segmentation techniques using metrics like Intersection over Union (IoU) or Dice score.
- Source:
- Face Mask Detection Dataset: GitHub Repository
- Masked Face Segmentation Dataset: GitHub Repository
The dataset consists of images categorized into two classes: individuals wearing masks and individuals without masks.
.
├── dataset
│ ├── with_mask
│ └── without_mask
-
With mask contains individuals wearing mask images
-
Without mask has individuals without mask images
-
MFSD (Masked Face Segmentation Dataset)
-
Source: Downloaded from Google Drive using the script in dataset/data_download.py
-
Used for: Tasks 3 and 4 (Traditional segmentation and U-Net segmentation)
-
Structure:
- Contains original images of people wearing masks in MFSD_dataset/MSFD/1/face_crop
- Includes pixel-level ground truth segmentation masks in MFSD_dataset/MSFD/1/face_crop_segmentation
- Dataset information stored in MFSD_dataset/MSFD/1/dataset.csv
-
Dataset Preparation:
- Created a custom ImageDataset class to load images from the local directory.
- Analyzed images and found a mix of grayscale and 4-channel images.
- Converted all images to RGB and resized them to the mean size of 285x285.
- Transformed dataset for consistency.
-
MLP on Raw Images:
- Trained a Multi-Layer Perceptron (MLP) using raw images as input (flattened vectors).
- Achieved 87% accuracy.
-
Handcrafted Feature Extraction:
- Extracted Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT) features.
- HOG: Captures the distribution of gradient orientations to encode shape and texture.
- SIFT: Detects key points and descriptors that are invariant to scale and rotation.
- Used these features to train two classifiers: SVM (SVC) and MLP.
- Extracted Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT) features.
| Classifier | Accuracy |
|---|---|
| SVM (SVC) | 88.52% |
| MLP (Handcrafted Features) | 92.9% |
- MLP: Optimizer - Adam, Learning Rate - 0.001, Batch Size - 200.
- SVM (SVC): Default settings.
- MLP performed better than SVC as it is a better function approximator.
- Handcrafted features improved performance compared to raw image input.
-
CNN Architecture:
- Designed a CNN model for classification.
- Initially used average pooling for flattening.
- Found information loss due to pooling, leading to 78% accuracy.
-
Improved CNN:
- Replaced average pooling with flattening after max pooling.
- Fully connected layers with input of size 24 × 35 × 35.
| Model | Accuracy |
|---|---|
| CNN (Avg Pooling) | 78% |
| CNN (Max Pooling + Flattening) | 95.38% |
- Optimizer: Adam
- Learning Rate: 0.001
- Epochs: 13
- Tried Optimizers: Adam, SGD (Adam converged faster)
- Adam optimizer led to faster convergence.
- Max pooling with flattening preserved information better than average pooling.
- CNN outperformed SVM (88.52%) and MLP (92.9%) in classification with the best accuracy of 95.38%.
File: combine_hsv_lab.py
We explored a more complex color-based segmentation approach that combined HSV and LAB color spaces:
- Convert image to HSV and LAB color spaces
- Create masks using:
- Dark regions in HSV (for black mask areas)
- Yellow regions in HSV (for logos)
- L channel thresholding
- B channel thresholding
- Combine multiple color space masks
- Apply morphological operations (closing, opening, dilation)
- Multiple color space masks led to complex segmentation
- Increased computational complexity
- Less consistent results compared to single HSV approach
File: haar_cascade_and_kmeans.py
An alternative approach combining face detection and image segmentation:
- Use Haar cascade classifier for face detection
- Apply K-Means clustering for image segmentation
- Extract lower face regions
- Inconsistent face detection
- K-Means clustering not specifically tailored to mask segmentation
- High computational overhead
File: resize_and_contrast_equalze.py
Explored image preprocessing techniques to improve segmentation:
- Resize images with padding
- Apply Contrast Limited Adaptive Histogram Equalization (CLAHE)
- Maintain aspect ratio during resizing
- Standardized image size
- Enhanced local image contrast
- Improved feature visibility
We developed a mask segmentation method using color-based thresholding and morphological operations:
-
Color Space Conversion:
- Converted images to HSV color space for better color-based segmentation
- Used interactive HSV tuner to find optimal color range for mask detection
-
Segmentation Techniques:
- Applied color thresholding using custom HSV range
- Used morphological operations (closing and opening) to refine the mask
- Selected the largest contour to create the final mask
-
Developed an interactive HSV tuner (
hsv_tuner.py) to manually adjust color thresholds -
Final HSV range used:
- Lower Bound:
[85, 20, 30] - Upper Bound:
[160, 255, 255]
- Lower Bound:
- Preprocess images by resizing
- Convert to HSV color space
- Create binary mask using color thresholding
- Apply morphological operations
- Extract largest contour as final mask
- Only save those images whose white pixel percentage threshold of being in between 10% and 75%.
Included sample images showing:
- Original images
- Extracted masks
- Segmented regions
- Varying mask colors and materials
- Inconsistent lighting conditions
- Handling different mask styles
- Python 3.7+
- Clone the repository
git clone https://github.com/yourusername/VR_Project1_YourName_YourRollNo.git
cd VR_Project1_YourName_YourRollNo- Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`- Install dependencies
pip install -r requirements.txtRun the segmentation script:
python src/segmentation.py-
Model Architecture:
- Custom U-Net implementation defined in UNet class with:
- Encoder path with 4 blocks: [3→64, 64→128, 128→256, 256→512]
- Double convolution blocks with batch normalization and ReLU activation
- Decoder path with skip connections from encoder
- 2×2 max pooling in encoder and bilinear upsampling in decoder
- Final 1×1 convolution and sigmoid activation for binary mask output
- Custom U-Net implementation defined in UNet class with:
-
Training Approach:
- Input images and masks resized to 256×256
- Trained model loaded from checkpoint at epoch 10
- Evaluated using IoU and Dice score metrics
- Data loading with SegmentationDataset class for paired image-mask processing
-
Visualization Results:
U-Net Segmentation Comparison:
The following image demonstrates better performance of U-Net segmentation as compared to traditional methods. The visualization shows original images (left side image), ground truth masks (middle image), and U-Net predicted masks (right side image).As visible in these examples, U-Net consistently produces clean, accurate mask boundaries that closely match the ground truth masks, even in challenging cases with different mask types, colors, and lighting conditions. The high IoU (93.62%) and Dice (96.44%) scores reflect this qualitative observation.
NOTE : Dataset was uploaded and code was executed on Kaggle due to non-availability of local GPU for execution.
- Visit Kaggle repository
- Create a new notebook using more options button.
- Go to FILES and SELECT IMPORT NOTEBOOK
- Add the UNet.ipynb file link Github repository
Run the ipynb file using run all button
- Uses color thresholding (HSV, LAB), edge detection, or clustering (K-Means).
- Requires manual tuning of parameters for different lighting conditions and backgrounds.
- Performance is inconsistent across diverse datasets.
- Computationally cheaper but lacks generalization.
- Deep learning-based segmentation with encoder-decoder architecture and skip connections.
- Learns complex spatial and contextual relationships automatically.
- Provides higher accuracy and better boundary preservation compared to traditional methods.
- Computationally expensive but generalizes well to new data.
- CNNs perform well on large datasets.
- More tolerant of variations in image quality.
- Requires more computational resources than traditional classifiers.
- Deep models need more data to avoid overfitting.
- UNet handles complex segmentation tasks effectively.
- Performs well with occlusions and overlapping regions.
- Needs a large annotated dataset for better performance.
- Training takes longer and needs powerful hardware.
This project focused on face mask detection and segmentation using both traditional machine learning and deep learning approaches.
- Handcrafted Features + MLP (92.9%) outperformed SVM (88.52%).
- CNN with max pooling achieved the best accuracy (95.38%).
- The Adam optimizer enabled faster convergence compared to SGD.
- Traditional segmentation (HSV thresholding, LAB space, K-Means) faced challenges with color variations and inconsistencies, achieving an average IoU of 57.19%.
- U-Net significantly outperformed traditional methods, achieving:
- IoU: 93.62%
- Dice Score: 96.44%
- U-Net demonstrated superior ability in boundary detection and mask segmentation.










