Medical Image Segmentation with U-Net

Representation Learning 2022 Computer Vision Prototype

Overview

This project explored deep-learning methods for medical image segmentation, with a focus on CT image masks and convolutional encoder-decoder architectures.

The workflow built a segmentation pipeline from image-mask matching and preprocessing to U-Net-style model construction, training diagnostics and pixel-level evaluation with Intersection over Union.

The project should be read as an applied computer-vision prototype, not as a clinical diagnostic system.

Problem

Medical image segmentation requires predicting a class label for each pixel or voxel region of an image. In CT imaging, this can be used to identify lesion regions or anatomical structures from grayscale images and corresponding masks.

The goal was to construct a deep-learning workflow that could learn from paired CT images and segmentation masks, then evaluate predicted masks against held-out ground truth.

Data Preparation

The pipeline matched input CT images with their corresponding segmentation masks, filtered image paths using shared file-name structure and converted images into NumPy arrays suitable for neural-network training.

Grayscale CT images were loaded from nested image folders.
Segmentation masks were loaded from corresponding mask folders.
Input and target lists were matched to keep image-mask correspondence.
Images and masks were resized or loaded at native resolution depending on the experiment.
Pixel values were normalized before model training.
Masks were converted into categorical pixel-level targets.

Model Architecture

The core model was a U-Net-style convolutional encoder-decoder architecture. The contraction path used convolution, dropout and max-pooling layers to extract hierarchical image features. The expansion path used transposed convolutions and skip connections to recover spatial resolution.

CT image
  -> convolution / pooling encoder
  -> latent feature representation
  -> transposed-convolution decoder
  -> per-pixel softmax segmentation mask

The output layer produced pixel-level class probabilities through a softmax activation.

Training and Evaluation

The model was trained on an image-mask split and evaluated on held-out data. Training diagnostics included loss and accuracy curves, while segmentation quality was assessed using mean Intersection over Union and class-level IoU calculations.

Train-test split for image and mask arrays.
Categorical cross-entropy for multi-class segmentation masks.
Early stopping based on validation loss.
Training and validation loss curves.
Training and validation accuracy curves.
Mean IoU and class-wise IoU from predicted masks.
Visual comparison of input image, ground-truth mask and predicted mask.

Implemented Elements

Image and mask path extraction from nested folder structures.
Image-mask matching based on file-name substrings.
Grayscale image loading and tensor conversion.
Mask preprocessing and categorical target construction.
U-Net-style model implementation in Keras/TensorFlow.
Dropout and batch-normalization style experimentation.
Training diagnostics and evaluation plots.
Mean IoU and class-wise IoU segmentation metrics.

Evaluation Limits

The project was an educational and applied prototype. It was not validated for clinical use.

Dataset size: segmentation models require careful validation on sufficiently large and diverse datasets.
Class imbalance: lesion masks can be sparse relative to background pixels.
Clinical validity: medical deployment would require expert annotation review and external validation.
Generalization: models trained on one CT dataset may not transfer across scanners, protocols or hospitals.
Metrics: pixel accuracy can be misleading, so IoU and class-level metrics are essential.

Modern Extension

A modern version of this project would strengthen the medical-imaging evaluation protocol and compare more robust segmentation architectures.

Use Dice loss or combined Dice/cross-entropy objectives.
Add data augmentation for robustness.
Use stratified train-validation-test splits by patient or scan source.
Compare U-Net, U-Net++ and attention U-Net variants.
Report Dice, IoU, precision, recall and calibration metrics.
Track prediction uncertainty and failure cases.

Technologies and Methods Used

Python for preprocessing and model orchestration.
TensorFlow / Keras for convolutional segmentation models.
OpenCV / PIL for image loading and preprocessing.
NumPy for tensor construction and array manipulation.
U-Net-style CNNs for encoder-decoder segmentation.
Mean IoU for segmentation evaluation.
Matplotlib for training curves and visual diagnostics.

Resources

Code and raw medical-image data are not public.

An anonymized technical note can be prepared upon request.