Research code supporting our work on multi-class brain tumour classification from MRI scans using deep convolutional neural networks.z
This repository trains and evaluates five CNN architectures for 4-class brain tumour classification:
| Model | Type |
|---|---|
| Xception | Transfer learning (ImageNet) |
| EfficientNetB4 | Transfer learning (ImageNet) |
| ResNet-50 | Transfer learning (ImageNet) |
| Custom Residual CNN | Trained from scratch, dilated convolutions |
| Dilated CNN | Trained from scratch, sequential dilated convolutions |
Classes: glioma · meningioma · notumor · pituitary
Dataset: Brain Tumor MRI Dataset
Author: Masoud Nickparvar (Kaggle)
License: CC BY-SA 4.0
This repository does not redistribute the dataset.
Users must download it directly from Kaggle and comply with the dataset’s license terms.
Mri-Tumor-Classification-Benchmark/
│
├── notebooks/
│ ├── 00_original_notebook.ipynb # Original monolithic notebook (reference)
│ ├── 01_data_exploration.ipynb # EDA: class distribution, sample images
│ └── 02_train_and_evaluate.ipynb # Model training + evaluation pipeline
│
├── src/ # Importable Python modules
│ ├── __init__.py
│ ├── data_utils.py # Dataset loading & Keras generators
│ ├── models.py # Model factory functions
│ └── visualization.py # Plotting helpers
│
├── configs/
│ └── config.yaml # All hyperparameters & paths (edit here)
│
├── data/
│ └── README.md # Dataset download instructions
│
├── figures/ # Saved training curves & confusion matrices
│
├── outputs/ # Model weights (gitignored, created at runtime)
│
├── environment.yml # Conda environment spec
├── requirements.txt # Pip requirements
├── LICENSE # Apache 2.0
└── CONTRIBUTING.md
git clone https://github.com/CodeNinjaSarthak/Mri-Tumor-Classification-Benchmark.git
cd Mri-Tumor-Classification-Benchmark# Option A — conda (recommended)
conda env create -f environment.yml
conda activate brain-tumor-mri
# Option B — pip + venv
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtFollow the instructions in data/README.md.
TL;DR (Kaggle API):
pip install kaggle
# Place ~/.kaggle/kaggle.json (from kaggle.com → Settings → Create New Token)
kaggle datasets download -d masoudnickparvar/brain-tumor-mri-dataset \
-p data/ --unzipOpen configs/config.yaml and update the data: section:
data:
training_dir: "data/brain-tumor-mri-dataset/Training"
testing_dir: "data/brain-tumor-mri-dataset/Testing"On Kaggle the defaults (
/kaggle/input/...) are correct as-is.
jupyter notebookExecute in order:
| Step | Notebook |
|---|---|
| 1 | notebooks/01_data_exploration.ipynb |
| 2 | notebooks/02_train_and_evaluate.ipynb |
All random seeds that TensorFlow exposes are set to 42. Note that
full bit-for-bit reproducibility on GPU is not guaranteed by
TensorFlow without additional environment variables; see the
TF determinism guide
for details.
To enable strict determinism add the following to the top of the training notebook before any other TensorFlow calls:
import os
os.environ["TF_DETERMINISTIC_OPS"] = "1"
tf.config.experimental.enable_op_determinism()| Decision | Rationale |
|---|---|
All hyperparameters in configs/config.yaml |
Single place to change epochs, LR, batch size; avoids notebook sprawl |
EarlyStopping(restore_best_weights=True) + ReduceLROnPlateau |
Prevents overfitting, adapts LR when training plateaus |
80 / 20 stratified split via ImageDataGenerator(validation_split=0.2) |
Reproducible split with fixed seed=42 |
Models deleted after evaluation (del model, history) |
Frees GPU VRAM between sequential model runs on memory-limited hardware |
| Transfer-learning backbones fully fine-tuned (no frozen layers) | MRI imagery differs sufficiently from ImageNet that full fine-tuning yields better results than feature extraction alone |
The model labelled "U-Net" in earlier versions of this codebase is a sequential dilated CNN, not the encoder-decoder with skip connections described in Ronneberger et al. (2015). It has been renamed to "Dilated CNN" in the refactored notebooks for clarity. The scientific implementation (weights, dilation rates, layer counts) is unchanged.
This repository contains no medical data. All experiments were conducted using publicly available datasets.
This code is released under the Apache 2.0 License. The dataset is distributed under CC BY-SA 4.0 — see the Kaggle page for full terms.