Hybrid Deep Learning Framework: Comparative Analysis of CNN and PCA-driven FFNN Classification.
This project implements and benchmarks two different computational strategies for image classification using the Kaggle Fruits and Vegetables dataset. The objective is to compare Spatial Feature Extraction (CNN) against Statistical Dimensionality Reduction (PCA + FFNN).
As an MSc Candidate in Computational Biology, I developed this framework to demonstrate the transferability of high-dimensional data processing techniques—moving from Computer Vision to the logic required for Genomic Signal Analysis.
A Deep Learning model optimized for hierarchical pattern recognition.
- Core Layers: Conv2D for spatial patterns, MaxPooling2D for downsampling.
- Regularization: L2 weight regularization and Dropout to ensure generalization.
- Optimization: Adam Optimizer with Early Stopping to prevent overfitting.
A hybrid approach focused on latent space representation, mirroring workflows used in Omics data analysis.
- Pre-processing: StandardScaler for feature normalization.
- Dimensionality Reduction: Principal Component Analysis (PCA) to extract the most informative components from raw pixel data.
- Classification: A Dense Neural Network (FFNN) trained on the reduced feature space.
- Data & OS:
numpy,pandas,os,json - Deep Learning:
tensorflow.keras - Machine Learning & Stats:
sklearn(PCA, StandardScaler, Metrics) - Visualization:
matplotlib
The models are benchmarked using:
- Confusion Matrix: To visualize per-class misclassifications.
- Classification Report: Detailed Precision, Recall, and F1-Score metrics.
- Training Logs: Convergence analysis via accuracy/loss curves.
The models were evaluated on classification accuracy and computational efficiency. The CNN architecture outperformed the statistical approach, demonstrating its superior ability to capture complex spatial patterns in biological/image data.
- CNN Accuracy: ~97%
- PCA + FFNN Accuracy: ~99%
Since this project is optimized for cloud environments (Kaggle/Google Colab), you can run the analysis without downloading the dataset locally:
- Access the Notebook: Open the
.ipynbfile included in this repository. - Environment Setup: If running locally, install dependencies via:
pip install numpy pandas tensorflow scikit-learn matplotlib