Project-ImageReconstruction

Input image reconstruction from features (small network)

Video PPT Presentation and Code Demo : https://youtu.be/YzsRktf5dVI?si=K3JNTEeE23hOux2H
Documentation : https://fau-my.sharepoint.com/:w:/g/personal/mmitayeegiri2024_fau_edu/IQC6R7wd59K5SbGaf26PDk2AAR5ckQtbKM0Hxw2__gFW8pc?e=aUlaMo

Image Reconstruction from Deep CNN Features using a Lightweight Decoder

This project demonstrates how to reconstruct input images from deep CNN feature maps using a small decoder network.
It also includes a Streamlit-based user interface that allows you to upload or capture an image and view live reconstructions generated by the trained decoder.

Abstract
Introduction
Project Goals and Motivation
- Goals
- Motivation
Repository Structure
Environment Setup
Download Dataset
Model Architecture
Loss Function and Training Objective
- Combined SSIM + L1 Loss
- Optimization
Training the Model
Evaluation
- Evaluation Script (evaluate.py)
- Quick Test (test_model.py)
Interactive UI
- Streamlit App (ui_app.py)
- Features
Testing Model Performance
Model Evaluation
Limitations and Future Work
- Current Limitations
- Future Extensions
Reproducibility
References
Authors

Abstract

This project investigates image reconstruction from intermediate convolutional features using a small decoder network. Instead of training a large end-to-end autoencoder, we freeze a lightweight convolutional encoder and learn only a compact decoder that maps feature maps back to the image domain. Concretely, we use a convolutional backbone to extract feature tensors of size 56 X 56 X 256 from face images, and train a decoder composed of strided transposed convolutions and residual blocks to reconstruct 224 X 224 X 3 RGB images. Training is performed on the CelebA-HQ face dataset (30k high-quality face images), a widely used benchmark for generative modeling and face synthesis.

The decoder is optimized with mean absolute error (MAE) in pixel space, while evaluation uses a richer set of metrics: PSNR, SSIM, and LPIPS. SSIM provides a perceptually motivated measure of structural fidelity, and LPIPS compares deep feature activations from pretrained networks, which correlates better with human judgments than traditional distortion metrics. We conduct experiments on full-resolution face images and report both numerical scores and qualitative side-by-side reconstructions. Results show that even a relatively small decoder, trained only on frozen mid-level features, can recover the global structure and identity of faces, though fine details, contrast, and high-frequency textures remain challenging.

To make the system more accessible and reproducible, we package the full pipeline into a modular codebase with scripts for feature extraction, decoder training, and evaluation, along with a Streamlit user interface. The UI allows users to upload images or use a webcam, visualize encoder features, and view live reconstructions. We also provide a reproducibility checklist, detailed training logs, and fixed random seeds to ensure that experiments can be replicated under the specified environment. Overall, this project demonstrates a practical trade-off between model size and reconstruction quality, and serves as a compact, end-to-end example of feature-based image reconstruction suitable for course projects and future research extensions.

Introduction

This project explores image reconstruction from intermediate CNN feature maps using a small, resource-efficient decoder network.

The encoder is a frozen convolutional feature extractor.
The decoder is a compact convolutional network trained to reconstruct the original 224 X 224 RGB image from encoder feature maps.

The project provides:

A training pipeline (train.py) using Keras model.fit.
A research-style evaluation script (evaluate.py) with MSE, PSNR, SSIM, and plots.
A test script (test_model.py) for quick sanity checks.
A Streamlit UI (ui_app.py) for:
- Image upload reconstruction.
- Optional live webcam reconstruction.

The focus is on using a small decoder under realistic compute and memory constraints (Apple Silicon, 16GB RAM), while maintaining reasonable reconstruction quality and providing a reproducible end-to-end workflow.

Project Goals and Motivation

Goals

Feature-to-Image Reconstruction
Given an intermediate feature map from a CNN encoder, learn a decoder that can reconstruct the original 224 X 224 RGB image.
Small Network Constraint
Keep the decoder relatively small (hundreds of thousands of parameters, not tens of millions), to:
- Run on constrained hardware (e.g., Apple Silicon).
- Demonstrate that reasonable reconstructions are possible with limited capacity.
End-to-End Workflow
- Robust training scripts (train.py).
- Evaluation scripts (evaluate.py) with numerical metrics and visualizations.
- A UI (ui_app.py) for interactive demos.
Reproducibility and Documentation
- Pinned environment.
- Clear dataset assumptions.
- Correct handling of dataset cardinality and steps per epoch.

Motivation

Feature-to-image reconstruction is closely related to interpretability: it provides intuition about how much information intermediate feature maps retain. Small decoders are relevant in:

Low-resource deployment scenarios.
Privacy-related questions (how much can be reconstructed from shared features).
Educational contexts where hardware is limited.

Repository Structure

The repository layout is:

CAP6415-Project-ImageReconstruction/
├── main.py                         # Optional CLI entry-point (train, evaluate)
├── requirements.txt                # Pinned environment for macOS + Apple Silicon
├── README.md                       # Documentation
├── src/
│   ├── encoder.py                  # Builds frozen encoder (feature extractor)
│   ├── decoder.py                  # Builds small decoder network
│   ├── dataset.py                  # CelebA-HQ loader (224×224, [0,1])
│   ├── train.py                    # Training script (Keras model.fit)
│   ├── test_model.py               # Sanity-check reconstruction script
│   └── evaluate.py                 # Evaluation: MSE/PSNR/SSIM + plots
├── app/
│   └── ui_app.py                   # Streamlit UI: upload + webcam reconstruction
├── dataset/
│   └── celeba_hq/                  # CelebA-HQ images (30,000)
└── outputs/
    └── evaluation/                 # Metrics & plots from evaluate.py

Additional runtime directories:

src/models/decoder_checkpoints/: stores trained decoder weights (e.g., decoder_final.h5).
outputs/eval_run*/: stores evaluation metrics and figures.

Environment Setup

Target Platform

OS: macOS (Apple Silicon).
CPU/GPU: Apple Silicon (e.g., Apple M5) with Metal acceleration.
Python: 3.10.
DL Stack: tensorflow-macos == 2.10.0 + tensorflow-metal.

Conda Environment

Create and activate a dedicated environment:

conda create -n CV python=3.10 -y
conda activate CV

Python Dependencies

From the project root:

pip install -r requirements.txt

Key pinned packages:

tensorflow-macos == 2.10.0
tensorflow-metal == 0.7.0
numpy == 1.23.5
ml-dtypes == 0.2.0
protobuf == 3.19.6
opencv-python == 4.8.1.78
scikit-image == 0.21.0
matplotlib == 3.7.1
streamlit == 1.22.0
streamlit-webrtc
altair == 4.2.2, vega-datasets == 0.9.0
tqdm == 4.66.1

These versions avoid common compatibility issues such as NumPy / TensorFlow ABI mismatches and Protobuf descriptor errors.

Data Layout

The project uses CelebA-HQ, a dataset of high-quality face images. Place the images as:

CAP6415-Project-ImageReconstruction/dataset/celeba_hq/
    00001.png
    00002.png
    ...
    (≈ 30,000 images)

No class subfolders are required; the dataset is treated as a single pool of face images.

Dataset Loader (`dataset.py`)

The loader uses tf.keras.utils.image_dataset_from_directory to:

Read all images under dataset/celeba_hq/.
Resize each image to $224\times224$.
Normalize to $[0, 1]$ (float32).
Batch them (default batch size is often 8).

To verify:

python src/dataset.py

You should see:

"Found 30000 files belonging to 1 classes."
Batch shape: e.g. (4, 224, 224, 3).
Pixel range: 0.0 - 1.0.

Dataset Cardinality

Given:

$N = 30,000$ images.
Batch size $B = 8$.

Number of steps (batches) in one full epoch:

$$ n_{\text{batches}} = \left\lceil \frac{N}{B} \right\rceil = \left\lceil \frac{30000}{8} \right\rceil = 3750 $$

Thus, one epoch using the full dataset corresponds to 3750 training steps.

Model Architecture

Encoder (`encoder.py`)

The encoder is a convolutional backbone used as a feature extractor:

Input: $224\times224\times3$ RGB image.
Output: feature map, typically $56\times56\times256$.
During training: encoder.trainable = False.

It approximates a precomputed feature extractor for feature-to-image reconstruction.

Decoder (`decoder.py`)

The decoder is a small CNN:

Input: encoder features, e.g., $56\times56\times256$.
Output: reconstructed image, $224\times224\times3$.
Uses upsampling (e.g., UpSampling2D + Conv2D) and residual blocks.
Final layer: Conv2D(3, kernel_size=3, activation="sigmoid") to map to $[0,1]$.

The total parameter count is kept in the order of a few hundred thousand parameters.

Autoencoder Assembly

The autoencoder combines encoder and decoder:

$$x \in \mathbb{R}^{224\times224\times3} \quad\rightarrow\quad f(x) \in \mathbb{R}^{56\times56\times256} \quad\rightarrow\quad \hat{x} \in \mathbb{R}^{224\times224\times3},$$

where:

$f$ is the frozen encoder.
The decoder is trainable.
The training objective is to minimize the difference between $x$ and $\hat{x}$.

Loss Function and Training Objective

Combined SSIM + L1 Loss

The project uses a combination of mean absolute error (L1) and structural similarity (SSIM):

def ssim_l1_loss(y_true, y_pred, alpha=0.8):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(y_pred, tf.float32)

    l1 = tf.reduce_mean(tf.abs(y_true - y_pred))
    ssim_val = tf.image.ssim(y_true, y_pred, max_val=1.0)
    ssim_loss = 1.0 - tf.reduce_mean(ssim_val)

    return alpha * l1 + (1.0 - alpha) * ssim_loss

L1 encourages pixel-wise accuracy.
SSIM focuses on structural similarity.
$\alpha = 0.8$ gives more weight to L1 while retaining structure-aware penalties.

Optimization

The autoencoder is typically trained with the Adam optimizer and a modest learning rate to ensure stable convergence for the small decoder.

Training the Model

Using All 30,000 Images per Epoch

Previous issues occurred when manually setting steps_per_epoch = 1000, which conflicted with the true cardinality and caused "Your input ran out of data" warnings.

In the final configuration:

One epoch uses the entire dataset ($3750$ batches).
steps_per_epoch is either inferred or explicitly set to the dataset cardinality.

Final Training Pattern

train_ds = load_celeba_hq(batch_size=batch_size)

train_ds = train_ds.cache().shuffle(1000, seed=SEED).prefetch(tf.data.AUTOTUNE)

history = autoencoder.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[checkpoint_cb],
)

Keras automatically infers:

$$ \mathrm{steps}_{\mathrm{per\ epoch}} = \left| \mathrm{train\ ds} \right| \approx 3750 $$

Alternatively, explicitly:

num_batches = int(train_ds.cardinality().numpy())  # ≈ 3750

history = autoencoder.fit(
    train_ds,
    epochs=EPOCHS,
    steps_per_epoch=num_batches,
    callbacks=[checkpoint_cb],
)

Training Script Usage

From the project root:

python src/train.py

The training will:

Build encoder and decoder.
Freeze the encoder parameters.
Compile the autoencoder using ssim_l1_loss.
Load existing weights for fine-tuning, if available.
Train for the specified number of epochs.
Save:
- src/models/decoder_checkpoints/decoder_final.h5
- loss_history.json
- loss_curve.png

Evaluation

Evaluation Script (`evaluate.py`)

To evaluate the trained model:

python src/evaluate.py \
  --weights src/models/decoder_checkpoints/decoder_final.h5 \
  --num-samples 300 \
  --batch-size 8 \
  --save-dir outputs/eval_run1

The script:

Builds encoder and decoder, loads decoder weights.
Samples images from CelebA-HQ.
Computes:
- Mean squared error (MSE).
- Peak signal-to-noise ratio (PSNR) with data_range=1.0.
- Structural similarity (SSIM) with data_range=1.0, channel_axis=-1.
Saves to outputs/eval_run1/:
- metrics_summary.json: means and standard deviations; full lists.
- psnr_histogram.png, ssim_histogram.png.
- sample_reconstructions.png: original vs reconstructed image grid.
- training_loss_curve.png (if loss history is present).

Quick Test (`test_model.py`)

For a small sanity check:

python src/test_model.py

It:

Loads encoder and decoder with trained weights.
Reconstructs a small batch of images.
Prints basic metrics and may save a comparison figure.

Interactive UI

Streamlit App (`ui_app.py`)

Start the UI:

python -m streamlit run app/ui_app.py

Open the local URL (e.g., http://localhost:8501).

Features

Image Upload

Upload a face image (JPG/PNG).
The app resizes to $224\times224$, normalizes, runs encoder+decoder.
Displays:
- Original vs reconstructed images.
- MSE, PSNR, SSIM for the uploaded sample.

Webcam Reconstruction (Optional)

Uses streamlit-webrtc and av.
Implements a VideoProcessorBase with a recv() method (current API).
Each frame is preprocessed, encoded, decoded, and rendered as a reconstructed stream.

Reproducibility

Environment

Python 3.10.
Conda environment CV.
Dependencies installed from requirements.txt.

Random Seeds

In train.py, set seeds:

import random
import numpy as np
import tensorflow as tf

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

Use the same seed for dataset shuffling:

train_ds = train_ds.shuffle(1000, seed=SEED)

Note that full bitwise determinism may not be guaranteed on GPU/Metal, but this significantly stabilizes the training outcome.

Dataset Consistency

To reproduce results:

Use the same CelebA-HQ dataset in dataset/celeba_hq/.
Do not change the number or identity of images.

Training Hyperparameters

Keep fixed:

batch_size (e.g., 8).
Number of epochs (e.g., 30).
Encoder and decoder architectures.
Loss function (ssim_l1_loss, same $\alpha$).
Learning rate and optimizer.

Steps per Epoch

Ensure each epoch uses the full dataset:

Recommended: do not set steps_per_epoch; let Keras infer it from dataset cardinality.
Alternatively: set steps_per_epoch = int(train_ds.cardinality().numpy()) (approximately 3750).

Avoid contradictory manual settings (e.g., too small steps_per_epoch combined with caching) that can cause "input ran out of data" warnings.

Testing model performance

,

Model Evaluation

Limitations and Future Work

Current Limitations

Small Decoder:
- The constrained decoder limits reconstruction sharpness compared to large decoders or GANs.
Face-Only Training:
- Trained only on CelebA-HQ faces; generalisation to non-face data is limited.
Fixed Resolution:
- Only supports $224\times224$ images in the current configuration.
No Adversarial Loss:
- Reconstructions may appear over-smoothed compared to GAN-based methods.

Future Extensions

Slightly deeper decoder while keeping it "small" overall.
Additional perceptual loss terms, e.g., VGG-based perceptual loss.
Explore reconstruction from different encoder layers (early vs deep features).
Multi-resolution outputs and multi-scale training.
Enhanced UI:
- Compare different checkpoints.
- Visualise error maps.
- Toggle between different loss configurations.

Reproducibility

End-to-End Usage Summary

Clone the repository

git clone https://github.com/GouthamMallavolu/CAP6415-Project-ImageReconstruction
cd CAP6415-Project-ImageReconstruction

Set up the environment

conda create -n CV python=3.10 -y
conda activate CV
pip install -r requirements.txt

Prepare the dataset

Download Dataset CelebA-HQ: https://www.kaggle.com/datasets/lamsimon/celebahq?resource=download-directory&select=celeba_hq

-   Place CelebA-HQ images under `dataset/celeba_hq/`.

Verify the dataset loader
```
python src/dataset.py
```
Extract features for making training easier
```
python src/extract_features.py
```
Train the model
```
python src/train.py
```
Test the model
```
python src/test_model.py
```
Evaluate the model
```
python src/evaluate.py
```
Run the UI
```
python -m streamlit run app/ui_app.py
```

Following these steps with the newly created conda environment, dataset layout, and scripts will reproduce the training behavior, evaluation metrics, and interactive demo for image reconstruction from CNN features using a small decoder network.

References

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep Learning Face Attributes in the Wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations (ICLR).
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Autoencoder. Wikipedia, The Free Encyclopedia. Accessed 2024. (Overview of autoencoders and their use for unsupervised representation learning and reconstruction.)

Authors

Goutham Mallavolu - gmallavolu2024@fau.edu
Maahir Mitayeegiri - mmitayeegiri2024@fau.edu

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Weekly_Logs		Weekly_Logs
app		app
outputs/evaluation		outputs/evaluation
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project-ImageReconstruction

Image Reconstruction from Deep CNN Features using a Lightweight Decoder

Table of Contents

Abstract

Introduction

Project Goals and Motivation

Goals

Motivation

Repository Structure

Environment Setup

Target Platform

Conda Environment

Python Dependencies

Data Layout

Dataset Loader (dataset.py)

Dataset Cardinality

Model Architecture

Encoder (encoder.py)

Decoder (decoder.py)

Autoencoder Assembly

Loss Function and Training Objective

Combined SSIM + L1 Loss

Optimization

Training the Model

Using All 30,000 Images per Epoch

Final Training Pattern

Training Script Usage

Evaluation

Evaluation Script (evaluate.py)

Quick Test (test_model.py)

Interactive UI

Streamlit App (ui_app.py)

Features

Image Upload

Webcam Reconstruction (Optional)

Reproducibility

Environment

Random Seeds

Dataset Consistency

Training Hyperparameters

Steps per Epoch

Testing model performance

Model Evaluation

Limitations and Future Work

Current Limitations

Future Extensions

Reproducibility

End-to-End Usage Summary

References

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Dataset Loader (`dataset.py`)

Encoder (`encoder.py`)

Decoder (`decoder.py`)

Evaluation Script (`evaluate.py`)

Quick Test (`test_model.py`)

Streamlit App (`ui_app.py`)

Packages