Skip to content

GouthamMallavolu/ImageReconstruction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python TensorFlow Keras Streamlit Streamlit-webrtc OpenCV scikit-image Matplotlib Dataset Platform Project Type Status

Project-ImageReconstruction

Input image reconstruction from features (small network)

Image Reconstruction from Deep CNN Features using a Lightweight Decoder

This project demonstrates how to reconstruct input images from deep CNN feature maps using a small decoder network.
It also includes a Streamlit-based user interface that allows you to upload or capture an image and view live reconstructions generated by the trained decoder.


Table of Contents

  1. Abstract

  2. Introduction

  3. Project Goals and Motivation
  4. Repository Structure

  5. Environment Setup
  6. Download Dataset
  7. Model Architecture
  8. Loss Function and Training Objective
  9. Training the Model
  10. Evaluation
  11. Interactive UI
  12. Testing Model Performance

  13. Model Evaluation

  14. Limitations and Future Work
  15. Reproducibility

  16. References

  17. Authors


Abstract

This project investigates image reconstruction from intermediate convolutional features using a small decoder network. Instead of training a large end-to-end autoencoder, we freeze a lightweight convolutional encoder and learn only a compact decoder that maps feature maps back to the image domain. Concretely, we use a convolutional backbone to extract feature tensors of size 56 X 56 X 256 from face images, and train a decoder composed of strided transposed convolutions and residual blocks to reconstruct 224 X 224 X 3 RGB images. Training is performed on the CelebA-HQ face dataset (30k high-quality face images), a widely used benchmark for generative modeling and face synthesis.

The decoder is optimized with mean absolute error (MAE) in pixel space, while evaluation uses a richer set of metrics: PSNR, SSIM, and LPIPS. SSIM provides a perceptually motivated measure of structural fidelity, and LPIPS compares deep feature activations from pretrained networks, which correlates better with human judgments than traditional distortion metrics. We conduct experiments on full-resolution face images and report both numerical scores and qualitative side-by-side reconstructions. Results show that even a relatively small decoder, trained only on frozen mid-level features, can recover the global structure and identity of faces, though fine details, contrast, and high-frequency textures remain challenging.

To make the system more accessible and reproducible, we package the full pipeline into a modular codebase with scripts for feature extraction, decoder training, and evaluation, along with a Streamlit user interface. The UI allows users to upload images or use a webcam, visualize encoder features, and view live reconstructions. We also provide a reproducibility checklist, detailed training logs, and fixed random seeds to ensure that experiments can be replicated under the specified environment. Overall, this project demonstrates a practical trade-off between model size and reconstruction quality, and serves as a compact, end-to-end example of feature-based image reconstruction suitable for course projects and future research extensions.


Introduction

This project explores image reconstruction from intermediate CNN feature maps using a small, resource-efficient decoder network.

  • The encoder is a frozen convolutional feature extractor.

  • The decoder is a compact convolutional network trained to reconstruct the original 224 X 224 RGB image from encoder feature maps.

The project provides:

  • A training pipeline (train.py) using Keras model.fit.

  • A research-style evaluation script (evaluate.py) with MSE, PSNR, SSIM, and plots.

  • A test script (test_model.py) for quick sanity checks.

  • A Streamlit UI (ui_app.py) for:

    • Image upload reconstruction.

    • Optional live webcam reconstruction.

The focus is on using a small decoder under realistic compute and memory constraints (Apple Silicon, 16GB RAM), while maintaining reasonable reconstruction quality and providing a reproducible end-to-end workflow.

image

Project Goals and Motivation

Goals

  1. Feature-to-Image Reconstruction
    Given an intermediate feature map from a CNN encoder, learn a decoder that can reconstruct the original 224 X 224 RGB image.

  2. Small Network Constraint
    Keep the decoder relatively small (hundreds of thousands of parameters, not tens of millions), to:

    • Run on constrained hardware (e.g., Apple Silicon).

    • Demonstrate that reasonable reconstructions are possible with limited capacity.

  3. End-to-End Workflow

    • Robust training scripts (train.py).

    • Evaluation scripts (evaluate.py) with numerical metrics and visualizations.

    • A UI (ui_app.py) for interactive demos.

  4. Reproducibility and Documentation

    • Pinned environment.

    • Clear dataset assumptions.

    • Correct handling of dataset cardinality and steps per epoch.

Motivation

Feature-to-image reconstruction is closely related to interpretability: it provides intuition about how much information intermediate feature maps retain. Small decoders are relevant in:

  • Low-resource deployment scenarios.

  • Privacy-related questions (how much can be reconstructed from shared features).

  • Educational contexts where hardware is limited.

Repository Structure

The repository layout is:

CAP6415-Project-ImageReconstruction/
├── main.py                         # Optional CLI entry-point (train, evaluate)
├── requirements.txt                # Pinned environment for macOS + Apple Silicon
├── README.md                       # Documentation
├── src/
│   ├── encoder.py                  # Builds frozen encoder (feature extractor)
│   ├── decoder.py                  # Builds small decoder network
│   ├── dataset.py                  # CelebA-HQ loader (224×224, [0,1])
│   ├── train.py                    # Training script (Keras model.fit)
│   ├── test_model.py               # Sanity-check reconstruction script
│   └── evaluate.py                 # Evaluation: MSE/PSNR/SSIM + plots
├── app/
│   └── ui_app.py                   # Streamlit UI: upload + webcam reconstruction
├── dataset/
│   └── celeba_hq/                  # CelebA-HQ images (30,000)
└── outputs/
    └── evaluation/                 # Metrics & plots from evaluate.py

Additional runtime directories:

  • src/models/decoder_checkpoints/: stores trained decoder weights (e.g., decoder_final.h5).

  • outputs/eval_run*/: stores evaluation metrics and figures.

Environment Setup

Target Platform

  • OS: macOS (Apple Silicon).

  • CPU/GPU: Apple Silicon (e.g., Apple M5) with Metal acceleration.

  • Python: 3.10.

  • DL Stack: tensorflow-macos == 2.10.0 + tensorflow-metal.

Conda Environment

Create and activate a dedicated environment:

conda create -n CV python=3.10 -y
conda activate CV

Python Dependencies

From the project root:

pip install -r requirements.txt

Key pinned packages:

  • tensorflow-macos == 2.10.0

  • tensorflow-metal == 0.7.0

  • numpy == 1.23.5

  • ml-dtypes == 0.2.0

  • protobuf == 3.19.6

  • opencv-python == 4.8.1.78

  • scikit-image == 0.21.0

  • matplotlib == 3.7.1

  • streamlit == 1.22.0

  • streamlit-webrtc

  • altair == 4.2.2, vega-datasets == 0.9.0

  • tqdm == 4.66.1

These versions avoid common compatibility issues such as NumPy / TensorFlow ABI mismatches and Protobuf descriptor errors.

Data Layout

The project uses CelebA-HQ, a dataset of high-quality face images. Place the images as:

CAP6415-Project-ImageReconstruction/dataset/celeba_hq/
    00001.png
    00002.png
    ...
    (≈ 30,000 images)

No class subfolders are required; the dataset is treated as a single pool of face images.

Dataset Loader (dataset.py)

The loader uses tf.keras.utils.image_dataset_from_directory to:

  • Read all images under dataset/celeba_hq/.

  • Resize each image to $224\times224$.

  • Normalize to $[0, 1]$ (float32).

  • Batch them (default batch size is often 8).

To verify:

python src/dataset.py

You should see:

  • "Found 30000 files belonging to 1 classes."

  • Batch shape: e.g. (4, 224, 224, 3).

  • Pixel range: 0.0 - 1.0.

Dataset Cardinality

Given:

  • $N = 30,000$ images.

  • Batch size $B = 8$.

Number of steps (batches) in one full epoch:

$$ n_{\text{batches}} = \left\lceil \frac{N}{B} \right\rceil = \left\lceil \frac{30000}{8} \right\rceil = 3750 $$

Thus, one epoch using the full dataset corresponds to 3750 training steps.

Model Architecture

Encoder (encoder.py)

The encoder is a convolutional backbone used as a feature extractor:

  • Input: $224\times224\times3$ RGB image.

  • Output: feature map, typically $56\times56\times256$.

  • During training: encoder.trainable = False.

It approximates a precomputed feature extractor for feature-to-image reconstruction.

Decoder (decoder.py)

The decoder is a small CNN:

  • Input: encoder features, e.g., $56\times56\times256$.

  • Output: reconstructed image, $224\times224\times3$.

  • Uses upsampling (e.g., UpSampling2D + Conv2D) and residual blocks.

  • Final layer: Conv2D(3, kernel_size=3, activation="sigmoid") to map to $[0,1]$.

The total parameter count is kept in the order of a few hundred thousand parameters.

Autoencoder Assembly

The autoencoder combines encoder and decoder:

$$x \in \mathbb{R}^{224\times224\times3} \quad\rightarrow\quad f(x) \in \mathbb{R}^{56\times56\times256} \quad\rightarrow\quad \hat{x} \in \mathbb{R}^{224\times224\times3},$$

where:

  • $f$ is the frozen encoder.

  • The decoder is trainable.

  • The training objective is to minimize the difference between $x$ and $\hat{x}$.

Loss Function and Training Objective

Combined SSIM + L1 Loss

The project uses a combination of mean absolute error (L1) and structural similarity (SSIM):

def ssim_l1_loss(y_true, y_pred, alpha=0.8):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(y_pred, tf.float32)

    l1 = tf.reduce_mean(tf.abs(y_true - y_pred))
    ssim_val = tf.image.ssim(y_true, y_pred, max_val=1.0)
    ssim_loss = 1.0 - tf.reduce_mean(ssim_val)

    return alpha * l1 + (1.0 - alpha) * ssim_loss
  • L1 encourages pixel-wise accuracy.

  • SSIM focuses on structural similarity.

  • $\alpha = 0.8$ gives more weight to L1 while retaining structure-aware penalties.

Optimization

The autoencoder is typically trained with the Adam optimizer and a modest learning rate to ensure stable convergence for the small decoder.

Training the Model

Using All 30,000 Images per Epoch

Previous issues occurred when manually setting steps_per_epoch = 1000, which conflicted with the true cardinality and caused "Your input ran out of data" warnings.

In the final configuration:

  • One epoch uses the entire dataset ($3750$ batches).

  • steps_per_epoch is either inferred or explicitly set to the dataset cardinality.

Final Training Pattern

train_ds = load_celeba_hq(batch_size=batch_size)

train_ds = train_ds.cache().shuffle(1000, seed=SEED).prefetch(tf.data.AUTOTUNE)

history = autoencoder.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[checkpoint_cb],
)

Keras automatically infers:

$$ \mathrm{steps}_{\mathrm{per\ epoch}} = \left| \mathrm{train\ ds} \right| \approx 3750 $$

Alternatively, explicitly:

num_batches = int(train_ds.cardinality().numpy())  # ≈ 3750

history = autoencoder.fit(
    train_ds,
    epochs=EPOCHS,
    steps_per_epoch=num_batches,
    callbacks=[checkpoint_cb],
)

Training Script Usage

From the project root:

python src/train.py

The training will:

  • Build encoder and decoder.

  • Freeze the encoder parameters.

  • Compile the autoencoder using ssim_l1_loss.

  • Load existing weights for fine-tuning, if available.

  • Train for the specified number of epochs.

  • Save:

    • src/models/decoder_checkpoints/decoder_final.h5

    • loss_history.json

    • loss_curve.png

Evaluation

Evaluation Script (evaluate.py)

To evaluate the trained model:

python src/evaluate.py \
  --weights src/models/decoder_checkpoints/decoder_final.h5 \
  --num-samples 300 \
  --batch-size 8 \
  --save-dir outputs/eval_run1

The script:

  1. Builds encoder and decoder, loads decoder weights.

  2. Samples images from CelebA-HQ.

  3. Computes:

    • Mean squared error (MSE).

    • Peak signal-to-noise ratio (PSNR) with data_range=1.0.

    • Structural similarity (SSIM) with data_range=1.0, channel_axis=-1.

  4. Saves to outputs/eval_run1/:

    • metrics_summary.json: means and standard deviations; full lists.

    • psnr_histogram.png, ssim_histogram.png.

    • sample_reconstructions.png: original vs reconstructed image grid.

    • training_loss_curve.png (if loss history is present).

Quick Test (test_model.py)

For a small sanity check:

python src/test_model.py

It:

  • Loads encoder and decoder with trained weights.

  • Reconstructs a small batch of images.

  • Prints basic metrics and may save a comparison figure.

Interactive UI

Streamlit App (ui_app.py)

Start the UI:

python -m streamlit run app/ui_app.py

Open the local URL (e.g., http://localhost:8501).

Features

Image Upload

  • Upload a face image (JPG/PNG).

  • The app resizes to $224\times224$, normalizes, runs encoder+decoder.

  • Displays:

    • Original vs reconstructed images.

    • MSE, PSNR, SSIM for the uploaded sample.

Webcam Reconstruction (Optional)

  • Uses streamlit-webrtc and av.

  • Implements a VideoProcessorBase with a recv() method (current API).

  • Each frame is preprocessed, encoded, decoded, and rendered as a reconstructed stream.

Reproducibility

Environment

  • Python 3.10.

  • Conda environment CV.

  • Dependencies installed from requirements.txt.

Random Seeds

In train.py, set seeds:

import random
import numpy as np
import tensorflow as tf

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

Use the same seed for dataset shuffling:

train_ds = train_ds.shuffle(1000, seed=SEED)

Note that full bitwise determinism may not be guaranteed on GPU/Metal, but this significantly stabilizes the training outcome.

Dataset Consistency

To reproduce results:

  • Use the same CelebA-HQ dataset in dataset/celeba_hq/.

  • Do not change the number or identity of images.

Training Hyperparameters

Keep fixed:

  • batch_size (e.g., 8).

  • Number of epochs (e.g., 30).

  • Encoder and decoder architectures.

  • Loss function (ssim_l1_loss, same $\alpha$).

  • Learning rate and optimizer.

Steps per Epoch

Ensure each epoch uses the full dataset:

  • Recommended: do not set steps_per_epoch; let Keras infer it from dataset cardinality.

  • Alternatively: set steps_per_epoch = int(train_ds.cardinality().numpy()) (approximately 3750).

Avoid contradictory manual settings (e.g., too small steps_per_epoch combined with caching) that can cause "input ran out of data" warnings.

Testing model performance

image, image

Model Evaluation

image image image image

Limitations and Future Work

Current Limitations

  • Small Decoder:

    • The constrained decoder limits reconstruction sharpness compared to large decoders or GANs.
  • Face-Only Training:

    • Trained only on CelebA-HQ faces; generalisation to non-face data is limited.
  • Fixed Resolution:

    • Only supports $224\times224$ images in the current configuration.
  • No Adversarial Loss:

    • Reconstructions may appear over-smoothed compared to GAN-based methods.

Future Extensions

  • Slightly deeper decoder while keeping it "small" overall.

  • Additional perceptual loss terms, e.g., VGG-based perceptual loss.

  • Explore reconstruction from different encoder layers (early vs deep features).

  • Multi-resolution outputs and multi-scale training.

  • Enhanced UI:

    • Compare different checkpoints.

    • Visualise error maps.

    • Toggle between different loss configurations.

Reproducibility

End-to-End Usage Summary

  1. Clone the repository

    git clone https://github.com/GouthamMallavolu/CAP6415-Project-ImageReconstruction
    cd CAP6415-Project-ImageReconstruction
  2. Set up the environment

    conda create -n CV python=3.10 -y
    conda activate CV
    pip install -r requirements.txt
  3. Prepare the dataset

-   Place CelebA-HQ images under `dataset/celeba_hq/`.
  1. Verify the dataset loader

    python src/dataset.py
  2. Extract features for making training easier

    python src/extract_features.py
  3. Train the model

    python src/train.py
    Screenshot 2025-12-06 at 12 16 57 PM
  4. Test the model

    python src/test_model.py
    image
  5. Evaluate the model

    python src/evaluate.py
    image
  6. Run the UI

    python -m streamlit run app/ui_app.py
    image

Following these steps with the newly created conda environment, dataset layout, and scripts will reproduce the training behavior, evaluation metrics, and interactive demo for image reconstruction from CNN features using a small decoder network.


References

  1. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.

  2. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep Learning Face Attributes in the Wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

  3. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations (ICLR).

  4. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13(4), 600–612.

  5. Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  6. Autoencoder. Wikipedia, The Free Encyclopedia. Accessed 2024. (Overview of autoencoders and their use for unsupervised representation learning and reconstruction.)


Authors


© 2025 This project was created as part of Florida Atlantic University, CAP 6415 Computer Vision course Project.

About

Input image reconstruction from features (small network)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages