Connoisseur

This project was the culmination of my work with Convolutional Neural Networks and Sequence Models, allowing me to apply a lot of the technologies I had learned through the "Deep Learning Specialization" on Coursera, taught by Andrew Ng.

The application is able automatically describe the content of images. To do this, it had to be capable of prerceiving contextual subtleties in images, and to relate observations to both the scene and the real world.

Data Overview

The MS-COCO dataset contained ~400,000 captions and ~80,000 images, i.e each image has 5 corresponding captions.

Data Preprocessing:
The images (*.jpeg or *.png) were passed through a Fine Tuned InceptionV3 (pretrained on imagenet) model and converted to an 8 by 8 by 2048 numpy matrices

The text was tokenized using the Keras.preprocessing.text.Tokenizer

Model

The model architecture was inspired by the Show, Attend and Tell paper by Xu et al. (2015)

The embedded image vector is passed through a CNN encoder
The RNN decoder then takes the image as input and uses Bahdanau's Attention Model to generate a context that is continuously fed (along with an embedded input word) into a GRU to create a sequence of words that would describe the image.

Prediction

The base 64 image is first converted to a tensor and passed through the image_features_extract_model() function, i.e, the InceptionV3 model.

The caption is then sequentially generated using Beam Search (An optimization of best-first search) of width three. This significantly improves the accuracy of the output caption.

The following image provides a visual representation of how the model works.

Future Work/ Improvements:

Add section for users to submit actual captions (vs generated ones) and add code for an online iterative learning pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
1702.01806.pdf		1702.01806.pdf
README.md		README.md
attentionVisualization.png		attentionVisualization.png
beamSearch.png		beamSearch.png
grouppic.png		grouppic.png
homepage.png		homepage.png
image_captioning_with_visual_attention.ipynb		image_captioning_with_visual_attention.ipynb
processInput.png		processInput.png
ski.png		ski.png
skifull.png		skifull.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Connoisseur

Data Overview

Model

Prediction

Future Work/ Improvements:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Connoisseur

Data Overview

Model

Prediction

Future Work/ Improvements:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages