Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 1.47 KB

File metadata and controls

27 lines (17 loc) · 1.47 KB

Image-Captioning using VGG for feature extraction

Using Flickr8k dataset 1GB. for each photo 5 descriptions are available.

Used Keras with Tensorflow backend for the code. VGG is used for extracting the features.

No Beam search is yet implemented.

You can download the weights here

Examples

"epoch1" "epoch7" "epoch12"

Dependencies

  • Keras 1.2.2
  • Tensorflow 0.12.1
  • numpy
  • matplotlib

References

[1] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. Show and Tell: A Neural Image Caption Generator

[2] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). VGG