AI-powered Image → Main Entity Identification → Open Library Book Retrieval using OpenAI Vision Models and the Open Library API.
Developed by Partha Pratim Ray
📧 parthapratimray1986@gmail.com
📧 ppray@cus.ac.in
This project is an intelligent multimodal retrieval system that accepts an uploaded image and automatically identifies the single most important searchable entity from that image (such as a famous person, book title, monument, historical event, scientific concept, cultural object, or academic subject).
The system then converts that entity into a clean bibliographic search query and retrieves the most relevant books from the Open Library API, presenting results in a structured tabular format along with detailed API timing analysis.
Additionally, all results are automatically saved into a local .csv file for future research analysis, benchmarking, and documentation.
This system is especially useful for:
- Academic demonstrations
- Research prototypes
- Digital library systems
- Educational retrieval systems
- Multimodal search experiments
- AI-assisted bibliographic discovery
Image Upload
↓
Image Encoding (Base64)
↓
AI Vision Analysis (GPT-5.4-mini)
↓
Main Entity Detection
↓
Textual Description Generation
↓
Keyword Extraction (GPT-5.4-nano)
↓
Bibliographic Query Generation
↓
Open Library Search API
↓
Book Retrieval
↓
Result Display + CSV Logging
---
# 🧠 Model Strategy
| Task | Model/API Used |
| -------------- | ---------------- |
| Image Analysis | GPT-5.4-mini |
| Query Cleaning | GPT-5.4-nano |
| Book Retrieval | Open Library API |
| User Interface | Gradio |
| Result Storage | CSV + Pandas |
---
# ✨ Key Features
## 1. Intelligent Image Understanding
Instead of generating generic image descriptions, the system identifies:
* Famous personalities
* Authors
* Historical figures
* Religious figures
* Monuments
* Artworks
* Scientific concepts
* Cultural themes
* Academic subjects
* Book titles
* Recognizable named entities
This significantly improves search quality.
---
## 2. Strict Entity-Based Retrieval
The system avoids vague descriptions such as:
❌ elderly man
❌ black-and-white photo
❌ traditional dress
❌ seated person
and prioritizes:
✅ C. V. Raman
✅ Buddha
✅ White Tiger
✅ Bharatanatyam
✅ Taj Mahal
This improves Open Library matching accuracy.
---
## 3. Structured Open Library Search
Retrieved metadata includes:
* Title
* Author
* First Publish Year
* Edition Count
* Languages
* ISBN
* Open Library Work ID
---
## 4. Automatic CSV Logging
Every processed image is permanently saved in:
```text
image_open_library_results.csv
This supports:
- research reproducibility
- performance benchmarking
- large-scale experiments
- longitudinal evaluation
The system records:
- Image Encoding Time
- Vision API Time
- Keyword API Time
- Open Library API Time
- Internal Processing Delay
- Total End-to-End Processing Time
This enables system-level performance analysis.
Install dependencies using:
pip install openai gradio pandas requests python-dotenv -qThis project is designed for Google Colab.
Store your API key securely using:
from google.colab import userdataAdd your OpenAI key inside:
Colab Secrets → OPENAI_API_KEY
The system automatically loads:
api_key = userdata.get("OPENAI_API_KEY")Open the notebook in Google Colab
Install required packages
pip install openai gradio pandas requests python-dotenv -qAdd your OpenAI API Key to Colab Secrets
OPENAI_API_KEY
Run all cells
Upload an image and click:
🚀 Analyze Image and Find Books
View:
- AI Analysis Result
- Open Library Results
- Timing Summary
- Saved CSV File
Main Entity: C. V. Raman
Confidence: High
Reason: The portrait closely matches the well-known Indian physicist C. V. Raman in formal attire and turban.
C. V. Raman
| Title | Author | Year |
|---|---|---|
| The Life of C. V. Raman | Various | 1985 |
Vision API Time: 1.53 seconds
Keyword API Time: 0.76 seconds
Open Library API Time: 0.41 seconds
Total Processing Time: 2.71 seconds
This work demonstrates a practical bridge between:
and
using lightweight LLM-based semantic reasoning.
It is especially valuable for:
- Digital Humanities
- Smart Libraries
- Educational AI Systems
- Visual Knowledge Retrieval
- AI-powered Cataloguing
- Library Science Automation
The system uses a professionally designed Gradio Interface with:
- sober academic color scheme
- structured workflow display
- wide result tables
- clean result visualization
- CSV download support
- professional research presentation aesthetics
Suitable for:
- conferences
- thesis demonstrations
- congress presentations
- research exhibitions
Possible future extensions include:
- multi-book relevance ranking
- PDF retrieval integration
- Google Books API support
- CrossRef API support
- Semantic Scholar integration
- OCR for scanned documents
- multilingual search support
- domain-specific academic retrieval
- local vector database integration
- RAG-based scholarly retrieval
Assistant Professor (Stage II) Department of Computer Applications School of Physical Sciences Sikkim University
Research Areas:
- Internet of AI Things (IoAT)
- Edge Computing
- Large Language Models
- Multimodal Retrieval Systems
- Generative AI
- Pervasive Biomedical Informatics
Recognitions:
- Stanford Top 2% Scientist
- Fellow of IETE
- Senior Member of INAE
This project is developed for:
Feel free to use, extend, and cite for non-commercial academic purposes.
Please consider:
- starring the repository
- citing the work
- sharing with researchers
- extending the system for your domain
Building intelligent bridges between images and knowledge.