Generative Ad-Creative Analyzer (Gemini 2.0 Multimodal Pipeline)

📖 Overview

This repository implements a Next-Gen Multimodal AI pipeline designed to automate the extraction of structured feature data from raw advertising creatives.

Leveraging Google's Gemini 2.0 Flash, this system acts as an intelligent ETL layer. It "watches" video ads and "views" static images to generate high-fidelity, structured JSON metadata (e.g., mood, pacing, shot angles, text density) with significantly lower latency and higher temporal reasoning capabilities than previous architectures.

The structured data produced by this pipeline serves as the critical upstream input for downstream performance forecasting models (TCN, XGBoost), solving the "unstructured data" problem in AdTech.

🚀 Why Gemini 2.0 Flash?

Native Multimodality: Processes video frames natively (not just sampled images) for deep temporal understanding of ad pacing and narrative flow.
Ultra-Low Latency: Optimized for speed, making it viable for high-volume production pipelines processing thousands of creatives.
Structured Output Enforcement: Utilizes native Function Calling to guarantee strictly valid JSON outputs, eliminating hallucinated formatting errors.

⚡ Key Features

Multimodal Ingestion: Native support for both .mp4 video and static .jpg/.png image formats.
Temporal Video Analysis: Captures time-based features such as pacing (Fast/Slow), audio mood, and call-to-action timing.
Schema-Constrained Generation: Forces the LLM to adhere to a strict business schema, ensuring 100% compatibility with downstream SQL/Pandas pipelines.
Production Robustness: Includes polling mechanisms for asynchronous video processing states and automatic rate-limit handling.

🛠️ Technical Architecture

The Pipeline (`main.py`)

Ingestion: Scans the extraction_visuals/ directory for new creative assets.
Upload & State Management: Uploads large video files to the Gemini File API and polls for ACTIVE processing state.
Inference: Sends the asset + a strict JSON schema definition to Gemini 2.0 Flash.
Serialization: Saves the extracted metadata as sanitized JSON files in extraction_results/.

Extracted Features (Schema)

The model is constrained to extract specific dimensions known to impact ad performance:

Visual Composition: shot_type (Close-up, Wide), color_tone (Warm, Cool), camera_angle.
Text Analysis: amount_of_text, text_position, font_style.
Content: people_presence (count, emotion), setting (Indoor/Outdoor).
Audio/Pacing: music_mood, voiceover_gender, pacing.

📦 Installation

Clone the repository:

git clone [https://github.com/yourusername/generative-ad-feature-extraction.git](https://github.com/yourusername/generative-ad-feature-extraction.git)
cd generative-ad-feature-extraction

Install dependencies:
```
pip install -r requirements.txt
```
API Key Setup: Create a .env file in the root directory and add your Google Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```

⚙️ Usage

Prepare Data: Place your raw video or image files into the extraction_visuals/ folder.
Run the Extractor:
```
python main.py
```

Output: Structured JSON files will appear in extraction_results/.

Example Output:

{
  "creative_format": {
    "media_type": "Video",
    "shot_type": "Close-up",
    "text_position": "Center"
  },
  "mood_and_tone": {
    "pacing": "Fast",
    "emotional_appeal": "Excitement",
    "music_mood": "Upbeat/Electronic"
  },
  "source_filename": "nike_summer_campaign.mp4"
}

🔗 Integration with Forecasting

This repository is Part 1 of a larger MLOps ecosystem.

This Repo: Extracts structured features from pixels.
Forecasting Repos (TCN/XGBoost): Ingest these JSON features to predict the actual Leads/Clicks each ad will generate.

Author

Luciën Tuijp

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative Ad-Creative Analyzer (Gemini 2.0 Multimodal Pipeline)

📖 Overview

🚀 Why Gemini 2.0 Flash?

⚡ Key Features

🛠️ Technical Architecture

The Pipeline (`main.py`)

Extracted Features (Schema)

📦 Installation

⚙️ Usage

🔗 Integration with Forecasting

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Generative Ad-Creative Analyzer (Gemini 2.0 Multimodal Pipeline)

📖 Overview

🚀 Why Gemini 2.0 Flash?

⚡ Key Features

🛠️ Technical Architecture

The Pipeline (main.py)

Extracted Features (Schema)

📦 Installation

⚙️ Usage

🔗 Integration with Forecasting

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The Pipeline (`main.py`)

Packages