Skip to content

lucien150/generative-ad-feature-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Generative Ad-Creative Analyzer (Gemini 2.0 Multimodal Pipeline)

📖 Overview

This repository implements a Next-Gen Multimodal AI pipeline designed to automate the extraction of structured feature data from raw advertising creatives.

Leveraging Google's Gemini 2.0 Flash, this system acts as an intelligent ETL layer. It "watches" video ads and "views" static images to generate high-fidelity, structured JSON metadata (e.g., mood, pacing, shot angles, text density) with significantly lower latency and higher temporal reasoning capabilities than previous architectures.

The structured data produced by this pipeline serves as the critical upstream input for downstream performance forecasting models (TCN, XGBoost), solving the "unstructured data" problem in AdTech.

🚀 Why Gemini 2.0 Flash?

  • Native Multimodality: Processes video frames natively (not just sampled images) for deep temporal understanding of ad pacing and narrative flow.
  • Ultra-Low Latency: Optimized for speed, making it viable for high-volume production pipelines processing thousands of creatives.
  • Structured Output Enforcement: Utilizes native Function Calling to guarantee strictly valid JSON outputs, eliminating hallucinated formatting errors.

⚡ Key Features

  • Multimodal Ingestion: Native support for both .mp4 video and static .jpg/.png image formats.
  • Temporal Video Analysis: Captures time-based features such as pacing (Fast/Slow), audio mood, and call-to-action timing.
  • Schema-Constrained Generation: Forces the LLM to adhere to a strict business schema, ensuring 100% compatibility with downstream SQL/Pandas pipelines.
  • Production Robustness: Includes polling mechanisms for asynchronous video processing states and automatic rate-limit handling.

🛠️ Technical Architecture

The Pipeline (main.py)

  1. Ingestion: Scans the extraction_visuals/ directory for new creative assets.
  2. Upload & State Management: Uploads large video files to the Gemini File API and polls for ACTIVE processing state.
  3. Inference: Sends the asset + a strict JSON schema definition to Gemini 2.0 Flash.
  4. Serialization: Saves the extracted metadata as sanitized JSON files in extraction_results/.

Extracted Features (Schema)

The model is constrained to extract specific dimensions known to impact ad performance:

  • Visual Composition: shot_type (Close-up, Wide), color_tone (Warm, Cool), camera_angle.
  • Text Analysis: amount_of_text, text_position, font_style.
  • Content: people_presence (count, emotion), setting (Indoor/Outdoor).
  • Audio/Pacing: music_mood, voiceover_gender, pacing.

📦 Installation

  1. Clone the repository:

    git clone [https://github.com/yourusername/generative-ad-feature-extraction.git](https://github.com/yourusername/generative-ad-feature-extraction.git)
    cd generative-ad-feature-extraction
  2. Install dependencies:

    pip install -r requirements.txt
  3. API Key Setup: Create a .env file in the root directory and add your Google Gemini API key:

    GEMINI_API_KEY=your_api_key_here

⚙️ Usage

  1. Prepare Data: Place your raw video or image files into the extraction_visuals/ folder.

  2. Run the Extractor:

    python main.py
  3. Output: Structured JSON files will appear in extraction_results/.

    Example Output:

    {
      "creative_format": {
        "media_type": "Video",
        "shot_type": "Close-up",
        "text_position": "Center"
      },
      "mood_and_tone": {
        "pacing": "Fast",
        "emotional_appeal": "Excitement",
        "music_mood": "Upbeat/Electronic"
      },
      "source_filename": "nike_summer_campaign.mp4"
    }

🔗 Integration with Forecasting

This repository is Part 1 of a larger MLOps ecosystem.

  1. This Repo: Extracts structured features from pixels.
  2. Forecasting Repos (TCN/XGBoost): Ingest these JSON features to predict the actual Leads/Clicks each ad will generate.

Author

Luciën Tuijp

About

Multimodal AI ETL pipeline using Google Gemini 2.0 Flash to extract structured metadata (JSON) from raw video and image creatives. Automates feature engineering for downstream forecasting models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages