Skip to content

jsstein/spec_sheet_shinyapp

Repository files navigation

PV Spec Sheet Finder and Extractor

A Shiny for Python web application that finds, downloads, and extracts data from photovoltaic (PV) module and inverter specification sheets using AI.

Features

  • Find Spec Sheets: Search the web for manufacturer spec sheets and download PDFs automatically
  • Extract Data: Upload a PDF spec sheet and extract key electrical and physical parameters using AI
  • Multi-Agent Extraction: Enhanced extraction pipeline with document analysis, validation, and error correction
  • Batch Processing: Download multiple spec sheets from a manufacturer and extract data from multiple PDFs
  • Dual LLM Support: Works with both Anthropic Claude and OpenAI-compatible APIs

Installation

# Clone the repository
git clone https://github.com/joshuasstein/spec_sheet_shinyapp.git
cd spec_sheet_shinyapp

# Install dependencies
pip install -r requirements.txt

Configuration

Create a .env file in the project root with your API credentials:

# For Anthropic Claude (recommended)
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

# For OpenAI-compatible API (optional fallback)
OPENAI_API_KEY=your-key-here
OPENAI_BASE_URL=https://your-endpoint/v1

The application will try Anthropic first, then fall back to OpenAI-compatible API if configured.

Usage

Start the Shiny app:

shiny run app.py

Then open your browser to the URL shown (typically http://127.0.0.1:8000).

Tabs

  1. Find Spec Sheet: Enter a manufacturer and model to search for and download a spec sheet PDF
  2. Extract Data: Upload a PDF to extract module parameters (power, voltage, current, temperature coefficients, dimensions, etc.)
  3. Batch Find: Download multiple spec sheets from a manufacturer at once
  4. Batch Extract: Process all PDFs in a folder and export results to CSV

Extracted Parameters

The extractor captures the following data from PV module spec sheets:

Parameter Description
model_name Module model name
power_rating Rated power (W)
Voc_V Open circuit voltage (V)
Isc_A Short circuit current (A)
Vmp_V Voltage at max power (V)
Imp_A Current at max power (A)
power_tempco Power temperature coefficient
Voc_tempco Voc temperature coefficient
Isc_tempco Isc temperature coefficient
module_length_mm Module length (mm)
module_width_mm Module width (mm)
weight_kg Module weight (kg)
NOCT_degC Nominal Operating Cell Temperature (°C)
number_cells_per_module Cell count
cell_technology Cell type (Mono, Poly, etc.)

Project Structure

spec_sheet_shinyapp/
├── app.py                          # Main Shiny application
├── llm_client.py                   # Unified LLM client (Anthropic/OpenAI)
├── pv_module_finder.py             # Web search and PDF download
├── pv_inverter_finder.py           # Inverter spec sheet finder
├── pv_module_extractor.py          # Single-agent data extraction
├── pv_module_extractor_multiagent.py  # Multi-agent extraction pipeline
├── batch_finder.py                 # Batch spec sheet download
├── batch_extractor.py              # Batch data extraction
├── requirements.txt                # Python dependencies
└── .env                            # API credentials (not in repo)

Multi-Agent Extraction

The multi-agent extraction pipeline uses specialized AI agents for improved accuracy:

  1. Document Analyzer: Identifies document structure and power classes
  2. Power Class Extractor: Extracts parameters for each power variant
  3. Data Validator: Validates data against physical constraints (Vmp < Voc, etc.)
  4. Error Corrector: Re-extracts values that fail validation

Requirements

  • Python 3.10+
  • See requirements.txt for package dependencies

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages