A Shiny for Python web application that finds, downloads, and extracts data from photovoltaic (PV) module and inverter specification sheets using AI.
- Find Spec Sheets: Search the web for manufacturer spec sheets and download PDFs automatically
- Extract Data: Upload a PDF spec sheet and extract key electrical and physical parameters using AI
- Multi-Agent Extraction: Enhanced extraction pipeline with document analysis, validation, and error correction
- Batch Processing: Download multiple spec sheets from a manufacturer and extract data from multiple PDFs
- Dual LLM Support: Works with both Anthropic Claude and OpenAI-compatible APIs
# Clone the repository
git clone https://github.com/joshuasstein/spec_sheet_shinyapp.git
cd spec_sheet_shinyapp
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root with your API credentials:
# For Anthropic Claude (recommended)
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
# For OpenAI-compatible API (optional fallback)
OPENAI_API_KEY=your-key-here
OPENAI_BASE_URL=https://your-endpoint/v1The application will try Anthropic first, then fall back to OpenAI-compatible API if configured.
Start the Shiny app:
shiny run app.pyThen open your browser to the URL shown (typically http://127.0.0.1:8000).
- Find Spec Sheet: Enter a manufacturer and model to search for and download a spec sheet PDF
- Extract Data: Upload a PDF to extract module parameters (power, voltage, current, temperature coefficients, dimensions, etc.)
- Batch Find: Download multiple spec sheets from a manufacturer at once
- Batch Extract: Process all PDFs in a folder and export results to CSV
The extractor captures the following data from PV module spec sheets:
| Parameter | Description |
|---|---|
| model_name | Module model name |
| power_rating | Rated power (W) |
| Voc_V | Open circuit voltage (V) |
| Isc_A | Short circuit current (A) |
| Vmp_V | Voltage at max power (V) |
| Imp_A | Current at max power (A) |
| power_tempco | Power temperature coefficient |
| Voc_tempco | Voc temperature coefficient |
| Isc_tempco | Isc temperature coefficient |
| module_length_mm | Module length (mm) |
| module_width_mm | Module width (mm) |
| weight_kg | Module weight (kg) |
| NOCT_degC | Nominal Operating Cell Temperature (°C) |
| number_cells_per_module | Cell count |
| cell_technology | Cell type (Mono, Poly, etc.) |
spec_sheet_shinyapp/
├── app.py # Main Shiny application
├── llm_client.py # Unified LLM client (Anthropic/OpenAI)
├── pv_module_finder.py # Web search and PDF download
├── pv_inverter_finder.py # Inverter spec sheet finder
├── pv_module_extractor.py # Single-agent data extraction
├── pv_module_extractor_multiagent.py # Multi-agent extraction pipeline
├── batch_finder.py # Batch spec sheet download
├── batch_extractor.py # Batch data extraction
├── requirements.txt # Python dependencies
└── .env # API credentials (not in repo)
The multi-agent extraction pipeline uses specialized AI agents for improved accuracy:
- Document Analyzer: Identifies document structure and power classes
- Power Class Extractor: Extracts parameters for each power variant
- Data Validator: Validates data against physical constraints (Vmp < Voc, etc.)
- Error Corrector: Re-extracts values that fail validation
- Python 3.10+
- See
requirements.txtfor package dependencies
MIT