Replace Azure Blob in data_acquisition.py with Google Drive as the pipeline data source.
- Authenticate via Google Drive API (service account JSON, stored as secret)
- List files from Drive folder (support PDF, DOCX, TXT)
- Download to temp dir, then pass paths to ocr_extraction
- Add GOOGLE_DRIVE_FOLDER_ID, GOOGLE_SERVICE_ACCOUNT_JSON to .env.example
- Handle pagination for folders exceeding 100 files
Replace Azure Blob in data_acquisition.py with Google Drive as the pipeline data source.