✅ SYSTEM IS READY for Election Day data import with proper Democratic/Republican party separation.
The scraper is configured to handle large-scale Election Day data with:
- Automatic party detection from ballot styles
- Separate tracking for Democratic and Republican primaries
- Robust database schema with proper indexes
- Memory-efficient batch processing
- Fallback handling for unknown party voters
ELECTION_FILTERS = {
'2026 REPUBLICAN PRIMARY': ('2026-03-03', '2026', 'primary', 'Republican'),
'2026 DEMOCRATIC PRIMARY': ('2026-03-03', '2026', 'primary', 'Democratic'),
'2024 REPUBLICAN PRIMARY': ('2024-03-05', '2024', 'primary', 'Republican'),
'2024 DEMOCRATIC PRIMARY': ('2024-03-05', '2024', 'primary', 'Democratic'),
}The scraper fetches BOTH Democratic and Republican primary data as separate datasets.
The scraper uses a three-tier approach to determine party:
Tier 1: Election-level party (from ELECTION_FILTERS)
- If scraping "2026 REPUBLICAN PRIMARY", default party is 'Republican'
- If scraping "2026 DEMOCRATIC PRIMARY", default party is 'Democratic'
Tier 2: Ballot style detection (overrides Tier 1)
if 'DEM' in ballot_style.upper() or 'DEMOCRATIC' in ballot_style.upper():
determined_party = 'Democratic'
elif 'REP' in ballot_style.upper() or 'REPUBLICAN' in ballot_style.upper():
determined_party = 'Republican'Tier 3: Unknown fallback
- If no party can be determined, marks as 'Unknown' (displayed in gray on map)
Each voter's party is stored in TWO places:
voter_elections table:
party_votedcolumn stores the party for THIS specific election- Allows tracking party changes over time
- Used for flip detection
voters table:
current_partycolumn stores the most recent party- Updated automatically after import
- Used for quick filtering
-- Popup speed optimization
CREATE INDEX idx_voters_lat_lng ON voters(lat, lng);
CREATE INDEX idx_voters_address_upper ON voters(UPPER(address));
CREATE INDEX idx_ve_vuid_date_party ON voter_elections(vuid, election_date, party_voted);
CREATE INDEX idx_ve_date_party ON voter_elections(election_date, party_voted);
CREATE INDEX idx_ve_vuid_date ON voter_elections(vuid, election_date);
-- Heatmap and stats optimization
CREATE INDEX idx_ve_date_method_party ON voter_elections(election_date, voting_method, party_voted);
CREATE INDEX idx_voters_county_geocoded ON voters(county, geocoded);
CREATE INDEX idx_voters_county_vuid ON voters(county, vuid);These indexes ensure fast queries even with 1M+ records.
The scraper uses write_batch_direct() (lines 78-139) which:
- Writes in batches to reduce database locks
- Uses
ON CONFLICTto handle duplicates gracefully - Preserves existing data (doesn't overwrite good data with blanks)
- Commits in transactions for data integrity
- Total RAM: 4GB
- Free RAM: 2.9GB (Ollama disabled)
- Database: SQLite with WAL mode (concurrent reads during writes)
- Estimated voters: 1.5-2 million statewide
- Database size increase: ~500MB-1GB
- Memory during import: ~200-300MB peak
- Import time: 15-30 minutes (depending on API speed)
Streaming CSV parsing (line 207):
reader = csv.DictReader(io.StringIO(csv_text))
for row in reader:
# Process one row at a timeBatch commits (lines 78-139):
- Accumulates records in memory
- Writes in batches of 1000
- Clears memory after each batch
No GeoJSON generation during import:
- Import only writes to database
- GeoJSON generated on-demand when user loads map
- Saves ~1GB of memory during import
GET /api-ivis-system/api/v1/getFile?type=EVR_ELECTION
Returns list of all available elections with IDs.
Tries statewide first:
GET /api-ivis-system/api/v1/getFile?type=ELECTION_DAY&electionId={id}&electionDate={date}
Falls back to county-by-county if needed:
GET /api-ivis-system/api/v1/getFile?type=ELECTION_DAY&electionId={id}&electionDate={date}&countyId={county}
For each voter in the CSV:
- Extract VUID, name, address, precinct, ballot style
- Determine party (using 3-tier logic above)
- Create voter record (if new) or update (if exists)
- Create voter_elections record with party_voted
- Batch write to database every 1000 records
Scraper maintains state file to avoid re-importing:
/opt/whovoted/data/election_day_scraper_state.json
Safe to run multiple times - won't duplicate data.
python3 -c "
import sys
sys.path.insert(0, '/opt/whovoted/backend')
import database as db
with db.get_db() as conn:
total = conn.execute('''
SELECT COUNT(*) FROM voter_elections
WHERE election_date = '2026-03-03'
''').fetchone()[0]
dem = conn.execute('''
SELECT COUNT(*) FROM voter_elections
WHERE election_date = '2026-03-03' AND party_voted = 'Democratic'
''').fetchone()[0]
rep = conn.execute('''
SELECT COUNT(*) FROM voter_elections
WHERE election_date = '2026-03-03' AND party_voted = 'Republican'
''').fetchone()[0]
unknown = conn.execute('''
SELECT COUNT(*) FROM voter_elections
WHERE election_date = '2026-03-03' AND party_voted = 'Unknown'
''').fetchone()[0]
print(f'Total: {total:,}')
print(f'Democratic: {dem:,}')
print(f'Republican: {rep:,}')
print(f'Unknown: {unknown:,}')
"python3 -c "
import sys
sys.path.insert(0, '/opt/whovoted/backend')
import database as db
with db.get_db() as conn:
rows = conn.execute('''
SELECT v.county, ve.party_voted, COUNT(*) as cnt
FROM voter_elections ve
JOIN voters v ON ve.vuid = v.vuid
WHERE ve.election_date = '2026-03-03'
GROUP BY v.county, ve.party_voted
ORDER BY v.county, ve.party_voted
''').fetchall()
for r in rows:
print(f\"{r['county']}: {r['party_voted']} = {r['cnt']:,}\")
"Cause: Ballot style field is empty or doesn't contain party keywords
Impact: Voters displayed in gray on map (still counted, just no party color)
Solution: This is expected and acceptable. Unknown voters are still tracked.
Cause: Civix API may throttle requests if scraping too fast
Solution: Scraper includes retry logic and delays between county requests
Symptoms: Server becomes slow, database locks
Solution:
- Stop gunicorn during import:
sudo pkill -9 gunicorn - Run scraper:
python3 deploy/election_day_scraper.py - Restart gunicorn after import completes
Cause: Running scraper multiple times
Solution: Database uses ON CONFLICT to handle duplicates gracefully. Safe to re-run.
✅ Database indexes in place (added in Task 4)
✅ Party detection logic tested (3-tier approach)
✅ Batch processing configured (1000 records per batch)
✅ Memory available (2.9GB free, need ~300MB)
✅ State tracking enabled (prevents duplicates)
✅ Error logging configured (/opt/whovoted/data/election_day_scraper.log)
✅ Missing function fixed (_county_has_prior_data() added in Task 4)
cd /opt/whovoted
python3 deploy/election_day_scraper.pyTime: 15-30 minutes Impact: Data imported, but caches not updated (map may show old data until refresh)
cd /opt/whovoted
bash deploy/deploy_election_day_update.shTime: 45-60 minutes Impact:
- Imports Election Day data
- Fixes first-time voter flags
- Regenerates district cache
- Regenerates county reports
- Regenerates gazette cache
- Full system ready for users
# Step 1: Stop gunicorn (optional, reduces memory pressure)
sudo pkill -9 gunicorn
# Step 2: Import Election Day data
cd /opt/whovoted
python3 deploy/election_day_scraper.py
# Step 3: Fix first-time voter flags
python3 deploy/fix_new_voter_flags.py
# Step 4: Regenerate caches
python3 deploy/cache_districts_only.py
python3 deploy/regenerate_county_report_cache.py
python3 deploy/generate_statewide_gazette_cache.py
# Step 5: Restart gunicorn
cd /opt/whovoted/backend
source ../venv/bin/activate
nohup gunicorn -c gunicorn_config.py -b 127.0.0.1:5000 app:app > /opt/whovoted/logs/gunicorn.log 2>&1 &tail -100 /opt/whovoted/data/election_day_scraper.logLook for:
- "Successfully processed [election name]"
- "Found [X] voters"
- No error messages
Run the verification queries above to check:
- Total voters imported
- Democratic vs Republican split
- Unknown party count (should be minimal)
- Open https://politiquera.com
- Select "2026 Primary" from dropdown
- Verify:
- Blue dots for Democratic voters
- Red dots for Republican voters
- Gray dots for Unknown voters (if any)
- Stats box shows correct totals
- Click on a voter dot
- Verify:
- Name, address, precinct displayed
- Party affiliation shown
- Flip status (if applicable)
- First-time voter badge (if applicable)
The system is fully ready for Election Day data import with:
- Robust party separation - 3-tier detection logic ensures accurate party assignment
- Optimized database - Indexes in place for fast queries with large datasets
- Memory-efficient processing - Batch writes and streaming parsing
- Error handling - Graceful fallbacks for unknown parties and API failures
- State tracking - Safe to re-run without duplicating data
- Verification tools - SQL queries to confirm data integrity
Recommendation: Use Option 2 (Full Deployment) for best results. This ensures all caches are updated and the system is fully ready for users.
Estimated Time: 45-60 minutes total (mostly automated)
Risk Level: Low - scraper has been tested and includes error handling
If you encounter any problems during import:
- Check scraper log:
/opt/whovoted/data/election_day_scraper.log - Check gunicorn log:
/opt/whovoted/logs/gunicorn.log - Verify database integrity with verification queries above
- Re-run scraper if needed (safe to run multiple times)