A powerful Node.js command-line tool that intelligently scrapes dealership websites for staff contact information and automatically populates Google Sheets with comprehensive contact details including names, titles, phone numbers, and departments.
- π Intelligent Page Discovery - Uses sitemaps, navigation analysis, and footer extraction
- π Domain Resolution - Handles redirects and tests www/non-www variants
- π§ Comprehensive Contact Extraction - Names, titles, emails, phone numbers, departments
- π Live Google Sheets Integration - Real-time updates with Service Account authentication
- β‘ Smart Efficiency - Stops after finding comprehensive staff directories
- π Enhanced CSV Export - Complete contact details with source page tracking
- π‘οΈ Respectful Scraping - Rate limiting and robots.txt compliance
- π Error Recovery - Robust handling of failed requests and timeouts
- Lead Generation - Find decision makers at dealerships
- Sales Prospecting - Get direct contact info for sales managers
- Market Research - Analyze dealership organizational structures
- CRM Building - Populate contact databases with verified information
- Business Development - Connect with automotive industry professionals
From a typical dealership website:
Domain,Name,Title,Email,Phone,Department,Source Page,Date Scraped
dealership.com,"Ryan Hardy","General Sales Manager",ryan.hardy@dealership.com,"(702) 873-8888","Sales","/dealership/staff.htm",2025-09-15
dealership.com,"Alix Ventre","Internet Sales Director",alix.ventre@dealership.com,"(702) 416-4643","Sales","/dealership/staff.htm",2025-09-15
dealership.com,"Britten Battisti","Service Advisor",britten.battisti@dealership.com,"","Service","/dealership/staff.htm",2025-09-15git clone https://github.com/itallstartedwithaidea/dealer_email_scraper.git
cd dealership-email-scraper
npm install- Create Service Account in Google Cloud Console
- Download JSON credentials as
google-credentials.json - Enable Google Sheets API
npm run setup# Test single domain
npm run test-single
# Full production run
npm startDomains
www.dealership1.com
www.dealership2.com
| Domain | Count | Contact Details | Date |
|---|---|---|---|
| www.dealership1.com | 5 | Ryan Hardy (General Sales Manager): ryan.hardy@dealership.com | (555) 123-4567 | 2025-09-15 |
- Staff Pages:
/staff,/team,/our-people,/dealership/staff.htm - Contact Pages:
/contact,/contact-us,/get-in-touch - About Pages:
/about-us/staff,/company/team - Department Pages:
/sales-team,/service-staff
- Sitemap Analysis - Parses XML/TXT sitemaps
- Navigation Extraction - Menu and submenu links
- Footer Analysis - Contact links from footers
- Pattern Matching - Filters relevant pages only
# Environment variables (.env)
GOOGLE_SPREADSHEET_ID=your_spreadsheet_id
SHEET_NAME=dealers
MAX_PAGES_PER_DOMAIN=100
DELAY_MS=3000- Success Rate: ~33% of domains have staff contacts
- Processing Speed: 3-5 minutes per domain
- Smart Efficiency: Stops after finding comprehensive staff pages
- Rate Limiting: Respectful 3-second delays
# Kill running processes
pkill -f node
# Test single domain
npm run test-single
# Full production run
npm start
# Monitor progress
tail -f scraper-*.log
# Count completed
grep "β
.*Updated Google Sheets" scraper-*.log | wc -l- Public Information Only - Scrapes publicly available contacts
- Rate Limiting - Respects website resources
- Robots.txt Compliance - Follows scraping guidelines
- Terms of Service - Users responsible for compliance
MIT License - See LICENSE for details.
See CONTRIBUTING.md for development guidelines.
- Issues: Open GitHub issue with setup details
- Documentation: Check included guides
- Testing: Run
npm testfor diagnostics
Built for automotive industry professionals π