Skip to content

itallstartedwithaidea/dealer_email_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Dealership Email Scraper

A powerful Node.js command-line tool that intelligently scrapes dealership websites for staff contact information and automatically populates Google Sheets with comprehensive contact details including names, titles, phone numbers, and departments.

GitHub Node.js Google Sheets

✨ Features

  • πŸ” Intelligent Page Discovery - Uses sitemaps, navigation analysis, and footer extraction
  • 🌐 Domain Resolution - Handles redirects and tests www/non-www variants
  • πŸ“§ Comprehensive Contact Extraction - Names, titles, emails, phone numbers, departments
  • πŸ“Š Live Google Sheets Integration - Real-time updates with Service Account authentication
  • ⚑ Smart Efficiency - Stops after finding comprehensive staff directories
  • πŸ“ Enhanced CSV Export - Complete contact details with source page tracking
  • πŸ›‘οΈ Respectful Scraping - Rate limiting and robots.txt compliance
  • πŸ”„ Error Recovery - Robust handling of failed requests and timeouts

🎯 Perfect For

  • Lead Generation - Find decision makers at dealerships
  • Sales Prospecting - Get direct contact info for sales managers
  • Market Research - Analyze dealership organizational structures
  • CRM Building - Populate contact databases with verified information
  • Business Development - Connect with automotive industry professionals

πŸ“Š Example Results

From a typical dealership website:

Domain,Name,Title,Email,Phone,Department,Source Page,Date Scraped
dealership.com,"Ryan Hardy","General Sales Manager",ryan.hardy@dealership.com,"(702) 873-8888","Sales","/dealership/staff.htm",2025-09-15
dealership.com,"Alix Ventre","Internet Sales Director",alix.ventre@dealership.com,"(702) 416-4643","Sales","/dealership/staff.htm",2025-09-15
dealership.com,"Britten Battisti","Service Advisor",britten.battisti@dealership.com,"","Service","/dealership/staff.htm",2025-09-15

πŸ”§ Quick Setup

1. Install

git clone https://github.com/itallstartedwithaidea/dealer_email_scraper.git
cd dealership-email-scraper
npm install

2. Google Cloud Setup

  1. Create Service Account in Google Cloud Console
  2. Download JSON credentials as google-credentials.json
  3. Enable Google Sheets API

3. Configure

npm run setup

4. Run

# Test single domain
npm run test-single

# Full production run
npm start

πŸ“‹ Input/Output

Input (Google Sheets Column A)

Domains
www.dealership1.com
www.dealership2.com

Output (Google Sheets Columns B, C, D)

Domain Count Contact Details Date
www.dealership1.com 5 Ryan Hardy (General Sales Manager): ryan.hardy@dealership.com | (555) 123-4567 2025-09-15

πŸ” Intelligent Discovery

Page Types Found

  • Staff Pages: /staff, /team, /our-people, /dealership/staff.htm
  • Contact Pages: /contact, /contact-us, /get-in-touch
  • About Pages: /about-us/staff, /company/team
  • Department Pages: /sales-team, /service-staff

Discovery Methods

  1. Sitemap Analysis - Parses XML/TXT sitemaps
  2. Navigation Extraction - Menu and submenu links
  3. Footer Analysis - Contact links from footers
  4. Pattern Matching - Filters relevant pages only

βš™οΈ Configuration

# Environment variables (.env)
GOOGLE_SPREADSHEET_ID=your_spreadsheet_id
SHEET_NAME=dealers
MAX_PAGES_PER_DOMAIN=100
DELAY_MS=3000

πŸ“Š Performance

  • Success Rate: ~33% of domains have staff contacts
  • Processing Speed: 3-5 minutes per domain
  • Smart Efficiency: Stops after finding comprehensive staff pages
  • Rate Limiting: Respectful 3-second delays

πŸ› οΈ Commands

# Kill running processes
pkill -f node

# Test single domain
npm run test-single

# Full production run
npm start

# Monitor progress
tail -f scraper-*.log

# Count completed
grep "βœ….*Updated Google Sheets" scraper-*.log | wc -l

πŸ”’ Legal & Ethics

  • Public Information Only - Scrapes publicly available contacts
  • Rate Limiting - Respects website resources
  • Robots.txt Compliance - Follows scraping guidelines
  • Terms of Service - Users responsible for compliance

πŸ“„ License

MIT License - See LICENSE for details.

🀝 Contributing

See CONTRIBUTING.md for development guidelines.

πŸ†˜ Support

  • Issues: Open GitHub issue with setup details
  • Documentation: Check included guides
  • Testing: Run npm test for diagnostics

Built for automotive industry professionals πŸš€

About

A powerful Node.js command-line tool that intelligently scrapes dealership websites for staff contact information and automatically populates Google Sheets with comprehensive contact details including names, titles, phone numbers, and departments.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors