Automate external asset discovery, intelligently enrich with multi-source data, and leverage ensemble machine learning to predict and prioritize risk. Transform alert fatigue into actionable intelligence.
ThreatSentry AI eliminates security alert fatigue and enables proactive threat hunting through intelligent risk prioritization. It combines automated asset discovery, multi-source data enrichment, and advanced machine learning to transform raw security data into actionable intelligence.
Project Lead: EclipseManic
Development Note: This comprehensive enterprise security platform was architected and developed by a single developer with assistance from AI development tools for code generation, optimization, and documentationβdemonstrating the viability of AI-augmented software engineering for complex systems.
Modern security teams face unprecedented challenges:
- Alert Fatigue: Thousands of daily alerts with signal-to-noise ratios making manual triage impossible
- Fragmented Data: Critical context scattered across SIEM, CMDB, patch systems, and network monitoring
- Reactive Posture: Responding to known threats rather than hunting for emerging risks
- Resource Constraints: Limited budgets and personnel in increasingly complex infrastructure
ThreatSentry AI solves these problems through:
- Automated External Asset Discovery via Shodan API for continuous visibility
- Intelligent Multi-Source Enrichment from NVD, internal systems, and behavioral analytics
- ML-Powered Risk Scoring using ensemble models for accurate threat prioritization
- Executive-Ready Dashboards for actionable intelligence and rapid response
- Proactive Alerting that surfaces high-risk assets before incidents occur
- Ensemble Machine Learning Model: Combines Random Forest, Gradient Boosting, and Neural Networks for robust risk classification
- Multi-Factor Analysis: Evaluates 40+ security and business attributes beyond simple vulnerability counts:
- Temporal Context: Exposure duration, patch lag, incident history
- Network Position: Critical service status, network segment, firewall protection
- Behavioral Signals: Authentication failures, traffic anomalies, false positive history
- Compliance Impact: Data sensitivity, regulatory requirements, connected critical assets
- Confidence Scoring: Each prediction includes a confidence metric (0-1) for analysts to gauge reliability
- Continuous Learning: Model retrains automatically on configurable intervals with feedback integration
- Shodan Integration: Continuous discovery of internet-facing devices with advanced query support
- Preset queries for common scenarios (SSL certificates, RDP services, ICS/Modbus, etc.)
- Custom query support for organization-specific asset hunting
- NVD Enrichment: Automatic CVE correlation using intelligent banner parsing
- Banner service extraction supporting 14+ product types (Apache, Nginx, MySQL, IIS, etc.)
- CVSS scoring and severity classification
- Prevents data loss: Only updates CVE metrics when vulnerabilities are found
- Internal Data Integration (Extensible architecture for your environment):
- CMDB Collector: Asset classification, criticality levels, compliance tags
- SIEM Collector: Behavioral metrics, authentication patterns, anomaly scores
- Patch Management Collector: Patch currency, missing updates, patch lag analysis
- Network Monitoring Collector: Traffic patterns, DDoS detection, anomaly scores
- Unified Database: SQLite with 40+ indexed columns for efficient querying and reporting
- PyQt5-Based GUI with dark/light theme support:
- Real-time risk distribution visualization (bar charts with Matplotlib)
- Sortable, filterable device table with color-coded risk indicators
- Quick-filter by Organization, Country, Risk Level
- Auto-search across IP, Org, Country, Risk fields
- Interactive Device Inspection:
- Detailed vulnerability list with CVSS scores
- Vulnerability timeline and historical tracking
- Risk factor breakdown explaining the scoring
- Bulk Operations:
- Manual scan triggers (Shodan + NVD enrichment)
- CSV/JSON data import with field validation
- Model retraining with performance metrics
- Manual alert sending to validate configurations
- Analytics Panel (Advanced):
- Risk distribution trends
- CVE impact analysis
- Organization-wise vulnerability metrics
- SendGrid Integration: Automatic HTML email notifications for high-risk assets
- Triggered immediately upon risk label change (Low/Medium β High)
- Prevents duplicate alerts with "notified" status tracking
- Rich Alert Content:
- Executive summary with risk score and confidence
- Detailed vulnerability list (top N by CVSS)
- Risk factor breakdown for security team context
- Actionable remediation recommendations
- Flexible Configuration: Define alert recipients, email templates, and trigger conditions
![Automated Security Alert Email Template]
HTML formatted email with risk summary, CVE details, and remediation guidance sent via SendGrid
- Performance Monitoring:
- Accuracy tracking across training epochs
- Confusion matrix and classification reports
- Feature importance analysis to understand model decisions
- Data drift detection indicators
- Feedback Loop:
- Manual risk label corrections by analysts
- True positive/false positive tracking
- Model weight adjustments based on feedback
- Automated retraining schedule with metadata logging
- Optimized for Scale:
- Database indexing on 5+ columns for sub-millisecond queries
- Result caching for frequently accessed data (28.7x speedup)
- Pagination with "Load More" for large datasets (50 rows initial + 50 row increments)
- Non-blocking UI with worker threads (QThread, ThreadPoolExecutor)
- Robust Error Handling:
- Exponential backoff retry logic for API failures (Shodan, NVD, SendGrid)
- Graceful degradation if optional services unavailable
- Comprehensive logging with file rotation
- Thread-safe signal/slot architecture prevents race conditions
- Memory Efficient:
- Garbage collection after chart renders
- Lazy-loaded UI tabs to reduce startup time
- Session context managers ensure proper cleanup
![ThreatSentry AI Dashboard - Main Threat Hunting Interface]
Real-time risk visualization with sortable device table and risk distribution bar chart
![Device Table with Risk Indicators]
Sortable and filterable device listing with color-coded risk levels (Green=Low, Yellow=Medium, Red=High)
![Advanced Filters - CVSS Range, Organization, Country, Risk Level]
Powerful filtering controls with CVSS range, organization, country, and risk level filters + "Clear All" button
- CVSS Range Filter: Adjust minimum and maximum CVSS scores (0-10) to focus on specific severity levels
- Organization Filter: Quick filter by specific organization from a dropdown of all organizations in your database
- Country Filter: Filter by geographical location to identify regional risks
- Risk Level Filter: Display only High, Medium, Low, or All devices for focused analysis
- Clear All Button: One-click reset of all filters to default values to see full dataset again
- Apply Filters: Instantly apply all selected filters with results reflected in real-time
![Search Input - Multi-Column Search Across Entire Database]
Intelligent search that spans entire device database, not just currently displayed rows
- Comprehensive Search: Search across IP addresses, organization names, countries, and risk levels
- Auto-IP Detection: Automatically detects IP address format and searches accordingly
- Full Database Coverage: Search results include ALL devices in database, not limited to first 50 rows
- Real-Time Results: Debounced 300ms delay for responsive search without performance impact
- Pagination Integration: Search results respect pagination system for efficient loading
![Load More Button - Pagination Controls]
Optimized pagination system for handling thousands of devices efficiently
- Default 50-Row Display: Dashboard loads with initial 50 devices for fast rendering
- Progressive Loading: "Load More (50 rows)" button allows incremental loading without timeout
- Memory Efficient: Only displays requested rows; never loads entire dataset into memory
- Performance Optimized: Non-blocking UI prevents freezing when loading large datasets
- Status Indicator: Shows "Showing X of Y devices" for transparency on total available data
- Search Integration: Search filters work seamlessly with pagination for fast results
![Analytics Tab - Risk Trends and CVE Analysis]
Historical risk trends, vulnerability analysis, and organization-wise security metrics
![Tools Tab - Model Status and Data Export]
Model training information, performance metrics, data export, and advanced filtering options
![Automated Security Alert Email]
HTML formatted email with executive summary, vulnerability details, and remediation guidance
![Scan Trigger Dialog]
Execute Shodan + NVD enrichment with detailed error reporting and progress indication
| Layer | Technologies |
|---|---|
| Backend | Python 3.9+, SQLAlchemy ORM, APScheduler |
| ML/AI | Scikit-learn (Random Forest, Gradient Boosting, MLP), NumPy, Joblib |
| Frontend | PyQt5, Matplotlib, Custom theme manager |
| Data | Pandas, NumPy, SQLite3 |
| APIs | Shodan, NVDLib, SendGrid, Requests |
| Utilities | Python-dotenv, Logging module, Config management |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ThreatSentry AI β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Scheduler (APScheduler) β
ββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββββ¬ββββββββββ€
β β β β β β
βΌ βΌ βΌ βΌ βΌ βΌ
Shodan NVD Lib CMDB SIEM Patch Network
Collector Collector Collector Collector Mgmt Monitor
β β β β β
ββββββββββββββββ΄βββββββββββββββ΄ββββββββββββ΄ββββββββββ
β
βΌ
βββββββββββββββββββ
β SQLite DB β
β (40+ Columns) β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
βΌ βΌ βΌ
Feature Engine Model Training Predictions
β β β
βββββββββββ¬ββββββββ΄βββββββββββ¬βββββββ
βΌ βΌ
ββββββββββββββββββββββββ
β Ensemble Model β
β (RF + GB + MLP) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
PyQt5 GUI Email Alerts Analytics
-
Discovery Phase (Configurable interval, default 30 min)
- Shodan scan with configurable queries
- NVD enrichment with CVE correlation
- Internal system enrichment for context
-
Analysis Phase
- Feature engineering from 40+ attributes
- Ensemble model prediction (Random Forest 40% + Gradient Boosting 40% + MLP 20%)
- Risk label generation (0=Low, 1=Medium, 2=High)
- Confidence scoring
-
Alerting Phase
- Check for new high-risk assets
- Generate and send email alerts via SendGrid
- Update notification status
-
Model Retraining (Configurable interval, default 60 min)
- Load all historical data
- Extract features
- Train ensemble with balanced class weights
- Validate performance metrics
- Save metadata for auditing
- Python: 3.9 or later
- API Keys (required):
- Shodan API key (https://www.shodan.io/)
- SendGrid API key (https://sendgrid.com/) - for email alerts
- Optional API Keys (for enhanced enrichment):
- CMDB endpoint and credentials
- SIEM endpoint and credentials
- Patch management system endpoint and credentials
- Network monitoring endpoint and credentials
git clone https://github.com/EclipseManic/ThreatSentry-AI.git
cd ThreatSentry-AI# On Windows
python -m venv .venv
.venv\Scripts\activate
# On macOS/Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root:
cp .env.example .env # If provided, or create newEdit .env with your credentials:
# Required - Threat Discovery
SHODAN_API_KEY=your_shodan_api_key_here
SHODAN_QUERY= # Leave empty to use presets
SHODAN_QUERY_EMPTY_TO_PRESET=True # Use preset queries when SHODAN_QUERY is empty
# Optional - Email Alerts
SENDGRID_API_KEY=your_sendgrid_key_here
SENDER_EMAIL=alerts@yourcompany.com # Must be verified in SendGrid
ALERT_RECIPIENTS=security@yourcompany.com,ciso@yourcompany.com
# Optional - Internal Enrichment (Implement in collectors/)
CMDB_API_ENDPOINT=https://cmdb.internal/api
CMDB_API_KEY=your_cmdb_key
SIEM_API_ENDPOINT=https://siem.internal/api
SIEM_API_KEY=your_siem_key
# Configuration
SCAN_INTERVAL_MINUTES=30 # How often to scan for new assets
RETRAIN_INTERVAL_MINUTES=60 # How often to retrain the model
MAX_SHODAN_RESULTS=50 # Results per Shodan query
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
# Paths (Optional)
SQLITE_PATH=data/db/threat_sentric_ai.db
MODEL_PATH=data/models/rf_model.pkl
LOG_FILE_PATH=data/logs/threat_sentric_ai.logpython -c "from data import init_db; init_db()"python run.pyThe dashboard will launch with the scheduler running in the background.
![Dashboard Main Interface with All Tabs] Overview, Analytics, and Tools tabs for comprehensive threat assessment
![Organization and Country Filters with Device List] Sortable device table with color-coded risk indicators (Green=Low, Yellow=Medium, Red=High)
- Color Coding: Instant visual risk assessment
- Sortable/Filterable: Click column headers or use Quick Filter for rapid searching
- Real-time Updates: Dashboard refreshes as new threats detected
![Risk Distribution Bar Chart] Overall security posture visualization with device counts per risk level
- Shows count of devices across all risk categories
- Updates in real-time as model predictions change
- Identifies security hotspots requiring immediate attention
- Scan Now: Manually trigger Shodan + NVD enrichment with detailed error reporting
- Refresh: Update dashboard from latest database state
- Upload Data: Bulk import CSV/JSON with device information
- Train Model: Manually retrain ensemble model with current data
![Analytics Panel - Risk Trends and Metrics]
- Detailed risk trends over time with historical analysis
- Top vulnerable services identification
- Organization-wise risk metrics and comparisons
- CVE impact analysis and vulnerability trending
![Tools Panel - Model Status and Export
- Model status and training information
SHODAN_QUERIES = {
"default": "product:apache",
"web_apps": "http.title:\"login\" org:\"Your Company\"",
"database": "port:27017 OR port:3306",
"iot": "device:camera OR device:printer",
"rdp": "port:3389",
"vpn": "port:500 OR port:1194"
}Implement in collectors/ directory:
- Copy template from existing collector
- Update
_collect()method with your API calls - Return enrichment data
- Register in scheduler (
core/scheduler.py)
Edit alerts/email_alerts.py:
- Modify HTML template for branded emails
- Add custom risk factor descriptions
- Adjust remediation recommendations
Core Identifiers: ip, org, country, first_seen, last_seen
Vulnerability: cve_count, max_cvss, vulnerabilities (rel)
Security Metrics: auth_failures_24h, traffic_anomaly_score, patch_lag_days
Risk Assessment: risk_label, risk_score, confidence_score
Context: network_segment, service_category, is_critical_service
Compliance: compliance_requirements, data_sensitivity_level
Historical: incident_history_count, last_compromise_date, false_positive_count
Alerting: notified, alert_history
Linked to Device: device_id (FK)
CVE Info: cve_id, cvss, summary
- Open dashboard β Review Risk=High devices (red)
- Click device β View detailed CVE list
- Note organization and infrastructure type
- Cross-reference with SIEM for recent suspicious activities
- Prioritize remediation based on criticality and patch lag
- Received alert about new high-risk device
- Dashboard shows vulnerability details and risk factors
- Check Asset Management tab β See if device is known
- Send manual alert to on-call SOC team
- After remediation, dashboard auto-updates when Shodan reflects changes
- Go to Analytics tab
- Export risk distribution and trend charts
- Identify organizational risk hotspots
- Generate remediation roadmap
- Track progress over time with periodic re-exports
- Solution: Verify Shodan API key in
.envfile. Dashboard will show error but continue processing - Note: NVD enrichment won't run (prevents data loss on API failures)
- Solution: Check banner extraction. Edit
collectors/nvd_collector.pykeyword list - Prevention: Manual enrichment via CSV upload to set CVE data manually
- Solution: Retrain with more labeled data. Provide feedback on misclassified devices
- Prevention: Use feedback system to improve training data quality
- Solution: Verify SendGrid API key and sender email verified
- Check:
ENABLE_EMAIL_ALERTSenvironment variable set to True
- Solution: Ensure indexes are created (init_db() does this)
- Check: No duplicate database connections (UI refresh only, not constant writing)
- API Key Management: Use environment variables, never commit
.envto Git - Database Security: SQLite suitable for single-user; migrate to PostgreSQL for multi-user
- Network Security: Run on trusted network; implement network segmentation if exposing API
- Data Privacy: Configure log rotation to limit disk space. Implement data retention policies
- Audit Logging: All model decisions logged with feature values for auditability
| Issue | Solution |
|---|---|
| Slow dashboard load | Increase pagination size in config |
| High CPU during training | Reduce n_estimators in model/advanced_model.py |
| High memory usage | Enable logging cleanup, reduce chart resolution |
| Slow Shodan scans | Reduce MAX_SHODAN_RESULTS, use more specific queries |
| Slow NVD enrichment | Implement API caching, reduce product keyword extraction |
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Areas for enhancement:
- Additional collector implementations (Nessus, Tenable, Qualys integration)
- Web API for programmatic access
- Multi-user support with role-based access
- Advanced visualization (Grafana/ELK integration)
- Kubernetes deployment support
This project is licensed under the MIT Licenseβsee LICENSE file for details.
- Issues: GitHub Issues for bug reports and feature requests
- Documentation: See docs/ directory for detailed technical documentation
- Email: Maintainer information in CODE_OF_CONDUCT.md
- Shodan: For comprehensive internet-facing device discovery
- NVD/NIST: For authoritative vulnerability data
- Scikit-learn: Robust ML libraries and documentation
- PyQt5: Excellent GUI framework
- Community: All contributors and users providing feedback
Made with β€οΈ by EclipseManic | Securing Tomorrow's Infrastructure Today pip install -r requirements.txt
---
## π Configuration
### 1. Environment Variables (`.env` file)
Create a `.env` file in the root directory with the following structure:
```bash
# --- Shodan API Key (Required) ---
SHODAN_API_KEY="YOUR_SHODAN_API_KEY"
# --- SendGrid Email Alerts (Required) ---
SENDGRID_API_KEY="YOUR_SENDGRID_API_KEY"
SENDER_EMAIL="your_verified_sender@example.com"
ALERT_RECIPIENTS="recipient1@example.com,recipient2@example.com"
# --- Database ---
SQLITE_PATH="threat_sentric_ai.db"
# --- Model ---
MODEL_PATH="threatsentry_model.pkl"
# --- Scheduler ---
SCAN_INTERVAL_MINUTES="30"
RETRAIN_INTERVAL_MINUTES="60"
RETRAIN_ON_SCHEDULE="True"
# --- Shodan Query Behavior Control ---
SHODAN_QUERY=""
SHODAN_QUERY_EMPTY_TO_PRESET="True"
# --- Logging ---
LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
# --- Internal System Credentials (Optional - Add as needed for your collectors) ---
# CMDB_API_ENDPOINT="..."
# CMDB_API_KEY="..."
# SIEM_API_ENDPOINT="..."
# SIEM_API_KEY="..."
# PATCH_API_ENDPOINT="..."
# PATCH_API_KEY="..."
# NETWORK_MONITOR_ENDPOINT="..."
# NETWORK_MONITOR_KEY="..."
The app loads environment variables via
os.getenv()inconfig.py.
Never commit this file to version control.
Modify the SHODAN_QUERIES dictionary to define your custom query presets:
SHODAN_QUERIES = {
"default": "product:apache",
"org": 'org:"Your Company Name"',
"net": 'net:"123.45.67.0/24"',
"ssl": 'ssl:"yourcompany.com"',
"hostname": 'hostname:".yourcompany.com"',
"rdp": 'port:3389 "remote desktop"',
"mongodb": 'port:27017 "mongodb"',
"ics_modbus": 'port:502 "modbus"',
"vuln_example": 'vuln:CVE-2024-12345',
"http_login": 'http.title:"Login" org:"Your Company"'
}If SHODAN_QUERY in .env is empty and SHODAN_QUERY_EMPTY_TO_PRESET=True,
the scheduler will automatically cycle through these presets.
β οΈ Important: The internal collectors (cmdb_collector.py,siem_collector.py,patch_collector.py,network_monitor_collector.py) are placeholders.
Replace the placeholder logic with real integrations to your systems.
Youβll need to:
- Fetch internal data using APIs, databases, or SDKs.
- Map fetched data to the
Devicemodel attributes. - Update the database session with this enriched information.
Without these integrations, the model will lack context for accurate predictions.
python scripts/generate_realistic_training_data.py --count 1000Creates scripts/my_training_data.json.
You can upload this file through the GUIβs Upload option to initialize training data.
python run.pyThe GUI will open and the scheduler will start scanning, enriching, and predicting automatically.
- Complete installation (steps 1β3).
- Set up your
.envfile (even with placeholder keys). - Optionally adjust Shodan presets in
config.py. - Generate data:
python scripts/generate_realistic_training_data.py --count 500
- Launch the app:
python run.py
- In the GUI:
- Select Upload β Choose
scripts/my_training_data.json - Click Refresh to view populated device data and risk levels.
- Select Upload β Choose
| Script | Description |
|---|---|
| run.py | Main entry point. Starts DB, scheduler, and GUI. |
| scheduler.py | Handles periodic scanning, enrichment, prediction, and retraining. |
| scripts/generate_realistic_training_data.py | Generates realistic training data for testing or bootstrapping. |
| scripts/clear_db_enhanced.py | Interactively clean database or reset notification flags. |
| scripts/reset_db.py | Completely resets the database. Use with caution. |
# Generate 1000 records
python scripts/generate_realistic_training_data.py --count 1000
# View cleanup options
python scripts/clear_db_enhanced.py --help
# Delete all devices & vulnerabilities
python scripts/clear_db_enhanced.py --delete-devices --delete-vulns
# Reset notified flag
python scripts/clear_db_enhanced.py --reset-notified
# Dangerous full reset
python scripts/reset_db.pyThis project was developed by EclipseManic.
While code contributions are currently closed, your feedback and bug reports are highly appreciated.
Please open an Issue to share your thoughts or report a problem.
This project is licensed under the MIT License.
See the LICENSE file for details.