| layout | default |
|---|---|
| title | Architecture |
| nav_order | 2 |
Version: 1.2 Last Updated: 2026-03-26 Status: Production
- Executive Summary
- System Architecture
- Application Structure
- Core Components
- Data Flow
- Technology Stack
- Design Patterns
- Security Architecture
- Deployment Architecture
- Extension Points
- Probe Direction (SPEC-003)
- Neighborhood Scan (SPEC-004)
- Diagnostics Bundle (SPEC-005)
kshark is a command-line diagnostic tool for Apache Kafka connectivity, designed as a "network sniffer" for Kafka infrastructure. It performs comprehensive health checks across all layers of the client-to-broker communication path.
- Modular Architecture - Core logic in main.go, probe engine and Connect API in
internal/packages - Layered Testing Approach - Systematic validation from L3 (DNS) to L7 (Kafka/MongoDB/PostgreSQL/DB2)
- Report-Centric Architecture - All checks append results to a central Report structure
- Connector Probe Engine - Reads connector configs from Kafka Connect REST API or local JSON files
- External AI Integration - Optional AI-powered analysis via REST APIs with layered reasoning
- Multi-Platform Support - Cross-compiled for Linux, macOS, and Windows
- Probe Direction Control - Configurable bottom-up (fail-fast) or full-scan mode with parallel diagnostics
- Neighborhood Scan - Port-level and protocol-level network restriction detection with automatic classification
- Diagnostics Bundle - Packaged .tar.gz with redacted Terraform state, configs, and system context
| Metric | Value |
|---|---|
| Total Lines of Code | ~10,400 lines (Go) across 21 source files + 23 test files |
| Test Cases | 478 unit tests + 4 fuzz targets |
| Programming Language | Go 1.23.2 |
| Binary Size | ~24MB (statically linked, pure Go) |
| Packages | cmd/kshark (14 files), internal/probe (5 files), internal/connectapi (4 files) |
| Test Coverage | cmd/kshark 41.5%, internal/connectapi 73.6%, internal/probe 52.3%, total 47.8% |
| Dependencies | kafka-go, mongo-driver, pgx (all pure Go, no CGO) |
| License | Apache License 2.0 |
┌─────────────────────────────────────────────────────────────────┐
│ kshark CLI │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Configuration Loading │ │
│ │ • Properties Parser • AI Config • License Checker │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Layered Connectivity Checks │ │
│ │ │ │
│ │ L3 (Network) → DNS Resolution │ │
│ │ L4 (Transport) → TCP Connection │ │
│ │ L5-6 (Security) → TLS Handshake & Certificate Validation│ │
│ │ L7 (Application)→ Kafka Protocol, Schema Registry │ │
│ │ Diagnostics → Traceroute, MTU Check │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Report Generation │ │
│ │ • Console Output (Pretty Print) │ │
│ │ • HTML Report (with Template) │ │
│ │ • JSON Export (Premium) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ AI Analysis (Optional, Premium) │ │
│ │ • OpenAI / Scalytics-Connect Integration │ │
│ │ • Root Cause Analysis │ │
│ │ • Actionable Recommendations │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────┐
│ External Systems │
├──────────────────────┤
│ • Kafka Cluster │
│ • Schema Registry │
│ • REST Proxy │
│ • AI Provider APIs │
└──────────────────────┘
User → CLI Flags → Config Parser → scanConfig → runScan(ctx) → Report Builder → Output
↓ ↑
AI Analyzer SIGINT/SIGTERM
(Optional) cancels context
kshark-core/
├── cmd/kshark/ # CLI application (14 focused source files)
│ ├── main.go # Entry point, CLI flags, scan orchestration
│ ├── ai.go # AI client, prompt building, analysis
│ ├── auth.go # SASL authentication (PLAIN, SCRAM, JAAS fallback)
│ ├── bundle.go # Diagnostics bundle, Terraform redaction, export commands
│ ├── connector.go # Connector probe orchestration
│ ├── diagnostics.go # Traceroute, MTU checks, PMTU correlation
│ ├── httpcheck.go # Schema Registry, REST Proxy HTTP checks
│ ├── kafka.go # Kafka dialer, metadata, produce/consume probes
│ ├── neighborhood.go # Port neighborhood scan, restriction classification
│ ├── properties.go # Properties file loading, presets
│ ├── report.go # Report model, JSON/HTML output, summarize
│ ├── ssrf.go # SSRF two-tier protection (deny/warn model)
│ ├── tls.go # TLS config, certificate validation
│ ├── util.go # Shared helpers (logging, file I/O, redaction)
│ └── *_test.go # 14 test files including 3 fuzz targets (auth, ai, kafka, ssrf, tls, ...)
├── internal/
│ ├── probe/ # Database probing engine
│ │ ├── types.go # ProbeTarget, ProbeStep, Prober interface
│ │ ├── common.go # ProbeDNS, ProbeTCP, ProbeTLS helpers
│ │ ├── common_test.go # Probe helper tests
│ │ ├── mongodb.go # MongoDB prober (mongo-driver)
│ │ ├── mongodb_test.go # MongoDB prober tests
│ │ ├── postgres.go # PostgreSQL prober (pgx)
│ │ ├── postgres_test.go # PostgreSQL prober tests
│ │ ├── db2.go # DB2 DRDA wire protocol prober
│ │ └── db2_test.go # DRDA message construction tests
│ └── connectapi/ # Kafka Connect integration
│ ├── client.go # Connect REST API client (SSRF-protected)
│ ├── client_test.go # Connect API client tests
│ ├── config_parser.go # Connector type detection + extraction
│ ├── config_parser_test.go
│ ├── jdbc_url.go # JDBC URL parser (DB2, PostgreSQL)
│ ├── jdbc_url_test.go
│ ├── jdbc_url_fuzz_test.go # JDBC URL fuzz target
│ ├── redact.go # Credential redaction
│ └── redact_test.go
├── testbed/ # Docker integration testbed
│ ├── docker-compose.yml # 6 services (Kafka, Connect, MongoDB, PG, DB2, kshark)
│ ├── configs/ # Connector config examples
│ ├── init/ # Database initialization scripts
│ └── run-tests.sh # 10 automated integration tests
├── web/templates/ # HTML report template
├── docs/ # Jekyll documentation site
├── Dockerfile # Multi-stage Alpine container
├── go.mod # Go module (kafka-go, mongo-driver, pgx)
└── README.md # Project documentation
The application follows a multi-file modular architecture within the cmd/kshark package, split from the original monolithic main.go into 12 focused source files:
| File | Responsibility |
|---|---|
| main.go (~714 lines) | Entry point, CLI flag parsing, scanConfig struct, runScan() orchestration, checkRESTProxy(), signal handling |
| ai.go | AI client, prompt construction, analysis dispatch, HTML report |
| auth.go | SASL mechanism setup (PLAIN, SCRAM-SHA-256/512), JAAS fallback extraction |
| connector.go | Connector probe orchestration, Connect API + local config fallback |
| bundle.go | Diagnostics bundle creation, Terraform state redaction, system context, export commands |
| diagnostics.go | Traceroute, MTU check, PMTU correlation, hostname validation |
| httpcheck.go | Schema Registry and REST Proxy HTTP health checks |
| kafka.go | Kafka dialer construction, metadata fetch, produce/consume probes |
| properties.go | Properties file parser with os.ExpandEnv(), presets, warnInsecurePermissions() |
| neighborhood.go | Port neighborhood scan, ICMP comparison, restriction classification, summary |
| report.go | Report data model (thread-safe), JSON/HTML output, summarize, pretty-print |
| ssrf.go | Two-tier SSRF protection: DENY loopback/link-local/metadata, WARN RFC1918 |
| tls.go | TLS config builder, certificate chain validation, expiry checks |
| util.go | Shared helpers: slog init, file I/O, SHA256 checksums, redaction |
Internal packages provide reusable logic:
| Package | Responsibility |
|---|---|
| internal/probe | Database prober interface + implementations (MongoDB, PostgreSQL, DB2 DRDA) |
| internal/connectapi | Kafka Connect REST client, connector config parser, JDBC URL parser, credential redaction |
Purpose: Load and parse configuration from multiple sources
Files:
client.properties- Kafka connection parameters (Java properties format)ai_config.json- AI provider configuration (JSON)license.key- Premium feature activation (JSON)
Key Functions:
loadProperties()(cmd/kshark/properties.go) - Parse Java-style properties files withos.ExpandEnv()for${VAR}expansionloadAIConfig()(cmd/kshark/ai.go) - Load AI configurationapplyPreset()(cmd/kshark/properties.go) - Apply quick configuration presets
Configuration Flow:
CLI Flags → Preset (optional) → Properties File → Default Values → Configuration Map
Supported Presets:
confluent-cloud- Confluent Cloud SASL/SSL defaultsbitnami- Bitnami Kafka stack defaultsaws-msk- AWS MSK IAM authentication defaultsplaintext- Local development (no security)
Purpose: Systematically test all layers of Kafka connectivity
Function: checkDNS() (cmd/kshark/httpcheck.go)
Checks:
- DNS resolution of broker hostnames
- Multiple A/AAAA records detection
- Resolution latency
Success Criteria: Hostname resolves to at least one IP address
Function: checkTCP() (cmd/kshark/httpcheck.go)
Checks:
- TCP connection establishment
- Connection latency measurement
- Timeout handling (default: 60s)
Success Criteria: Successful TCP handshake
Functions (cmd/kshark/tls.go):
tlsConfigFromProps()- Build TLS configurationwrapTLS()- Perform TLS handshakepeerCN()- Extract peer certificate CNearliestExpiry()- Find earliest certificate expiry
Checks:
- TLS handshake success
- TLS version verification (minimum TLS 1.2)
- Server certificate validation
- Certificate chain validation
- Certificate expiry (warns if <30 days)
- Server name (CN) extraction
Security Features:
- Enforces TLS 1.2 minimum
- Supports custom CA certificates
- Client certificate authentication
- Certificate expiry monitoring
Functions (cmd/kshark/kafka.go):
dialerFromProps()- Create Kafka dialer with SASL/TLSkafkaConn()- Establish Kafka connectioncheckTopic()- Verify topic visibility via metadataprobeProduceConsume()- End-to-end data flow test
Checks:
- Kafka protocol handshake
- SASL authentication (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512)
- Broker metadata retrieval
- Topic visibility (if specified)
- Produce/consume round-trip (if topic specified)
Authentication Support:
- SASL/PLAIN
- SASL/SCRAM-SHA-256
- SASL/SCRAM-SHA-512
- SASL/GSSAPI (Kerberos) - requires build tag
- Mutual TLS (mTLS)
Functions (cmd/kshark/httpcheck.go, cmd/kshark/main.go):
checkSchemaRegistry()- Test Schema RegistrycheckRESTProxy()- Test REST Proxy (extracted to testable function inmain.go)- Schema Registry subject listing
Checks:
- HTTP/HTTPS connectivity
- Basic authentication
- Subject enumeration (/subjects endpoint)
- Response latency
Functions (cmd/kshark/diagnostics.go):
bestEffortTraceroute()- Network path tracingmtuCheck()- Maximum Transmission Unit discoveryisValidHostname()- Hostname sanitization
Checks:
- Network path visualization (traceroute/tracepath/tracert)
- MTU discovery via ping with Don't Fragment bit
- Maximum of 100 lines output per diagnostic
Security: Hostname validation prevents command injection
Purpose: Collect, aggregate, and present diagnostic results
Report Structure (cmd/kshark/report.go):
type Report struct {
Timestamp string // Scan timestamp
Host string // Target hostname
Layers map[Layer][]Row // Results grouped by layer
ConfigEcho map[string]string // Redacted configuration
AIAnalysis *AIAnalysisResponse // Optional AI analysis
}Row Structure (cmd/kshark/report.go):
type Row struct {
Component string // e.g., "dns", "tcp", "tls"
Target string // Target host/service
Layer Layer // L3, L4, L5-6, L7, DIAG
Status CheckStatus // OK, WARN, FAIL, SKIP
Detail string // Technical details
Hint string // Actionable guidance
}1. Console Output (Pretty Print)
- Function:
printPretty()(cmd/kshark/report.go) - Features:
- Color-coded status (green=OK, yellow=WARN, red=FAIL, gray=SKIP)
- Grouped by layer
- Summary statistics
- TTY detection for color support
2. HTML Report
- Function:
writeHTMLReport()(cmd/kshark/ai.go) - Template:
web/templates/report_template.html - Features:
- Responsive design
- Embedded CSS
- AI analysis section
- Detailed results table
- Timestamp and configuration echo
3. JSON Export (Premium)
- Function:
writeJSON()(cmd/kshark/report.go) - Features:
- Machine-readable format
- Includes full report structure
- Redacted credentials
- AI analysis (if available)
Function: summarize() (cmd/kshark/report.go)
Metrics per Layer:
- Total checks
- OK count
- WARN count
- FAIL count
- SKIP count
Purpose: Provide intelligent root cause analysis and recommendations
Report → AI Client → API Provider → AI Analysis → Enhanced Report
AIClient Structure (cmd/kshark/ai.go):
type AIClient struct {
config *AIProviderConfig
client *http.Client
maxRetries int
}Supported Providers:
- OpenAI - GPT-4 models
- Endpoint:
https://api.openai.com/v1/chat/completions - Model:
gpt-4(configurable)
- Endpoint:
- Scalytics-Connect - Custom endpoint
- Endpoint: Configurable
- Model: Configurable
-
Report Submission
- Function:
AnalyzeReport()(cmd/kshark/ai.go) - Serializes report to JSON
- Constructs analysis prompt
- Sends to AI provider via REST API
- Function:
-
Analysis Prompt (
cmd/kshark/ai.go)You are a Kafka diagnostics expert. Analyze this connectivity report... Focus on: 1. Failures or warnings (status FAIL/WARN) 2. Identify problematic OSI layer 3. Provide root cause analysis 4. Offer actionable fix suggestions -
Response Processing
- Parses AI provider response
- Extracts analysis text
- Handles errors gracefully
- Falls back to basic report if AI unavailable
-
Output Integration
- Function:
printIllustrativeAnalysis()(cmd/kshark/ai.go) - Displays analysis with visual separators
- Markdown formatting support
- Function:
- API keys stored in
ai_config.json(recommend environment variables) - Credentials redacted before sending to AI
- 60-second timeout per request
- HTTPS required for API communication
See: docs/SECURITY.md for comprehensive security analysis
-
Input Validation
- Hostname sanitization (regex validation)
- Path traversal prevention
- Command injection protection
-
SSRF Protection (
cmd/kshark/ssrf.go)- Two-tier model: DENY loopback/link-local/metadata IPs, WARN RFC1918
- Redirect-based SSRF bypass prevention (
CheckRedirecthandler) - URL scheme validation (http/https only)
- Response body size limits (1MB) via
io.LimitReader
-
Credential Protection
- Redaction function for sensitive fields (passwords, secrets, tokens, bearer, JAAS)
- Credential scrubbing in database probe error messages (
ScrubCredentials) - Connect API auth via environment variables (
KSHARK_CONNECT_AUTH,KSHARK_CONNECT_TOKEN) - Functions in
cmd/kshark/util.goandinternal/connectapi/redact.go
-
Structured Logging (
log/slog)- Configurable format:
--log-format text|json - Security events logged (authentication attempts, API calls, SSRF blocks)
- Scan log written to file with SHA256 checksum
- Configurable format:
-
TLS Enforcement
- Minimum TLS 1.2
- Certificate validation and chain checking
- Custom CA support
- Certificate expiry monitoring (<30 days warning)
-
Secure Defaults
- Non-root Docker user
- Timeouts on all network operations
- Fail-safe error handling
- SHA256 checksums for report artifacts
- Credential Storage: Plain-text configuration files (mitigated with env var support and file permissions)
- Memory Zeroability: Credentials held as
stringrather than[]byte(low priority)
┌─────────────────┐
│ User Input │
│ CLI Flags + │
│ Config Files │
└────────┬────────┘
↓
┌─────────────────┐ ┌──────────────────┐
│ Initialization │ │ Signal Handler │
│ • Load config │ │ SIGINT/SIGTERM │
│ • ExpandEnv │ │ → cancel(ctx) │
│ • Warn perms │ └────────┬─────────┘
│ • Build scanCfg │ │ (cancels
└────────┬────────┘ │ context)
↓ ↓
┌─────────────────┐
│ Scan Plan │
│ • Display plan │
│ • User confirm │
└────────┬────────┘
↓
┌──────────────────────────────────────────┐
│ runScan(ctx, report, cfg) │
│ (all checks guarded by ctx.Done()) │
│ │
│ ┌─────────────────┐ │
│ │ L3: DNS Check │ │
│ │ Resolve hostname│ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ L4: TCP Check │ │
│ │ Connect to port │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ L5-6: TLS Check │ │
│ │ Handshake + Cert│ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ L7: Kafka Check │ │
│ │ • Metadata │ │
│ │ • Topic check │ │
│ │ • Produce/Consume│ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ L7: HTTP Checks │ │
│ │ • Schema Reg │ │
│ │ • Connector Probe│ │
│ │ • REST Proxy │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ Diagnostics │ │
│ │ • Traceroute │ │
│ │ • MTU Check │ │
│ └─────────────────┘ │
└──────────────────┬───────────────────────┘
↓
┌─────────────────┐
│ Generate Report │
│ • Summarize │
│ • Console output│
│ • HTML/JSON │
└────────┬────────┘
↓
┌─────────────────┐
│ AI Analysis │
│ (if enabled) │
└────────┬────────┘
↓
┌─────────────────┐
│ Final Output │
│ • Reports saved │
│ • Analysis shown│
└─────────────────┘
Properties Map → TLS Config → Kafka Dialer → Connection
↓
Check Results
↓
Row Objects
↓
Report Struct
↓
┌───────────────┴───────────────┐
↓ ↓
AI Analysis Report Output
↓ ↓
Enhanced Report Console/HTML/JSON
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Language | Go | 1.23.2 | Application runtime |
| Kafka Client | segmentio/kafka-go | 0.4.49 | Kafka protocol implementation |
| Compression | klauspost/compress | 1.15.9 | Message compression |
| SCRAM Auth | xdg-go/scram | 1.1.2 | SASL SCRAM authentication |
| Build Tool | Go toolchain | 1.23 | Compilation |
| Release Tool | GoReleaser | latest | Cross-platform builds |
| Container | Docker | - | Containerization |
| CI/CD | GitHub Actions | - | Automation |
CGO_ENABLED=0 # Static linking
-ldflags="-w -s" # Strip debug info
-X main.version={{.Version}} # Version injectionkshark
├── github.com/segmentio/kafka-go (v0.4.49)
│ ├── github.com/klauspost/compress (v1.16.7)
│ ├── github.com/pierrec/lz4/v4 (v4.1.15)
│ ├── github.com/xdg-go/pbkdf2 (v1.0.0)
│ ├── github.com/xdg-go/scram (v1.1.2)
│ │ └── github.com/xdg-go/stringprep (v1.0.4)
│ │ └── golang.org/x/text (v0.23.0)
│ └── golang.org/x/sync (v0.12.0)
├── go.mongodb.org/mongo-driver/v2 (v2.1.0)
├── github.com/jackc/pgx/v5 (v5.7.4)
│ ├── github.com/jackc/pgpassfile (v1.0.0)
│ └── github.com/jackc/pgservicefile
└── Standard Library
├── crypto/tls
├── encoding/json
├── log/slog
├── net/http
├── html/template
└── ...
Rationale:
- Each file has a single, clear responsibility (~100-714 lines)
- Easier to navigate and maintain than a monolithic file
- Enables focused unit testing per concern
- Simplifies code review (changes scoped to relevant file)
- Still compiles to a single binary for easy deployment
Trade-offs:
- More files to manage (mitigated by consistent naming conventions)
- Cross-file dependencies within the package require care
Usage: Configuration construction
// TLS config builder
func tlsConfigFromProps(p map[string]string) (*tls.Config, error) {
conf := &tls.Config{MinVersion: tls.VersionTLS12}
if caFile := p["ssl.ca.location"]; caFile != "" {
// Build CA pool...
}
if certFile := p["ssl.certificate.location"]; certFile != "" {
// Add client cert...
}
return conf, nil
}
// SASL config builder
func saslFromProps(p map[string]string) (sasl.Mechanism, error) {
mechanism := p["sasl.mechanism"]
switch mechanism {
case "PLAIN":
return plain.Mechanism{...}, nil
case "SCRAM-SHA-256":
return scram.Mechanism(...), nil
// ...
}
}Usage: Authentication mechanisms
// Different SASL strategies selected at runtime
switch saslMechanism {
case "PLAIN":
mechanism = plain.Mechanism{Username: user, Password: pass}
case "SCRAM-SHA-256":
mechanism, _ = scram.Mechanism(scram.SHA256, user, pass)
case "SCRAM-SHA-512":
mechanism, _ = scram.Mechanism(scram.SHA512, user, pass)
}Usage: HTML report generation
// Template defines structure, data fills in details
tmpl, _ := template.ParseFiles("web/templates/report_template.html")
tmpl.Execute(file, report)Usage: Simplified Kafka connectivity
// Complex Kafka setup hidden behind simple function
func kafkaConn(ctx context.Context, dialer *kafka.Dialer, addr string) (*kafka.Conn, error) {
// Handles timeout, context, error checking internally
return dialer.DialContext(ctx, "tcp", addr)
}Usage: Collecting diagnostic results
// Global-like report object accumulates all results
func addRow(r *Report, row Row) {
layer := row.Layer
r.Layers[layer] = append(r.Layers[layer], row)
}
// All checks append to the same report
checkDNS(report, ...)
checkTCP(report, ...)
checkTLS(report, ...)Current: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, (Kerberos with build tag)
Extension Point: saslFromProps() function (cmd/kshark/auth.go)
How to Add:
case "SCRAM-SHA-1": // Add new case
mechanism, err = scram.Mechanism(scram.SHA1, username, password)
case "OAUTH":
// Implement OAuth mechanismCurrent: OpenAI, Scalytics-Connect
Extension Point: ai_config.json + AnalyzeReport() function
How to Add:
{
"provider": "anthropic",
"api_key": "sk-ant-...",
"api_endpoint": "https://api.anthropic.com/v1/messages",
"model": "claude-3-opus"
}Code modification needed: Update API request/response format in AnalyzeReport()
Current: Console, HTML, JSON
Extension Point: After report generation in cmd/kshark/main.go
How to Add:
if *xmlOut != "" {
writeXMLReport(report, *xmlOut)
}Current: L3 (DNS), L4 (TCP), L5-6 (TLS), L7 (Kafka/HTTP), DIAG
Extension Point: runScan() function in cmd/kshark/main.go
How to Add:
// Add a new phase inside runScan(), guarded by ctx.Done()
select {
case <-ctx.Done():
addRow(report, Row{"kshark", "timeout", DIAG, FAIL, "Global timeout reached", ""})
return
default:
checkNewService(ctx, report, cfg.props)
}Current: confluent-cloud, bitnami, aws-msk, plaintext
Extension Point: applyPreset() function (cmd/kshark/properties.go)
How to Add:
case "azure-eventhub":
p["security.protocol"] = "SASL_SSL"
p["sasl.mechanism"] = "PLAIN"
p["sasl.username"] = "$ConnectionString"User's Machine
├── kshark binary
├── client.properties (config)
├── ai_config.json (optional)
├── license.key (optional)
└── reports/ (output)
Execution:
./kshark -props client.properties -topic test-topic --analyzeDocker Container (alpine:latest)
├── /usr/local/bin/kshark (binary)
├── /app/web/templates/ (HTML template)
├── /app/client.properties (mounted)
├── /app/ai_config.json (mounted)
├── /app/license.key (mounted)
└── /app/reports/ (mounted volume)
Build:
docker build -t kshark:latest .Run:
docker run -v $(pwd):/app -u $(id -u):$(id -g) \
kshark:latest -props /app/client.properties --analyzeapiVersion: batch/v1
kind: CronJob
metadata:
name: kshark-health-check
spec:
schedule: "0 * * * *" # Hourly
jobTemplate:
spec:
template:
spec:
containers:
- name: kshark
image: kshark:latest
args:
- "-props"
- "/config/client.properties"
- "-topic"
- "health-check"
- "--analyze"
- "-y"
volumeMounts:
- name: config
mountPath: /config
- name: reports
mountPath: /app/reports
volumes:
- name: config
secret:
secretName: kshark-config
- name: reports
persistentVolumeClaim:
claimName: kshark-reports
restartPolicy: OnFailureDeveloper Push (Tag v*)
↓
GitHub Actions Trigger
↓
Checkout Code (fetch-depth: 0)
↓
Setup Go 1.23
↓
GoReleaser Build
├── Linux (amd64, arm64)
├── macOS (amd64, arm64)
└── Windows (amd64, arm64)
↓
Create Archives
├── tar.gz (Linux, macOS)
└── zip (Windows)
↓
Generate Checksums
↓
Create GitHub Release
├── Upload binaries
├── Upload checksums
└── Generate changelog
Workflow File: .github/workflows/build-and-release.yml
| Check Type | Time Range | Notes |
|---|---|---|
| DNS Resolution | 10-100ms | Depends on DNS server |
| TCP Connection | 50-200ms | Network latency dependent |
| TLS Handshake | 100-500ms | Certificate chain length |
| Kafka Metadata | 100-300ms | Cluster size dependent |
| Produce/Consume | 200-1000ms | Topic partition count |
| Traceroute | 5-30s | Hop count dependent |
| AI Analysis | 5-30s | AI provider latency |
| Resource | Typical | Peak | Notes |
|---|---|---|---|
| Memory | 20MB | 50MB | Includes binary overhead |
| CPU | <5% | 20% | Mostly I/O wait |
| Network | <1KB/s | 100KB/s | During checks |
| Disk | 1-5MB | 10MB | Report output |
-
Modularization (done)
- Split monolithic
main.go(2,479 lines) into 12 focused files internal/probeandinternal/connectapipackages extracted- 23 test files (14 in cmd/kshark including 3 fuzz targets, 9 in internal/) with 478 test cases
- Split monolithic
-
Structured Logging (done)
log/slogintegration with--log-format text|jsonflag- Security events logged throughout scan lifecycle
-
SSRF Protection (done)
- Two-tier deny/warn model in
cmd/kshark/ssrf.go(14 deny CIDRs + 4 warn CIDRs) - Redirect validation (
checkRedirectSSRF), bounded reads (io.LimitReader), scheme validation
- Two-tier deny/warn model in
-
Control Flow Refactor (done)
- Extracted
runScan()function replacinggoto endScanpattern - Extracted
checkRESTProxy()as testable function scanConfigstruct encapsulates all scan parameters- All scan phases guarded by
ctx.Done()for clean timeout/cancellation
- Extracted
-
Signal Handling (done)
- SIGINT/SIGTERM graceful shutdown cancels scan context
- Existing
ctx.Done()checks inrunScan()handle early exit
-
Fuzz Testing (done)
- 4 fuzz targets:
auth_fuzz_test.go,properties_fuzz_test.go,ssrf_fuzz_test.go,jdbc_url_fuzz_test.go - Covers security-critical parsers (SASL auth, properties, SSRF URL validation, JDBC URLs)
- 4 fuzz targets:
-
CI Quality Gates (done)
.golangci.ymlwithgoseclinter enabled- Coverage gate in CI pipeline
govulncheckin weekly security scan workflow
-
Concurrency
- Parallel layer checks where possible
- Worker pool for multiple brokers
- Async AI analysis
-
Caching
- DNS result caching
- TLS session resumption
- Configuration caching
-
Observability
- Prometheus metrics export
- OpenTelemetry integration
-
Persistence
- Historical report database
- Trend analysis
- Alerting on degradation
The --probe-direction flag controls how kshark traverses the layer stack:
| Mode | Flag Value | Behavior |
|---|---|---|
| Bottom-up (default) | up |
L3→L4→L5-6→L7, fail-fast at hard layer boundaries. Skips higher layers on TCP/TLS failure. |
| Full scan | full |
Attempts all applicable layers for every broker. Hard dependencies (no TCP = no TLS) still apply per-broker, but other brokers and L7 sub-checks continue. |
Within L7, checks now continue past partial failures:
- SASL auth failure → still attempts topic metadata (some clusters allow unauthenticated metadata)
- Topic metadata failure → still attempts produce (different ACL path)
- Produce failure → still attempts consume (different ACL)
Traceroute and MTU checks run in goroutines concurrent with produce/consume, reducing total scan time. A sync.Mutex on the Report struct protects concurrent addRow calls.
After all probes complete, mtuCorrelation() cross-references:
- MTU check result (ICMP-based, OK at what size?)
- Produce/consume result (OK or timeout?)
If MTU is OK but produce timed out → emits a PMTU black hole warning.
Source: cmd/kshark/main.go (runScan, runDiagnosticsParallel), cmd/kshark/diagnostics.go (mtuCorrelation)
When TCP to a Kafka port fails, kshark can probe nearby ports and protocols to classify the restriction.
- Automatic: When
--diag=true(default) and TCP connect fails - Explicit:
--neighborhoodforces scan even on success (audit mode)
TCP FAIL on port 9092
│
├─ Port Neighborhood Scan (concurrent)
│ ├─ TCP 80 → OPEN/BLOCKED/REFUSED
│ ├─ TCP 443 → OPEN/BLOCKED/REFUSED
│ ├─ TCP 9092 → (known FAIL)
│ ├─ TCP 9093 → OPEN/BLOCKED/REFUSED
│ ├─ TCP 9094 → OPEN/BLOCKED/REFUSED
│ ├─ TCP 8081 → OPEN/BLOCKED/REFUSED
│ └─ TCP 8083 → OPEN/BLOCKED/REFUSED
│
├─ ICMP Reachability Check
│ └─ ping -c 1 host → OK/FAIL
│
└─ Restriction Classification
├─ selective_port_filtering (Kafka ports blocked, 443 open)
├─ host_unreachable (all blocked, ICMP blocked)
├─ all_tcp_blocked (all TCP blocked, ICMP open)
├─ service_not_listening (connection refused)
├─ mixed_filtering (inconsistent pattern)
└─ no_network_restriction (all open)
Results appear as component=neighborhood rows with restriction classification, confidence level, and suggested action.
Source: cmd/kshark/neighborhood.go
The --bundle flag packages all diagnostic artifacts into a portable .tar.gz archive.
kshark-diag-<hostname>-<timestamp>/
├── report.json # Full kshark JSON report
├── kshark.log # Run log
├── config/
│ └── client.properties.redacted # Redacted client properties
├── terraform/
│ ├── terraform.tfstate.redacted # Redacted Terraform state (if --tf-state)
│ └── terraform-plan.txt # Redacted plan output (if --tf-plan)
├── context/
│ ├── os.txt, arch.txt, go.txt # System info
│ ├── hostname.txt
│ ├── resolv.conf.txt # DNS configuration
│ ├── interfaces.txt # Network interfaces
│ └── routes.txt # Routing table
└── MANIFEST.md # SHA256 checksums
The state JSON is walked recursively. Keys matching sensitive patterns (password, secret, api_key, token, private_key, connection_string, and Confluent-specific keys) have their values replaced with [REDACTED]. Both string and non-string scalar values are redacted.
After bundle creation, kshark prints ready-to-use copy commands based on detected environment:
- VM/bare metal:
scpcommand - Docker (detected via
/.dockerenv):docker cpcommand - Kubernetes (detected via
KUBERNETES_SERVICE_HOST):kubectl cpcommand
Source: cmd/kshark/bundle.go
kshark's architecture prioritizes simplicity, reliability, and comprehensive diagnostics. The multi-file modular design within the cmd/kshark package keeps each concern focused while still compiling to a single binary. The internal/ packages (probe, connectapi) provide reusable logic for connector probing and database health checks.
The layered testing approach ensures thorough validation of all connectivity components. The probe direction control, neighborhood scanning, and diagnostics bundle features extend diagnostic depth while maintaining the single-binary deployment model. Optional AI integration adds intelligent analysis capabilities without compromising core functionality.
Document Version: 1.2 Author: kshark Development Team Last Review: 2026-03-26 Next Review: 2026-06-26