Skip to content

Feat: New race strategy#133

Closed
marsteg wants to merge 3 commits intocloudfoundry:masterfrom
marsteg:dns/newRaceStrategy
Closed

Feat: New race strategy#133
marsteg wants to merge 3 commits intocloudfoundry:masterfrom
marsteg:dns/newRaceStrategy

Conversation

@marsteg
Copy link
Copy Markdown

@marsteg marsteg commented Mar 6, 2026

This PR is for adding a new Recursor Selection Option. This option should allow to forward requests to all recursors in parallel and take the fastest, good response. This mimics the behavior or bind and dnsmasq to improve dns resiliency against a set of recursors.

marsteg added 3 commits March 5, 2026 13:36
## Overview

Add a new `"race"` recursor selection strategy that sends DNS queries to all configured recursors in parallel and uses the best/fastest response.

## Motivation

**Benefits:**
- **Lower latency**: Always get the fastest response regardless of which recursor is fastest
- **Better reliability**: Single slow/failing recursor doesn't slow down queries
- **More accurate results**: Can prefer successful responses over failures (SERVFAIL/NXDOMAIN)
- **No manual preference tuning needed**: Automatically adapts to current network conditions

**Tradeoffs:**
- **Higher network bandwidth**: Sends N queries instead of 1-2
- **More load on upstream recursors**: Each query hits all recursors
- **Resource usage**: More goroutines and network connections

## Implementation Details

### 1. Configuration

Add `"race"` as a valid value for `recursor_selection`:

```json
{
  "recursor_selection": "race"  // Valid values: "serial", "smart", "race"
}
```

### 2. Add Constant

Add to `config/config.go`:

```go
const (
	SmartRecursorSelection  = "smart"
	SerialRecursorSelection = "serial"
	RaceRecursorSelection   = "race"   // NEW
	RFCFormatting           = "rfc3339"
)
```

### 3. Core Implementation

Add new `raceRecursorPool` type in `failover_recursor_pool.go`:

```go
type raceRecursorPool struct {
	recursors             []string
	logger                logger.Logger
	logTag                string
	recursorRetrySettings recursorRetrySettings
}

func newRaceRecursorPool(recursors []string, recursorSettings recursorRetrySettings, logger logger.Logger) RecursorPool {
	if recursors == nil {
		recursors = []string{}
	}

	return &raceRecursorPool{
		recursors:             recursors,
		logger:                logger,
		logTag:                "RaceRecursor",
		recursorRetrySettings: recursorSettings,
	}
}
```

### 4. Race Logic with Response Quality Ranking

The key insight: **Not all responses are equal**. We should prefer:
1. ✅ **NOERROR** (successful resolution) - USE IMMEDIATELY
2. ⚠️ **NXDOMAIN** (name doesn't exist) - WAIT for better answer
3. ❌ **SERVFAIL** (server failure) - WAIT for better answer
4. ❌ **Network errors** - WAIT for better answer

```go
func (q *raceRecursorPool) PerformStrategically(work func(string) error) error {
	if len(q.recursors) == 0 {
		return ErrNoRecursorResponse
	}

	type result struct {
		recursor string
		err      error
		priority int // Lower is better: 0=success, 1=NXDOMAIN, 2=SERVFAIL, 3=network error
	}

	// Buffered channel sized to number of recursors - prevents goroutine blocking
	results := make(chan result, len(q.recursors))

	// Start all queries in parallel
	for _, recursor := range q.recursors {
		go func(r string) {
			err := performWithRetryLogic(work, r, q.recursorRetrySettings.maxRetries, q.logTag, q.logger)

			// Classify the error by priority
			priority := q.classifyError(err)

			results <- result{
				recursor: r,
				err:      err,
				priority: priority,
			}
		}(recursor)
	}

	// Collect all results
	var bestResult *result
	receivedCount := 0
	totalRecursors := len(q.recursors)

	for receivedCount < totalRecursors {
		res := <-results
		receivedCount++

		q.logger.Debug(q.logTag, fmt.Sprintf(
			"received response %d/%d from %s (priority=%d, err=%v)",
			receivedCount, totalRecursors, res.recursor, res.priority, res.err))

		// Keep track of best result so far
		if bestResult == nil || res.priority < bestResult.priority {
			bestResult = &res
		}

		// If we got a perfect response (NOERROR), return immediately
		// Don't wait for other responses
		if res.priority == 0 {
			q.logger.Info(q.logTag, fmt.Sprintf(
				"recursor %s returned successful response (received %d/%d responses)",
				res.recursor, receivedCount, totalRecursors))

			// Drain remaining responses to prevent goroutine leaks
			go q.drainResults(results, totalRecursors-receivedCount)

			return nil
		}

		// For non-success responses, continue collecting to see if we get a better one
	}

	// All responses received, return best one
	if bestResult == nil {
		return ErrNoRecursorResponse
	}

	if bestResult.err != nil {
		q.logger.Info(q.logTag, fmt.Sprintf(
			"all %d recursors responded, best result from %s (priority=%d): %s",
			totalRecursors, bestResult.recursor, bestResult.priority, bestResult.err.Error()))
	} else {
		q.logger.Info(q.logTag, fmt.Sprintf(
			"all %d recursors responded, using result from %s",
			totalRecursors, bestResult.recursor))
	}

	return bestResult.err
}

// classifyError assigns priority to errors
// Lower number = better response
func (q *raceRecursorPool) classifyError(err error) int {
	if err == nil {
		return 0 // NOERROR - perfect response
	}

	// Check if it's a DNS error with rcode
	if dnsErr, ok := err.(server.DnsError); ok {
		switch dnsErr.Rcode() {
		case dns.RcodeNameError: // NXDOMAIN
			return 1 // Name doesn't exist - could be correct, but prefer NOERROR
		case dns.RcodeServerFailure: // SERVFAIL
			return 2 // Server failure - prefer NXDOMAIN over this
		default:
			return 2 // Other DNS errors
		}
	}

	// Network errors are worst
	if _, ok := err.(net.Error); ok {
		return 3
	}

	// Unknown errors
	return 3
}

// drainResults reads remaining results to prevent goroutine blocking
func (q *raceRecursorPool) drainResults(results chan result, remaining int) {
	for i := 0; i < remaining; i++ {
		<-results
	}
}
```

### 5. Update Factory Function

Modify `NewFailoverRecursorPool` in `failover_recursor_pool.go`:

```go
func NewFailoverRecursorPool(recursors []string, recursorSelection string, RecursorMaxRetries int, logger logger.Logger) RecursorPool {
	recursorSettings := recursorRetrySettings{
		maxRetries: RecursorMaxRetries,
	}

	switch recursorSelection {
	case config.SmartRecursorSelection:
		return newSmartFailoverRecursorPool(recursors, recursorSettings, logger)
	case config.RaceRecursorSelection:
		return newRaceRecursorPool(recursors, recursorSettings, logger)
	default: // serial
		return newSerialFailoverRecursorPool(recursors, recursorSettings, logger)
	}
}
```

### 6. Response Handling in ForwardHandler

The current implementation in `forward_handler.go` works correctly because:

1. The `work` function (line 67-96) writes the response immediately when called
2. When `PerformStrategically` returns `nil`, it means a response was already written
3. If it returns an error, no response was written and we create one from the error

**For race strategy**:
- First NOERROR causes immediate response write and return
- For NXDOMAIN/SERVFAIL, we wait for all responses, then use the best one
- The response is only written once by the winning recursor's work function

✅ **No changes needed to ForwardHandler**

## Connection Leak Prevention

### Analysis

1. **Goroutines**: Each recursor query runs in its own goroutine
2. **Channel**: Buffered to `len(recursors)` - no goroutine blocks on send
3. **Cleanup paths**:
   - **Fast path** (NOERROR): We spawn a goroutine to drain remaining results
   - **Slow path** (all responses): We read all results in the loop

✅ **No goroutine leaks**: All goroutines complete naturally and all results are consumed

### Why the draining goroutine is safe

When we get a NOERROR and return early:
```go
go q.drainResults(results, totalRecursors-receivedCount)
```

- The draining goroutine reads remaining results from the channel
- The query goroutines complete their DNS exchanges and send to the buffered channel
- Once all results are drained, the draining goroutine exits
- The channel is garbage collected when no references remain

**No resource leaks**: Goroutines terminate, connections close, memory is freed.

## Response Quality Logic

### Scenario Examples

#### Scenario 1: Mixed Responses
- Recursor A: SERVFAIL (server is down)
- Recursor B: NXDOMAIN (name not found) ← **Best result**
- Recursor C: Network timeout

**Result**: Returns NXDOMAIN (priority 1 beats 2 and 3)

#### Scenario 2: One Good Response
- Recursor A: NOERROR ← **Returns immediately**
- Recursor B: (still processing)
- Recursor C: (still processing)

**Result**: Returns NOERROR immediately, doesn't wait for B and C

#### Scenario 3: Split-Brain DNS
- Recursor A: NXDOMAIN (stale cache, thinks name doesn't exist)
- Recursor B: NOERROR ← **Best result**
- Recursor C: SERVFAIL (misconfigured)

**Result**: Returns NOERROR (priority 0 beats everything)

#### Scenario 4: All Fail
- Recursor A: SERVFAIL
- Recursor B: SERVFAIL ← **Best result (all tied)**
- Recursor C: Network timeout

**Result**: Returns SERVFAIL (priority 2 beats 3)

## Testing Considerations

Test scenarios needed:

1. **Performance**:
   - Verify fastest NOERROR wins
   - Measure latency improvement vs serial/smart

2. **Correctness**:
   - One NOERROR + others NXDOMAIN → returns NOERROR
   - All NXDOMAIN → returns NXDOMAIN
   - One NXDOMAIN + others SERVFAIL → returns NXDOMAIN
   - All SERVFAIL → returns SERVFAIL

3. **Resource safety**:
   - No goroutine leaks (use runtime.NumGoroutine())
   - No connection leaks
   - Memory usage under load

4. **Edge cases**:
   - Zero recursors
   - One recursor (should behave like serial)
   - All timeout
   - Artificially delay responses to test early return

## Performance Impact

**Latency**:
- ✅ Best case: ~50-70% reduction with multiple recursors
- ✅ Common case: Returns immediately on first NOERROR
- ⚠️ Worst case: Waits for all responses if no NOERROR (slower than serial)

**Network bandwidth**:
- ❌ Always sends N queries (number of recursors)
- For typical query: 64 bytes × N recursors
- For typical response: 512 bytes × N recursors
- Example with 3 recursors: ~1.7KB per DNS query

**Recursor load**:
- ❌ Every recursor receives every query
- May violate rate limits on public DNS servers
- Not suitable for large-scale deployments without considering upstream capacity

## Documentation Updates

Update `jobs/bosh-dns/spec`:

```yaml
recursor_selection:
  description: |
    Recursor selection strategy. Options:
    - "serial": Try recursors in order, no failure tracking
    - "smart": Track failures, prefer reliable recursors (default, DNS spec compliant)
    - "race": Query all recursors in parallel, use best/fastest response
      * Prefers successful responses (NOERROR) over failures
      * Returns immediately on first successful response
      * Waits for all responses if only errors received
      * Higher network bandwidth (N× queries)
  default: smart
```

## Recommendation

**When to use race strategy:**
- ✅ Low-latency requirements (milliseconds matter)
- ✅ Recursors have varying performance characteristics
- ✅ Bandwidth and recursor load are not concerns
- ✅ Split-brain DNS scenarios need resolution
- ✅ Small to medium scale deployments

**When NOT to use:**
- ❌ Large-scale deployments (bandwidth cost)
- ❌ Rate-limited public DNS servers
- ❌ Bandwidth-constrained networks
- ❌ Metered connections

Implement with the response quality ranking to ensure correct DNS behavior.
@marsteg
Copy link
Copy Markdown
Author

marsteg commented Mar 6, 2026

Adding some basic dnsperf test results to compare performance between "smart" and "race":

Smart:
Statistics:

Queries sent: 504851
Queries completed: 504851 (100.00%)
Queries lost: 0 (0.00%)

Response codes: NOERROR 504851 (100.00%)
Average packet size: request 32, response 63
Run time (s): 30.000126
Queries per second: 16828.295988

Average Latency (s): 0.003507 (min 0.000021, max 2.074524)
Latency StdDev (s): 0.077404

Race:
Statistics:

Queries sent: 889711
Queries completed: 889711 (100.00%)
Queries lost: 0 (0.00%)

Response codes: NOERROR 889711 (100.00%)
Average packet size: request 33, response 65
Run time (s): 30.000800
Queries per second: 29656.242500

Average Latency (s): 0.000707 (min 0.000023, max 0.204968)
Latency StdDev (s): 0.003805

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Mar 6, 2026

Thank you @marsteg for this contribution! IMO this is only relevant for Ubuntu Jammy which is reaching EOL in April 2027 because the switch to Ubuntu Noble introduces a major change how BOSH DNS is invoked. In Noble BOSH DNS will be only responsible for resolving its own domain and the OS resolver will handle everything else. For more details please check the discussions in this pr. With that BOSH DNS won't need to delegate resolutions to recursors and require a selection strategy.

@aramprice aramprice requested review from a team, a-hassanin and ragaskar and removed request for a team March 12, 2026 16:18
@aramprice aramprice moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Mar 12, 2026
@beyhan
Copy link
Copy Markdown
Member

beyhan commented Mar 30, 2026

@marsteg Given the feedback in the comment above, is the feature still valuable and will help you being added to bosh-dns?

@marsteg
Copy link
Copy Markdown
Author

marsteg commented Mar 30, 2026

Hi @beyhan,
AFAIK the migration to Noble-Stemcell makes this improvement useless for my Team. I think it might be valuable in the current use-case but not hard required anymore,

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Mar 30, 2026

Hi @marsteg,

Thank you for the update! In that case, we'd prefer to close the PR and avoid adding functionality to BOSH DNS that won't actually be used. Should someone decide to support Jammy for a longer period down the road, they're always welcome to reopen it and we can revisit the discussion then.

@beyhan beyhan closed this Mar 30, 2026
@github-project-automation github-project-automation Bot moved this from Pending Review | Discussion to Done in Foundational Infrastructure Working Group Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants