Add Directional Bias and Directional Accuracy Metrics

## Summary
Add directional bias and directional accuracy metrics to the metrics module. These metrics are particularly valuable for time series forecasting and financial modeling, where the direction of change is often more important than the absolute prediction accuracy.

## Motivation
While traditional regression metrics (MAE, MSE, RMSE) focus on prediction magnitude, they don't capture whether a model correctly predicts the *direction* of change. This is critical for:

1. **Financial forecasting**: Predicting whether a stock will go up or down is often more important than the exact price
2. **Trend prediction**: Understanding if a model captures directional trends correctly
3. **Model diagnostics**: Identifying systematic over-prediction or under-prediction biases
4. **Risk assessment**: Evaluating if errors are balanced or skewed in one direction

## Background

### Directional Accuracy
Measures the proportion of times a model correctly predicts the direction of change relative to a baseline. For time series, this is typically whether the value increased or decreased from the previous time point.

**Formula**: `DA = (1/n) * Σ I(sign(y_true - baseline) == sign(y_pred - baseline))`

where I is the indicator function that equals 1 when the condition is true.

### Directional Bias
Measures systematic tendency to over-predict or under-predict. A positive bias indicates over-prediction, negative indicates under-prediction, and zero indicates no systematic bias.

**Formula**: `DB = (n_over - n_under) / n`

where n_over is the number of over-predictions and n_under is the number of under-predictions.

## Proposed API

### 1. Directional Accuracy Function

```python
def directional_accuracy_score(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    baseline: Optional[np.ndarray] = None,
    sample_weight: Optional[np.ndarray] = None,
    handle_equal: str = 'exclude',
) -> float:
    """Calculate directional accuracy of predictions.
    
    Measures the proportion of times the model correctly predicts the direction of 
    change from a baseline (typically previous value in time series).
    
    :param y_true: array-like of shape (n_samples,). True target values.
    :param y_pred: array-like of shape (n_samples,). Predicted target values.
    :param baseline: array-like of shape (n_samples,), optional. Baseline values for 
                    comparison. If None, uses previous true value (y_true[i-1]) for
                    time series data.
    :param sample_weight: array-like of shape (n_samples,), default=None. Sample weights.
    :param handle_equal: {'exclude', 'correct', 'incorrect'}, default='exclude'.
                        How to handle cases where y_true == baseline:
                        - 'exclude': Remove these samples from calculation
                        - 'correct': Count as correct if y_pred == baseline
                        - 'incorrect': Count as incorrect if y_pred != baseline
    :return: Directional accuracy score in range [0, 1]. Higher is better.
             1.0 = perfect directional prediction, 0.5 = random, 0.0 = always wrong.
    
    :raises ValueError: If baseline is None and y_true has fewer than 2 samples.
    :raises ValueError: If handle_equal not in {'exclude', 'correct', 'incorrect'}.
    :raises ValueError: If shapes don't match.
    
    Example:
        >>> # Time series example
        >>> y_true = np.array([100, 102, 98, 101, 99])
        >>> y_pred = np.array([101, 103, 97, 102, 98])
        >>> da = directional_accuracy_score(y_true, y_pred)  # Uses y_true[i-1] as baseline
        >>> print(f"Directional Accuracy: {da:.2%}")
        
        >>> # Custom baseline example
        >>> baseline = np.array([100, 100, 100, 100, 100])
        >>> da = directional_accuracy_score(y_true, y_pred, baseline=baseline)
    
    Notes:
        - For time series (baseline=None), the first sample is excluded as it has no prior value
        - A score of 0.5 indicates performance no better than random guessing
        - This metric is particularly useful for financial forecasting and trend prediction
    """
```

### 2. Directional Bias Function

```python
def directional_bias_score(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    sample_weight: Optional[np.ndarray] = None,
    handle_equal: str = 'exclude',
) -> float:
    """Calculate directional bias of predictions.
    
    Measures systematic tendency to over-predict or under-predict. Returns the 
    proportion of over-predictions minus the proportion of under-predictions.
    
    :param y_true: array-like of shape (n_samples,). True target values.
    :param y_pred: array-like of shape (n_samples,). Predicted target values.
    :param sample_weight: array-like of shape (n_samples,), default=None. Sample weights.
    :param handle_equal: {'exclude', 'neutral'}, default='exclude'.
                        How to handle cases where y_pred == y_true:
                        - 'exclude': Remove these samples from calculation
                        - 'neutral': Include but count as neither over nor under
    :return: Directional bias score in range [-1, 1].
             - Positive values indicate tendency to over-predict
             - Negative values indicate tendency to under-predict
             - 0 indicates no systematic bias (balanced errors)
             - ±1 indicates complete bias in one direction
    
    :raises ValueError: If handle_equal not in {'exclude', 'neutral'}.
    :raises ValueError: If shapes don't match.
    
    Example:
        >>> y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
        >>> y_pred = np.array([1.2, 2.3, 3.1, 4.2, 5.1])  # Consistently over-predicting
        >>> bias = directional_bias_score(y_true, y_pred)
        >>> print(f"Directional Bias: {bias:.2f}")
        Directional Bias: 1.00
        
        >>> y_pred_balanced = np.array([0.9, 2.1, 2.9, 4.1, 5.0])  # Balanced errors
        >>> bias = directional_bias_score(y_true, y_pred_balanced)
        >>> print(f"Directional Bias: {bias:.2f}")
        Directional Bias: 0.00
    
    Notes:
        - This metric complements traditional error metrics by revealing systematic biases
        - Useful for calibrating models and identifying consistent over/under-estimation
        - A model with low RMSE but high bias may need recalibration
    """
```

## Suggested Implementation

### Implementation: directional_accuracy_score

```python
def directional_accuracy_score(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    baseline: Optional[np.ndarray] = None,
    sample_weight: Optional[np.ndarray] = None,
    handle_equal: str = 'exclude',
) -> float:
    """Calculate directional accuracy of predictions."""
    # Validate handle_equal parameter
    if handle_equal not in {'exclude', 'correct', 'incorrect'}:
        raise ValueError(
            f"handle_equal must be 'exclude', 'correct', or 'incorrect', got '{handle_equal}'"
        )
    
    # Convert to numpy arrays
    y_true = np.asarray(y_true).flatten()
    y_pred = np.asarray(y_pred).flatten()
    
    # Validate shapes
    if y_true.shape != y_pred.shape:
        raise ValueError(
            f"Shape mismatch: y_true {y_true.shape} and y_pred {y_pred.shape}"
        )
    
    # Handle baseline
    if baseline is None:
        # Use previous value as baseline (time series default)
        if len(y_true) < 2:
            raise ValueError(
                "y_true must have at least 2 samples when baseline is None "
                "(time series mode requires previous values)"
            )
        baseline = y_true[:-1]
        y_true = y_true[1:]
        y_pred = y_pred[1:]
        if sample_weight is not None:
            sample_weight = np.asarray(sample_weight)[1:]
    else:
        baseline = np.asarray(baseline).flatten()
        if baseline.shape != y_true.shape:
            raise ValueError(
                f"Shape mismatch: baseline {baseline.shape} and y_true {y_true.shape}"
            )
    
    # Calculate directions
    true_direction = np.sign(y_true - baseline)
    pred_direction = np.sign(y_pred - baseline)
    
    # Handle cases where true value equals baseline
    if handle_equal == 'exclude':
        # Exclude samples where true value equals baseline
        mask = true_direction != 0
        true_direction = true_direction[mask]
        pred_direction = pred_direction[mask]
        if sample_weight is not None:
            sample_weight = np.asarray(sample_weight)[mask]
    elif handle_equal == 'correct':
        # Count as correct if prediction also equals baseline
        equal_mask = true_direction == 0
        correct_equal = (pred_direction == 0) & equal_mask
        true_direction[equal_mask] = 1  # Temporary marker
        pred_direction[correct_equal] = 1  # Mark as correct
    elif handle_equal == 'incorrect':
        # Count as incorrect if prediction doesn't equal baseline
        equal_mask = true_direction == 0
        true_direction[equal_mask] = 1  # Temporary marker
        # pred_direction stays as is, will be wrong if != 0
    
    # Check if we have samples left
    if len(true_direction) == 0:
        raise ValueError("No valid samples remain after filtering")
    
    # Calculate directional accuracy
    correct_direction = (true_direction == pred_direction)
    
    if sample_weight is not None:
        sample_weight = np.asarray(sample_weight)
        if len(sample_weight) != len(correct_direction):
            raise ValueError(
                f"Sample weight length {len(sample_weight)} doesn't match "
                f"number of samples {len(correct_direction)}"
            )
        # Normalize weights
        sample_weight = sample_weight / sample_weight.sum()
        accuracy = (correct_direction * sample_weight).sum()
    else:
        accuracy = correct_direction.mean()
    
    return float(accuracy)
```

### Implementation: directional_bias_score

```python
def directional_bias_score(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    sample_weight: Optional[np.ndarray] = None,
    handle_equal: str = 'exclude',
) -> float:
    """Calculate directional bias of predictions."""
    # Validate handle_equal parameter
    if handle_equal not in {'exclude', 'neutral'}:
        raise ValueError(
            f"handle_equal must be 'exclude' or 'neutral', got '{handle_equal}'"
        )
    
    # Convert to numpy arrays
    y_true = np.asarray(y_true).flatten()
    y_pred = np.asarray(y_pred).flatten()
    
    # Validate shapes
    if y_true.shape != y_pred.shape:
        raise ValueError(
            f"Shape mismatch: y_true {y_true.shape} and y_pred {y_pred.shape}"
        )
    
    # Calculate prediction errors
    errors = y_pred - y_true
    
    # Handle equal cases
    if handle_equal == 'exclude':
        # Exclude samples where prediction equals true value
        mask = errors != 0
        errors = errors[mask]
        if sample_weight is not None:
            sample_weight = np.asarray(sample_weight)[mask]
    # If 'neutral', errors of 0 contribute 0 to both over and under counts
    
    # Check if we have samples left
    if len(errors) == 0:
        raise ValueError("No valid samples remain after filtering")
    
    # Calculate over-predictions and under-predictions
    over_predictions = errors > 0
    under_predictions = errors < 0
    
    if sample_weight is not None:
        sample_weight = np.asarray(sample_weight)
        if len(sample_weight) != len(errors):
            raise ValueError(
                f"Sample weight length {len(sample_weight)} doesn't match "
                f"number of samples {len(errors)}"
            )
        # Normalize weights
        sample_weight = sample_weight / sample_weight.sum()
        prop_over = (over_predictions * sample_weight).sum()
        prop_under = (under_predictions * sample_weight).sum()
    else:
        prop_over = over_predictions.mean()
        prop_under = under_predictions.mean()
    
    # Calculate bias: positive = over-prediction, negative = under-prediction
    bias = prop_over - prop_under
    
    return float(bias)
```

## Testing Requirements

### Unit Tests for Directional Accuracy

```python
def test_directional_accuracy_perfect_prediction():
    """Test that perfect directional predictions give score of 1.0."""
    y_true = np.array([100, 102, 98, 101, 99, 103])
    y_pred = np.array([100.5, 102.5, 97.5, 101.5, 98.5, 103.5])
    # All directions are correct (same sign of change)
    da = directional_accuracy_score(y_true, y_pred)
    assert da == pytest.approx(1.0)


def test_directional_accuracy_random_prediction():
    """Test directional accuracy with 50% correct predictions."""
    y_true = np.array([100, 102, 98, 101, 99])
    baseline = np.array([100, 100, 100, 100, 100])
    y_pred = np.array([101, 99, 99, 99, 101])  # 2 correct, 2 incorrect
    da = directional_accuracy_score(y_true, y_pred, baseline=baseline)
    assert da == pytest.approx(0.5)


def test_directional_accuracy_all_wrong():
    """Test that completely wrong directional predictions give score of 0.0."""
    y_true = np.array([100, 102, 98, 101, 99])
    baseline = np.array([100, 100, 100, 100, 100])
    # All predictions go opposite direction
    y_pred = np.array([99, 97, 101, 98, 102])
    da = directional_accuracy_score(y_true, y_pred, baseline=baseline)
    assert da == pytest.approx(0.0)


def test_directional_accuracy_time_series_default():
    """Test directional accuracy with default time series behavior."""
    y_true = np.array([100, 102, 98, 101, 99])
    # Uses previous value as baseline automatically
    # Changes: +2, -4, +3, -2
    y_pred = np.array([100.5, 103, 97, 102, 98])
    # Predicted changes: +0.5, +0.5, -6, +5, -4
    # Correct directions: ✓(+,+), ✓(-,-), ✓(+,+), ✓(-,-)
    da = directional_accuracy_score(y_true, y_pred)
    assert da == pytest.approx(1.0)


def test_directional_accuracy_insufficient_samples():
    """Test that insufficient samples raises ValueError."""
    y_true = np.array([100])
    y_pred = np.array([101])
    with pytest.raises(ValueError, match="at least 2 samples"):
        directional_accuracy_score(y_true, y_pred)


def test_directional_accuracy_shape_mismatch():
    """Test that shape mismatch raises ValueError."""
    y_true = np.array([1.0, 2.0, 3.0])
    y_pred = np.array([1.0, 2.0])
    with pytest.raises(ValueError, match="Shape mismatch"):
        directional_accuracy_score(y_true, y_pred)


def test_directional_accuracy_invalid_handle_equal():
    """Test that invalid handle_equal raises ValueError."""
    y_true = np.array([1.0, 2.0, 3.0])
    y_pred = np.array([1.0, 2.0, 3.0])
    baseline = np.array([1.0, 1.0, 1.0])
    with pytest.raises(ValueError, match="handle_equal must be"):
        directional_accuracy_score(y_true, y_pred, baseline=baseline, handle_equal='invalid')


@pytest.mark.parametrize("handle_equal", ['exclude', 'correct', 'incorrect'])
def test_directional_accuracy_handle_equal_modes(handle_equal):
    """Test different handle_equal modes."""
    y_true = np.array([100, 100, 102, 100, 98])
    baseline = np.array([100, 100, 100, 100, 100])
    y_pred = np.array([100, 101, 103, 99, 97])
    # Positions 0 and 1: y_true == baseline
    # Position 2: correct direction (+, +)
    # Position 3: correct direction (0, -) - depends on mode
    # Position 4: correct direction (-, -)
    
    da = directional_accuracy_score(y_true, y_pred, baseline=baseline, handle_equal=handle_equal)
    
    if handle_equal == 'exclude':
        # Only positions 2, 4 evaluated: 2/2 = 1.0
        assert da == pytest.approx(1.0)
    elif handle_equal == 'correct':
        # Position 0: pred != baseline, counts as incorrect (0 vs +)
        # Position 1: pred != baseline, counts as incorrect (0 vs +)
        # Others correct: 3/5 = 0.6
        assert 0.0 <= da <= 1.0  # Exact value depends on implementation details
    elif handle_equal == 'incorrect':
        # Similar to correct but different handling
        assert 0.0 <= da <= 1.0


def test_directional_accuracy_with_weights():
    """Test directional accuracy with sample weights."""
    y_true = np.array([100, 102, 98, 101])
    baseline = np.array([100, 100, 100, 100])
    y_pred = np.array([101, 103, 97, 99])
    # All directions correct
    
    # Equal weights
    da_equal = directional_accuracy_score(y_true, y_pred, baseline=baseline)
    
    # Weighted (should still be 1.0 since all correct)
    weights = np.array([2.0, 1.0, 1.0, 1.0])
    da_weighted = directional_accuracy_score(y_true, y_pred, baseline=baseline, sample_weight=weights)
    
    assert da_equal == pytest.approx(1.0)
    assert da_weighted == pytest.approx(1.0)
```

### Unit Tests for Directional Bias

```python
def test_directional_bias_no_bias():
    """Test that balanced errors give bias of 0.0."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    y_pred = np.array([1.1, 1.9, 3.1, 3.9, 5.0])
    # 2 over, 2 under, 1 equal → (2-2)/4 = 0.0 with exclude mode
    bias = directional_bias_score(y_true, y_pred, handle_equal='exclude')
    assert bias == pytest.approx(0.0)


def test_directional_bias_complete_over_prediction():
    """Test that complete over-prediction gives bias of 1.0."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    y_pred = np.array([1.1, 2.1, 3.1, 4.1, 5.1])
    # All over-predictions
    bias = directional_bias_score(y_true, y_pred)
    assert bias == pytest.approx(1.0)


def test_directional_bias_complete_under_prediction():
    """Test that complete under-prediction gives bias of -1.0."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    y_pred = np.array([0.9, 1.9, 2.9, 3.9, 4.9])
    # All under-predictions
    bias = directional_bias_score(y_true, y_pred)
    assert bias == pytest.approx(-1.0)


def test_directional_bias_mostly_over():
    """Test partial over-prediction bias."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    y_pred = np.array([1.1, 2.1, 3.1, 3.9, 4.9])
    # 3 over, 2 under → (3-2)/5 = 0.2
    bias = directional_bias_score(y_true, y_pred)
    assert bias == pytest.approx(0.2)


def test_directional_bias_shape_mismatch():
    """Test that shape mismatch raises ValueError."""
    y_true = np.array([1.0, 2.0, 3.0])
    y_pred = np.array([1.0, 2.0])
    with pytest.raises(ValueError, match="Shape mismatch"):
        directional_bias_score(y_true, y_pred)


def test_directional_bias_invalid_handle_equal():
    """Test that invalid handle_equal raises ValueError."""
    y_true = np.array([1.0, 2.0, 3.0])
    y_pred = np.array([1.0, 2.0, 3.0])
    with pytest.raises(ValueError, match="handle_equal must be"):
        directional_bias_score(y_true, y_pred, handle_equal='invalid')


@pytest.mark.parametrize("handle_equal", ['exclude', 'neutral'])
def test_directional_bias_handle_equal_modes(handle_equal):
    """Test different handle_equal modes."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    y_pred = np.array([1.1, 2.0, 3.1, 4.0, 5.1])
    # 3 over, 0 under, 2 equal
    
    bias = directional_bias_score(y_true, y_pred, handle_equal=handle_equal)
    
    if handle_equal == 'exclude':
        # Only 3 samples: (3-0)/3 = 1.0
        assert bias == pytest.approx(1.0)
    elif handle_equal == 'neutral':
        # All 5 samples: (3-0)/5 = 0.6
        assert bias == pytest.approx(0.6)


def test_directional_bias_with_weights():
    """Test directional bias with sample weights."""
    y_true = np.array([1.0, 2.0, 3.0, 4.0])
    y_pred = np.array([1.1, 2.1, 2.9, 3.9])
    # 2 over, 2 under normally → bias = 0
    
    bias_equal = directional_bias_score(y_true, y_pred)
    assert bias_equal == pytest.approx(0.0)
    
    # Weight the over-predictions more heavily
    weights = np.array([2.0, 2.0, 1.0, 1.0])
    bias_weighted = directional_bias_score(y_true, y_pred, sample_weight=weights)
    # Weighted: over = (2+2)/(2+2+1+1) = 4/6, under = 2/6 → bias = 2/6 ≈ 0.333
    assert bias_weighted > 0


def test_directional_bias_all_equal():
    """Test that all equal predictions raises error in exclude mode."""
    y_true = np.array([1.0, 2.0, 3.0])
    y_pred = np.array([1.0, 2.0, 3.0])
    with pytest.raises(ValueError, match="No valid samples"):
        directional_bias_score(y_true, y_pred, handle_equal='exclude')
```

### Integration Tests

```python
def test_directional_metrics_on_real_data():
    """Test directional metrics on realistic time series data."""
    # Simulate stock price predictions
    np.random.seed(42)
    n_samples = 100
    
    # True prices with trend
    true_prices = 100 + np.cumsum(np.random.randn(n_samples) * 2)
    
    # Good model: follows trend but with some error
    good_pred = true_prices + np.random.randn(n_samples) * 5
    
    # Bad model: random walk
    bad_pred = 100 + np.cumsum(np.random.randn(n_samples) * 2)
    
    # Calculate directional accuracy
    da_good = directional_accuracy_score(true_prices, good_pred)
    da_bad = directional_accuracy_score(true_prices, bad_pred)
    
    # Good model should have higher directional accuracy
    assert da_good > da_bad
    assert da_good > 0.5  # Better than random
    
    # Calculate bias
    bias_good = directional_bias_score(true_prices, good_pred)
    bias_bad = directional_bias_score(true_prices, bad_pred)
    
    # Both should be relatively unbiased (close to 0)
    assert abs(bias_good) < 0.3
    assert abs(bias_bad) < 0.3


def test_directional_metrics_consistency():
    """Test that directional metrics are consistent with each other."""
    y_true = np.array([100, 102, 98, 101, 99, 103])
    
    # Consistently over-predicting model
    y_pred_over = y_true + 2
    
    bias_over = directional_bias_score(y_true, y_pred_over)
    assert bias_over == pytest.approx(1.0)  # Complete over-prediction
    
    # Directional accuracy should still be high if it captures trends
    da_over = directional_accuracy_score(y_true, y_pred_over)
    assert da_over == pytest.approx(1.0)  # All directions correct despite bias
```

## Documentation Requirements

1. **Docstring examples**: Add comprehensive examples for both functions showing:
   - Time series usage
   - Custom baseline usage
   - Interpretation of results
   - Use with sample weights

2. **Module documentation**: Update module-level docstring with:
   - Brief explanation of directional metrics
   - When to use them vs traditional metrics
   - Link to references

3. **README**: Add section on directional metrics with:
   - Use case examples (finance, forecasting)
   - Interpretation guide
   - Code snippets

4. **Tutorial notebook** (optional): Create notebook showing:
   - Comparison with traditional metrics
   - Financial forecasting example
   - Model calibration using directional bias

## Additional Notes

### Design Decisions

- **handle_equal parameter**: Provides flexibility in how to treat edge cases (no change, perfect predictions)
- **Time series default**: When baseline=None, automatically uses previous value for intuitive time series usage
- **Sample weights**: Supports weighted metrics for imbalanced or importance-weighted datasets
- **Range normalization**: Both metrics return values in intuitive ranges ([-1, 1] for bias, [0, 1] for accuracy)

### Comparison with Existing Metrics

| Metric | What it measures | When to use |
|--------|------------------|-------------|
| MAE/MSE | Magnitude of error | When exact values matter |
| Directional Accuracy | Correct trend prediction | When direction matters more than magnitude |
| Directional Bias | Systematic over/under prediction | For model calibration and diagnostics |

### Use Cases

1. **Financial Forecasting**: Stock price direction is often more valuable than exact price
2. **Energy Demand**: Predicting increase/decrease helps with resource allocation
3. **Sales Forecasting**: Understanding trend direction aids business planning
4. **Weather Forecasting**: Temperature trend prediction for planning

### Performance Considerations

- Both functions use vectorized NumPy operations for efficiency
- O(n) time complexity where n is number of samples
- Minimal memory overhead (only creates mask arrays)

## References

- Pesaran, M. H., & Timmermann, A. (1992). A simple nonparametric test of predictive performance. *Journal of Business & Economic Statistics*, 10(4), 461-465.
- Blaskowitz, O., & Herwartz, H. (2011). On economic evaluation of directional forecasts. *International Journal of Forecasting*, 27(4), 1058-1065.
- Nyberg, H. (2011). Forecasting the direction of the US stock market with dynamic binary probit models. *International Journal of Forecasting*, 27(2), 561-578.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Directional Bias and Directional Accuracy Metrics #96

Summary

Motivation

Background

Directional Accuracy

Directional Bias

Proposed API

1. Directional Accuracy Function

2. Directional Bias Function

Suggested Implementation

Implementation: directional_accuracy_score

Implementation: directional_bias_score

Testing Requirements

Unit Tests for Directional Accuracy

Unit Tests for Directional Bias

Integration Tests

Documentation Requirements

Additional Notes

Design Decisions

Comparison with Existing Metrics

Use Cases

Performance Considerations

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	What it measures	When to use
MAE/MSE	Magnitude of error	When exact values matter
Directional Accuracy	Correct trend prediction	When direction matters more than magnitude
Directional Bias	Systematic over/under prediction	For model calibration and diagnostics

Add Directional Bias and Directional Accuracy Metrics #96

Description

Summary

Motivation

Background

Directional Accuracy

Directional Bias

Proposed API

1. Directional Accuracy Function

2. Directional Bias Function

Suggested Implementation

Implementation: directional_accuracy_score

Implementation: directional_bias_score

Testing Requirements

Unit Tests for Directional Accuracy

Unit Tests for Directional Bias

Integration Tests

Documentation Requirements

Additional Notes

Design Decisions

Comparison with Existing Metrics

Use Cases

Performance Considerations

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions