feat: Add GuardComponent framework for pluggable input/output safety guards

## Summary

Flock should provide a **generic guard component framework** — an abstract base class `GuardComponent` with a clear contract for scanning inputs and outputs — that allows plugging in any content safety or prompt shielding backend. The first built-in implementation would be an `AzurePromptShieldGuard` for [Azure AI Content Safety Prompt Shields](https://learn.microsoft.com/azure/ai-services/content-safety/concepts/jailbreak-detection), but the framework is designed to be open for other backends.

## Motivation

Flock's blackboard architecture means artifacts published by one agent become input context for downstream agents. A malicious or compromised artifact can contain indirect prompt injection that manipulates a downstream agent's behavior. Guard components that scan both user prompts and context documents are essential for production multi-agent deployments.

Currently Flock has **no built-in content safety capability**. However, the existing `AgentComponent` lifecycle hooks (`on_pre_evaluate` / `on_post_evaluate`) and priority-based execution order already provide the right integration points — no new architectural patterns needed.

## Proposed implementation

### `GuardComponent` — Abstract base class

Extends `AgentComponent`. Concrete implementations only override `scan_input()` and/or `scan_output()`, returning a `GuardVerdict`. The base class handles lifecycle integration, text extraction from artifacts, and verdict routing (block / warn / annotate).

```python
class GuardVerdict:
    safe: bool
    reason: str | None = None
    details: dict[str, Any] = {}
    provider: str = "unknown"

class GuardComponentConfig(AgentComponentConfig):
    on_input_flagged: Literal["block", "warn", "annotate"] = "block"
    on_output_flagged: Literal["block", "warn", "annotate"] = "warn"
    scan_input: bool = True
    scan_output: bool = False
    scan_context_artifacts: bool = True

class GuardComponent(AgentComponent, abc.ABC):
    @abc.abstractmethod
    async def scan_input(self, text: str, documents: list[str] | None = None, **kwargs) -> GuardVerdict: ...

    async def scan_output(self, text: str, **kwargs) -> GuardVerdict:
        return GuardVerdict(safe=True)  # default pass-through

    # on_pre_evaluate / on_post_evaluate wiring handled by base class
```

The `scan_input(text, documents)` signature takes raw strings rather than Flock-specific types, so guard implementations stay backend-focused and don't need to understand Flock internals.

### `AzurePromptShieldGuard` — First built-in implementation

Calls the Prompt Shields REST API (`POST /contentsafety/text:shieldPrompt?api-version=2024-09-01`) to detect direct jailbreak attacks in user prompts and indirect injection in context documents. Supports both API key and Managed Identity authentication.

```python
class AzurePromptShieldGuard(GuardComponent):
    name: str = "azure_prompt_shield"
    config: AzurePromptShieldConfig  # adds endpoint, max_document_length

    async def scan_input(self, text, documents=None, **kwargs) -> GuardVerdict:
        result = await self._call_api(text, documents or [])
        user_attack = result["userPromptAnalysis"]["attackDetected"]
        doc_attacks = [d["attackDetected"] for d in result.get("documentsAnalysis", [])]

        if user_attack or any(doc_attacks):
            return GuardVerdict(safe=False, reason="Prompt attack detected", 
                                details={"user_attack": user_attack, "doc_attacks": doc_attacks},
                                provider=self.name)
        return GuardVerdict(safe=True, provider=self.name)
```

> **Note:** The stable `azure-ai-contentsafety` Python SDK (1.0.0) does not expose a dedicated `shield_prompt` method. The implementation calls the REST API directly via `aiohttp`. Should the SDK add native support, the implementation should be updated accordingly.

### Usage

```python
agent = Agent(
    name="support_agent",
    engine=DSPyEngine(model="azure/gpt-4.1"),
    components=[
        AzurePromptShieldGuard(
            priority=-10,  # run before other components
            config=AzurePromptShieldConfig(
                on_input_flagged="block",
                scan_context_artifacts=True,
            ),
        ),
    ],
)
```

Multiple guards compose naturally as separate components — Flock's priority-based ordering ensures they execute sequentially, and if one blocks, the exception propagates before subsequent guards or the engine run.

## Related

- #377 — Azure Managed Identity authentication (shares `azure-identity` dependency)
- [Azure AI Content Safety — Prompt Shields](https://learn.microsoft.com/azure/ai-services/content-safety/concepts/jailbreak-detection)
- [Prompt Shields REST API reference](https://learn.microsoft.com/azure/ai-services/content-safety/quickstart-jailbreak)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add GuardComponent framework for pluggable input/output safety guards #378

Summary

Motivation

Proposed implementation

`GuardComponent` — Abstract base class

`AzurePromptShieldGuard` — First built-in implementation

Usage

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Add GuardComponent framework for pluggable input/output safety guards #378

Description

Summary

Motivation

Proposed implementation

GuardComponent — Abstract base class

AzurePromptShieldGuard — First built-in implementation

Usage

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`GuardComponent` — Abstract base class

`AzurePromptShieldGuard` — First built-in implementation