Summary
Flock should provide a generic guard component framework — an abstract base class GuardComponent with a clear contract for scanning inputs and outputs — that allows plugging in any content safety or prompt shielding backend. The first built-in implementation would be an AzurePromptShieldGuard for Azure AI Content Safety Prompt Shields, but the framework is designed to be open for other backends.
Motivation
Flock's blackboard architecture means artifacts published by one agent become input context for downstream agents. A malicious or compromised artifact can contain indirect prompt injection that manipulates a downstream agent's behavior. Guard components that scan both user prompts and context documents are essential for production multi-agent deployments.
Currently Flock has no built-in content safety capability. However, the existing AgentComponent lifecycle hooks (on_pre_evaluate / on_post_evaluate) and priority-based execution order already provide the right integration points — no new architectural patterns needed.
Proposed implementation
GuardComponent — Abstract base class
Extends AgentComponent. Concrete implementations only override scan_input() and/or scan_output(), returning a GuardVerdict. The base class handles lifecycle integration, text extraction from artifacts, and verdict routing (block / warn / annotate).
class GuardVerdict:
safe: bool
reason: str | None = None
details: dict[str, Any] = {}
provider: str = "unknown"
class GuardComponentConfig(AgentComponentConfig):
on_input_flagged: Literal["block", "warn", "annotate"] = "block"
on_output_flagged: Literal["block", "warn", "annotate"] = "warn"
scan_input: bool = True
scan_output: bool = False
scan_context_artifacts: bool = True
class GuardComponent(AgentComponent, abc.ABC):
@abc.abstractmethod
async def scan_input(self, text: str, documents: list[str] | None = None, **kwargs) -> GuardVerdict: ...
async def scan_output(self, text: str, **kwargs) -> GuardVerdict:
return GuardVerdict(safe=True) # default pass-through
# on_pre_evaluate / on_post_evaluate wiring handled by base class
The scan_input(text, documents) signature takes raw strings rather than Flock-specific types, so guard implementations stay backend-focused and don't need to understand Flock internals.
AzurePromptShieldGuard — First built-in implementation
Calls the Prompt Shields REST API (POST /contentsafety/text:shieldPrompt?api-version=2024-09-01) to detect direct jailbreak attacks in user prompts and indirect injection in context documents. Supports both API key and Managed Identity authentication.
class AzurePromptShieldGuard(GuardComponent):
name: str = "azure_prompt_shield"
config: AzurePromptShieldConfig # adds endpoint, max_document_length
async def scan_input(self, text, documents=None, **kwargs) -> GuardVerdict:
result = await self._call_api(text, documents or [])
user_attack = result["userPromptAnalysis"]["attackDetected"]
doc_attacks = [d["attackDetected"] for d in result.get("documentsAnalysis", [])]
if user_attack or any(doc_attacks):
return GuardVerdict(safe=False, reason="Prompt attack detected",
details={"user_attack": user_attack, "doc_attacks": doc_attacks},
provider=self.name)
return GuardVerdict(safe=True, provider=self.name)
Note: The stable azure-ai-contentsafety Python SDK (1.0.0) does not expose a dedicated shield_prompt method. The implementation calls the REST API directly via aiohttp. Should the SDK add native support, the implementation should be updated accordingly.
Usage
agent = Agent(
name="support_agent",
engine=DSPyEngine(model="azure/gpt-4.1"),
components=[
AzurePromptShieldGuard(
priority=-10, # run before other components
config=AzurePromptShieldConfig(
on_input_flagged="block",
scan_context_artifacts=True,
),
),
],
)
Multiple guards compose naturally as separate components — Flock's priority-based ordering ensures they execute sequentially, and if one blocks, the exception propagates before subsequent guards or the engine run.
Related
Summary
Flock should provide a generic guard component framework — an abstract base class
GuardComponentwith a clear contract for scanning inputs and outputs — that allows plugging in any content safety or prompt shielding backend. The first built-in implementation would be anAzurePromptShieldGuardfor Azure AI Content Safety Prompt Shields, but the framework is designed to be open for other backends.Motivation
Flock's blackboard architecture means artifacts published by one agent become input context for downstream agents. A malicious or compromised artifact can contain indirect prompt injection that manipulates a downstream agent's behavior. Guard components that scan both user prompts and context documents are essential for production multi-agent deployments.
Currently Flock has no built-in content safety capability. However, the existing
AgentComponentlifecycle hooks (on_pre_evaluate/on_post_evaluate) and priority-based execution order already provide the right integration points — no new architectural patterns needed.Proposed implementation
GuardComponent— Abstract base classExtends
AgentComponent. Concrete implementations only overridescan_input()and/orscan_output(), returning aGuardVerdict. The base class handles lifecycle integration, text extraction from artifacts, and verdict routing (block / warn / annotate).The
scan_input(text, documents)signature takes raw strings rather than Flock-specific types, so guard implementations stay backend-focused and don't need to understand Flock internals.AzurePromptShieldGuard— First built-in implementationCalls the Prompt Shields REST API (
POST /contentsafety/text:shieldPrompt?api-version=2024-09-01) to detect direct jailbreak attacks in user prompts and indirect injection in context documents. Supports both API key and Managed Identity authentication.Usage
Multiple guards compose naturally as separate components — Flock's priority-based ordering ensures they execute sequentially, and if one blocks, the exception propagates before subsequent guards or the engine run.
Related
azure-identitydependency)