Skip to content

Latest commit

 

History

History
241 lines (181 loc) · 6.43 KB

File metadata and controls

241 lines (181 loc) · 6.43 KB

Week 3: Fine Control, Guardrails, and Output Reliability

Core Concepts

1. Function Calling / Tool Calling

What It Is

Function calling (also called tool calling) is a mechanism that allows LLMs to:

  • Return structured data instead of free-form text
  • Trigger specific functions with well-defined parameters
  • Bridge the gap between natural language and executable code

Instead of getting: "The user's email is john@example.com and they want 5 items"
You get:

{
  "email": "john@example.com",
  "quantity": 5
}

Why It Improves Reliability

Before function calling:

  • Parse free-form text with regex or heuristics
  • Handle inconsistent formatting ("5" vs "five" vs "5 items")
  • Miss edge cases (what if the LLM decides to be creative?)

With function calling:

  • LLM must conform to a predefined schema
  • Output is machine-readable by design
  • Type safety is enforced at the API level

Real-world example:

# Without function calling (fragile)
response = "Based on the invoice, the total is $1,250.50 and it's due on March 15th"
# Now you need to parse this... good luck with edge cases!

# With function calling (robust)
{
  "total_amount": 1250.50,
  "currency": "USD",
  "due_date": "2025-03-15"
}

How Structured Outputs Differ from Free-Form Text

Aspect Free-Form Text Structured Output
Format Natural language, unpredictable JSON/XML with schema
Parsing Regex, heuristics, brittle Direct deserialization
Validation Complex, error-prone Schema-based, automatic
Type Safety None Strong typing possible
Reliability Varies widely Consistent

Common Failure Modes

  1. Schema Violations

    • LLM returns a field with wrong type
    • Missing required fields
    • Extra unexpected fields
  2. Hallucinated Data

    • LLM "invents" information not in the source
    • Dates in wrong format (even with schema)
    • Numbers with wrong precision
  3. Partial Extraction

    • Some fields extracted, others missed
    • Empty strings vs null vs missing keys
  4. Context Window Limits

    • Input too large → truncation → incomplete extraction

Mitigation: Validation + retry logic (covered below)


2. Guardrails for LLMs

Guardrails are constraints and validation layers that ensure LLM outputs are safe, correct, and usable.

Schema Enforcement

Define what valid output looks like before the LLM runs:

from pydantic import BaseModel, Field

class InvoiceData(BaseModel):
    invoice_number: str = Field(..., pattern=r"^INV-\d+$")
    total: float = Field(..., gt=0)  # Must be positive
    due_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")

Why this matters:

  • Catches errors immediately
  • No invalid data enters your system
  • Self-documenting code

Input/Output Constraints

Input constraints:

  • Max length (token limits)
  • Content filtering (no PII in prompts)
  • Rate limiting

Output constraints:

  • Schema validation
  • Allowed value ranges
  • Business logic checks (e.g., due_date must be in the future)

Determinism vs Creativity Tradeoffs

Parameter Value Use Case
Temperature 0-0.2 Deterministic Data extraction, classification
Temperature 0.7-1.0 Creative Content generation, brainstorming

For production data extraction:

  • Use temperature=0 or 0.1
  • Still not 100% deterministic (LLM API behavior)
  • But far more consistent

Trade-off:

  • Low temp → consistent, but may struggle with ambiguity
  • High temp → flexible, but unreliable for structured tasks

3. Output Validation

JSON Schema Validation

Level 1: Structure

# Does it parse as JSON?
try:
    data = json.loads(response)
except JSONDecodeError:
    # Fail fast

Level 2: Schema Compliance

# Does it match the expected structure?
from pydantic import BaseModel

class Expected(BaseModel):
    name: str
    age: int

try:
    validated = Expected(**data)
except ValidationError as e:
    # Retry with error details

Level 3: Business Logic

# Does it make sense in context?
if validated.age < 0 or validated.age > 150:
    # Invalid even if schema-compliant

Type Safety

Without Pydantic:

data = {"age": "25"}  # String, not int!
result = data["age"] + 5  # Runtime error or silent bug

With Pydantic:

from pydantic import BaseModel

class Person(BaseModel):
    age: int

person = Person(age="25")  # Auto-converts
person.age + 5  # Works! Type is guaranteed

Partial vs Hard Failures

Partial Failure Strategy:

  • Extract what you can
  • Mark missing fields as None or "UNKNOWN"
  • Log what failed
  • Use case: Nice-to-have data

Hard Failure Strategy:

  • All fields required
  • Fail immediately if anything is missing
  • Retry or abort
  • Use case: Critical business data (invoices, contracts)

When to Retry vs Fail Fast:

Scenario Strategy Reason
Schema violation Retry (1-3 times) Might be a one-off LLM error
Timeout / rate limit Retry (with backoff) Transient issue
Invalid API key Fail fast Won't resolve on its own
Consistent schema errors Fail fast (after N retries) Schema might be wrong
Hallucinated data Retry with stronger prompt May need more context

How These Concepts Connect to the Project

In our LLM-Powered Data Extractor, you'll see:

  1. Function Calling: The LLM returns JSON conforming to Pydantic schemas
  2. Guardrails: Input validation, schema enforcement, temperature control
  3. Validation: Pydantic models catch errors; retry logic fixes them
  4. Determinism: Low temperature ensures consistent extraction
  5. Failure Handling: Automatic retries with error feedback, graceful degradation

This transforms the LLM from a "smart text generator" into a reliable system component.


Why This Matters in Real Systems

Without these techniques:

  • 80% success rate → 20% of data corrupted
  • Manual cleanup required
  • Can't trust LLM output in pipelines
  • Every edge case needs custom code

With these techniques:

  • 99%+ success rate (with retries)
  • Automatic error recovery
  • LLM output is production-ready
  • One validation layer handles all cases

The goal: Move from "works most of the time" to "works reliably enough to build on."