Week 3: Fine Control, Guardrails, and Output Reliability

Core Concepts

1. Function Calling / Tool Calling

What It Is

Function calling (also called tool calling) is a mechanism that allows LLMs to:

Return structured data instead of free-form text
Trigger specific functions with well-defined parameters
Bridge the gap between natural language and executable code

Instead of getting: "The user's email is john@example.com and they want 5 items"
You get:

{
  "email": "john@example.com",
  "quantity": 5
}

Why It Improves Reliability

Before function calling:

Parse free-form text with regex or heuristics
Handle inconsistent formatting ("5" vs "five" vs "5 items")
Miss edge cases (what if the LLM decides to be creative?)

With function calling:

LLM must conform to a predefined schema
Output is machine-readable by design
Type safety is enforced at the API level

Real-world example:

# Without function calling (fragile)
response = "Based on the invoice, the total is $1,250.50 and it's due on March 15th"
# Now you need to parse this... good luck with edge cases!

# With function calling (robust)
{
  "total_amount": 1250.50,
  "currency": "USD",
  "due_date": "2025-03-15"
}

How Structured Outputs Differ from Free-Form Text

Aspect	Free-Form Text	Structured Output
Format	Natural language, unpredictable	JSON/XML with schema
Parsing	Regex, heuristics, brittle	Direct deserialization
Validation	Complex, error-prone	Schema-based, automatic
Type Safety	None	Strong typing possible
Reliability	Varies widely	Consistent

Common Failure Modes

Schema Violations
- LLM returns a field with wrong type
- Missing required fields
- Extra unexpected fields
Hallucinated Data
- LLM "invents" information not in the source
- Dates in wrong format (even with schema)
- Numbers with wrong precision
Partial Extraction
- Some fields extracted, others missed
- Empty strings vs null vs missing keys
Context Window Limits
- Input too large → truncation → incomplete extraction

Mitigation: Validation + retry logic (covered below)

2. Guardrails for LLMs

Guardrails are constraints and validation layers that ensure LLM outputs are safe, correct, and usable.

Schema Enforcement

Define what valid output looks like before the LLM runs:

from pydantic import BaseModel, Field

class InvoiceData(BaseModel):
    invoice_number: str = Field(..., pattern=r"^INV-\d+$")
    total: float = Field(..., gt=0)  # Must be positive
    due_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")

Why this matters:

Catches errors immediately
No invalid data enters your system
Self-documenting code

Input/Output Constraints

Input constraints:

Max length (token limits)
Content filtering (no PII in prompts)
Rate limiting

Output constraints:

Schema validation
Allowed value ranges
Business logic checks (e.g., due_date must be in the future)

Determinism vs Creativity Tradeoffs

Parameter	Value	Use Case
Temperature 0-0.2	Deterministic	Data extraction, classification
Temperature 0.7-1.0	Creative	Content generation, brainstorming

For production data extraction:

Use temperature=0 or 0.1
Still not 100% deterministic (LLM API behavior)
But far more consistent

Trade-off:

Low temp → consistent, but may struggle with ambiguity
High temp → flexible, but unreliable for structured tasks

3. Output Validation

JSON Schema Validation

Level 1: Structure

# Does it parse as JSON?
try:
    data = json.loads(response)
except JSONDecodeError:
    # Fail fast

Level 2: Schema Compliance

# Does it match the expected structure?
from pydantic import BaseModel

class Expected(BaseModel):
    name: str
    age: int

try:
    validated = Expected(**data)
except ValidationError as e:
    # Retry with error details

Level 3: Business Logic

# Does it make sense in context?
if validated.age < 0 or validated.age > 150:
    # Invalid even if schema-compliant

Type Safety

Without Pydantic:

data = {"age": "25"}  # String, not int!
result = data["age"] + 5  # Runtime error or silent bug

With Pydantic:

from pydantic import BaseModel

class Person(BaseModel):
    age: int

person = Person(age="25")  # Auto-converts
person.age + 5  # Works! Type is guaranteed

Partial vs Hard Failures

Partial Failure Strategy:

Extract what you can
Mark missing fields as None or "UNKNOWN"
Log what failed
Use case: Nice-to-have data

Hard Failure Strategy:

All fields required
Fail immediately if anything is missing
Retry or abort
Use case: Critical business data (invoices, contracts)

When to Retry vs Fail Fast:

Scenario	Strategy	Reason
Schema violation	Retry (1-3 times)	Might be a one-off LLM error
Timeout / rate limit	Retry (with backoff)	Transient issue
Invalid API key	Fail fast	Won't resolve on its own
Consistent schema errors	Fail fast (after N retries)	Schema might be wrong
Hallucinated data	Retry with stronger prompt	May need more context

How These Concepts Connect to the Project

In our LLM-Powered Data Extractor, you'll see:

Function Calling: The LLM returns JSON conforming to Pydantic schemas
Guardrails: Input validation, schema enforcement, temperature control
Validation: Pydantic models catch errors; retry logic fixes them
Determinism: Low temperature ensures consistent extraction
Failure Handling: Automatic retries with error feedback, graceful degradation

This transforms the LLM from a "smart text generator" into a reliable system component.

Why This Matters in Real Systems

Without these techniques:

80% success rate → 20% of data corrupted
Manual cleanup required
Can't trust LLM output in pipelines
Every edge case needs custom code

With these techniques:

99%+ success rate (with retries)
Automatic error recovery
LLM output is production-ready
One validation layer handles all cases

The goal: Move from "works most of the time" to "works reliably enough to build on."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Week 3: Fine Control, Guardrails, and Output Reliability

Core Concepts

1. Function Calling / Tool Calling

What It Is

Why It Improves Reliability

How Structured Outputs Differ from Free-Form Text

Common Failure Modes

2. Guardrails for LLMs

Schema Enforcement

Input/Output Constraints

Determinism vs Creativity Tradeoffs

3. Output Validation

JSON Schema Validation

Type Safety

Partial vs Hard Failures

How These Concepts Connect to the Project

Why This Matters in Real Systems

FilesExpand file tree

CONCEPTS.md

Latest commit

History

CONCEPTS.md

File metadata and controls

Week 3: Fine Control, Guardrails, and Output Reliability

Core Concepts

1. Function Calling / Tool Calling

What It Is

Why It Improves Reliability

How Structured Outputs Differ from Free-Form Text

Common Failure Modes

2. Guardrails for LLMs

Schema Enforcement

Input/Output Constraints

Determinism vs Creativity Tradeoffs

3. Output Validation

JSON Schema Validation

Type Safety

Partial vs Hard Failures

How These Concepts Connect to the Project

Why This Matters in Real Systems