Function calling (also called tool calling) is a mechanism that allows LLMs to:
- Return structured data instead of free-form text
- Trigger specific functions with well-defined parameters
- Bridge the gap between natural language and executable code
Instead of getting: "The user's email is john@example.com and they want 5 items"
You get:
{
"email": "john@example.com",
"quantity": 5
}Before function calling:
- Parse free-form text with regex or heuristics
- Handle inconsistent formatting ("5" vs "five" vs "5 items")
- Miss edge cases (what if the LLM decides to be creative?)
With function calling:
- LLM must conform to a predefined schema
- Output is machine-readable by design
- Type safety is enforced at the API level
Real-world example:
# Without function calling (fragile)
response = "Based on the invoice, the total is $1,250.50 and it's due on March 15th"
# Now you need to parse this... good luck with edge cases!
# With function calling (robust)
{
"total_amount": 1250.50,
"currency": "USD",
"due_date": "2025-03-15"
}| Aspect | Free-Form Text | Structured Output |
|---|---|---|
| Format | Natural language, unpredictable | JSON/XML with schema |
| Parsing | Regex, heuristics, brittle | Direct deserialization |
| Validation | Complex, error-prone | Schema-based, automatic |
| Type Safety | None | Strong typing possible |
| Reliability | Varies widely | Consistent |
-
Schema Violations
- LLM returns a field with wrong type
- Missing required fields
- Extra unexpected fields
-
Hallucinated Data
- LLM "invents" information not in the source
- Dates in wrong format (even with schema)
- Numbers with wrong precision
-
Partial Extraction
- Some fields extracted, others missed
- Empty strings vs null vs missing keys
-
Context Window Limits
- Input too large → truncation → incomplete extraction
Mitigation: Validation + retry logic (covered below)
Guardrails are constraints and validation layers that ensure LLM outputs are safe, correct, and usable.
Define what valid output looks like before the LLM runs:
from pydantic import BaseModel, Field
class InvoiceData(BaseModel):
invoice_number: str = Field(..., pattern=r"^INV-\d+$")
total: float = Field(..., gt=0) # Must be positive
due_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")Why this matters:
- Catches errors immediately
- No invalid data enters your system
- Self-documenting code
Input constraints:
- Max length (token limits)
- Content filtering (no PII in prompts)
- Rate limiting
Output constraints:
- Schema validation
- Allowed value ranges
- Business logic checks (e.g., due_date must be in the future)
| Parameter | Value | Use Case |
|---|---|---|
| Temperature 0-0.2 | Deterministic | Data extraction, classification |
| Temperature 0.7-1.0 | Creative | Content generation, brainstorming |
For production data extraction:
- Use
temperature=0or0.1 - Still not 100% deterministic (LLM API behavior)
- But far more consistent
Trade-off:
- Low temp → consistent, but may struggle with ambiguity
- High temp → flexible, but unreliable for structured tasks
Level 1: Structure
# Does it parse as JSON?
try:
data = json.loads(response)
except JSONDecodeError:
# Fail fastLevel 2: Schema Compliance
# Does it match the expected structure?
from pydantic import BaseModel
class Expected(BaseModel):
name: str
age: int
try:
validated = Expected(**data)
except ValidationError as e:
# Retry with error detailsLevel 3: Business Logic
# Does it make sense in context?
if validated.age < 0 or validated.age > 150:
# Invalid even if schema-compliantWithout Pydantic:
data = {"age": "25"} # String, not int!
result = data["age"] + 5 # Runtime error or silent bugWith Pydantic:
from pydantic import BaseModel
class Person(BaseModel):
age: int
person = Person(age="25") # Auto-converts
person.age + 5 # Works! Type is guaranteedPartial Failure Strategy:
- Extract what you can
- Mark missing fields as
Noneor"UNKNOWN" - Log what failed
- Use case: Nice-to-have data
Hard Failure Strategy:
- All fields required
- Fail immediately if anything is missing
- Retry or abort
- Use case: Critical business data (invoices, contracts)
When to Retry vs Fail Fast:
| Scenario | Strategy | Reason |
|---|---|---|
| Schema violation | Retry (1-3 times) | Might be a one-off LLM error |
| Timeout / rate limit | Retry (with backoff) | Transient issue |
| Invalid API key | Fail fast | Won't resolve on its own |
| Consistent schema errors | Fail fast (after N retries) | Schema might be wrong |
| Hallucinated data | Retry with stronger prompt | May need more context |
In our LLM-Powered Data Extractor, you'll see:
- Function Calling: The LLM returns JSON conforming to Pydantic schemas
- Guardrails: Input validation, schema enforcement, temperature control
- Validation: Pydantic models catch errors; retry logic fixes them
- Determinism: Low temperature ensures consistent extraction
- Failure Handling: Automatic retries with error feedback, graceful degradation
This transforms the LLM from a "smart text generator" into a reliable system component.
Without these techniques:
- 80% success rate → 20% of data corrupted
- Manual cleanup required
- Can't trust LLM output in pipelines
- Every edge case needs custom code
With these techniques:
- 99%+ success rate (with retries)
- Automatic error recovery
- LLM output is production-ready
- One validation layer handles all cases
The goal: Move from "works most of the time" to "works reliably enough to build on."