Stop reading about agents. Start building them.
This is the repo for engineers who want to understand what's behind popular agents like Claude Code, Codex, and GitHub Copilot and how to build one by yourself. From your first LLM call to a production eval harness.
Building AI agents is engineering, not magic. Master the constraints, not the hype.
How many of you actually can pull out a whiteboard and build me an agent? Can you show me the inferencing loop?
If you don't know this, your career is in jeopardy.
What is a tool call? If you don't know what that is, you need to learn what it is and all these basic fundamentals. I preference candidates if they know what a tool call is, how the inferencing loop works, pull out a whiteboard — the same way we used to say, show me a linked list, reverse me this data structure.
This is now baseline knowledge because we're getting candidates in that can answer this stuff.
— Geoffrey Huntley, creator of Ralph Wiggum
Agent fluency is the new data-structures interview. We teach it from first principles - you build the loop, the tool calls, the memory, and the evals yourself before we ever introduce a framework. No magic. No black boxes. Just the primitives, in the order they were invented.
💡 None of this requires fancy frameworks. Just an LLM API, some tools, and a loop. Build one this weekend. You'll understand agents better than reading 100 blog posts.
No prior AI/ML experience required - just Python basics and curiosity about building LLM-powered agents.
- We take production agents apart. The Disassembling AI Agents Substack series reverse-engineers Claude Code, GitHub Copilot, and OpenCode. You read how real agents work, then rebuild the pieces here.
- First principles, no black boxes. You build the agent loop, the tool executor, the memory layer, the eval harness from scratch — before we introduce a single framework. Learn what each abstraction is hiding before you let one hide it for you.
- Runnable in one command.
uv run --directory <tutorial> python <script>.py. No conda dance. No Jupyter kernel hunt.
brew install uv # or: pipx install uv
git clone https://github.com/agenticloops-ai/agentic-ai-engineering.git
cd agentic-ai-engineering
cp .env.example .env # add your Anthropic and/or OpenAI keys
uv run --directory 01-foundations/01-simple-llm-call python 01_llm_call_anthropic.pyThat's it. Every tutorial is self-contained and idempotent — you can jump in anywhere. Full setup details in SETUP.md. Or open in Codespaces and skip local setup entirely.
If you find this useful, a ⭐️ star helps us know we're on the right track. Join the 💬 discussion or report an 🐛 issue — your input directly shapes what we build next.
The tutorials teach you to build. Our Substack gives you the mental model first - a foundational primer on how agents actually work, followed by teardowns of real production agents you use every day. Read the post. Open the tutorial. Rebuild the pattern.
How Agents Work: The Patterns Behind the Magic - the core agentic loop from first principles. The four pattern levels (one-shot → single-tool → ReAct → planning), the role of the system prompt as behavioral design, and Ralph Mode as the outer loop. If you read one thing before opening the repo, read this. Pairs with → 01-foundations.
The tutorials are organized into modules (01-foundations, 02-effective-agents) that progress from basics to advanced concepts. Each module contains numbered tutorials that build on previous lessons. Inside each tutorial folder, you'll find:
- Python scripts
- Self-contained, runnable examples demonstrating key concepts
- README.md - Detailed explanations, code walkthroughs, and learning objectives
You can explore individual scripts independently or follow the complete learning path from start to finish. Each module ends with a project that combines all concepts from the module into a single, production-style agent.
Your first steps — from a single API call to a fully autonomous agent loop. Build everything from scratch to understand what's really happening under the hood.
- Simple LLM Call — First API call with token tracking
- Prompt Engineering — Guide model behavior
- Chat — Interactive chat with message history
- Tool Use — Enable function calling
- Agent Loop — Autonomous tool-using agents
- Codebase Navigator
— The Augmented LLM with RAG, tools, and memory
Architectural patterns that separate toy demos from real agents. Based on Anthropic's "Building Effective Agents" — learn when to chain, route, parallelize, or delegate.
- Prompt Chaining — Sequential multi-step pipelines
- Routing — Classify input, dispatch to specialized handlers
- Parallelization — Fan-out/fan-in, parallel tool calls
- Orchestrator-Workers — Dynamic task decomposition
- Evaluator-Optimizer — Self-critique, iterative refinement
- Human in the Loop — Approval gates, escalation, feedback
- Content Writer
— Full agent composing all agentic workflow patterns
Practical engineering problems you'll hit the moment agents leave the prototype stage. Context, cost, memory, multimodality, safety — solved one tutorial at a time.
- Structured Output — JSON mode, schemas, constrained generation
- Streaming — SSE, token-by-token output, streaming tool calls
- Context Engineering — Window strategies, summarization, tool context
- Cost Optimization — Prompt caching, model routing
- Memory — Short-term, long-term, memory inspection
- RAG Techniques — Hybrid search, agentic retrieval
- Multimodal — Vision, image generation, audio
- Guardrails — Input/output filtering, safety patterns
Agents are non-deterministic — testing them requires different thinking. Measure quality, catch regressions, and build confidence before shipping.
- Unit Testing Agents — Mocking LLMs, deterministic tests
- Evals — Accuracy, quality, regression benchmarks
- Tracing & Debugging — Observability during development
- Red Teaming & Safety — Adversarial testing, guardrails
- Benchmarking — Comparing models, prompts, architectures head-to-head
- Eval Frameworks — Promptfoo, Braintrust, Langfuse integration
- Eval Harness
— Complete eval pipeline combining all techniques
One agent, nine implementations. Build the same system with each framework and compare trade-offs with your own hands.
- No Framework — Raw SDK baseline
- LangGraph — Graph-based orchestration
- Pydantic AI — Type-safe agents
- Google ADK — Google's Agent Development Kit
- AWS Strands — AWS agent SDK
- CrewAI — Role-based multi-agent collaboration
- AutoGen — Multi-agent conversations
- LlamaIndex — Data-centric agents
- Semantic Kernel — Microsoft AI orchestration
The gap between "works on my laptop" and "runs reliably at scale." Principles, deployment, monitoring, cost control, and security.
- 12-Factor Agents — Principles for production-grade agents
- Deployment Strategies — Containers, serverless, scaling
- Monitoring & Observability — Metrics, logging, tracing in prod
- Cost Optimization — Token budgets, caching, model routing
- Security & Guardrails — Auth, sandboxing, injection defense
- Error Handling & Resilience — Retries, fallbacks, graceful degradation
If you find this project useful, consider supporting us:
Module not found? Run uv sync in the lesson directory.
API errors or authentication failures? You need API keys from Anthropic, OpenAI, or both, depending on which examples you run. See SETUP.md for details.
This project is licensed under the MIT License - see the LICENSE file for details.