AI developer tools were built for one kind of road. Most of us build on a different one.
When a developer in Lagos tries to integrate Flutterwave and gets responses optimized for Stripe, or a team in Manila builds offline-first mobile apps only to have their AI assistant default to cloud-native architectures, the cost isn't abstract. It's measured in hours of rework, weeks of debugging, and the quiet accumulation of "that's just how it is" that Western toolmakers never have to feel. The dominant AI coding assistants were trained on codebases written in Silicon Valley offices, deployed on American cloud infrastructure, and tested against user behaviors that don't include 2G networks in rural Kenya or USSD menus in Jakarta.
This bias shows up in three places: API references (models assume Western payment processors), architecture patterns (suggestions assume always-online connectivity), and documentation context (examples assume high-bandwidth, high-compute environments). The developers who pay are those building for markets where these assumptions don't hold—because they are the markets, not the edge cases.
Unpaved is an open-source toolkit for measuring and documenting this bias. We provide standardized benchmarks, prompt guides, and a result submission schema so that anyone—anywhere—can run audits against their AI tool of choice and produce comparable data. The goal is not to blame individual models, but to build a collective evidence base that makes the problem visible, measurable, and solvable.
The name is the point. Most of the world's developers build on infrastructure, APIs, and constraints that the dominant AI tools have never encountered. Unpaved is where they work. It is also what we are changing.
We don't want different tools for "other" developers. We want the tools to work for everyone. That starts with measurement.
| Directory | Description |
|---|---|
benchmarks/ |
Standardized benchmark tasks for testing AI tool responses |
benchmarks/payment-apis/ |
Integration tasks for African and Asian payment APIs |
benchmarks/mobile-money/ |
USSD flow and mobile money integration benchmarks |
benchmarks/infrastructure/ |
Low-bandwidth and offline-first architecture tasks |
benchmarks/compliance/ |
Data protection compliance tasks for African jurisdictions |
results/ |
Schema and examples for community-submitted benchmark results |
prompt-guides/ |
Detailed prompting instructions for consistent benchmark execution |
dataset-guide/ |
Guides for contributing data and engaging regional communities |
tools/ |
CLI utilities for scoring and validating benchmark results |
.github/ |
Issue and PR templates for contributions |
- Run a benchmark: Pick a benchmark task from
benchmarks/, use the corresponding prompt guide, and test your AI tool of choice. - Submit your result: Use the
results/example-result.jsonformat and submit via GitHub Issues using the benchmark-result template. - Add new benchmarks: If you're working with APIs or patterns we haven't covered, create a new benchmark task and submit via the new-benchmark-task template.
- Improve prompt guides: Found a better way to prompt for a specific task? Submit your methodology via the prompt-guide-submission template.
| Category | API/Region | Coverage |
|---|---|---|
| Payment APIs | Flutterwave (Nigeria) | Active |
| Payment APIs | M-Pesa Daraja (Kenya) | Active |
| Payment APIs | Paystack (Nigeria/Ghana) | Active |
| Payment APIs | bKash (Bangladesh) | Active |
| Mobile Money | USSD Flows (Regional) | Active |
| Infrastructure | Low-Bandwidth Patterns | Active |
| Infrastructure | Offline-First Architecture | Active |
| Compliance | NDPR (Nigeria) | Active |
| Compliance | PDPA (Kenya) | Active |
| Tool | Benchmark Category | Pass Rate | Avg Time to Correct | Regions Covered |
|---|---|---|---|---|
| Coming soon | Community data needed | - | - | Add your results! |
Built in Kampala. Built for the world. The tools should be too.
Moses Wekesa — Founder, Digital Talisman, Kampala, Uganda