Skip to content

moswek/unpaved

Unpaved: AI Developer Tool Bias Audit Toolkit

AI developer tools were built for one kind of road. Most of us build on a different one.

When a developer in Lagos tries to integrate Flutterwave and gets responses optimized for Stripe, or a team in Manila builds offline-first mobile apps only to have their AI assistant default to cloud-native architectures, the cost isn't abstract. It's measured in hours of rework, weeks of debugging, and the quiet accumulation of "that's just how it is" that Western toolmakers never have to feel. The dominant AI coding assistants were trained on codebases written in Silicon Valley offices, deployed on American cloud infrastructure, and tested against user behaviors that don't include 2G networks in rural Kenya or USSD menus in Jakarta.

This bias shows up in three places: API references (models assume Western payment processors), architecture patterns (suggestions assume always-online connectivity), and documentation context (examples assume high-bandwidth, high-compute environments). The developers who pay are those building for markets where these assumptions don't hold—because they are the markets, not the edge cases.

Unpaved is an open-source toolkit for measuring and documenting this bias. We provide standardized benchmarks, prompt guides, and a result submission schema so that anyone—anywhere—can run audits against their AI tool of choice and produce comparable data. The goal is not to blame individual models, but to build a collective evidence base that makes the problem visible, measurable, and solvable.

Why Unpaved?

The name is the point. Most of the world's developers build on infrastructure, APIs, and constraints that the dominant AI tools have never encountered. Unpaved is where they work. It is also what we are changing.

We don't want different tools for "other" developers. We want the tools to work for everyone. That starts with measurement.

Directory Structure

Directory Description
benchmarks/ Standardized benchmark tasks for testing AI tool responses
benchmarks/payment-apis/ Integration tasks for African and Asian payment APIs
benchmarks/mobile-money/ USSD flow and mobile money integration benchmarks
benchmarks/infrastructure/ Low-bandwidth and offline-first architecture tasks
benchmarks/compliance/ Data protection compliance tasks for African jurisdictions
results/ Schema and examples for community-submitted benchmark results
prompt-guides/ Detailed prompting instructions for consistent benchmark execution
dataset-guide/ Guides for contributing data and engaging regional communities
tools/ CLI utilities for scoring and validating benchmark results
.github/ Issue and PR templates for contributions

How to Contribute

  1. Run a benchmark: Pick a benchmark task from benchmarks/, use the corresponding prompt guide, and test your AI tool of choice.
  2. Submit your result: Use the results/example-result.json format and submit via GitHub Issues using the benchmark-result template.
  3. Add new benchmarks: If you're working with APIs or patterns we haven't covered, create a new benchmark task and submit via the new-benchmark-task template.
  4. Improve prompt guides: Found a better way to prompt for a specific task? Submit your methodology via the prompt-guide-submission template.

Current Benchmark Coverage

Category API/Region Coverage
Payment APIs Flutterwave (Nigeria) Active
Payment APIs M-Pesa Daraja (Kenya) Active
Payment APIs Paystack (Nigeria/Ghana) Active
Payment APIs bKash (Bangladesh) Active
Mobile Money USSD Flows (Regional) Active
Infrastructure Low-Bandwidth Patterns Active
Infrastructure Offline-First Architecture Active
Compliance NDPR (Nigeria) Active
Compliance PDPA (Kenya) Active

Results So Far

Tool Benchmark Category Pass Rate Avg Time to Correct Regions Covered
Coming soon Community data needed - - Add your results!

Built in Kampala. Built for the world. The tools should be too.

Moses Wekesa — Founder, Digital Talisman, Kampala, Uganda

License: MIT PRs Welcome Made in Uganda

About

Open-source audit toolkit for Global South developers to benchmark, document, and reduce AI tool bias in their markets.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors