soildb

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.

Overview

soildb provides Python access to:

Soil Survey Data: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
Laboratory Data: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
Bulk Downloads: Complete SSURGO/STATSGO datasets from Web Soil Survey
Multiple Backends: Query data from SDA web service, local SQLite snapshots, or GeoPackage files

Query soil survey data via web service or local database, export to pandas/polars DataFrames, and handle spatial queries.

Installation

pip install soildb

For spatial functionality:

pip install soildb[spatial]

For all optional features support:

pip install soildb[all]

Features

Soil Survey Data (SDA)

Query SSURGO/STATSGO data from NRCS Soil Data Access web service
Build custom SQL queries with fluent interface
Spatial queries with points, bounding boxes, and polygons
Bulk data fetching with automatic pagination and chunking
Export to pandas and polars DataFrames

Laboratory Characterization Data

Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
Query via SDA web service or local SQLite snapshot databases
Full horizon-level data with lab analyses
Structured object models for nested pedon data
Support for flexible column selection

Web Soil Survey Downloads

Download complete SSURGO datasets as ZIP files
Download STATSGO (general soil map) data
Concurrent downloads with progress tracking
Automatic file extraction and organization
State-wide and custom area selections

Multi-Backend Support

Query from SDA web service (live data)
Query from local SQLite snapshots (offline analysis)
Support for GeoPackage files with spatial features
Unified interface across all backends
Async I/O for high performance and concurrency

Quick Start

Query Builder

Build and execute custom SQL queries with the fluent interface:

from soildb import Query

query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))

# Inspect the generated SQL
print(query.to_sql())

# Execute and get results
import asyncio
from soildb import SDAClient

async def main():
    result = await SDAClient().execute(query)
    return result.to_pandas()

df = asyncio.run(main())
print(df.head())

SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
    mukey                                             muname  musym
0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

Async vs Synchronous Usage

All soildb functions have both async and synchronous versions. For most use cases, the synchronous .sync() version is simpler and easier to use.

Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()

Found 80 map units

	mukey	musym	muname	mukind	muacres	areasymbol	areaname
0	408333	1032	Spicer silty clay loam, 0 to 2 percent slopes	Consociation	1834	IA109	Kossuth County, Iowa
1	408334	107	Webster clay loam, 0 to 2 percent slopes	Consociation	46882	IA109	Kossuth County, Iowa
2	408335	108	Wadena loam, 0 to 2 percent slopes	Consociation	807	IA109	Kossuth County, Iowa
3	408336	108B	Wadena loam, 2 to 6 percent slopes	Consociation	1103	IA109	Kossuth County, Iowa
4	408337	1133	Colo silty clay loam, channeled, 0 to 2 percen...	Consociation	1403	IA109	Kossuth County, Iowa

The .sync methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()

Convenience Functions

soildb provides high-level functions for common tasks:

from soildb import get_mapunit_by_areasymbol

mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()

Found 80 map units

	mukey	musym	muname	mukind	muacres	areasymbol	areaname
0	408333	1032	Spicer silty clay loam, 0 to 2 percent slopes	Consociation	1834	IA109	Kossuth County, Iowa
1	408334	107	Webster clay loam, 0 to 2 percent slopes	Consociation	46882	IA109	Kossuth County, Iowa
2	408335	108	Wadena loam, 0 to 2 percent slopes	Consociation	807	IA109	Kossuth County, Iowa
3	408336	108B	Wadena loam, 2 to 6 percent slopes	Consociation	1103	IA109	Kossuth County, Iowa
4	408337	1133	Colo silty clay loam, channeled, 0 to 2 percen...	Consociation	1403	IA109	Kossuth County, Iowa

If you have suggestions for new convenience functions please file a feature request on GitHub.

Spatial Queries

Query soil data by location with points, bounding boxes, or polygons:

from soildb import spatial_query

# Point query
response = spatial_query.sync(
    geometry="POINT(-93.6 42.0)",
    table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")

Point query found 1 results

	mukey	areasymbol	musym	nationalmusym	muname	mukind
0	411278	IA169	1314	fsz1	Hanlon-Spillville complex, channeled, 0 to 2 p...	Complex

Bulk Data Fetching

Retrieve large datasets efficiently with automatic pagination and chunking:

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch components in chunks automatically
response = fetch_by_keys.sync(
    all_mukeys, 
    "component", 
    key_column="mukey", 
    chunk_size=100,
    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")

Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records

	mukey	cokey	compname	localphase	comppct_r
0	408333	25562547	Kingston	<NA>	2
1	408333	25562548	Okoboji	<NA>	5
2	408333	25562549	Spicer	<NA>	90
3	408333	25562550	Madelia	<NA>	3
4	408334	25562837	Okoboji	<NA>	5
5	408334	25562838	Glencoe	<NA>	3
6	408334	25562839	Canisteo	<NA>	2
7	408334	25562840	Webster	<NA>	85
8	408334	25562841	Nicollet	<NA>	5
9	408335	25562135	Biscay	<NA>	1

The component table has a hierarchical relationship:

mukey (map unit key) is the parent
cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to get all components for specific map units.

Use the fetch_by_keys() function with the "mukey" as the key_column to achieve this with automatic pagination over chunks with 100 rows each (or specify your own chunk_size).

Bulk Downloads (Web Soil Survey)

Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:

from soildb import download_wss

# Download specific survey areas
paths = download_wss.sync(
    areasymbols=["IA109", "IA113"],
    dest_dir="./ssurgo_data",
    extract=True
)
print(f"Downloaded {len(paths)} survey areas")

# Download all survey areas for a state
paths = download_wss.sync(
    where_clause="areasymbol LIKE 'IA%'",
    dest_dir="./iowa_ssurgo",
    extract=True,
    remove_zip=True  # Clean up ZIP files after extraction
)

# Download STATSGO (general soil map) data
paths = download_wss.sync(
    areasymbols=["IA"],
    db="STATSGO",
    dest_dir="./iowa_statsgo",
    extract=True
)

Each extracted survey area directory contains:

tabular/ - Pipe-delimited TXT files with soil data tables
spatial/ - ESRI shapefiles with map unit polygons and boundaries

Use Cases:

SDA: Live queries, filtered data, programmatic access to current data
WSS Downloads: Complete offline datasets, bulk data for analysis, static snapshots updated annually

Async Usage

For performance-critical applications, use async functions directly with concurrent requests:

import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol

async def concurrent_example():
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    all_mukeys = await get_mukey_by_areasymbol(areas)
    
    # Fetch components concurrently with automatic pagination
    response = await fetch_by_keys(
        all_mukeys,
        "component",
        key_column="mukey",
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "comppct_r"]
    )
    return response.to_pandas()

# Run async function
df = asyncio.run(concurrent_example())

For more async patterns, see the Async Programming Guide.

Examples

See the examples/ directory and documentation for detailed usage patterns.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github		.github
docs		docs
scripts		scripts
src/soildb		src/soildb
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soildb

Overview

Installation

Features

Soil Survey Data (SDA)

Laboratory Characterization Data

Web Soil Survey Downloads

Multi-Backend Support

Quick Start

Query Builder

Async vs Synchronous Usage

Synchronous Usage

Convenience Functions

Spatial Queries

Bulk Data Fetching

Bulk Downloads (Web Soil Survey)

Async Usage

Examples

License

About

Uh oh!

Releases 9

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

soildb

Overview

Installation

Features

Soil Survey Data (SDA)

Laboratory Characterization Data

Web Soil Survey Downloads

Multi-Backend Support

Quick Start

Query Builder

Async vs Synchronous Usage

Synchronous Usage

Convenience Functions

Spatial Queries

Bulk Data Fetching

Bulk Downloads (Web Soil Survey)

Async Usage

Examples

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Uh oh!

Contributors

Uh oh!

Languages