Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.
soildb provides Python access to:
- Soil Survey Data: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
- Laboratory Data: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
- Bulk Downloads: Complete SSURGO/STATSGO datasets from Web Soil Survey
- Multiple Backends: Query data from SDA web service, local SQLite snapshots, or GeoPackage files
Query soil survey data via web service or local database, export to pandas/polars DataFrames, and handle spatial queries.
pip install soildbFor spatial functionality:
pip install soildb[spatial]For all optional features support:
pip install soildb[all]- Query SSURGO/STATSGO data from NRCS Soil Data Access web service
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination and chunking
- Export to pandas and polars DataFrames
- Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
- Query via SDA web service or local SQLite snapshot databases
- Full horizon-level data with lab analyses
- Structured object models for nested pedon data
- Support for flexible column selection
- Download complete SSURGO datasets as ZIP files
- Download STATSGO (general soil map) data
- Concurrent downloads with progress tracking
- Automatic file extraction and organization
- State-wide and custom area selections
- Query from SDA web service (live data)
- Query from local SQLite snapshots (offline analysis)
- Support for GeoPackage files with spatial features
- Unified interface across all backends
- Async I/O for high performance and concurrency
Build and execute custom SQL queries with the fluent interface:
from soildb import Query
query = (Query()
.select("mukey", "muname", "musym")
.from_("mapunit")
.inner_join("legend", "mapunit.lkey = legend.lkey")
.where("areasymbol = 'IA109'")
.limit(5))
# Inspect the generated SQL
print(query.to_sql())
# Execute and get results
import asyncio
from soildb import SDAClient
async def main():
result = await SDAClient().execute(query)
return result.to_pandas()
df = asyncio.run(main())
print(df.head())SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
mukey muname musym
0 408337 Colo silty clay loam, channeled, 0 to 2 percen... 1133
1 408339 Colo silty clay loam, 0 to 2 percent slopes 133
2 408340 Colo silty clay loam, 2 to 4 percent slopes 133B
3 408345 Clarion loam, 9 to 14 percent slopes, moderate... 138D2
4 408348 Harpster silt loam, 0 to 2 percent slopes 1595
All soildb functions have both async and synchronous versions. For most use cases, the synchronous .sync() version is simpler and easier to use.
For simple scripts and interactive use, soildb provides synchronous versions of all async functions:
from soildb import get_mapunit_by_areasymbol
# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()Found 80 map units
| mukey | musym | muname | mukind | muacres | areasymbol | areaname | |
|---|---|---|---|---|---|---|---|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
The .sync methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:
from soildb import SDAClient, get_mapunit_by_areasymbol
client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()soildb provides high-level functions for common tasks:
from soildb import get_mapunit_by_areasymbol
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()Found 80 map units
| mukey | musym | muname | mukind | muacres | areasymbol | areaname | |
|---|---|---|---|---|---|---|---|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
If you have suggestions for new convenience functions please file a feature request on GitHub.
Query soil data by location with points, bounding boxes, or polygons:
from soildb import spatial_query
# Point query
response = spatial_query.sync(
geometry="POINT(-93.6 42.0)",
table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")Point query found 1 results
| mukey | areasymbol | musym | nationalmusym | muname | mukind | |
|---|---|---|---|---|---|---|
| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |
Retrieve large datasets efficiently with automatic pagination and chunking:
from soildb import fetch_by_keys, get_mukey_by_areasymbol
# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)
print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
# Fetch components in chunks automatically
response = fetch_by_keys.sync(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
| mukey | cokey | compname | localphase | comppct_r | |
|---|---|---|---|---|---|
| 0 | 408333 | 25562547 | Kingston | <NA> | 2 |
| 1 | 408333 | 25562548 | Okoboji | <NA> | 5 |
| 2 | 408333 | 25562549 | Spicer | <NA> | 90 |
| 3 | 408333 | 25562550 | Madelia | <NA> | 3 |
| 4 | 408334 | 25562837 | Okoboji | <NA> | 5 |
| 5 | 408334 | 25562838 | Glencoe | <NA> | 3 |
| 6 | 408334 | 25562839 | Canisteo | <NA> | 2 |
| 7 | 408334 | 25562840 | Webster | <NA> | 85 |
| 8 | 408334 | 25562841 | Nicollet | <NA> | 5 |
| 9 | 408335 | 25562135 | Biscay | <NA> | 1 |
The component table has a hierarchical relationship:
- mukey (map unit key) is the parent
- cokey (component key) is the child
So when fetching components, you typically want to filter by mukey to get all components for specific map units.
Use the fetch_by_keys() function with the "mukey" as the
key_column to achieve this with automatic pagination over chunks with
100 rows each (or specify your own chunk_size).
Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:
from soildb import download_wss
# Download specific survey areas
paths = download_wss.sync(
areasymbols=["IA109", "IA113"],
dest_dir="./ssurgo_data",
extract=True
)
print(f"Downloaded {len(paths)} survey areas")
# Download all survey areas for a state
paths = download_wss.sync(
where_clause="areasymbol LIKE 'IA%'",
dest_dir="./iowa_ssurgo",
extract=True,
remove_zip=True # Clean up ZIP files after extraction
)
# Download STATSGO (general soil map) data
paths = download_wss.sync(
areasymbols=["IA"],
db="STATSGO",
dest_dir="./iowa_statsgo",
extract=True
)Each extracted survey area directory contains:
tabular/- Pipe-delimited TXT files with soil data tablesspatial/- ESRI shapefiles with map unit polygons and boundaries
Use Cases:
- SDA: Live queries, filtered data, programmatic access to current data
- WSS Downloads: Complete offline datasets, bulk data for analysis, static snapshots updated annually
For performance-critical applications, use async functions directly with concurrent requests:
import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol
async def concurrent_example():
# Get mukeys for multiple areas concurrently
areas = ["IA109", "IA113", "IA117"]
all_mukeys = await get_mukey_by_areasymbol(areas)
# Fetch components concurrently with automatic pagination
response = await fetch_by_keys(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "comppct_r"]
)
return response.to_pandas()
# Run async function
df = asyncio.run(concurrent_example())For more async patterns, see the Async Programming Guide.
See the examples/ directory and documentation
for detailed usage patterns.
This project is licensed under the MIT License. See the LICENSE file for details.