Skip to content

brownag/py-soildb

Repository files navigation

soildb

PyPI version License: MIT

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.

Overview

soildb provides Python access to:

  • Soil Survey Data: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
  • Laboratory Data: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
  • Bulk Downloads: Complete SSURGO/STATSGO datasets from Web Soil Survey
  • Multiple Backends: Query data from SDA web service, local SQLite snapshots, or GeoPackage files

Query soil survey data via web service or local database, export to pandas/polars DataFrames, and handle spatial queries.

Installation

pip install soildb

For spatial functionality:

pip install soildb[spatial]

For all optional features support:

pip install soildb[all]

Features

Soil Survey Data (SDA)

  • Query SSURGO/STATSGO data from NRCS Soil Data Access web service
  • Build custom SQL queries with fluent interface
  • Spatial queries with points, bounding boxes, and polygons
  • Bulk data fetching with automatic pagination and chunking
  • Export to pandas and polars DataFrames

Laboratory Characterization Data

  • Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
  • Query via SDA web service or local SQLite snapshot databases
  • Full horizon-level data with lab analyses
  • Structured object models for nested pedon data
  • Support for flexible column selection

Web Soil Survey Downloads

  • Download complete SSURGO datasets as ZIP files
  • Download STATSGO (general soil map) data
  • Concurrent downloads with progress tracking
  • Automatic file extraction and organization
  • State-wide and custom area selections

Multi-Backend Support

  • Query from SDA web service (live data)
  • Query from local SQLite snapshots (offline analysis)
  • Support for GeoPackage files with spatial features
  • Unified interface across all backends
  • Async I/O for high performance and concurrency

Quick Start

Query Builder

Build and execute custom SQL queries with the fluent interface:

from soildb import Query

query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))

# Inspect the generated SQL
print(query.to_sql())

# Execute and get results
import asyncio
from soildb import SDAClient

async def main():
    result = await SDAClient().execute(query)
    return result.to_pandas()

df = asyncio.run(main())
print(df.head())
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
    mukey                                             muname  musym
0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

Async vs Synchronous Usage

All soildb functions have both async and synchronous versions. For most use cases, the synchronous .sync() version is simpler and easier to use.

Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { vertical-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

The .sync methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()

Convenience Functions

soildb provides high-level functions for common tasks:

from soildb import get_mapunit_by_areasymbol

mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

If you have suggestions for new convenience functions please file a feature request on GitHub.

Spatial Queries

Query soil data by location with points, bounding boxes, or polygons:

from soildb import spatial_query

# Point query
response = spatial_query.sync(
    geometry="POINT(-93.6 42.0)",
    table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
Point query found 1 results
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey areasymbol musym nationalmusym muname mukind
0 411278 IA169 1314 fsz1 Hanlon-Spillville complex, channeled, 0 to 2 p... Complex

Bulk Data Fetching

Retrieve large datasets efficiently with automatic pagination and chunking:

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch components in chunks automatically
response = fetch_by_keys.sync(
    all_mukeys, 
    "component", 
    key_column="mukey", 
    chunk_size=100,
    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey cokey compname localphase comppct_r
0 408333 25562547 Kingston <NA> 2
1 408333 25562548 Okoboji <NA> 5
2 408333 25562549 Spicer <NA> 90
3 408333 25562550 Madelia <NA> 3
4 408334 25562837 Okoboji <NA> 5
5 408334 25562838 Glencoe <NA> 3
6 408334 25562839 Canisteo <NA> 2
7 408334 25562840 Webster <NA> 85
8 408334 25562841 Nicollet <NA> 5
9 408335 25562135 Biscay <NA> 1

The component table has a hierarchical relationship:

  • mukey (map unit key) is the parent
  • cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to get all components for specific map units.

Use the fetch_by_keys() function with the "mukey" as the key_column to achieve this with automatic pagination over chunks with 100 rows each (or specify your own chunk_size).

Bulk Downloads (Web Soil Survey)

Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:

from soildb import download_wss

# Download specific survey areas
paths = download_wss.sync(
    areasymbols=["IA109", "IA113"],
    dest_dir="./ssurgo_data",
    extract=True
)
print(f"Downloaded {len(paths)} survey areas")

# Download all survey areas for a state
paths = download_wss.sync(
    where_clause="areasymbol LIKE 'IA%'",
    dest_dir="./iowa_ssurgo",
    extract=True,
    remove_zip=True  # Clean up ZIP files after extraction
)

# Download STATSGO (general soil map) data
paths = download_wss.sync(
    areasymbols=["IA"],
    db="STATSGO",
    dest_dir="./iowa_statsgo",
    extract=True
)

Each extracted survey area directory contains:

  • tabular/ - Pipe-delimited TXT files with soil data tables
  • spatial/ - ESRI shapefiles with map unit polygons and boundaries

Use Cases:

  • SDA: Live queries, filtered data, programmatic access to current data
  • WSS Downloads: Complete offline datasets, bulk data for analysis, static snapshots updated annually

Async Usage

For performance-critical applications, use async functions directly with concurrent requests:

import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol

async def concurrent_example():
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    all_mukeys = await get_mukey_by_areasymbol(areas)
    
    # Fetch components concurrently with automatic pagination
    response = await fetch_by_keys(
        all_mukeys,
        "component",
        key_column="mukey",
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "comppct_r"]
    )
    return response.to_pandas()

# Run async function
df = asyncio.run(concurrent_example())

For more async patterns, see the Async Programming Guide.

Examples

See the examples/ directory and documentation for detailed usage patterns.

License

This project is licensed under the MIT License. See the LICENSE file for details.