NISAR Access with R by HarshiniGirish · Pull Request #574 · MAAP-Project/maap-documentation

HarshiniGirish · 2026-03-18T18:39:03Z

No description provided.

review-notebook-app · 2026-03-18T18:39:10Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2026-03-19T01:38:47Z

View / edit / reply to this conversation on ReviewNB

hrodmn commented on 2026-03-19T01:38:46Z
----------------------------------------------------------------

Should we target the Hub for this instead of the ADE?

HarshiniGirish commented on 2026-03-23T15:34:00Z
----------------------------------------------------------------

I am running the notebook on the hub it was failing due to a Python package compatibility issue in the Hub environment. A dependency is trying to import ssl from urllib3.util.ssl_, but the installed urllib3 version in the notebook environment no longer exposes that symbol when I try to run the below cell

earthaccess <- import("earthaccess")

maap_module <- import("maap.maap", convert = FALSE)

MAAP <- maap_module$MAAP

maap <- MAAP()

review-notebook-app · 2026-03-19T01:38:47Z

View / edit / reply to this conversation on ReviewNB

hrodmn commented on 2026-03-19T01:38:47Z
----------------------------------------------------------------

Great use of the tempdir() here. I wonder what the impact of downloading many gigabytes to the /tmp directory on the Hub is though. @wildintellect might know.

I don't love that we have to download these entire granule files to work on them in R. You could add something like "if your workflow is taking too long due to the download process consider using the Python workflow" and then link to the NISAR Python notebook.

review-notebook-app · 2026-03-19T01:38:48Z

View / edit / reply to this conversation on ReviewNB

hrodmn commented on 2026-03-19T01:38:47Z
----------------------------------------------------------------

This is a really nice snippet but we should update this to clip out a specific area of interest (in projected coordinates) rather than grid cell indexes.

hrodmn

@HarshiniGirish really nice job on this one. It is succinct and to the point, and the formatting is so clean 🫶 .

I have a few change requests:

Replace the local download option method with some kind of cloud-native data access path. I know support for reading from S3 in R is not great but I think there is a solution out there.
Change the subset operation at the end to use projected coordinates instead of grid cell indexes. This might be a bit of work, but it will be what users want to be able to do.

For the cloud-optimized read solution there are a few possibilities:

Use rhdf5 instead of hdf5r: see https://huber-group-embl.github.io/rhdf5/articles/rhdf5_cloud_reading.html

Maybe we could use GDAL drivers via the terra package to load the file lazily (without downloading the entire file), but I am not really sure how well terra handles the complex HDF5 data structure.

I tried this:

vsis3_path = "/vsis3/sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_002_109_D_063_4005_DHDH_A_20251012T182508_20251012T182531_X05010_N_P_J_001/NISAR_L2_PR_GCOV_002_109_D_063_4005_DHDH_A_20251012T182508_20251012T182531_X05010_N_P_J_001.h5"

# got creds from a python session
setGDALconfig("AWS_SECRET_ACCESS_KEY", "...")
setGDALconfig("AWS_ACCESS_KEY_ID", "...")
setGDALconfig("AWS_SESSION_TOKEN", "...")
setGDALconfig("AWS_REGION", "us-west-2")

# Enable the virtual file system cache
setGDALconfig("VSI_CACHE", "TRUE")

# Set the size of that cache (e.g., 500 MB)
# This prevents re-downloading the same blocks during analysis
setGDALconfig("VSI_CACHE_SIZE", "500000000") 

# Increase the global block cache (default is usually too small)
# This can be a % of your RAM or a specific byte value
setGDALconfig("GDAL_CACHEMAX", "20%")

cube <- sds(vsis3_path)
cube

HarshiniGirish · 2026-03-23T15:34:01Z

I am running the notebook on the hub it was failing due to a Python package compatibility issue in the Hub environment. A dependency is trying to import ssl from urllib3.util.ssl_, but the installed urllib3 version in the notebook environment no longer exposes that symbol when I try to run the below cell

earthaccess <- import("earthaccess")

maap_module <- import("maap.maap", convert = FALSE)

MAAP <- maap_module$MAAP

maap <- MAAP()

View entire conversation on ReviewNB

HarshiniGirish · 2026-03-23T15:56:37Z

@hrodmn thanks for the feedback
I’ve now implemented the requested notebook changes. I replaced the local full-download workflow with a cloud-native access path using direct S3 access through GDAL /vsis3/ and terra, and I updated the subsetting step to use an AOI in projected coordinates instead of grid cell indices. I also added a note in the notebook that if the R workflow is slow or unstable, users should consider using the Python NISAR workflow instead.

I also wanted to mention the main issues I ran into while making these updates. The initial rhdf5 cloud-read approach was not opening the authenticated S3-backed HDF5 file reliably, so I moved to the GDAL /vsis3/ + terra route instead. After that, terra::sds() was able to reach the file, but it produced extent-mismatch warnings because the GCOV HDF5 contains many datasets that do not all behave like one aligned raster stack. I also ran into a layer-selection mismatch at one point, where the HHHH layer existed but the matching logic was too strict. On top of that, the AOI initially extended outside the raster bounds, so I constrained it to the valid extent.

HarshiniGirish · 2026-03-23T20:41:51Z

cc : @wildintellect

hrodmn · 2026-03-24T11:51:24Z

Since reticulate is having trouble on the hub let's use a different approach for EDL authentication and granule search.

As @wildintellect mentioned yesterday there is an R package called earthdatalogin (https://boettiger-lab.github.io/earthdatalogin/index.html) that can be used for EDL purposes. I could not get it to work to get S3 credentials for NISAR using the built-in functions but here is what I came up with for a reticulate-free solution:

library(earthdatalogin)
library(httr2)
library(rstac)


# get S3 credentials using httr2::request with EDL token set in auth header
edl_token = edl_set_token()

resp <- request("https://nisar.asf.earthdatacloud.nasa.gov/s3credentials") |>
  req_auth_bearer_token(edl_token) |>
  req_perform()

creds = resp_body_json(resp)

# search the ASF STAC endpoint for a NISAR granule
items <- stac("https://cmr.earthdata.nasa.gov/stac/ASF") |> 
  stac_search(
      collections = "NISAR_L2_GCOV_BETA_V1_1",
      limit=1
  ) |>
  get_request() |>
  items_next()

item = items$features[[1]]

# get the S3-prefixed asset href
s3_asset_key <- names(item$assets)[startsWith(names(item$assets), "s3")]
s3_link = item$assets[[s3_asset_key]]$href

vsis3_path <- sub("^s3://", "/vsis3/", s3_link)

Now I am getting stuck when trying to read the /vsis3 link with terra on the Hub (this was working last week...).

> cube <- terra::sds(vsis3_path)
Error: [sds] file does not exist: /vsis3/sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_002_109_D_064_4005_DHDH_A_20251012T182530_20251012T182605_X05010_N_F_J_001/NISAR_L2_PR_GCOV_002_109_D_064_4005_DHDH_A_20251012T182530_20251012T182605_X05010_N_F_J_001.h5
In addition: Warning message:
A header you provided implies functionality that is not implemented (GDAL error 17)

I don't know if it is related but when I check terra's gdal version I get 3.8.4 which is not the same that I get when I run gdalinfo --version in the Hub's terminal. I am able to open the same /vsis3 path (with proper S3 credentials set) directly with gdalinfo.

HarshiniGirish · 2026-03-25T01:08:21Z

https://gist.github.com/HarshiniGirish/7f401d4feaa9ff4b0e4df436d06b27d3 (will set a proper PR once the methodology is finalised)

Thankyou @hrodmn

I was able to authenticate with Earthdata, request temporary ASF S3 credentials, query the ASF STAC collection, and correctly resolve the actual .h5 science asset instead of the browse/thumbnail asset. I also confirmed the /vsis3/ path and HDF5 subdatasets are valid through GDAL.

The main issue is that direct streamed HDF5 reads are not reliable in the current notebook R/terra runtime, even though the remote file itself is accessible. Because of that, I cleaned the notebook so it now handles authentication, STAC lookup, asset selection, and subdataset path construction in R, and then prints the exact GDAL commands needed to stream only the required variables and create small output rasters for use in the notebook. I chose this approach because it keeps the workflow cloud-based without downloading the full .h5, while avoiding the runtime limitations we kept hitting with direct terra access.

What I found is that this workflow is not very straightforward in the current notebook environment. I would prefer an easier and more stable approach if possible, looking forward for the feedback.

HarshiniGirish added 2 commits March 18, 2026 13:37

NISAR_access_with_r

d187def

Add NISAR_access_with_r.ipynb to science examples

97a6862

HarshiniGirish requested a review from hrodmn March 18, 2026 18:39

hrodmn requested changes Mar 19, 2026

View reviewed changes

avoiding download of dataset and streaming

e720366

HarshiniGirish requested a review from hrodmn March 23, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NISAR Access with R#574

NISAR Access with R#574
HarshiniGirish wants to merge 3 commits intoMAAP-Project:developfrom
HarshiniGirish:nisar_r

HarshiniGirish commented Mar 18, 2026

Uh oh!

review-notebook-app bot commented Mar 18, 2026

Uh oh!

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

hrodmn left a comment

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

hrodmn commented Mar 24, 2026

Uh oh!

HarshiniGirish commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HarshiniGirish commented Mar 18, 2026

Uh oh!

review-notebook-app bot commented Mar 18, 2026

Uh oh!

review-notebook-app bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hrodmn left a comment

Choose a reason for hiding this comment

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

HarshiniGirish commented Mar 23, 2026

Uh oh!

hrodmn commented Mar 24, 2026

Uh oh!

HarshiniGirish commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading

review-notebook-app bot commented Mar 19, 2026 •

edited

Loading