Skip to content

8Ginette8/gbif.range

Repository files navigation

gbif.range R package

Auto-Version R-CMD-check Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Species ranges can be estimated from expert maps (for example IUCN and EUFORGEN) or with modelling approaches. Expert data, however, remain unavailable for many species, whereas modelling workflows often require substantial technical expertise and large numbers of occurrence records.

When such data are unavailable, they can often be approximated from the Global Biodiversity Information Facility (GBIF), the largest public repository of georeferenced species observations worldwide (https://www.gbif.org/). Retrieving GBIF records at large scale in R can still be cumbersome, especially if users are unfamiliar with the practical limits of the rgbif package.

gbif.range provides a workflow to retrieve GBIF records, filter them for spatial analyses, generate ecologically informed range maps from bundled or custom ecoregions, and evaluate the resulting products. The package also includes utilities to create GBIF-derived DOIs, inspect GBIF taxonomy, and thin large occurrence datasets.

(source: globe image from the Noun Project adapted by LenaCassie-Studio)

Main functions

  • get_gbif(): improves the accessibility of the rgbif R package (CRAN) in retrieving GBIF observations of a given species (accepted and synonym names). It uses a dynamic moving windows if the given geographic extent contains > 100,000 observations and implements 13 post-processing options to flag and clean erroneous records based on custom functions and the CoordinateCleaner R package (CRAN).

  • get_gbif_count(): estimates how many GBIF records are available for a taxon using the same taxonomic matching logic as get_gbif(), which is useful before launching large downloads.

  • get_range(): estimates species ranges based on occurrence data (a getGBIF output or a set of coordinates) and ecoregion polygons.

  • read_ecoreg(): download and read available ecoregion files from different available URL sources. See also associated calls ecoreg_list, get_ecoreg() and check_and_get_ecoreg().

  • get_status(): generates, based on a given species name, its IUCN red list status and a list of all scientific names (accepted, synonyms) found in the GBIF backbone taxonomy. Children and related doubtful names not used to download the data may also be extracted.

  • obs_filter(): obs_filter() accepts as input a getGBIF output (one or several species) and filter the observations according to a specific given grid resolution. It can retain one observation per grid pixel and/or remove observations from grid pixels that contain fewer than a specified number of records.

  • make_tiles(): may be used to generate a set of SpatialExtent and geometry arguments POLYGON() based on a given geographic extent. This function is meant to help users who want to use the rgbif R package and its parameter geometry that uses a POLYGON() argument.

  • get_doi(): a small wrapper of derived_dataset() in rgbif that simplifies obtaining a general DOI for a set of several gbif species datasets.

  • make_ecoreg(): a function to create custom ecoregions based on environmental layers.

  • evaluate_range(): evaluation function to validate the species ranges with distribution information provided by the user.

  • cv_range(): cross-validation function to evaluate a getRange output based on its occurrence data.

  • make_blocks(): helper used to split observations into approximately balanced random or spatially structured folds, for example in cross-validation workflows.

  • split_gbif_by_species(): streams a large downloaded GBIF table from disk and writes one occurrence file per species or GBIF taxon key without loading the full table into memory.

  • species_csvs_to_ranges(): reads those per-species files sequentially, keeps the minimal occurrence columns needed by get_range(), and saves one range output per species.

  • read_range_rds(): reads back .rds range files created by species_csvs_to_ranges() and restores the saved range output for plotting or further analysis.

Installation

You can install the development version from GitHub with (make sure the R package remotes is up to date):

remotes::install_github("8Ginette8/gbif.range", build_vignettes = TRUE)
library(gbif.range)

If you install from GitHub or a local source tree without build_vignettes = TRUE, the package will load normally but browseVignettes("gbif.range") will not find the workflow vignettes.

Vignettes

The package now includes three focused workflow vignettes:

  • ecoregion-constrained-range-inference: the core logic of get_range(), packaged versus custom ecoregions, and evaluation workflows.
  • gbif-retrieval-and-taxonomy: synonym-aware GBIF backbone inspection, credential-free retrieval, filtering, tiling, and GBIF-derived DOIs.
  • large-downloaded-gbif-tables: the disk-based workflow for splitting large downloaded GBIF tables and generating one range per species.

After installation, open them with:

browseVignettes("gbif.range")

vignette("ecoregion-constrained-range-inference", package = "gbif.range")
vignette("gbif-retrieval-and-taxonomy", package = "gbif.range")
vignette("large-downloaded-gbif-tables", package = "gbif.range")

If they are not found after a GitHub or local install, reinstall with build_vignettes = TRUE, or install the built source tarball with R CMD INSTALL.

Example

Terrestrial species

Let's download worldwide the records of Panthera tigris only based on true observations and literature (default):

# Download
obs.pt <- get_gbif(sp_name = "Panthera tigris")

# Plot species records
countries <- rnaturalearth::ne_countries(type = "countries", returnclass = "sv")
terra::plot(countries, col = "#bcbddc")
points(obs.pt[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1.5)

image

Note that the function did not manage to get rid of observations of most likely non-informed captive individuals (e.g., in Europe, U.S. and South Africa); see the CoordinateCleaner R package (CRAN) for improved filtering. We can also retrieve the tiger IUCN red list status, and its scientific names (accepted and synonyms) that were used in the download with the GBIF backbone taxonomy. If all = TRUE, additional children and related doubtful names may also be extracted (not used in get_gbif()):

get_status("Panthera tigris", all = FALSE)

Let's now extract the terrestrial ecoregions of the world (Nature Conservancy) and generate the distributional range map of Panthera tigris :

# Download ecoregion and read
eco.terra <- read_ecoreg(ecoreg_name = "eco_terra", save_dir = NULL)

# Range
range.tiger <- get_range(occ_coord = obs.pt,
                        ecoreg = eco.terra,
                        ecoreg_name = "ECO_NAME",
                        degrees_outlier = 5,
                        clust_pts_outlier = 4)

Let's plot the result now:

terra::plot(countries, col = "#bcbddc")
terra::plot(range.tiger$rangeOutput, col = "#238b45", add = TRUE, axes = FALSE, legend = FALSE)

image

Here, default parameters were employed, however, clust_pts_outlier (in degrees, ~440 km here) could have been increased to remove larger isolated clusters of observations, and degrees_outlier (~550 km here) to account for more apart observations in the range process. Here, default parameters still allowed to remove obvious tiger observation anomalies in Europe, U.S. and South Africa.

Available ecoregions

Any suitable shapefile can be supplied to get_range(), but the package can also download several ecoregion layers directly: eco_terra (for terrestrial species; The Nature Conservancy 2009 adapted from Olson et al. 2001), eco_marine (for marine species, two versions; The Nature Conservancy 2012 adapted from Spalding et al. 2007, 2012), and eco_fresh (for freshwater species; Abell et al. 2008). Each is available at different levels of detail:

  • eco_terra has three different levels: 'ECO_NAME', 'WWF_MHTNAM' and 'WWF_REALM2'.
  • eco_fresh has only one: 'ECOREGION'.
  • eco_marine and eco_hd_marine (the more coastline-precise version) contain three distinct levels: 'ECOREGION', 'PROVINCE' and 'REALM'.

Available ecoregion files that can be downloaded with the package:

# List
ecoreg_list

Custom ecoregions

Additionally, if the in-house ecoregions are too coarse for a given geographic region (e.g., for local studies) or an ecoshapefile of finer environmental details is needed, make_ecoreg() can be used based on spatially-informed data (e.g. climate, biodiversity) of desired resolution and extent defining the study area.

Example of 10 ecoregions in the European Alps based on CHELSA bioclimatic layers at 5 × 5 km resolution (Karger et al. 2017), i.e., mean annual air temperature (bio1) and annual precipitation amount (bio12) 1981–2010:

bio <- terra::rast(paste0(system.file(package = "gbif.range"), "/extdata/rst.tif"))
eco.eg <- make_ecoreg(env = bio, nclass = 10)
terra::plot(eco.eg, col = rainbow(10))
image

Let's further demonstrate how a custom map of ecoregions can be employed in combination with the package's main functions:

# Let's download the observations of Arctostaphylos alpinus in the European Alps:
shp.lonlat <- terra::vect(paste0(system.file(package = "gbif.range"), "/extdata/shp_lonlat.shp"))
obs.arcto <- get_gbif(sp_name = "Arctostaphylos alpinus",
                      geo = shp.lonlat,
                      grain = 1)

# Create an ecoregion layer of 200 classes, based on two environmental spatial layers:
rst <- terra::rast(paste0(system.file(package = "gbif.range"), "/extdata/rst.tif"))
my.eco <- make_ecoreg(env = rst,
                        nclass = 200)

# Create the range map based on our custom ecoregion
# (always set 'EcoRegion' as a name when using a make_ecoreg() output):
range.arcto <- get_range(occ_coord = obs.arcto,
                        ecoreg = my.eco,
                        ecoreg_name = "EcoRegion",
                        degrees_outlier = 5,
                        clust_pts_outlier = 4,
                        res = 0.05)

Unlike at larger-scales, we have here decreased the get_gbif() grain parameter from 100km to 1km, as keeping observations with a precision of 100km would have been too coarse to infer the approximate range distribution of the species relative to the study extent. degrees_outlier and clust_pts_outlier were here also kept defaults (~550 and 440 km, respectively), so relative to the study extent, almost no clustered or too distance observations were considered outliers.

It is also important to note that the resolution parameter (res) can be changed to adjust how fine the spatial output should be. This highest possible resolution will only depend on the precision of the ecoreg object (e.g., a range output can reach the same resolution of the rasters used to create a make_ecoreg object).

# Plot
alps.shp <- terra::crop(countries,terra::ext(rst))
r.arcto <- terra::mask(range.arcto$rangeOutput,alps.shp)
terra::plot(alps.shp, col = "#bcbddc")
terra::plot(r.arcto, add = TRUE, col = "darkgreen", axes = FALSE, legend = FALSE)
points(obs.arcto[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1)
image

Marine species

Let's reapply the same process as for Panthera tigris, but with the marine species Delphinus delphis (> 100'000 observations).

⚠️Notes that the download takes here longer unless the parameter occ_samp is used. Altough giving less precise observational distribution, occ_samp allows to extract a subsample of n GBIF observations per created tiles over the study area:

# Here the example is a sample of 1000 observations per geographic tile
obs.dd <- get_gbif("Delphinus delphis", occ_samp = 1000)

# Here the list is longer because 'all=TRUE' includes every names (even doubtful)
get_status("Delphinus delphis", all = TRUE)

Let's now generate three range maps of Delphinus delphis using the eco.marine as ecoregion shapefile:

# Download ecoregion and read
eco.marine <- read_ecoreg(ecoreg_name = "eco_marine", save_dir = NULL)

# Range from different levels
range.dd1 <- get_range(obs.dd, eco.marine, "ECOREGION")
range.dd2 <- get_range(obs.dd, eco.marine, "PROVINCE")
range.dd3 <- get_range(obs.dd, eco.marine, "REALM")

The three results are pretty similar because most of the observations are near the coast. But let's plot the first more fine result:

terra::plot(countries, col = "#bcbddc")
terra::plot(range.dd3$rangeOutput, col = "#238b45", add = TRUE, axes = FALSE, legend = FALSE)
points(obs.dd[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1)

image

Although our result map follows the sampling pattern found in GBIF, the dolphin range map might have been improved if more GBIF observations would have been extracted. Therefore, occ_samp must be in this case increased or removed.

Large downloaded GBIF tables

For very large multi-species GBIF exports already stored on disk, the package also provides a disk-based workflow:

gbif_file <- system.file("extdata", "occ_example_4sps.csv", package = "gbif.range")

split_dir <- file.path(tempdir(), "gbif_split")
range_dir <- file.path(tempdir(), "gbif_ranges")

# Split one downloaded GBIF table into one species file per GBIF key.
split_summary <- split_gbif_by_species(
  input_file = gbif_file,
  outdir = split_dir,
  chunk_size = 100,
  sep_in = "\t",
  sep_out = "\t",
  overwrite = TRUE,
  verbose = FALSE
)

# Build one range per species from those on-disk occurrence files.
range_summary <- species_csvs_to_ranges(
  species_dir = split_dir,
  ecoreg = "eco_terra",
  ecoreg_name = "ECO_NAME",
  outdir = range_dir,
  range_save_as = "rds",
  overwrite = TRUE,
  verbose = FALSE
)

# Read one saved range back from disk.
rg <- read_range_rds(range_summary$range_file[1])
terra::plot(rg$rangeOutput)

This disk-based workflow is described in more detail in the vignette large-downloaded-gbif-tables.

Citation

Yohann Chauvier, Oskar Hagen, Stefan Pinkert, Camille Albouy, Fabian Fopp, Philipp Brun, Patrice Descombes, Florian Altermatt, Loic Pellissier, Katalin Csilléry. gbif.range: An R package to generate ecologically-informed species range maps from occurrence data with seamless GBIF integration. Authorea. June 30, 2025. doi: 10.22541/au.175130858.83083354/v1

References

Chamberlain, S., Oldoni, D., & Waller, J. (2022). rgbif: interface to the global biodiversity information facility API. doi: 10.5281/zenodo.6023735

Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, H.P., , Kessler, M. (2017). Climatologies at high resolution for the earth’s land surface areas. Sci Data 4, 170122. doi: 10.1038/sdata.2017.122

Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter, C., Edler, D., ... & Antonelli, A. (2019). CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution, 10(5), 744-751. doi: 10.1111/2041-210X.13152

Hijmans, Robert J. "terra: Spatial Data Analysis. R Package Version 1.6-7." (2022). Link to package: terra - CRAN

Hagen, O., Vaterlaus, L., Albouy, C., Brown, A., Leugger, F., Onstein, R. E., Novaes de Santana, C., Scotese, C. R., & Pellissier, L. (2019). Mountain building, climate cooling and the richness of cold-adapted plants in the Northern Hemisphere. Journal of Biogeography, 46(8), 1792-1807. doi: 10.1111/jbi.13653

Hagen, O. Species_Range_Mapping. GitHub repository. Available at: https://github.com/ohagen/Species_Range_Mapping

Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, N. D., Powell, G. V. N., Underwood, E. C., D'Amico, J. A., Itoua, I., Strand, H. E., Morrison, J. C., Loucks, C. J., Allnutt, T. F., Ricketts, T. H., Kura, Y., Lamoreux, J. F., Wettengel, W. W., Hedao, P., Kassem, K. R. 2001. Terrestrial ecoregions of the world: a new map of life on Earth. BioScience 51(11):933-938. doi: 10.1641/0006-3568(2001)051

The Nature Conservancy (2009). Global Ecoregions, Major Habitat Types, Biogeographical Realms and The Nature Conservancy Terrestrial Assessment Units. GIS layers developed by The Nature Conservancy with multiple partners, combined from Olson et al. (2001), Bailey 1995 and Wiken 1986. Cambridge (UK): The Nature Conservancy. Data URL: https://geospatial.tnc.org/datasets/b1636d640ede4d6ca8f5e369f2dc368b/about

Mark D. Spalding, Helen E. Fox, Gerald R. Allen, Nick Davidson, Zach A. Ferdaña, Max Finlayson, Benjamin S. Halpern, Miguel A. Jorge, Al Lombana, Sara A. Lourie, Kirsten D. Martin, Edmund McManus, Jennifer Molnar, Cheri A. Recchia, James Robertson, Marine Ecoregions of the World: A Bioregionalization of Coastal and Shelf Areas, BioScience, Volume 57, Issue 7, July 2007, Pages 573–583. doi: 10.1641/B570707

Spalding, M. D., Agostini, V. N., Rice, J., & Grant, S. M. (2012). Pelagic provinces of the world: a biogeographic classification of the world’s surface pelagic waters. Ocean & Coastal Management, 60, 19-30. doi: 10.1016/j.ocecoaman.2011.12.016

The Nature Conservancy (2012). Marine Ecoregions and Pelagic Provinces of the World. GIS layers developed by The Nature Conservancy with multiple partners, combined from Spalding et al. (2007) and Spalding et al. (2012). Cambridge (UK): The Nature Conservancy. Data URL: http://data.unep-wcmc.org/datasets/38

Robin Abell, Michele L. Thieme, Carmen Revenga, Mark Bryer, Maurice Kottelat, Nina Bogutskaya, Brian Coad, Nick Mandrak, Salvador Contreras Balderas, William Bussing, Melanie L. J. Stiassny, Paul Skelton, Gerald R. Allen, Peter Unmack, Alexander Naseka, Rebecca Ng, Nikolai Sindorf, James Robertson, Eric Armijo, Jonathan V. Higgins, Thomas J. Heibel, Eric Wikramanayake, David Olson, Hugo L. López, Roberto E. Reis, John G. Lundberg, Mark H. Sabaj Pérez, Paulo Petry, Freshwater Ecoregions of the World: A New Map of Biogeographic Units for Freshwater Biodiversity Conservation, BioScience, Volume 58, Issue 5, May 2008, Pages 403–414. doi: 10.1641/B580507

Packages

 
 
 

Contributors

Languages