Species ranges can be estimated from expert maps (for example IUCN and EUFORGEN) or with modelling approaches. Expert data, however, remain unavailable for many species, whereas modelling workflows often require substantial technical expertise and large numbers of occurrence records.
When such data are unavailable, they can often be approximated from the Global Biodiversity Information Facility (GBIF), the largest public repository of georeferenced species observations worldwide (https://www.gbif.org/). Retrieving GBIF records at large scale in R can still be cumbersome, especially if users are unfamiliar with the practical limits of the rgbif package.
gbif.range provides a workflow to retrieve GBIF records, filter them for spatial analyses, generate ecologically informed range maps from bundled or custom ecoregions, and evaluate the resulting products. The package also includes utilities to create GBIF-derived DOIs, inspect GBIF taxonomy, and thin large occurrence datasets.
(source: globe image from the Noun Project adapted by LenaCassie-Studio)
-
get_gbif(): improves the accessibility of thergbifR package (CRAN) in retrieving GBIF observations of a given species (accepted and synonym names). It uses a dynamic moving windows if the given geographic extent contains > 100,000 observations and implements 13 post-processing options to flag and clean erroneous records based on custom functions and theCoordinateCleanerR package (CRAN). -
get_gbif_count(): estimates how many GBIF records are available for a taxon using the same taxonomic matching logic asget_gbif(), which is useful before launching large downloads. -
get_range(): estimates species ranges based on occurrence data (agetGBIFoutput or a set of coordinates) and ecoregion polygons. -
read_ecoreg(): download and read available ecoregion files from different available URL sources. See also associated callsecoreg_list,get_ecoreg()andcheck_and_get_ecoreg(). -
get_status(): generates, based on a given species name, its IUCN red list status and a list of all scientific names (accepted, synonyms) found in the GBIF backbone taxonomy. Children and related doubtful names not used to download the data may also be extracted. -
obs_filter():obs_filter()accepts as input agetGBIFoutput (one or several species) and filter the observations according to a specific given grid resolution. It can retain one observation per grid pixel and/or remove observations from grid pixels that contain fewer than a specified number of records. -
make_tiles(): may be used to generate a set ofSpatialExtentand geometry argumentsPOLYGON()based on a given geographic extent. This function is meant to help users who want to use thergbifR package and its parametergeometrythat uses aPOLYGON()argument. -
get_doi(): a small wrapper ofderived_dataset()inrgbifthat simplifies obtaining a general DOI for a set of several gbif species datasets. -
make_ecoreg(): a function to create custom ecoregions based on environmental layers. -
evaluate_range(): evaluation function to validate the species ranges with distribution information provided by the user. -
cv_range(): cross-validation function to evaluate agetRangeoutput based on its occurrence data. -
make_blocks(): helper used to split observations into approximately balanced random or spatially structured folds, for example in cross-validation workflows. -
split_gbif_by_species(): streams a large downloaded GBIF table from disk and writes one occurrence file per species or GBIF taxon key without loading the full table into memory. -
species_csvs_to_ranges(): reads those per-species files sequentially, keeps the minimal occurrence columns needed byget_range(), and saves one range output per species. -
read_range_rds(): reads back.rdsrange files created byspecies_csvs_to_ranges()and restores the saved range output for plotting or further analysis.
You can install the development version from GitHub with (make sure the R package remotes is up to date):
remotes::install_github("8Ginette8/gbif.range", build_vignettes = TRUE)
library(gbif.range)If you install from GitHub or a local source tree without build_vignettes = TRUE, the package will load normally but browseVignettes("gbif.range") will not find the workflow vignettes.
The package now includes three focused workflow vignettes:
ecoregion-constrained-range-inference: the core logic ofget_range(), packaged versus custom ecoregions, and evaluation workflows.gbif-retrieval-and-taxonomy: synonym-aware GBIF backbone inspection, credential-free retrieval, filtering, tiling, and GBIF-derived DOIs.large-downloaded-gbif-tables: the disk-based workflow for splitting large downloaded GBIF tables and generating one range per species.
After installation, open them with:
browseVignettes("gbif.range")
vignette("ecoregion-constrained-range-inference", package = "gbif.range")
vignette("gbif-retrieval-and-taxonomy", package = "gbif.range")
vignette("large-downloaded-gbif-tables", package = "gbif.range")If they are not found after a GitHub or local install, reinstall with build_vignettes = TRUE, or install the built source tarball with R CMD INSTALL.
Let's download worldwide the records of Panthera tigris only based on true observations and literature (default):
# Download
obs.pt <- get_gbif(sp_name = "Panthera tigris")
# Plot species records
countries <- rnaturalearth::ne_countries(type = "countries", returnclass = "sv")
terra::plot(countries, col = "#bcbddc")
points(obs.pt[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1.5)Note that the function did not manage to get rid of observations of most likely non-informed captive individuals (e.g., in Europe, U.S. and South Africa); see the CoordinateCleaner R package (CRAN) for improved filtering. We can also retrieve the tiger IUCN red list status, and its scientific names (accepted and synonyms) that were used in the download with the GBIF backbone taxonomy. If all = TRUE, additional children and related doubtful names may also be extracted (not used in get_gbif()):
get_status("Panthera tigris", all = FALSE)Let's now extract the terrestrial ecoregions of the world (Nature Conservancy) and generate the distributional range map of Panthera tigris :
# Download ecoregion and read
eco.terra <- read_ecoreg(ecoreg_name = "eco_terra", save_dir = NULL)
# Range
range.tiger <- get_range(occ_coord = obs.pt,
ecoreg = eco.terra,
ecoreg_name = "ECO_NAME",
degrees_outlier = 5,
clust_pts_outlier = 4)Let's plot the result now:
terra::plot(countries, col = "#bcbddc")
terra::plot(range.tiger$rangeOutput, col = "#238b45", add = TRUE, axes = FALSE, legend = FALSE)Here, default parameters were employed, however, clust_pts_outlier (in degrees, ~440 km here) could have been increased to remove larger isolated clusters of observations, and degrees_outlier (~550 km here) to account for more apart observations in the range process. Here, default parameters still allowed to remove obvious tiger observation anomalies in Europe, U.S. and South Africa.
Any suitable shapefile can be supplied to get_range(), but the package can also download several ecoregion layers directly: eco_terra (for terrestrial species; The Nature Conservancy 2009 adapted from Olson et al. 2001), eco_marine (for marine species, two versions; The Nature Conservancy 2012 adapted from Spalding et al. 2007, 2012), and eco_fresh (for freshwater species; Abell et al. 2008). Each is available at different levels of detail:
- eco_terra has three different levels: 'ECO_NAME', 'WWF_MHTNAM' and 'WWF_REALM2'.
- eco_fresh has only one: 'ECOREGION'.
- eco_marine and eco_hd_marine (the more coastline-precise version) contain three distinct levels: 'ECOREGION', 'PROVINCE' and 'REALM'.
Available ecoregion files that can be downloaded with the package:
# List
ecoreg_listAdditionally, if the in-house ecoregions are too coarse for a given geographic region (e.g., for local studies) or an ecoshapefile of finer environmental details is needed, make_ecoreg() can be used based on spatially-informed data (e.g. climate, biodiversity) of desired resolution and extent defining the study area.
Example of 10 ecoregions in the European Alps based on CHELSA bioclimatic layers at 5 × 5 km resolution (Karger et al. 2017), i.e., mean annual air temperature (bio1) and annual precipitation amount (bio12) 1981–2010:
bio <- terra::rast(paste0(system.file(package = "gbif.range"), "/extdata/rst.tif"))
eco.eg <- make_ecoreg(env = bio, nclass = 10)
terra::plot(eco.eg, col = rainbow(10))
Let's further demonstrate how a custom map of ecoregions can be employed in combination with the package's main functions:
# Let's download the observations of Arctostaphylos alpinus in the European Alps:
shp.lonlat <- terra::vect(paste0(system.file(package = "gbif.range"), "/extdata/shp_lonlat.shp"))
obs.arcto <- get_gbif(sp_name = "Arctostaphylos alpinus",
geo = shp.lonlat,
grain = 1)
# Create an ecoregion layer of 200 classes, based on two environmental spatial layers:
rst <- terra::rast(paste0(system.file(package = "gbif.range"), "/extdata/rst.tif"))
my.eco <- make_ecoreg(env = rst,
nclass = 200)
# Create the range map based on our custom ecoregion
# (always set 'EcoRegion' as a name when using a make_ecoreg() output):
range.arcto <- get_range(occ_coord = obs.arcto,
ecoreg = my.eco,
ecoreg_name = "EcoRegion",
degrees_outlier = 5,
clust_pts_outlier = 4,
res = 0.05)Unlike at larger-scales, we have here decreased the get_gbif() grain parameter from 100km to 1km, as keeping observations with a precision of 100km would have been too coarse to infer the approximate range distribution of the species relative to the study extent. degrees_outlier and clust_pts_outlier were here also kept defaults (~550 and 440 km, respectively), so relative to the study extent, almost no clustered or too distance observations were considered outliers.
It is also important to note that the resolution parameter (res) can be changed to adjust how fine the spatial output should be. This highest possible resolution will only depend on the precision of the ecoreg object (e.g., a range output can reach the same resolution of the rasters used to create a make_ecoreg object).
# Plot
alps.shp <- terra::crop(countries,terra::ext(rst))
r.arcto <- terra::mask(range.arcto$rangeOutput,alps.shp)
terra::plot(alps.shp, col = "#bcbddc")
terra::plot(r.arcto, add = TRUE, col = "darkgreen", axes = FALSE, legend = FALSE)
points(obs.arcto[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1)
Let's reapply the same process as for Panthera tigris, but with the marine species Delphinus delphis (> 100'000 observations).
occ_samp is used. Altough giving less precise observational distribution, occ_samp allows to extract a subsample of n GBIF observations per created tiles over the study area:
# Here the example is a sample of 1000 observations per geographic tile
obs.dd <- get_gbif("Delphinus delphis", occ_samp = 1000)
# Here the list is longer because 'all=TRUE' includes every names (even doubtful)
get_status("Delphinus delphis", all = TRUE)Let's now generate three range maps of Delphinus delphis using the eco.marine as ecoregion shapefile:
# Download ecoregion and read
eco.marine <- read_ecoreg(ecoreg_name = "eco_marine", save_dir = NULL)
# Range from different levels
range.dd1 <- get_range(obs.dd, eco.marine, "ECOREGION")
range.dd2 <- get_range(obs.dd, eco.marine, "PROVINCE")
range.dd3 <- get_range(obs.dd, eco.marine, "REALM")The three results are pretty similar because most of the observations are near the coast. But let's plot the first more fine result:
terra::plot(countries, col = "#bcbddc")
terra::plot(range.dd3$rangeOutput, col = "#238b45", add = TRUE, axes = FALSE, legend = FALSE)
points(obs.dd[, c("decimalLongitude","decimalLatitude")], pch = 20, col = "#99340470", cex = 1)Although our result map follows the sampling pattern found in GBIF, the dolphin range map might have been improved if more GBIF observations would have been extracted. Therefore, occ_samp must be in this case increased or removed.
For very large multi-species GBIF exports already stored on disk, the package also provides a disk-based workflow:
gbif_file <- system.file("extdata", "occ_example_4sps.csv", package = "gbif.range")
split_dir <- file.path(tempdir(), "gbif_split")
range_dir <- file.path(tempdir(), "gbif_ranges")
# Split one downloaded GBIF table into one species file per GBIF key.
split_summary <- split_gbif_by_species(
input_file = gbif_file,
outdir = split_dir,
chunk_size = 100,
sep_in = "\t",
sep_out = "\t",
overwrite = TRUE,
verbose = FALSE
)
# Build one range per species from those on-disk occurrence files.
range_summary <- species_csvs_to_ranges(
species_dir = split_dir,
ecoreg = "eco_terra",
ecoreg_name = "ECO_NAME",
outdir = range_dir,
range_save_as = "rds",
overwrite = TRUE,
verbose = FALSE
)
# Read one saved range back from disk.
rg <- read_range_rds(range_summary$range_file[1])
terra::plot(rg$rangeOutput)This disk-based workflow is described in more detail in the vignette large-downloaded-gbif-tables.
Yohann Chauvier, Oskar Hagen, Stefan Pinkert, Camille Albouy, Fabian Fopp, Philipp Brun, Patrice Descombes, Florian Altermatt, Loic Pellissier, Katalin Csilléry. gbif.range: An R package to generate ecologically-informed species range maps from occurrence data with seamless GBIF integration. Authorea. June 30, 2025. doi: 10.22541/au.175130858.83083354/v1
Chamberlain, S., Oldoni, D., & Waller, J. (2022). rgbif: interface to the global biodiversity information facility API. doi: 10.5281/zenodo.6023735
Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, H.P., , Kessler, M. (2017). Climatologies at high resolution for the earth’s land surface areas. Sci Data 4, 170122. doi: 10.1038/sdata.2017.122
Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter, C., Edler, D., ... & Antonelli, A. (2019). CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution, 10(5), 744-751. doi: 10.1111/2041-210X.13152
Hijmans, Robert J. "terra: Spatial Data Analysis. R Package Version 1.6-7." (2022). Link to package: terra - CRAN
Hagen, O., Vaterlaus, L., Albouy, C., Brown, A., Leugger, F., Onstein, R. E., Novaes de Santana, C., Scotese, C. R., & Pellissier, L. (2019). Mountain building, climate cooling and the richness of cold-adapted plants in the Northern Hemisphere. Journal of Biogeography, 46(8), 1792-1807. doi: 10.1111/jbi.13653
Hagen, O. Species_Range_Mapping. GitHub repository. Available at: https://github.com/ohagen/Species_Range_Mapping
Olson, D. M., Dinerstein, E., Wikramanayake, E. D., Burgess, N. D., Powell, G. V. N., Underwood, E. C., D'Amico, J. A., Itoua, I., Strand, H. E., Morrison, J. C., Loucks, C. J., Allnutt, T. F., Ricketts, T. H., Kura, Y., Lamoreux, J. F., Wettengel, W. W., Hedao, P., Kassem, K. R. 2001. Terrestrial ecoregions of the world: a new map of life on Earth. BioScience 51(11):933-938. doi: 10.1641/0006-3568(2001)051
The Nature Conservancy (2009). Global Ecoregions, Major Habitat Types, Biogeographical Realms and The Nature Conservancy Terrestrial Assessment Units. GIS layers developed by The Nature Conservancy with multiple partners, combined from Olson et al. (2001), Bailey 1995 and Wiken 1986. Cambridge (UK): The Nature Conservancy. Data URL: https://geospatial.tnc.org/datasets/b1636d640ede4d6ca8f5e369f2dc368b/about
Mark D. Spalding, Helen E. Fox, Gerald R. Allen, Nick Davidson, Zach A. Ferdaña, Max Finlayson, Benjamin S. Halpern, Miguel A. Jorge, Al Lombana, Sara A. Lourie, Kirsten D. Martin, Edmund McManus, Jennifer Molnar, Cheri A. Recchia, James Robertson, Marine Ecoregions of the World: A Bioregionalization of Coastal and Shelf Areas, BioScience, Volume 57, Issue 7, July 2007, Pages 573–583. doi: 10.1641/B570707
Spalding, M. D., Agostini, V. N., Rice, J., & Grant, S. M. (2012). Pelagic provinces of the world: a biogeographic classification of the world’s surface pelagic waters. Ocean & Coastal Management, 60, 19-30. doi: 10.1016/j.ocecoaman.2011.12.016
The Nature Conservancy (2012). Marine Ecoregions and Pelagic Provinces of the World. GIS layers developed by The Nature Conservancy with multiple partners, combined from Spalding et al. (2007) and Spalding et al. (2012). Cambridge (UK): The Nature Conservancy. Data URL: http://data.unep-wcmc.org/datasets/38
Robin Abell, Michele L. Thieme, Carmen Revenga, Mark Bryer, Maurice Kottelat, Nina Bogutskaya, Brian Coad, Nick Mandrak, Salvador Contreras Balderas, William Bussing, Melanie L. J. Stiassny, Paul Skelton, Gerald R. Allen, Peter Unmack, Alexander Naseka, Rebecca Ng, Nikolai Sindorf, James Robertson, Eric Armijo, Jonathan V. Higgins, Thomas J. Heibel, Eric Wikramanayake, David Olson, Hugo L. López, Roberto E. Reis, John G. Lundberg, Mark H. Sabaj Pérez, Paulo Petry, Freshwater Ecoregions of the World: A New Map of Biogeographic Units for Freshwater Biodiversity Conservation, BioScience, Volume 58, Issue 5, May 2008, Pages 403–414. doi: 10.1641/B580507



