wcvpmatch

wcvpmatch standardizes scientific plant names and reconciles them against a World Checklist of Vascular Plants (WCVP)-style backbone. It combines parsing, exact and fuzzy matching, accepted-name resolution, and optional distribution retrieval in a single workflow.

The package is built around three main pieces:

classify_spnames() parses and normalizes submitted names.
wcvp_matching() resolves names against a WCVP-like backbone.
wcvp_distribution() retrieves species, genus, or family distribution from WCVP name and distribution tables.

Installation

Install the development version from GitHub:

# install.packages("pak")
pak::pak("PaulESantos/wcvpmatch")

Install the CRAN release:

# install.packages("pak")
pak::pak("wcvpmatch")

wcvpmatch uses fozziejoin for fuzzy matching. If fozziejoin is installed from source, a working Rust toolchain is needed.

Install Rust from https://rust-lang.org/tools/install/. On Windows, the most practical setup for R + Rtools is:

rustup override set stable-x86_64-pc-windows-gnu

Install fozziejoin:

pak::pak("fozziejoin")

To use the default WCVP backbone automatically, install wcvpdata from r-universe:

install.packages(
  "wcvpdata",
  repos = c("https://paulesantos.r-universe.dev", "https://cloud.r-project.org")
)

Quick example: `wcvp_matching()`

matching_backbone <- tibble(
  genus = c("Aniba", "Jaltomata", "Veronica", "Veronica"),
  species = c("heterotepala", "sagastegui", "vulcanica", "spathulata"),
  infraspecific_rank = NA_character_,
  infraspecies = NA_character_,
  plant_name_id = c(1, 2, 10, 200),
  taxon_name = c(
    "Aniba heterotepala",
    "Jaltomata sagastegui",
    "Veronica vulcanica",
    "Veronica spathulata"
  ),
  taxon_authors = c("A.Author", "B.Author", "C.Author", "D.Author"),
  taxon_status = c("Accepted", "Accepted", "Synonym", "Accepted"),
  accepted_plant_name_id = c(1, 2, 200, 200)
)

matching_result <- classify_spnames(
  c("Aniba heterotepala", "Jaltometa sagasteguii", "Veronica vulcanica")
) |>
  wcvp_matching(
    target_df = matching_backbone,
    allow_duplicates = TRUE,
    max_dist = 2,
    method = "osa",
    add_name_distance = TRUE,
    output_name_style = "snake_case"
  ) |>
  select(
    input_name,
    matched_taxon_name,
    accepted_taxon_name,
    taxon_status,
    matched_dist
  )

matching_result
#> # A tibble: 3 × 5
#>   input_name    matched_taxon_name accepted_taxon_name taxon_status matched_dist
#>   <chr>         <chr>              <chr>               <chr>               <dbl>
#> 1 Aniba hetero… Aniba heterotepala Aniba heterotepala  accepted                0
#> 2 Jaltometa sa… Jaltomata sagaste… Jaltomata sagasteg… accepted                2
#> 3 Veronica vul… Veronica vulcanica Veronica spathulata synonym                 0

Quick example: `wcvp_distribution()`

distribution_names <- tibble(
  plant_name_id = c(1, 2, 3, 4, 5, 6),
  accepted_plant_name_id = c(NA, 3, NA, NA, 1, NA),
  taxon_rank = c("Species", "Species", "Species", "Species", "Species", "Species"),
  taxon_status = c("Accepted", "Synonym", "Accepted", "Accepted", "Synonym", "Accepted"),
  family = c("Cactaceae", "Cactaceae", "Cactaceae", "Fagaceae", "Cactaceae", "Cactaceae"),
  genus = c("Opuntia", "Nopalea", "Opuntia", "Quercus", "Opuntia", "Mammillaria"),
  species = c("ficus-indica", "cochenillifera", "cochenillifera", "robur", "tuna", "elongata"),
  taxon_name = c(
    "Opuntia ficus-indica",
    "Nopalea cochenillifera",
    "Opuntia cochenillifera",
    "Quercus robur",
    "Opuntia tuna",
    "Mammillaria elongata"
  )
)

distribution_records <- tibble(
  plant_locality_id = 1:7,
  plant_name_id = c(1, 2, 3, 3, 4, 5, 6),
  continent_code_l1 = c("8", "8", "8", "4", "1", "8", "8"),
  continent = c(
    "SOUTHERN AMERICA", "SOUTHERN AMERICA", "SOUTHERN AMERICA",
    "NORTHERN AMERICA", "EUROPE", "SOUTHERN AMERICA", "SOUTHERN AMERICA"
  ),
  region_code_l2 = c("83", "83", "83", "41", "10", "85", "83"),
  region = c(
    "Western South America", "Western South America", "Western South America",
    "Mexico", "Europe", "Southern South America", "Western South America"
  ),
  area_code_l3 = c("MEX", "PER", "COL", "MEX", "ESP", "GAL", "MEX"),
  area = c("Mexico", "Peru", "Colombia", "Mexico", "Spain", "Galapagos", "Mexico"),
  introduced = c(0, 0, 0, 1, 0, 0, 0),
  extinct = c(0, 0, 0, 0, 0, 0, 0),
  location_doubtful = c(0, 0, 0, 0, 0, 0, 0)
)

distribution_result <- wcvp_distribution(
  c("Nopalea cochenilliferaa", "Taxon inexistente"),
  taxon_rank = "species",
  summarise_by_input = TRUE,
  wcvp_names = distribution_names,
  wcvp_distributions = distribution_records
) |>
  select(
    submited_name,
    accepted_taxon_name,
    distribution_status,
    distribution,
    n_areas
  )

distribution_result
#> # A tibble: 2 × 5
#>   submited_name     accepted_taxon_name distribution_status distribution n_areas
#>   <chr>             <chr>               <chr>               <chr>          <int>
#> 1 Nopalea cochenil… Opuntia cochenilli… distribution_found  Colombia - …       2
#> 2 Taxon inexistente <NA>                no_match            <NA>               0

Learn more

The README keeps the examples short. For full guides, see the package vignettes:

vignette("wcvp-matching", package = "wcvpmatch")
vignette("wcvp-distribution", package = "wcvpmatch")

Those articles describe:

accepted-name resolution and status handling
staged fuzzy matching and diagnostics
duplicate handling and profiling
species, genus, and family distribution retrieval
occurrence filters and summarised output

Acknowledgement

wcvpmatch builds on ideas used in the treemendous matching workflow and extends them for WCVP-focused reconciliation and reproducible row-level traceability.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
R		R
data		data
docs		docs
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
wcvpmatch.Rproj		wcvpmatch.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wcvpmatch

Installation

Quick example: `wcvp_matching()`

Quick example: `wcvp_distribution()`

Learn more

Acknowledgement

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wcvpmatch

Installation

Quick example: wcvp_matching()

Quick example: wcvp_distribution()

Learn more

Acknowledgement

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick example: `wcvp_matching()`

Quick example: `wcvp_distribution()`

Packages