Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ff1e8ce
rename_variable(): fix documentation of return value
florisvdh Feb 18, 2026
d6ab4ad
rename_variable(): test that return value contains no root path
florisvdh Feb 18, 2026
f486662
rename_variable(): normalize root and file paths *
florisvdh Feb 18, 2026
2bbd7d8
🔖 Bump package version
ThierryO Feb 19, 2026
d1daf37
👷 Remove pr_title GHA
ThierryO Feb 19, 2026
94a23c2
Initial plan
Copilot Feb 25, 2026
87acf64
Implement convert feature for write_vc and read_vc
Copilot Feb 25, 2026
4525416
Add convert change detection and additional tests
Copilot Feb 25, 2026
afadd11
Fix line lengths to comply with linter (80 chars max)
Copilot Feb 25, 2026
b7e19cd
Update NEWS.md with convert feature
Copilot Feb 25, 2026
d06a99a
Fixing the return value of rename_variable() on Windows (#83)
ThierryO Feb 25, 2026
4df805e
Merge branch '0.5.2' into copilot/fix-write-vc-optional-argument
ThierryO Feb 25, 2026
99f2488
📝 Add missing documentation
ThierryO Feb 25, 2026
5f20848
Reduce cyclomatic complexity and fix test warning message
Copilot Feb 25, 2026
e6587d3
Fix metadata hash calculation to include convert info
Copilot Feb 25, 2026
d2a7474
Fix convert comparison in metadata update
Copilot Feb 25, 2026
1deb27d
Add vignettes for convert and data_package features
Copilot Feb 25, 2026
94dabdd
✅ Fix failing unit test
ThierryO Mar 2, 2026
7d453d7
🐛 Fix vignette
ThierryO Mar 2, 2026
63d303a
💚 Add words to dictonary
ThierryO Mar 2, 2026
6f263c2
🚨 Fix linters
ThierryO Mar 3, 2026
62930a2
➕ Add support for air in vscode
ThierryO Mar 3, 2026
70411ea
📝 Improve the vignette on convert
ThierryO Mar 3, 2026
302ac7c
Add optional convert argument to write_vc/read_vc for column transfor…
ThierryO Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
^LICENSE.md$
^Meta$
^README\.Rmd$
^[.]?air[.]toml$
^\.Rproj\.user$
^\.github$
^\.httr-oauth$
^\.vscode$
^\.zenodo\.json$
^_pkgdown.yml$
^checklist.yml$
Expand Down
20 changes: 0 additions & 20 deletions .github/workflows/pr_title.yml

This file was deleted.

5 changes: 5 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"recommendations": [
"Posit.air-vscode"
]
}
10 changes: 10 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"[r]": {
"editor.formatOnSave": true,
"editor.defaultFormatter": "Posit.air-vscode"
},
"[quarto]": {
"editor.formatOnSave": true,
"editor.defaultFormatter": "quarto.quarto"
}
}
2 changes: 1 addition & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"title": "git2rdata: Store and Retrieve Data.frames in a Git Repository",
"version": "0.5.1",
"version": "0.5.2",
"license": "GPL-3.0",
"upload_type": "software",
"description": "<p>The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette(“plain_text”, package = “git2rdata”). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette(“version_control”, package = “git2rdata”). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette(“workflow”, package = “git2rdata”) gives a toy example. 4) vignette(“efficiency”, package = “git2rdata”) provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.<\/p>",
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ identifiers:
value: 10.5281/zenodo.1485309
- type: url
value: https://ropensci.github.io/git2rdata/
version: 0.5.1
version: 0.5.2
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: git2rdata
Title: Store and Retrieve Data.frames in a Git Repository
Version: 0.5.1
Version: 0.5.2
Authors@R: c(
person("Thierry", "Onkelinx", , "thierry.onkelinx@inbo.be", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-8804-4216", affiliation = "Research Institute for Nature and Forest (INBO)")),
Expand All @@ -11,7 +11,7 @@ Authors@R: c(
person("Els", "Lommelen", , "els.lommelen@inbo.be", role = "ctb",
comment = c(ORCID = "0000-0002-3481-5684", affiliation = "Research Institute for Nature and Forest (INBO)")),
person("Research Institute for Nature and Forest (INBO)", , , "info@inbo.be", role = c("cph", "fnd"),
comment = c(ROR = "https://ror.org/00j54wy13"))
comment = c(ROR = "00j54wy13"))
)
Description: The git2rdata package is an R package for writing and reading
dataframes as plain text files. A metadata file stores important
Expand Down Expand Up @@ -66,9 +66,9 @@ Collate:
'datahash.R'
'display_metadata.R'
'git2rdata_package.R'
'is_git2rmeta.R'
'write_vc.R'
'is_git2rdata.R'
'is_git2rmeta.R'
'list_data.R'
'meta.R'
'print.R'
Expand Down
11 changes: 11 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# git2rdata 0.5.2

* `write_vc()` gains an optional `convert` argument for specifying column
conversions. Conversions are applied before storing and reversed when
reading data back. The convert information is stored in the metadata
and added to the data frame attributes.
* `read_vc()` now applies conversions specified in the metadata and adds
the convert information to the data frame attributes.
* Bugfix in `rename_variable()` thanks to @florisvdh for finding and fixing the
bug.

# git2rdata 0.5.1

* `write_vc()` stores metadata stored in the data frame.
Expand Down
11 changes: 11 additions & 0 deletions R/read_vc.R
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,12 @@ read_vc.character <- function(file, root = ".") {
optimize = optimize
)

# Apply read conversions if present
if (has_name(meta_data[["..generic"]], "convert")) {
convert <- meta_data[["..generic"]][["convert"]]
raw_data <- apply_convert(raw_data, convert, direction = "read")
}

names(file) <- c(
meta_data[["..generic"]][["data_hash"]],
meta_data[["..generic"]][["hash"]]
Expand Down Expand Up @@ -209,6 +215,11 @@ read_vc.character <- function(file, root = ".") {
attr(raw_data, "optimize") <- meta_data[["..generic"]][["optimize"]]
attr(raw_data, "sorting") <- meta_data[["..generic"]][["sorting"]]

# Add convert to attributes if present
if (has_name(meta_data[["..generic"]], "convert")) {
attr(raw_data, "convert") <- meta_data[["..generic"]][["convert"]]
}

class(raw_data) <- c("git2rdata", class(raw_data))

return(raw_data)
Expand Down
5 changes: 4 additions & 1 deletion R/rename_variable.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
#' @inheritParams write_vc
#' @param change A named vector with the old names as values and the new names
#' as names.
#' @return invisible `NULL`.
#' @return a named vector with the file paths relative to `root`. The names
#' contain the hashes of the files.
#' @export
#' @examples
#'
Expand Down Expand Up @@ -106,6 +107,8 @@ rename_variable.character <- function(file, change, root = ".", ...) {
yaml[["..generic"]][["data_hash"]] <- datahash(file["raw_file"])
write_yaml(yaml, file["meta_file"], fileEncoding = "UTF-8")

root <- normalizePath(root, winslash = "/", mustWork = TRUE)
file <- normalizePath(file, winslash = "/", mustWork = TRUE)
hashes <- remove_root(file = file, root = root)
names(hashes) <-
c(
Expand Down
187 changes: 187 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,190 @@ display <- function(verbose, message, linefeed = TRUE) {
}
return(invisible(NULL))
}

#' Validate the convert argument
#' @noRd
#' @importFrom assertthat assert_that
validate_convert <- function(convert, colnames_x) {
if (is.null(convert) || length(convert) == 0) {
return(list())
}

validate_convert_structure(convert, colnames_x)

for (col_name in names(convert)) {
convert[[col_name]] <- validate_convert_element(
convert[[col_name]],
col_name
)
}

return(convert)
}

#' Validate convert structure
#' @noRd
#' @importFrom assertthat assert_that
validate_convert_structure <- function(convert, colnames_x) {
assert_that(
is.list(convert),
msg = "convert must be a list"
)

assert_that(
!is.null(names(convert)),
msg = "convert must be a named list"
)

assert_that(
all(names(convert) != ""),
msg = "all elements of convert must be named"
)

assert_that(
all(names(convert) %in% colnames_x),
msg = paste(
"all names in convert must be present in colnames of x.",
"Missing:",
paste(names(convert)[!names(convert) %in% colnames_x], collapse = ", ")
)
)
}

#' Validate a single convert element
#' @noRd
#' @importFrom assertthat assert_that
validate_convert_element <- function(conv, col_name) {
assert_that(
is.character(conv),
msg = sprintf(
"convert[['%s']] must be a character vector",
col_name
)
)
assert_that(
length(conv) == 2,
msg = sprintf(
"convert[['%s']] must have length 2",
col_name
)
)
assert_that(
!is.null(names(conv)),
msg = sprintf(
"convert[['%s']] must be a named vector",
col_name
)
)
assert_that(
all(names(conv) %in% c("write", "read")),
msg = sprintf(
"convert[['%s']] must have names 'write' and 'read'",
col_name
)
)
assert_that(
"write" %in% names(conv) && "read" %in% names(conv),
msg = sprintf(
"convert[['%s']] must have both 'write' and 'read' elements",
col_name
)
)

validate_convert_function(conv[["write"]], col_name, "write")
validate_convert_function(conv[["read"]], col_name, "read")
conv[c("write", "read")]
}

#' Validate a convert function specification
#' @noRd
#' @importFrom assertthat assert_that
validate_convert_function <- function(func_spec, col_name, direction) {
assert_that(
grepl("::", func_spec, fixed = TRUE),
msg = sprintf(
"convert[['%s']][['%s']] must be in 'package::function' format",
col_name,
direction
)
)

parts <- strsplit(func_spec, "::", fixed = TRUE)[[1]]
assert_that(
length(parts) == 2,
msg = sprintf(
"convert[['%s']][['%s']] must have exactly one '::'",
col_name,
direction
)
)

pkg_name <- parts[1]
func_name <- parts[2]

assert_that(
nzchar(pkg_name) && nzchar(func_name),
msg = sprintf(
"convert[['%s']][['%s']] has empty package or function name",
col_name,
direction
)
)

if (!requireNamespace(pkg_name, quietly = TRUE)) {
stop(
sprintf(
paste(
"Package '%s' required for convert[['%s']][['%s']]",
"is not available"
),
pkg_name,
col_name,
direction
),
call. = FALSE
)
}

if (
!exists(
func_name,
where = asNamespace(pkg_name),
mode = "function"
)
) {
stop(
sprintf(
paste(
"Function '%s' not found in package '%s'",
"for convert[['%s']][['%s']]"
),
func_name,
pkg_name,
col_name,
direction
),
call. = FALSE
)
}
}

#' Apply conversion functions to columns
#' @noRd
apply_convert <- function(x, convert, direction = "write") {
if (is.null(convert) || length(convert) == 0) {
return(x)
}

for (col_name in names(convert)) {
func_spec <- convert[[col_name]][[c(write = 1, read = 2)[direction]]]
parts <- strsplit(func_spec, "::", fixed = TRUE)[[1]]
pkg_name <- parts[1]
func_name <- parts[2]

func <- get(func_name, envir = asNamespace(pkg_name), mode = "function")
x[[col_name]] <- func(x[[col_name]])
}

return(x)
}
Loading
Loading