Conversation
…more user friendly
There was a problem hiding this comment.
Pull request overview
This pull request introduces S4 class-based architecture to the GalaxyR package to enable pipe-friendly workflows. The main changes include:
Changes:
- Introduces a new
GalaxyS4 class with slots for tracking workflow state (history_id, input_dataset_id, invocation_id, output_dataset_ids, state, etc.) - Converts existing functions (galaxy_initialize, galaxy_upload_ftp, galaxy_upload_https, galaxy_start_workflow, galaxy_poll_workflow, galaxy_run_tool, galaxy_download_result) to S4 generics with methods for both character (traditional API) and Galaxy object (pipe-friendly API)
- Adds galaxy_poll_tool as a replacement for galaxy_wait_for_job
- Adds galaxy_get_workflow function to retrieve workflow metadata
- Updates documentation and NAMESPACE accordingly
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| R/2026-01-21_s4_class_methods.R | New file containing S4 class definition, validation, constructor, and all S4 method implementations for pipe-friendly workflow execution |
| R/2025-10-27_JF_R_Galaxy_functions.R | Removes old function implementations that have been converted to S4 methods; adds galaxy_get_workflow function |
| NAMESPACE | Updates exports to use exportMethods for S4 generics; adds galaxy() and galaxy_get_workflow exports |
| man/*.Rd | Updates and creates documentation files for new S4 methods and classes |
| GalaxyR.Rproj | Adds --no-examples flag to package check arguments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| if (object@state != "new" && !nzchar(object@galaxy_url)) { | ||
| return("galaxy_url must be set once the Galaxy object is initialised.") |
There was a problem hiding this comment.
The error message uses British spelling "initialised" but the function is named "galaxy_initialize" with American spelling. For consistency, change "initialised" to "initialized" in the error message.
| return("galaxy_url must be set once the Galaxy object is initialised.") | |
| return("galaxy_url must be set once the Galaxy object is initialized.") |
| setMethod("galaxy_upload_ftp", "missing", | ||
| function(x, | ||
| input_file, | ||
| galaxy_ftp = "ftp.usegalaxy.eu", | ||
| galaxy_url = "https://usegalaxy.eu", | ||
| ...) | ||
| .galaxy_upload_ftp(input_file = input_file, history_id = x, galaxy_ftp, galaxy_url)) |
There was a problem hiding this comment.
The method signature declares 'missing' but attempts to use 'x' as 'history_id'. When 'x' is missing, this will cause an error. The method should accept a character parameter instead. Consider changing the signature to use 'character' for the first dispatch method or restructure the method to properly handle the missing case.
| dbkey = dbkey) | ||
| }) | ||
|
|
||
| #' S4 Method for galaxy ftp upload |
There was a problem hiding this comment.
The title says "S4 Method for galaxy ftp upload" but this is actually the method for galaxy_upload_https with a Galaxy object, not FTP upload. The title should be corrected to reference HTTPS upload instead of FTP upload.
| #' S4 Method for galaxy ftp upload | |
| #' S4 Method for galaxy https upload with Galaxy object |
| \title{S4 Method for galaxy ftp upload} | ||
| \usage{ | ||
| galaxy_upload_https( | ||
| \S4method{galaxy_upload_https}{Galaxy}( | ||
| x, | ||
| input_file, | ||
| history_id, | ||
| wait = FALSE, | ||
| wait_timeout = 600, | ||
| galaxy_url = "https://usegalaxy.eu", | ||
| file_type = "auto", | ||
| dbkey = "?" | ||
| dbkey = "?", | ||
| ... | ||
| ) | ||
| } | ||
| \arguments{ | ||
| \item{input_file}{Character. Path to the local file to upload.} | ||
|
|
||
| \item{history_id}{Character. ID of the Galaxy history to receive the data set.} | ||
|
|
||
| \item{wait}{Logical. Whether to wait for Galaxy to finish processing} | ||
|
|
||
| \item{wait_timeout}{Integer. Time in seconds until wait times out with an error.} | ||
|
|
||
| \item{galaxy_url}{Character. Base URL of the Galaxy instance | ||
| (for example \code{"https://usegalaxy.eu"}). | ||
| If the environment variable \code{GALAXY_URL} is set, it takes precedence.} | ||
|
|
||
| \item{file_type}{Character. Galaxy datatype identifier | ||
| (for example \code{"auto"}, \code{"fastq"}, \code{"bam"}). Default: \code{"auto"}.} | ||
|
|
||
| \item{dbkey}{Character. Reference genome identifier (for example \code{"?"} or \code{"hg38"}). Default: \code{"?"}.} | ||
| } | ||
| \value{ | ||
| A list describing the newly created dataset(s) as returned by Galaxy. | ||
| } | ||
| \description{ | ||
| Upload a dataset via HTTPS (direct POST) into Galaxy | ||
| } | ||
| \details{ | ||
| This function uses the built-in \code{upload1} tool and performs a | ||
| multipart form POST. Large files may still require FTP depending on | ||
| the Galaxy server's configuration limits. | ||
| } | ||
| \examples{ | ||
| \dontshow{if (galaxy_has_key()) withAutoprint(\{ # examplesIf} | ||
| history_id <- galaxy_initialize("test upload") | ||
| test_file <- tempfile(fileext = ".txt") | ||
| test_text <- "This is an example \ntest file." | ||
| writeLines(test_text,test_file) | ||
|
|
||
| file_id <- galaxy_upload_https(test_file, history_id) | ||
| \dontshow{\}) # examplesIf} | ||
| S4 Method for galaxy ftp upload |
There was a problem hiding this comment.
The title on lines 5 and 19 says "S4 Method for galaxy ftp upload" but this documentation is for the galaxy_upload_https Galaxy method, not FTP. The title should reference HTTPS upload instead.
| job_ids <- job_ids[!sapply(job_ids, is.null)] | ||
| job_ids <- job_ids[nzchar(job_ids)] | ||
| if (!length(job_ids)) { | ||
| message(Sys.time(), " ,No jobs yet, waiting...") |
There was a problem hiding this comment.
When no jobs are found (line 590), the code issues a message and uses next to continue the loop without sleeping. This will cause the function to rapidly poll the API in a tight loop until jobs appear, potentially causing unnecessary load on the server. Consider adding Sys.sleep(poll_interval) before the next statement to maintain the polling interval even when no jobs are present.
| message(Sys.time(), " ,No jobs yet, waiting...") | |
| message(Sys.time(), " ,No jobs yet, waiting...") | |
| Sys.sleep(poll_interval) |
| BuildType: Package | ||
| PackageUseDevtools: Yes | ||
| PackageInstallArgs: --no-multiarch --with-keep.source | ||
| PackageCheckArgs: --no-examples |
There was a problem hiding this comment.
Adding --no-examples to PackageCheckArgs disables running examples during R CMD check. This can hide broken examples and reduce the quality assurance provided by the check process. Unless there's a specific reason to skip example checks (such as requiring unavailable external resources), consider removing this flag or ensuring all examples are properly wrapped in \dontrun{} or \donttest{} where appropriate.
| PackageCheckArgs: --no-examples | |
| PackageCheckArgs: |
| input_file <- tempfile(fileext = ".txt") | ||
| writeLines("Example", input_file) | ||
| hid <- galaxy_initialize("test upload") | ||
| did <- galaxy_upload_ftp(input_file, hid, galaxy_ftp) |
There was a problem hiding this comment.
The example on line 46 shows the call signature as galaxy_upload_ftp(input_file, hid, galaxy_ftp), but the S4 method is defined with signature "missing" for x and attempts to use x as history_id on line 235. This is inconsistent. The method should either be changed to dispatch on "character" (like galaxy_upload_https does), or the example and documentation should be corrected to show x as the first parameter representing the history_id.
| @@ -0,0 +1,1043 @@ | |||
| ########################## | |||
There was a problem hiding this comment.
The filename contains the date '2026-01-21' but the current date is January 23, 2026. While this is not a critical issue, it's unusual to have source file names with dates in the future relative to the current date. Consider using a more conventional naming scheme that doesn't include dates, or ensure dates reflect actual creation or modification times.
| inputs = if (is.null(inputs)) x@inputs else inputs, | ||
| dataset_id = if (is.null(dataset_id)) x@input_dataset_id else dataset_id, | ||
| galaxy_url = x@galaxy_url) | ||
| x@invocation_id <- job_id |
There was a problem hiding this comment.
The invocation_id slot is used to store both workflow invocation IDs (line 560) and tool job IDs (line 937). This overloading means a single Galaxy object cannot track both a workflow invocation and a tool job simultaneously. If a user runs a workflow, then runs a tool, the workflow invocation ID would be overwritten. Consider either documenting this limitation clearly, renaming the slot to something more generic like execution_id, or providing separate slots for workflow_invocation_id and tool_job_id.
| .make_unique_names <- function(names, out_dir, overwrite = FALSE) { | ||
| out <- character(length(names)) | ||
| for (i in seq_along(names)) { | ||
| nm <- names[i] | ||
| if (!nzchar(nm)) nm <- sprintf("dataset_%02d", i) | ||
| ext <- tools::file_ext(nm) | ||
| base <- if (nzchar(ext)) tools::file_path_sans_ext(nm) else nm | ||
| cand <- nm | ||
| idx <- 1L | ||
| while (cand %in% out || | ||
| (!overwrite && file.exists(file.path(out_dir, cand)))) { | ||
| cand <- if (nzchar(ext)) sprintf("%s_%d.%s", base, idx, ext) | ||
| else sprintf("%s_%d", base, idx) | ||
| idx <- idx + 1L | ||
| } | ||
| if (cand != nm) { | ||
| warning("File '", nm, "' exists; using '", cand, "' instead.") | ||
| } | ||
| out[i] <- cand | ||
| } | ||
| out | ||
| } | ||
|
|
||
| #' Helper function for downloading the results of a history | ||
| #' @keywords internal | ||
| #' @noRd | ||
| .galaxy_download_result <- function(output_ids, | ||
| out_dir = ".", | ||
| galaxy_url = "https://usegalaxy.eu", | ||
| overwrite = FALSE) { | ||
| if (is.list(output_ids) && "output_ids" %in% names(output_ids)) { | ||
| output_ids <- output_ids$output_ids | ||
| } | ||
| galaxy_url <- .resolve_galaxy_url(galaxy_url) | ||
| api_key <- Sys.getenv("GALAXY_API_KEY") | ||
| if (!dir.exists(out_dir)) dir.create(out_dir, recursive = TRUE) | ||
|
|
||
| info <- galaxy_get_file_info(output_ids, galaxy_url = galaxy_url) | ||
| targets <- .make_unique_names(info$name, out_dir, overwrite = overwrite) | ||
|
|
||
| mapply(function(fid, fname) { | ||
| dest <- file.path(out_dir, fname) | ||
| httr::GET( | ||
| paste0(galaxy_url, "/api/datasets/", fid, "/display"), | ||
| httr::add_headers(`x-api-key` = api_key), | ||
| httr::write_disk(dest, overwrite = TRUE) # we've ensured uniqueness | ||
| ) |
There was a problem hiding this comment.
The use of dataset name values from Galaxy in local filenames allows a malicious or compromised Galaxy instance to perform path traversal when downloading results. .make_unique_names() passes untrusted info$name values through unchanged, and .galaxy_download_result() then uses file.path(out_dir, fname) with httr::write_disk(dest, overwrite = TRUE), so a dataset name like "../../.ssh/authorized_keys" would cause data to be written outside out_dir and potentially overwrite sensitive files. To prevent this, normalize and strictly sanitize fname (e.g. strip path separators, .., absolute-path patterns) before constructing dest, or reject unsafe names and map them to safe, generated filenames.
|
small bugs need to be fixed especially with documentation |
Integrate S4 class for pipefriendly usage and many more features associated.