humanpred · billdenney · May 21, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -11,6 +11,33 @@ fresh worktree. Update it when you discover a convention worth recording.
 - **License:** MIT (package) + BSD-3-Clause (bundled PDFium binary).
 - **CRAN target:** v0.1.0 ships to CRAN. Every change preserves CRAN-cleanliness.
 
+## Scope — wrap PDFium, don't invent helpers
+
+The package's job is to expose Google's PDFium C API to R idiomatically.
+Every public function should ultimately call into PDFium (perhaps via a
+chain of internal helpers) or be unambiguously tied to PDF-format
+concepts (`pdf_parse_date()` parses the PDF date-string format).
+
+What does **not** belong:
+
+- Filesystem walking (`list.files()` loops over `pdf_doc_*`).
+- Network plumbing beyond what PDFium itself does. `pdf_doc_open(path)`
+  accepting a URL is fine — the URL becomes raw bytes which go straight
+  into PDFium's `FPDF_LoadMemDocument64`. A function whose body is
+  mostly `httr2::request(...)` is not.
+- Bulk / batch wrappers ("apply this PDFium function to every file in
+  a folder"). Users have `lapply` and `purrr` for that.
+- Cross-PDF analysis ("compare these two PDFs"). Out of scope.
+
+When in doubt, ask: *what PDFium symbol does this wrap?* If the answer
+is "none — it's a convenience over base R", the function belongs in
+user code or a separate utility package, not here.
+
+This is recorded as a deletion-justification in `NEWS.md` for the
+`pdf_dir_summary` / `pdf_doc_open_url` retraction. Future contributors
+shouldn't re-add functions whose job is to glue base R primitives
+together around pdfium calls.
+
 ## Layering — never bypass
 
 ```

diff --git a/NAMESPACE b/NAMESPACE
@@ -42,8 +42,14 @@ S3method(print,pdfium_obj_list)
 S3method(print,pdfium_page)
 S3method(print,pdfium_signature)
 S3method(print,pdfium_signature_list)
+S3method(summary,pdfium_annot_list)
+S3method(summary,pdfium_attachment_list)
+S3method(summary,pdfium_bookmark_list)
 S3method(summary,pdfium_doc)
+S3method(summary,pdfium_form_field_list)
+S3method(summary,pdfium_obj_list)
 S3method(summary,pdfium_page)
+S3method(summary,pdfium_signature_list)
 export(as_pdfium_annot_list)
 export(as_pdfium_attachment_list)
 export(as_pdfium_bookmark_list)
@@ -105,7 +111,6 @@ export(pdf_bookmark_title)
 export(pdf_bookmark_uri)
 export(pdf_clip_path_count)
 export(pdf_clip_path_segments)
-export(pdf_dir_summary)
 export(pdf_doc_bookmark_find)
 export(pdf_doc_bookmarks)
 export(pdf_doc_close)
@@ -120,7 +125,6 @@ export(pdf_doc_named_dest_by_name)
 export(pdf_doc_named_dests)
 export(pdf_doc_new)
 export(pdf_doc_open)
-export(pdf_doc_open_url)
 export(pdf_doc_page_mode)
 export(pdf_doc_permissions)
 export(pdf_doc_security)

diff --git a/NEWS.md b/NEWS.md
@@ -11,10 +11,12 @@ PDFs created with `pdf_doc_new()` are also writable).
 * `pdf_doc_open()` / `pdf_doc_close()`, `pdf_doc_new()`,
   `pdf_save()` / `pdf_save_to_raw()` — open existing PDFs (optionally
   with `readwrite = TRUE`), build new ones in memory, and persist
-  the result. `pdf_doc_open_url(url)` is a convenience wrapper that
-  fetches a `http://` / `https://` / `ftp://` / `file://` URL via
-  `url()` + `readBin()` and loads the bytes through PDFium's
-  in-memory path — no temporary file on disk.
+  the result. The `path =` argument of `pdf_doc_open()` accepts
+  either a local filesystem path or a URL (any scheme `base::url()`
+  recognises — typically `http://` / `https://` / `ftp://` /
+  `file://`); URL input is fetched into raw bytes via `url()` +
+  `readBin()` and loaded through PDFium's `FPDF_LoadMemDocument64`,
+  with no temporary file on disk.
 * `pdf_doc_info()`, `pdf_doc_meta()`, `pdf_doc_text()`,
   `pdf_doc_fonts()`, `pdf_doc_file_id()`, `pdf_doc_page_mode()`,
   `pdf_doc_viewer_preferences()`, `pdf_doc_viewer_preference_by_name()`,
@@ -40,13 +42,29 @@ PDFs created with `pdf_doc_new()` are also writable).
   dispatch to the matching tibble — `summary(page)` adds the
   page-loaded counts (annotation count, page-object count,
   text-run count, link count) since the page is already loaded.
-* `pdf_dir_summary(dir)` — scans a directory for PDF files and
-  returns one row per file in the `pdf_doc_summary()` shape.
-  Recursive scan via `recursive = TRUE`; pattern-matches `.pdf`
-  case-insensitively by default. The `errors` argument selects
-  one of `"warn"` (default — surface broken files but don't
-  abort), `"skip"` (silently drop), or `"stop"` (abort on the
-  first failure).
+* `summary()` S3 methods for every `pdfium_*_list` class:
+  `pdfium_obj_list`, `pdfium_annot_list`, `pdfium_attachment_list`,
+  `pdfium_signature_list`, `pdfium_bookmark_list`, and
+  `pdfium_form_field_list`. Each dispatches to the matching
+  `as_tibble.*` method so `summary(x)` returns the same tibble
+  view `tibble::as_tibble(x)` would — matching the R idiom of
+  `print()` for the one-line summary and `summary()` for the deep
+  dive.
+
+## Scope retraction
+
+Two functions added during 0.1.0 development were retracted before
+release on scope grounds (see `CLAUDE.md` §"Scope"):
+
+* **`pdf_doc_open_url()`** — folded into `pdf_doc_open(path = ...)`.
+  The URL-fetching layer is just `base::url()` + `readBin()` ahead
+  of PDFium's existing in-memory path, so a separate exported
+  symbol added surface for no PDFium-specific behaviour.
+* **`pdf_dir_summary()`** — removed. Its body was `list.files()`
+  + `lapply(pdf_doc_summary)`; users with bulk-triage needs can
+  write the loop themselves in three lines. Keeping it set a
+  precedent for "convenience over a base R loop" creep that the
+  package's PDFium-wrapper mandate doesn't want.
 
 ## Page objects, paths, and text
 

diff --git a/R/annotations.R b/R/annotations.R
@@ -246,6 +246,20 @@ as_tibble.pdfium_annot_list <- function(x, ...) {
   )
 }
 
+#' Tibble-shaped summary of an annotation list
+#'
+#' `summary()` method for `pdfium_annot_list`. Defers to
+#' [as_tibble.pdfium_annot_list()] for the standard tibble view.
+#'
+#' @param object A `pdfium_annot_list` from [pdf_annotations()].
+#' @param ... Forwarded to [as_tibble.pdfium_annot_list()].
+#' @return The tibble returned by [as_tibble.pdfium_annot_list()].
+#' @method summary pdfium_annot_list
+#' @export
+summary.pdfium_annot_list <- function(object, ...) {
+  tibble::as_tibble(object, ...)
+}
+
 # Internal: zero-row tibble matching as_tibble.pdfium_annot_list's
 # schema. Used when the page has no annotations.
 empty_annot_tibble <- function(src_page) {

diff --git a/R/attachments.R b/R/attachments.R
@@ -75,6 +75,22 @@ as_tibble.pdfium_attachment_list <- function(x, ...) {
   )
 }
 
+#' Tibble-shaped summary of an attachment list
+#'
+#' `summary()` method for `pdfium_attachment_list`. Defers to
+#' [as_tibble.pdfium_attachment_list()] for the standard tibble
+#' view — matches the R idiom of `print()` for the one-line summary
+#' and `summary()` for the deep dive.
+#'
+#' @param object A `pdfium_attachment_list` from [pdf_attachments()].
+#' @param ... Forwarded to [as_tibble.pdfium_attachment_list()].
+#' @return The tibble returned by [as_tibble.pdfium_attachment_list()].
+#' @method summary pdfium_attachment_list
+#' @export
+summary.pdfium_attachment_list <- function(object, ...) {
+  tibble::as_tibble(object, ...)
+}
+
 # Internal: zero-row tibble matching as_tibble.pdfium_attachment_list.
 empty_attachment_tibble <- function() {
   tibble::tibble(

diff --git a/R/doc.R b/R/doc.R
@@ -110,6 +110,20 @@ as_tibble.pdfium_bookmark_list <- function(x, ...) {
   )
 }
 
+#' Tibble-shaped summary of a bookmark list
+#'
+#' `summary()` method for `pdfium_bookmark_list`. Defers to
+#' [as_tibble.pdfium_bookmark_list()] for the standard tibble view.
+#'
+#' @param object A `pdfium_bookmark_list` from [pdf_doc_bookmarks()].
+#' @param ... Forwarded to [as_tibble.pdfium_bookmark_list()].
+#' @return The tibble returned by [as_tibble.pdfium_bookmark_list()].
+#' @method summary pdfium_bookmark_list
+#' @export
+summary.pdfium_bookmark_list <- function(object, ...) {
+  tibble::as_tibble(object, ...)
+}
+
 empty_bookmark_tibble <- function() {
   tibble::tibble(
     bookmark_index = integer(),
@@ -642,116 +656,6 @@ summary.pdfium_doc <- function(object, ...) {
   pdf_doc_summary(object)
 }
 
-#' Summarise every PDF in a directory in one call
-#'
-#' Scans a directory for PDF files and returns a tibble whose rows
-#' are the [pdf_doc_summary()] output for each file. The natural
-#' replacement for the standard "loop over a folder of PDFs and
-#' triage" workflow — encrypted-which / has-forms-which /
-#' has-attachments-which.
-#'
-#' Files that fail to open (corrupt, wrong format, password
-#' protected) are handled per the `errors` argument:
-#'
-#' * `"warn"` (default) — a `warning()` per failed file; the file
-#'   is dropped from the result tibble.
-#' * `"skip"` — silently dropped.
-#' * `"stop"` — the first failed file raises an error and the
-#'   function aborts.
-#'
-#' @param dir Character scalar. Path to the directory to scan.
-#' @param pattern Regular expression filtering filenames. Defaults
-#'   to `"\\.pdf$"` (case-insensitive).
-#' @param recursive Logical. When `TRUE`, descend into
-#'   subdirectories. Defaults `FALSE`.
-#' @param password Optional password applied to every file. `NULL`
-#'   (default) tries each file without a password. Useful when all
-#'   files share the same password.
-#' @param errors One of `"warn"`, `"skip"`, `"stop"` — see Details.
-#' @return A tibble with the same columns as [pdf_doc_summary()].
-#'   Zero rows when the directory has no PDFs (or every PDF failed
-#'   to open under `errors = "skip"` / `"warn"`).
-#' @seealso [pdf_doc_summary()] for the single-file companion.
-#' @examples
-#' fixture_dir <- system.file("extdata", "fixtures",
-#'                            package = "pdfium")
-#' if (nzchar(fixture_dir)) {
-#'   pdf_dir_summary(fixture_dir)
-#' }
-#' @export
-pdf_dir_summary <- function(dir = ".", pattern = "\\.pdf$",
-                             recursive = FALSE, password = NULL,
-                             errors = c("warn", "skip", "stop")) {
-  checkmate::assert_directory_exists(dir)
-  checkmate::assert_string(pattern)
-  checkmate::assert_flag(recursive)
-  errors <- match.arg(errors)
-
-  files <- list.files(dir, pattern = pattern, recursive = recursive,
-                       full.names = TRUE, ignore.case = TRUE)
-  if (length(files) == 0L) {
-    return(pdf_doc_summary_empty())
-  }
-
-  rows <- lapply(files, function(f) {
-    tryCatch(
-      pdf_doc_summary(f, password = password),
-      error = function(e) {
-        if (errors == "stop") {
-          stop(sprintf("pdf_dir_summary: failed to read '%s': %s",
-                       f, conditionMessage(e)), call. = FALSE)
-        }
-        if (errors == "warn") {
-          warning(sprintf("pdf_dir_summary: failed to read '%s': %s",
-                          f, conditionMessage(e)), call. = FALSE)
-        }
-        NULL
-      }
-    )
-  })
-  ok <- !vapply(rows, is.null, logical(1L))
-  if (!any(ok)) {
-    return(pdf_doc_summary_empty())
-  }
-  out <- do.call(rbind, rows[ok])
-  tibble::as_tibble(out)
-}
-
-# Internal: zero-row tibble matching pdf_doc_summary's column shape.
-# Used by pdf_dir_summary() when the directory is empty (or every
-# file failed under `errors = "skip"` / `"warn"`).
-pdf_doc_summary_empty <- function() {
-  tibble::tibble(
-    path                 = character(),
-    page_count           = integer(),
-    file_version         = integer(),
-    title                = character(),
-    author               = character(),
-    subject              = character(),
-    keywords             = character(),
-    creator              = character(),
-    producer             = character(),
-    creation_date        = character(),
-    mod_date             = character(),
-    trapped              = character(),
-    creation_date_parsed = as.POSIXct(character(), tz = "UTC"),
-    mod_date_parsed      = as.POSIXct(character(), tz = "UTC"),
-    is_tagged            = logical(),
-    is_encrypted         = logical(),
-    security_revision    = integer(),
-    xref_valid           = logical(),
-    bookmark_count       = integer(),
-    attachment_count     = integer(),
-    signature_count      = integer(),
-    form_field_count     = integer(),
-    javascript_count     = integer(),
-    named_dest_count     = integer(),
-    has_page_labels      = logical(),
-    file_id_permanent    = character(),
-    file_id_changing     = character()
-  )
-}
-
 # Internal: convert pdf_doc_file_id()'s raw return to a hex string,
 # or NA_character_ when empty. Hoisted from pdf_doc_summary so its
 # two branches can be unit-tested without a fixture that carries an