Skip to content

Support haven_labelled data in rec_with_table() #159

@DougManuel

Description

@DougManuel

Problem

rec_with_table() fails with 'list' object cannot be coerced to type 'double' when input data contains haven_labelled columns from SPSS/Stata imports via the haven package.

This issue emerged when using PUMF data from ODESSI, which generates data using haven. Previous testing used CSV-sourced data which doesn't carry the haven_labelled class.

Reproducible example

library(haven)
library(cchsflow)

# Load SPSS data (creates haven_labelled columns)
pumf <- haven::read_sav("cchs_2015-2016_pumf.sav")

# Check column class
class(pumf$HWTGHTM)
# [1] "haven_labelled" "vctrs_vctr" "double"

# This fails
result <- rec_with_table(pumf, variables = "HWTGHTM", database_name = "cchs2015_2016_p")
# Error in recode_call[x] : 'list' object cannot be coerced to type 'double'

# Compare with package test data (plain numeric) - works fine
data("cchs2001_p", package = "cchsflow")
class(cchs2001_p$HWTGHTM)
# [1] "numeric"

result <- rec_with_table(cchs2001_p, variables = "HWTGHTM", database_name = "cchs2001_p")
# Success

Root cause

haven_labelled columns carry SPSS metadata (value labels, missing codes) as attributes. When rec_with_table() performs numeric operations on these columns, R's type coercion fails.

The underlying values are correct - it's the class wrapper that causes issues:

> as.numeric(pumf$HWTGHTM[1:5])
[1] 1.73 1.65 1.80   NA 1.58  # Values are fine

Current workaround

Convert haven_labelled columns to plain numeric before calling rec_with_table():

# Option 1: Manual conversion
df <- as.data.frame(df)
df[] <- lapply(df, function(x) {
  if (inherits(x, "haven_labelled")) as.numeric(x) else x
})

# Option 2: Use haven::zap_labels()
df <- haven::zap_labels(df)

# Now rec_with_table works
result <- rec_with_table(df, variables = "HWTGHTM", database_name = "cchs2015_2016_p")

Proposed fix

Add automatic detection and conversion of haven_labelled columns in rec_with_table():

# At start of rec_with_table()
if (any(sapply(data, inherits, "haven_labelled"))) {
 message("Converting haven_labelled columns to numeric")
 data <- as.data.frame(data)
 data[] <- lapply(data, function(x) {
   if (inherits(x, "haven_labelled")) as.numeric(x) else x
 })
}

Context

  • Identified during PR Master file physical activity changes #157 validation (physical activity variables)
  • As haven becomes the standard for importing SPSS/Stata data in R, this issue will affect more users
  • ODESSI (Statistics Canada data portal) now provides PUMF data via haven-generated files
  • Detailed analysis in ceps/cep-003-physical-activity/haven_labelled_issue.md

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions