Skip to content

Update R wrapper for the VM library#490

Merged
cachafla merged 15 commits intomainfrom
cachafla/sc-15256/update-r-wrapper-for-the-vm-library
Mar 25, 2026
Merged

Update R wrapper for the VM library#490
cachafla merged 15 commits intomainfrom
cachafla/sc-15256/update-r-wrapper-for-the-vm-library

Conversation

@cachafla
Copy link
Copy Markdown
Contributor

Pull Request Description

What and why?

The ValidMind R package (r/validmind/) hadn't been updated to match the current Python API. This PR brings R up to speed:

R Package (r/validmind/)

  • Add document parameter to vm() function to match Python's vm.init(document=...)
  • Set R_HOME automatically so rpy2 can find the R installation
  • Sync R package version with Python version (Makefile version target now updates both)
  • Updated README with prerequisites, rpy2 install notes, and API examples

Python-side fixes for R compatibility

  • validmind/models/r_model.py: Replace deprecated pandas2ri.activate() with localconverter context manager (rpy2 3.6+), convert R FloatVector predict output to numpy arrays, fix stale test_ds attribute reference
  • validmind/client.py: Register R models in the input registry (log_input + input_registry.add were missing from init_r_model), so run_documentation_tests can find them by input_id
  • validmind/template.py: Add plain-text fallback for preview_template() when not in a Jupyter notebook
  • validmind/utils.py: Add plain-text fallback for preview_test_config(); skip HTML display() when not in a notebook
  • validmind/vm_models/html_progress.py: Skip HTML progress bar/label/box rendering when not in a notebook (eliminates <IPython.core.display.HTML object> noise in R)

New quickstart notebooks

  • notebooks/code_sharing/r/quickstart_model_documentation.Rmd — End-to-end model documentation with GLM
  • notebooks/code_sharing/r/quickstart_model_validation.Rmd — End-to-end model validation with GLM

How to test

  1. Install R prerequisites: install.packages(c("reticulate", "dplyr", "caTools", "knitr", "glue", "plotly", "htmltools", "rmarkdown", "DT", "base64enc"))
  2. Install rpy2 in Python env: pip install rpy2 (on macOS may need --no-binary :all:)
  3. Install the R package: install.packages("r/validmind", repos = NULL, type = "source")
  4. Open R from repo root, run through quickstart_model_documentation.Rmd chunk by chunk
  5. Verify tests run and results upload to the ValidMind Platform
  6. Verify Python notebooks still work normally (the display/progress changes should be transparent)

What needs special review?

  • The r_model.py changes to predict() — replaced pandas2ri.activate() with context manager and added numpy conversion of R predict output
  • The init_r_model registration fix in client.py — this was a bug where R models weren't added to the input registry
  • The is_notebook() guards in html_progress.py and utils.py — ensure they don't affect normal Jupyter notebook behavior

Dependencies, breaking changes, and deployment notes

  • rpy2 is now required in the Python environment when using init_r_model (was already the case, but previously failed with an unhelpful error)
  • No breaking changes to the Python API
  • The sed -i '' in the Makefile version target uses macOS syntax; may need adjustment for Linux CI

Release notes

Updated the ValidMind R package to support the current Python API, including the document parameter for vm.init(), proper model registration, and modern rpy2 compatibility. Added two new R quickstart notebooks for model documentation and validation workflows. Fixed several issues with running ValidMind outside of Jupyter notebooks (plain-text fallbacks for template preview, test config preview, and progress display).

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

🤖 Generated with Claude Code

- Add `document` parameter to R `vm()` function matching Python's `vm.init()`
- Set R_HOME automatically so rpy2 can find the R installation
- Fix deprecated rpy2 pandas2ri.activate() with localconverter context manager
- Convert R FloatVector predict output to numpy arrays
- Fix stale test_ds attribute reference in RModel
- Register R models in input registry so run_documentation_tests can find them
- Add plain-text fallbacks for preview_template() and preview_test_config()
- Skip HTML progress/display rendering when not in a Jupyter notebook
- Add quickstart_model_documentation.Rmd and quickstart_model_validation.Rmd
- Sync R package version with Python (Makefile version target updates both)
- Update R README with prerequisites, rpy2 install notes, and API examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Pull requests must include at least one of the required labels: internal (no release notes required), highlight, enhancement, bug, deprecation, documentation. Except for internal, pull requests must also include a description in the release notes section.

@cachafla cachafla added the enhancement New feature or request label Mar 19, 2026
cachafla and others added 2 commits March 19, 2026 14:12
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cachafla and others added 2 commits March 19, 2026 14:21
Upgrade 2_run_comparison_tests.ipynb from nbformat 4.2 to 4.5 and add
missing cell IDs so the copyright verification passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nrichers nrichers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to run the developer R notebook with a setup tweak to sort out a rpy2 issue with reticulate when doing init_r_model, but not consistently. This might need looking at by @cachafla @AnilSorathiya or @juanmleng.

For the validator R notebook, I was never able to run through the full notebook, again because of the same issue with init_r_model. I also ran into some issues with the ClassImbalance test, but those are possibly related to not having any model documentation uploaded first, as I had to switch to a different model on prod when setting the validation template 404'ed for me on dev.

I suggested a few usability tweaks that I can address tomorrow. , but the code snippet user experience is not great as you need to basically copy the snippet, paste it into an editor window, grab the right lines, paste those in and then fix the line indents and whitespace. It works, but it's not elegant.

Info on my env:

> packageVersion("validmind")
[1] ‘2.12.3’
> R.version.string
[1] "R version 4.5.3 (2026-03-11)"

cachafla and others added 4 commits March 20, 2026 11:59
… add python_version guidance

- Put api_host before api_key in notebook vm() calls for easier copy-paste
- Add RPY2_CFFI_MODE=ABI env var for rpy2/reticulate compatibility
- Remove unsupported feature importance tests (PFI, SHAP) from validation notebook
- Add lead-in text explaining how to find python_version path
- Add plain-text fallback for format_dataframe outside notebooks
- Add local source install option to R README
- Revert init_r_model to save-to-file approach (direct model pass doesn't work)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nrichers nrichers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 🚀

EDIT: I haven't been able to retest fully (was testing when we were asked to delete all our pyenvs locally ...), but if the version mismatch explains the issues I was running into, then let's ship this. The docs page for R will go out shortly.

@cachafla
Copy link
Copy Markdown
Contributor Author

LGTM. 🚀

EDIT: I haven't been able to retest fully (was testing when we were asked to delete all our pyenvs locally ...), but if the version mismatch explains the issues I was running into, then let's ship this. The docs page for R will go out shortly.

Thanks! I'm adding a generic fix to not require manually specifying the location of the Python executable. This should make it easy to test, in combination with an updated JupyterHub deployment.

cachafla and others added 6 commits March 24, 2026 16:43
- vm() now defaults python_version to VALIDMIND_PYTHON env var, falling
  back to system Python. Resolves relative paths against working directory.
- Update all R notebooks to use Sys.getenv("VALIDMIND_PYTHON") pattern
  instead of hardcoded placeholder paths
- Remove python_version from vm() calls in notebooks (now auto-configured)
- Reorder vm() args: api_host first for easier copy-paste from platform
- Remove exposed credentials from r_time_series_model_validation.Rmd
- Document VALIDMIND_PYTHON configuration in R README (env file, .Renviron)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e error messages

- Make .Renviron the recommended first option for VALIDMIND_PYTHON config
- Add .Renviron to .gitignore
- Add required=TRUE to use_python() calls so misconfig fails fast
- Show actual import error in init_r_model instead of generic "install rpy2"
- Load demo data via Python module instead of hardcoded CSV paths
- Remove hardcoded "Exited" references, use customer_churn module constants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Summary

This PR introduces several functional improvements and bug fixes across the project. Key changes include:

  1. R Integration Enhancements:

    • Updated installation instructions and setup documentation in the README files for better clarity on installing and configuring R, the required Python dependencies (e.g., rpy2), and how to resolve issues on macOS.
    • Revised R model initialization in the R package (in files like r/validmind/DESCRIPTION, r/validmind/R/platform.R and notebooks) to support relative Python path resolution, using the VALIDMIND_PYTHON environment variable and fallback logic. This ensures that the correct Python binary is used across different environments (VS Code, RStudio, etc.).
    • Updated test notebooks (e.g., quickstart_model_documentation.Rmd, quickstart_model_validation.Rmd and other R demos) to use a more robust mechanism for specifying the Python interpreter by checking the environment variable and adjusting paths via Sys.getenv and file.path.
  2. Template and Notebook Rendering Improvements:

    • Revised the preview functionality in the template and utils modules to support both Jupyter Notebook and non-notebook environments. Helpers now print a plain-text hierarchy when not in a notebook, ensuring better compatibility.
    • Adjusted the progress bar and label display in the HTML progress module so that displays are gracefully skipped when not running in a notebook environment.
  3. Model Initialization and Prediction Adjustments:

    • In client.py and models/r_model.py, the R model initialization now requires models to be saved in an .RData format rather than older formats. Predictions from R models are now wrapped in NumPy arrays, and the conversion from pandas DataFrame to R objects uses context managers from rpy2 for improved robustness.
    • Added logging and registration of model metadata during R model initialization to improve traceability.
  4. Various Minor Fixes and Improvements:

    • Updated .gitignore to include .Renviron.
    • Updated the Makefile to commit changes to the DESCRIPTION file along with version bumps.
    • Adjusted tests in the Python test module to handle variations in the return types of task and test listings (supporting both Styler and plain DataFrame).

These changes enhance integration between R and Python workflows, improve user documentation and usability across different environments, and ensure that model testing and validation workflows function more reliably.

Test Suggestions

  • Run the updated R notebooks (documentation and validation) in both Jupyter Notebook and non-notebook environments to ensure that the conditional rendering works properly.
  • Verify that the Python dependency resolution (via VALIDMIND_PYTHON and fallback logic) correctly identifies and uses the intended Python interpreter in various deployment scenarios.
  • Test the new R model initialization with an actual .RData file to validate that predictions are correctly returned as NumPy arrays and that model metadata is correctly logged.
  • Execute the revised progress bar and label display functions in environments without notebook support to confirm that they gracefully skip display operations.
  • Validate that the updated test modules handle both Styler and plain DataFrame outputs without errors.

@cachafla cachafla merged commit e6d2ed7 into main Mar 25, 2026
21 checks passed
@cachafla cachafla deleted the cachafla/sc-15256/update-r-wrapper-for-the-vm-library branch March 25, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants