diff --git a/.github/workflows/validate-docs-site.yaml b/.github/workflows/validate-docs-site.yaml
index da1d3bc9d..e97815495 100644
--- a/.github/workflows/validate-docs-site.yaml
+++ b/.github/workflows/validate-docs-site.yaml
@@ -50,6 +50,7 @@ jobs:
with:
repository: validmind/release-notes
path: site/_source/release-notes
+ ref: nrichers/sc-15270/release-notes-for-26-04
token: ${{ secrets.DOCS_CI_RO_PAT }}
sparse-checkout: |
releases
diff --git a/site/.quartoignore b/site/.quartoignore
new file mode 100644
index 000000000..f91ba27b5
--- /dev/null
+++ b/site/.quartoignore
@@ -0,0 +1 @@
+*.Rmd
\ No newline at end of file
diff --git a/site/notebooks.zip b/site/notebooks.zip
index 3889c7a65..cc7b6172d 100644
Binary files a/site/notebooks.zip and b/site/notebooks.zip differ
diff --git a/site/notebooks/EXECUTED/model_development/1-set_up_validmind.ipynb b/site/notebooks/EXECUTED/model_development/1-set_up_validmind.ipynb
index f82f57eaa..4244924b9 100644
--- a/site/notebooks/EXECUTED/model_development/1-set_up_validmind.ipynb
+++ b/site/notebooks/EXECUTED/model_development/1-set_up_validmind.ipynb
@@ -171,7 +171,7 @@
"\n",
"
Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/EXECUTED/model_validation/1-set_up_validmind_for_validation.ipynb b/site/notebooks/EXECUTED/model_validation/1-set_up_validmind_for_validation.ipynb
index c5dc1fb39..05ad11c2c 100644
--- a/site/notebooks/EXECUTED/model_validation/1-set_up_validmind_for_validation.ipynb
+++ b/site/notebooks/EXECUTED/model_validation/1-set_up_validmind_for_validation.ipynb
@@ -261,7 +261,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/how_to/tests/explore_tests/explore_test_suites.ipynb b/site/notebooks/how_to/tests/explore_tests/explore_test_suites.ipynb
index 6eb1e5ef7..976eaedef 100644
--- a/site/notebooks/how_to/tests/explore_tests/explore_test_suites.ipynb
+++ b/site/notebooks/how_to/tests/explore_tests/explore_test_suites.ipynb
@@ -137,7 +137,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/how_to/tests/explore_tests/explore_tests.ipynb b/site/notebooks/how_to/tests/explore_tests/explore_tests.ipynb
index 56c58b62c..3bfda3032 100644
--- a/site/notebooks/how_to/tests/explore_tests/explore_tests.ipynb
+++ b/site/notebooks/how_to/tests/explore_tests/explore_tests.ipynb
@@ -107,7 +107,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/how_to/tests/run_tests/1_run_dataset_based_tests.ipynb b/site/notebooks/how_to/tests/run_tests/1_run_dataset_based_tests.ipynb
index 9af05b3b3..c4937af21 100644
--- a/site/notebooks/how_to/tests/run_tests/1_run_dataset_based_tests.ipynb
+++ b/site/notebooks/how_to/tests/run_tests/1_run_dataset_based_tests.ipynb
@@ -153,7 +153,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb b/site/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb
index 1ba4627bc..ffcd999fc 100644
--- a/site/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb
+++ b/site/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb
@@ -1,1094 +1,1095 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "1d29276f",
- "metadata": {},
- "source": [
- "# Run comparison tests\n",
- "\n",
- "Learn how to use the ValidMind Library to run comparison tests that take any datasets or models as inputs. Identify comparison tests to run, initialize ValidMind dataset and model objects in preparation for passing them to tests, and then run tests — generating outputs automatically logged to your model's documentation in the ValidMind Platform.\n",
- "\n",
- "We recommend that you first complete our introductory notebook on running tests.\n",
- "
\n",
- "
Run dataset-based tests "
- ]
- },
- {
- "cell_type": "markdown",
- "id": "61065444",
- "metadata": {},
- "source": [
- "::: {.content-hidden when-format=\"html\"}\n",
- "## Contents \n",
- "- [About ValidMind](#toc1__) \n",
- " - [Before you begin](#toc1_1__) \n",
- " - [New to ValidMind?](#toc1_2__) \n",
- " - [Key concepts](#toc1_3__) \n",
- "- [Setting up](#toc2__) \n",
- " - [Install the ValidMind Library](#toc2_1__) \n",
- " - [Initialize the ValidMind Library](#toc2_2__) \n",
- " - [Register sample model](#toc2_2_1__) \n",
- " - [Apply documentation template](#toc2_2_2__) \n",
- " - [Get your code snippet](#toc2_2_3__) \n",
- " - [Preview the documentation template](#toc2_3__) \n",
- " - [Initialize the Python environment](#toc2_4__) \n",
- "- [Explore a ValidMind test](#toc3__) \n",
- "- [Working with ValidMind datasets](#toc4__) \n",
- " - [Import the sample dataset](#toc4_1__) \n",
- " - [Split the dataset](#toc4_2__) \n",
- " - [Initialize the ValidMind dataset](#toc4_3__) \n",
- "- [Working with ValidMind models](#toc5__) \n",
- " - [Train a sample model](#toc5_1__) \n",
- " - [Initialize the ValidMind model](#toc5_2__) \n",
- " - [Assign predictions](#toc5_3__) \n",
- "- [Running ValidMind tests](#toc6__) \n",
- " - [Run classifier performance test with one model](#toc6_1__) \n",
- " - [Run comparison tests](#toc6_2__) \n",
- " - [Run classifier performance test with multiple models](#toc6_2_1__) \n",
- " - [Run classifier performance test with multiple parameter values](#toc6_2_2__) \n",
- " - [Run comparison test with multiple datasets](#toc6_2_3__) \n",
- "- [Work with test results](#toc7__) \n",
- "- [Next steps](#toc8__) \n",
- " - [Discover more learning resources](#toc8_1__) \n",
- "- [Upgrade ValidMind](#toc9__) \n",
- "\n",
- ":::\n",
- "\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "67a4d9dc",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## About ValidMind\n",
- "\n",
- "ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models. \n",
- "\n",
- "You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "eeb30df8",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Before you begin\n",
- "\n",
- "This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. \n",
- "\n",
- "If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html)."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "293c3f98",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### New to ValidMind?\n",
- "\n",
- "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n",
- "\n",
- "For access to all features available in this notebook, you'll need access to a ValidMind account.\n",
- "
\n",
- "
Register with ValidMind "
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4fc836d0",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Key concepts\n",
- "\n",
- "**Model documentation**: A structured and detailed record pertaining to a model, encompassing key components such as its underlying assumptions, methodologies, data sources, inputs, performance metrics, evaluations, limitations, and intended uses. It serves to ensure transparency, adherence to regulatory requirements, and a clear understanding of potential risks associated with the model’s application.\n",
- "\n",
- "**Documentation template**: Functions as a test suite and lays out the structure of model documentation, segmented into various sections and sub-sections. Documentation templates define the structure of your model documentation, specifying the tests that should be run, and how the results should be displayed.\n",
- "\n",
- "**Tests**: A function contained in the ValidMind Library, designed to run a specific quantitative test on the dataset or model. Tests are the building blocks of ValidMind, used to evaluate and document models and datasets, and can be run individually or as part of a suite defined by your model documentation template.\n",
- "\n",
- "**Metrics**: A subset of tests that do not have thresholds. In the context of this notebook, metrics and tests can be thought of as interchangeable concepts.\n",
- "\n",
- "**Custom metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with the ValidMind Library to be used in the ValidMind Platform.\n",
- "\n",
- "**Inputs**: Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:\n",
- "\n",
- " - **model**: A single model that has been initialized in ValidMind with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model).\n",
- " - **dataset**: Single dataset that has been initialized in ValidMind with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset).\n",
- " - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.\n",
- " - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. (Learn more: [Run tests with multiple datasets](https://docs.validmind.ai/notebooks/how_to/tests/run_tests/configure_tests/run_tests_that_require_multiple_datasets.html))\n",
- "\n",
- "**Parameters**: Additional arguments that can be passed when running a ValidMind test, used to pass additional information to a metric, customize its behavior, or provide additional context.\n",
- "\n",
- "**Outputs**: Custom metrics can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures.\n",
- "\n",
- "**Test suites**: Collections of tests designed to run together to automate and generate model documentation end-to-end for specific use-cases.\n",
- "\n",
- "Example: the [`classifier_full_suite`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html#ClassifierFullSuite) test suite runs tests from the [`tabular_dataset`](https://docs.validmind.ai/validmind/validmind/test_suites/tabular_datasets.html) and [`classifier`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html) test suites to fully document the data and model sections for binary classification model use-cases."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "8d52b6e0",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Setting up"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e0d2daaf",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Install the ValidMind Library\n",
- "\n",
- "Recommended Python versions\n",
- "
\n",
- "Python 3.8 <= x <= 3.11
\n",
- "\n",
- "To install the library:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "fc97888f",
- "metadata": {},
- "outputs": [],
- "source": [
- "%pip install -q validmind"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1ff56571",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Initialize the ValidMind Library"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c4d9f164",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Register sample model\n",
- "\n",
- "Let's first register a sample model for use with this notebook.\n",
- "\n",
- "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n",
- "\n",
- "2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.\n",
- "\n",
- "3. Enter the model details and click **Next >** to continue to assignment of model stakeholders. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))\n",
- "\n",
- "4. Select your own name under the **MODEL OWNER** drop-down.\n",
- "\n",
- "5. Click **Register Model** to add the model to your inventory."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "852392e5",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Apply documentation template\n",
- "\n",
- "Once you've registered your model, let's select a documentation template. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.\n",
- "\n",
- "1. In the left sidebar that appears for your model, click **Documents** and select **Development**.\n",
- "\n",
- "2. Under **TEMPLATE**, select `Binary classification`.\n",
- "\n",
- "3. Click **Use Template** to apply the template."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6490e991",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Get your code snippet\n",
- "\n",
- "Initialize the ValidMind Library with the *code snippet* unique to each model per document, ensuring your test results are uploaded to the correct model and automatically populated in the right document in the ValidMind Platform when you run this notebook.\n",
- "\n",
- "1. On the left sidebar that appears for your model, select **Getting Started** and select `Development` from the **DOCUMENT** drop-down menu.\n",
- "2. Click **Copy snippet to clipboard**.\n",
- "3. Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet::"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c51ae01c",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Load your model identifier credentials from an `.env` file\n",
- "\n",
- "%load_ext dotenv\n",
- "%dotenv .env\n",
- "\n",
- "# Or replace with your code snippet\n",
- "\n",
- "import validmind as vm\n",
- "\n",
- "vm.init(\n",
- " # api_host=\"...\",\n",
- " # api_key=\"...\",\n",
- " # api_secret=\"...\",\n",
- " # model=\"...\",\n",
- " document=\"documentation\",\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "99e9d14f",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Preview the documentation template\n",
- "\n",
- "Let's verify that you have connected the ValidMind Library to the ValidMind Platform and that the appropriate *template* is selected for your model.\n",
- "\n",
- "You will upload documentation and test results unique to your model based on this template later on. For now, **take a look at the default structure that the template provides with [the `vm.preview_template()` function](https://docs.validmind.ai/validmind/validmind.html#preview_template)** from the ValidMind library and note the empty sections:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "fd332a9d",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm.preview_template()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f805ec38",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Initialize the Python environment\n",
- "\n",
- "Next, let's import the necessary libraries and set up your Python environment for data analysis:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "8e2127cd",
- "metadata": {},
- "outputs": [],
- "source": [
- "import xgboost as xgb\n",
- "\n",
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1783e13c",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Explore a ValidMind test\n",
- "\n",
- "Before we run a test, use [the `vm.tests.list_tests()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to return information on out-of-the-box tests available in the ValidMind Library.\n",
- "\n",
- "Let's assume you want to evaluate *classifier performance* for a model. Classifier performance measures how well a classification model correctly predicts outcomes, using metrics like [precision, recall, and F1 score](https://en.wikipedia.org/wiki/Precision_and_recall).\n",
- "\n",
- "We'll pass in a `filter` to the `list_tests` function to find the test ID for classifier performance:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "a6a6f715",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm.tests.list_tests(filter=\"ClassifierPerformance\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "96a56e4b",
- "metadata": {},
- "source": [
- "We've identified from the output that the test ID for the classifier performance test is `validmind.model_validation.ClassifierPerformance`.\n",
- "\n",
- "Use this ID combined with [the `describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) to retrieve more information about the test, including its **Required Inputs**:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "f8a46c7d",
- "metadata": {},
- "outputs": [],
- "source": [
- "test_id = \"validmind.model_validation.sklearn.ClassifierPerformance\"\n",
- "vm.tests.describe_test(test_id)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "97053f50",
- "metadata": {},
- "source": [
- "Since this test requires a dataset and a model, you can expect it to throw an error when we run it without passing in either as input:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "f853c272",
- "metadata": {},
- "outputs": [],
- "source": [
- "try:\n",
- " vm.tests.run_test(test_id)\n",
- "except Exception as e:\n",
- " print(e)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1a3115ed",
- "metadata": {},
- "source": [
- "Learn more about the individual tests available in the ValidMind Library\n",
- "
\n",
- "Check out our
Explore tests notebook for more code examples and usage of key functions.
"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "89da851b",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Working with ValidMind datasets"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "50bfdb1b",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Import the sample dataset\n",
- "\n",
- "Since we need a dataset to run tests, let's import the public [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset from Kaggle so that we have something to work with.\n",
- "\n",
- "In our below example, note that:\n",
- "\n",
- "- The target column, `Exited` has a value of `1` when a customer has churned and `0` otherwise.\n",
- "- The ValidMind Library provides a wrapper to automatically load the dataset as a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "3ef2dfbb",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Import the sample dataset from the library\n",
- "\n",
- "from validmind.datasets.classification import customer_churn\n",
- "\n",
- "print(\n",
- " f\"Loaded demo dataset with: \\n\\n\\t• Target column: '{customer_churn.target_column}' \\n\\t• Class labels: {customer_churn.class_labels}\"\n",
- ")\n",
- "\n",
- "raw_df = customer_churn.load_data()\n",
- "raw_df.head()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a5a8212f",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Split the dataset\n",
- "\n",
- "Let's first split our dataset to help assess how well the model generalizes to unseen data.\n",
- "\n",
- "Use [`preprocess()`](https://docs.validmind.ai/validmind/validmind/datasets/classification/customer_churn.html#preprocess) to split our dataset into three subsets:\n",
- "\n",
- "1. **train_df** — Used to train the model.\n",
- "2. **validation_df** — Used to evaluate the model's performance during training.\n",
- "3. **test_df** — Used later on to asses the model's performance on new, unseen data."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "88c87d4a",
- "metadata": {},
- "outputs": [],
- "source": [
- "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2ae225d7",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Initialize the ValidMind dataset\n",
- "\n",
- "The next step is to connect your data with a ValidMind `Dataset` object. **This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind,** but you only need to do it once per dataset.\n",
- "\n",
- "ValidMind dataset objects provide a wrapper to any type of dataset (NumPy, Pandas, Polars, etc.) so that tests can run transparently regardless of the underlying library.\n",
- "\n",
- "Initialize a ValidMind dataset object using the [`init_dataset` function](https://docs.validmind.ai/validmind/validmind.html#init_dataset) from the ValidMind (`vm`) module. For this example, we'll pass in the following arguments:\n",
- "\n",
- "- **`dataset`** — The raw dataset that you want to provide as input to tests.\n",
- "- **`input_id`** — A unique identifier that allows tracking what inputs are used when running each individual test.\n",
- "- **`target_column`** — A required argument if tests require access to true values. This is the name of the target column in the dataset."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "bf0ec747",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm_train_ds = vm.init_dataset(\n",
- " dataset=train_df,\n",
- " input_id=\"train_dataset\",\n",
- " target_column=customer_churn.target_column,\n",
- ")\n",
- "\n",
- "vm_test_ds = vm.init_dataset(\n",
- " dataset=test_df,\n",
- " input_id=\"test_dataset\",\n",
- " target_column=customer_churn.target_column,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6d26f65b",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Working with ValidMind models"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6d1677f6",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Train a sample model\n",
- "\n",
- "To train the model, we need to provide it with:\n",
- "\n",
- "1. **Inputs** — Features such as customer age, usage, etc.\n",
- "2. **Outputs (Expected answers/labels)** — in our case, we would like to know whether the customer churned or not.\n",
- "\n",
- "Here, we'll use `x_train` and `x_val` to hold the input data (features), and `y_train` and `y_val` to hold the answers (the target we want to predict):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "39e8c7ea",
- "metadata": {},
- "outputs": [],
- "source": [
- "x_train = train_df.drop(customer_churn.target_column, axis=1)\n",
- "y_train = train_df[customer_churn.target_column]\n",
- "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n",
- "y_val = validation_df[customer_churn.target_column]"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4ac628eb",
- "metadata": {},
- "source": [
- "Next, let's create an *XGBoost classifier model* that will automatically stop training if it doesn't improve after 10 tries. XGBoost is a gradient-boosted tree ensemble that builds trees sequentially, with each tree correcting the errors of the previous ones — typically known for strong predictive performance and built-in regularization to reduce overfitting.\n",
- "\n",
- "Setting an explicit threshold avoids wasting time and helps prevent further overfitting by stopping training when further improvement isn't happening. We'll also set three evaluation metrics to get a more complete picture of model performance:\n",
- "\n",
- "1. **error** — Measures how often the model makes incorrect predictions.\n",
- "2. **logloss** — Indicates how confident the predictions are.\n",
- "3. **auc** — Evaluates how well the model distinguishes between churn and not churn."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "255e3583",
- "metadata": {},
- "outputs": [],
- "source": [
- "model = xgb.XGBClassifier(early_stopping_rounds=10)\n",
- "model.set_params(\n",
- " eval_metric=[\"error\", \"logloss\", \"auc\"],\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f6430312",
- "metadata": {},
- "source": [
- "Finally, our actual training step — where the model learns patterns from the data, so it can make predictions later:\n",
- "\n",
- "- The model is trained on `x_train` and `y_train`, and evaluates its performance using `x_val` and `y_val` to check if it’s learning well.\n",
- "- To turn off printed output while training, we'll set `verbose` to `False`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e3aa3657",
- "metadata": {},
- "outputs": [],
- "source": [
- "model.fit(\n",
- " x_train,\n",
- " y_train,\n",
- " eval_set=[(x_val, y_val)],\n",
- " verbose=False,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c303a046",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Initialize the ValidMind model\n",
- "\n",
- "You'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for our model.\n",
- "\n",
- "You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "4b2be11f",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm_model_xgb = vm.init_model(\n",
- " model,\n",
- " input_id=\"xgboost\",\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2fa83857",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Assign predictions\n",
- "\n",
- "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n",
- "\n",
- "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n",
- "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n",
- "\n",
- "If no prediction values are passed, the method will compute predictions automatically:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "229185fd",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm_train_ds.assign_predictions(model=vm_model_xgb)\n",
- "vm_test_ds.assign_predictions(model=vm_model_xgb)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d0b3312e",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Running ValidMind tests\n",
- "\n",
- "Now that we know how to initialize ValidMind `dataset` and `model` objects, we're ready to run some tests!\n",
- "\n",
- "You run individual tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module. For the examples below, we'll pass in the following arguments:\n",
- "\n",
- "- **`test_id`** — The ID of the test to run, as seen in the `ID` column when you run `list_tests`.\n",
- "- **`inputs`** — A dictionary of test inputs, such as `dataset`, `model`, `datasets`, or `models`. These are ValidMind objects initialized with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) or [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model)."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "96c89f32",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Run classifier performance test with one model\n",
- "\n",
- "Run `validmind.data_validation.ClassifierPerformance` test with the testing dataset (`vm_test_ds`) and model (`vm_model_xgb`) as inputs:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "85189af9",
- "metadata": {},
- "outputs": [],
- "source": [
- "result = vm.tests.run_test(\n",
- " \"validmind.model_validation.sklearn.ClassifierPerformance\",\n",
- " inputs={\n",
- " \"dataset\": vm_test_ds,\n",
- " \"model\": vm_model_xgb,\n",
- " },\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "676dff89",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Run comparison tests\n",
- "\n",
- "To evaluate which models might be a better fit for a use case based on their performance on selected criteria, we can run the same test with multiple models. We'll train three additional models and run the classifier performance test with for all four models using a single `run_test()` call.\n",
- "\n",
- "ValidMind helps streamline your documentation and testing.\n",
- "
\n",
- "You could call run_test() multiple times passing in different inputs, but you can also pass an input_grid object — a dictionary of test input keys and values that allow you to run a single test for a combination of models and datasets.\n",
- "
\n",
- "With input_grid, run comparison tests for multiple datasets, or even multiple datasets and models simultaneously — input_grid can be used with run_test() for all possible combinations of inputs, generating a cohesive and comprehensive single output.\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "3d9912dc",
- "metadata": {},
- "source": [
- "*Random forest classifier* models use an ensemble method that builds multiple decision trees and averages their predictions. Random forest is robust to overfitting and handles non-linear relations well, but is typically less interpretable than simpler models:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "1976b7e8",
- "metadata": {},
- "outputs": [],
- "source": [
- "from sklearn.ensemble import RandomForestClassifier\n",
- "\n",
- "# Train the random forest classifer model\n",
- "model_rf = RandomForestClassifier()\n",
- "model_rf.fit(x_train, y_train)\n",
- "\n",
- "# Initialize the ValidMind model object for the random forest classifer model\n",
- "vm_model_rf = vm.init_model(\n",
- " model_rf,\n",
- " input_id=\"random_forest\",\n",
- ")\n",
- "\n",
- "# Assign predictions to the test dataset for the random forest classifer model\n",
- "vm_test_ds.assign_predictions(model=vm_model_rf)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a259927c",
- "metadata": {},
- "source": [
- "*Logistic regression* models are linear models that estimate class probabilities via a logistic (sigmoid) function. Logistic regression is highly interpretable with fast training, establishing a strong baseline — however, they struggle when relationships are non-linear as real-world relationships often are:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "90bbf148",
- "metadata": {},
- "outputs": [],
- "source": [
- "from sklearn.linear_model import LogisticRegression\n",
- "from sklearn.preprocessing import StandardScaler\n",
- "from sklearn.pipeline import Pipeline\n",
- "\n",
- "# Scaling features ensures the lbfgs solver converges reliably\n",
- "model_lr = Pipeline([\n",
- " (\"scaler\", StandardScaler()),\n",
- " (\"lr\", LogisticRegression()),\n",
- "])\n",
- "model_lr.fit(x_train, y_train)\n",
- "\n",
- "# Initialize the ValidMind model object for the logistic regression model\n",
- "vm_model_lr = vm.init_model(\n",
- " model_lr,\n",
- " input_id=\"logistic_regression\",\n",
- ")\n",
- "\n",
- "# Assign predictions to the test dataset for the logistic regression model\n",
- "vm_test_ds.assign_predictions(model=vm_model_lr)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9a666b41",
- "metadata": {},
- "source": [
- "*Decision tree classifier* models are a single tree with data split on feature thresholds. Useful as an explanability benchmark, decision trees are easy to visualize and interpret — but are prone to overfitting without pruning or ensemble techniques:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "bfa1e17d",
- "metadata": {},
- "outputs": [],
- "source": [
- "from sklearn.tree import DecisionTreeClassifier\n",
- "\n",
- "# Train the decision tree classifer model\n",
- "model_dt = DecisionTreeClassifier()\n",
- "model_dt.fit(x_train, y_train)\n",
- "\n",
- "# Initialize the ValidMind model object for the decision tree classifier model\n",
- "vm_model_dt = vm.init_model(\n",
- " model_dt,\n",
- " input_id=\"decision_tree\",\n",
- ")\n",
- "\n",
- "# Assign predictions to the test dataset for the decision tree classifiermodel\n",
- "vm_test_ds.assign_predictions(model=vm_model_dt)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2c8f3268",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Run classifier performance test with multiple models\n",
- "\n",
- "Now, we'll use the `input_grid` to run the [`ClassifierPerformance` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html) on all four models using the testing dataset (`vm_test_ds`).\n",
- "\n",
- "When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "2e48ce1e",
- "metadata": {},
- "outputs": [],
- "source": [
- "perf_comparison_result = vm.tests.run_test(\n",
- " \"validmind.model_validation.sklearn.ClassifierPerformance:all_models\",\n",
- " input_grid={\n",
- " \"dataset\": [vm_test_ds],\n",
- " \"model\": [vm_model_xgb, vm_model_rf, vm_model_lr, vm_model_dt],\n",
- " },\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "81cbf144",
- "metadata": {},
- "source": [
- "Our output indicates that the XGBoost and random forest classification models provide the strongest overall classification performance, so we'll continue our testing with those two models as input only."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "3d3fb6ec",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Run classifier performance test with multiple parameter values\n",
- "\n",
- "Next, let's run the classifier performance test with the `param_grid` object, which runs the same test multiple times with different parameter values. We'll append an identifier to signify that this test was run with our `parameter_grid` configuration:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "d0ad94c9",
- "metadata": {},
- "outputs": [],
- "source": [
- "parameter_comparison_result = vm.tests.run_test(\n",
- " \"validmind.model_validation.sklearn.ClassifierPerformance:parameter_grid\",\n",
- " input_grid={\n",
- " \"dataset\": [vm_test_ds],\n",
- " \"model\": [vm_model_xgb,vm_model_rf]\n",
- " },\n",
- " param_grid={\n",
- " \"average\": [\"macro\", \"micro\"]\n",
- " },\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "508c7546",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "#### Run comparison test with multiple datasets\n",
- "\n",
- "Let's also run the [ROCCurve test](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n",
- "\n",
- "We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "96c3b426",
- "metadata": {},
- "outputs": [],
- "source": [
- "vm_train_ds.assign_predictions(model=vm_model_rf)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2be82bae",
- "metadata": {},
- "source": [
- "We'll append an identifier to signify that this test was run with our `train_vs_test` dataset comparison configuration:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "4056aa1e",
- "metadata": {},
- "outputs": [],
- "source": [
- "roc_curve_result = vm.tests.run_test(\n",
- " \"validmind.model_validation.sklearn.ROCCurve:train_vs_test\",\n",
- " input_grid={\n",
- " \"dataset\": [vm_train_ds, vm_test_ds],\n",
- " \"model\": [vm_model_xgb,vm_model_rf],\n",
- " },\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a05570d5",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Work with test results\n",
- "\n",
- "Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform. When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.\n",
- "\n",
- "You can do this through the ValidMind Platform interface after logging your test results ([Learn more ...](https://docs.validmind.ai/developer/model-documentation/work-with-test-results.html)), or directly via the ValidMind Library when calling `.log()` by providing an optional `section_id`. The `section_id` should be a string that matches the title of a section in the documentation template in `snake_case`.\n",
- "\n",
- "Let's log the results of the classifier performance test (`perf_comparison_result`) and the ROCCurve (`roc_curve_result`) test in the `model_evaluation` section of the documentation — present in the template we previewed in the beginning of this notebook:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e119bf1e",
- "metadata": {},
- "outputs": [],
- "source": [
- "perf_comparison_result.log(section_id=\"model_evaluation\")\n",
- "roc_curve_result.log(section_id=\"model_evaluation\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ab5205ee",
- "metadata": {},
- "source": [
- "Finally, let's head to the model we connected to at the beginning of this notebook and view our inserted test results in the updated documentation ([Need more help?](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html)):\n",
- "\n",
- "1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.\n",
- "\n",
- "2. In the left sidebar that appears for your model, click **Development** under Documents.\n",
- "\n",
- "3. Expand the **3.2. Model Evaluation** section.\n",
- "\n",
- "4. Confirm that `perf_comparison_result` and `roc_curve_result` display in this section as expected."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "eb196aac",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Next steps\n",
- "\n",
- "Now that you know how to run comparison tests with the ValidMind Library, you’re ready to take the next step. Extend the functionality of `run_test()` with your own custom test functions that can be incorporated into documentation templates just like any default out-of-the-box ValidMind test.\n",
- "\n",
- "Learn how to implement custom tests with the ValidMind Library.\n",
- "
\n",
- "Check out our
Implement comparison tests notebook for code examples and usage of key functions.
"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "083c1d8d",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "### Discover more learning resources\n",
- "\n",
- "We offer many interactive notebooks to help you automate testing, documenting, validating, and more:\n",
- "\n",
- "- [Run tests & test suites](https://docs.validmind.ai/developer/how-to/testing-overview.html)\n",
- "- [Use ValidMind Library features](https://docs.validmind.ai/developer/how-to/feature-overview.html)\n",
- "- [Code samples by use case](https://docs.validmind.ai/guide/samples-jupyter-notebooks.html)\n",
- "\n",
- "Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "efba0f57",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "## Upgrade ValidMind\n",
- "\n",
- "After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.
\n",
- "\n",
- "Retrieve the information for the currently installed version of ValidMind:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "0d35972c",
- "metadata": {
- "vscode": {
- "languageId": "plaintext"
- }
- },
- "outputs": [],
- "source": [
- "%pip show validmind"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "abcd07ef",
- "metadata": {},
- "source": [
- "If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:\n",
- "\n",
- "```bash\n",
- "%pip install --upgrade validmind\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5fe70b90",
- "metadata": {},
- "source": [
- "You may need to restart your kernel after running the upgrade package for changes to be applied."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "\n",
- "\n",
- "***\n",
- "\n",
- "Copyright © 2023-2026 ValidMind Inc. All rights reserved.
\n",
- "Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.
\n",
- "SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "name": "python",
- "version": "3.10"
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1d29276f",
+ "metadata": {},
+ "source": [
+ "# Run comparison tests\n",
+ "\n",
+ "Learn how to use the ValidMind Library to run comparison tests that take any datasets or models as inputs. Identify comparison tests to run, initialize ValidMind dataset and model objects in preparation for passing them to tests, and then run tests — generating outputs automatically logged to your model's documentation in the ValidMind Platform.\n",
+ "\n",
+ "We recommend that you first complete our introductory notebook on running tests.\n",
+ "
\n",
+ "
Run dataset-based tests "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "61065444",
+ "metadata": {},
+ "source": [
+ "::: {.content-hidden when-format=\"html\"}\n",
+ "## Contents \n",
+ "- [About ValidMind](#toc1__) \n",
+ " - [Before you begin](#toc1_1__) \n",
+ " - [New to ValidMind?](#toc1_2__) \n",
+ " - [Key concepts](#toc1_3__) \n",
+ "- [Setting up](#toc2__) \n",
+ " - [Install the ValidMind Library](#toc2_1__) \n",
+ " - [Initialize the ValidMind Library](#toc2_2__) \n",
+ " - [Register sample model](#toc2_2_1__) \n",
+ " - [Apply documentation template](#toc2_2_2__) \n",
+ " - [Get your code snippet](#toc2_2_3__) \n",
+ " - [Preview the documentation template](#toc2_3__) \n",
+ " - [Initialize the Python environment](#toc2_4__) \n",
+ "- [Explore a ValidMind test](#toc3__) \n",
+ "- [Working with ValidMind datasets](#toc4__) \n",
+ " - [Import the sample dataset](#toc4_1__) \n",
+ " - [Split the dataset](#toc4_2__) \n",
+ " - [Initialize the ValidMind dataset](#toc4_3__) \n",
+ "- [Working with ValidMind models](#toc5__) \n",
+ " - [Train a sample model](#toc5_1__) \n",
+ " - [Initialize the ValidMind model](#toc5_2__) \n",
+ " - [Assign predictions](#toc5_3__) \n",
+ "- [Running ValidMind tests](#toc6__) \n",
+ " - [Run classifier performance test with one model](#toc6_1__) \n",
+ " - [Run comparison tests](#toc6_2__) \n",
+ " - [Run classifier performance test with multiple models](#toc6_2_1__) \n",
+ " - [Run classifier performance test with multiple parameter values](#toc6_2_2__) \n",
+ " - [Run comparison test with multiple datasets](#toc6_2_3__) \n",
+ "- [Work with test results](#toc7__) \n",
+ "- [Next steps](#toc8__) \n",
+ " - [Discover more learning resources](#toc8_1__) \n",
+ "- [Upgrade ValidMind](#toc9__) \n",
+ "\n",
+ ":::\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67a4d9dc",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## About ValidMind\n",
+ "\n",
+ "ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models. \n",
+ "\n",
+ "You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eeb30df8",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Before you begin\n",
+ "\n",
+ "This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. \n",
+ "\n",
+ "If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "293c3f98",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### New to ValidMind?\n",
+ "\n",
+ "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n",
+ "\n",
+ "For access to all features available in this notebook, you'll need access to a ValidMind account.\n",
+ "
\n",
+ "
Register with ValidMind "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4fc836d0",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Key concepts\n",
+ "\n",
+ "**Model documentation**: A structured and detailed record pertaining to a model, encompassing key components such as its underlying assumptions, methodologies, data sources, inputs, performance metrics, evaluations, limitations, and intended uses. It serves to ensure transparency, adherence to regulatory requirements, and a clear understanding of potential risks associated with the model’s application.\n",
+ "\n",
+ "**Documentation template**: Functions as a test suite and lays out the structure of model documentation, segmented into various sections and sub-sections. Documentation templates define the structure of your model documentation, specifying the tests that should be run, and how the results should be displayed.\n",
+ "\n",
+ "**Tests**: A function contained in the ValidMind Library, designed to run a specific quantitative test on the dataset or model. Tests are the building blocks of ValidMind, used to evaluate and document models and datasets, and can be run individually or as part of a suite defined by your model documentation template.\n",
+ "\n",
+ "**Metrics**: A subset of tests that do not have thresholds. In the context of this notebook, metrics and tests can be thought of as interchangeable concepts.\n",
+ "\n",
+ "**Custom metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with the ValidMind Library to be used in the ValidMind Platform.\n",
+ "\n",
+ "**Inputs**: Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:\n",
+ "\n",
+ " - **model**: A single model that has been initialized in ValidMind with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model).\n",
+ " - **dataset**: Single dataset that has been initialized in ValidMind with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset).\n",
+ " - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.\n",
+ " - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. (Learn more: [Run tests with multiple datasets](https://docs.validmind.ai/notebooks/how_to/tests/run_tests/configure_tests/run_tests_that_require_multiple_datasets.html))\n",
+ "\n",
+ "**Parameters**: Additional arguments that can be passed when running a ValidMind test, used to pass additional information to a metric, customize its behavior, or provide additional context.\n",
+ "\n",
+ "**Outputs**: Custom metrics can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures.\n",
+ "\n",
+ "**Test suites**: Collections of tests designed to run together to automate and generate model documentation end-to-end for specific use-cases.\n",
+ "\n",
+ "Example: the [`classifier_full_suite`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html#ClassifierFullSuite) test suite runs tests from the [`tabular_dataset`](https://docs.validmind.ai/validmind/validmind/test_suites/tabular_datasets.html) and [`classifier`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html) test suites to fully document the data and model sections for binary classification model use-cases."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8d52b6e0",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Setting up"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0d2daaf",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Install the ValidMind Library\n",
+ "\n",
+ "Recommended Python versions\n",
+ "
\n",
+ "Python 3.8 <= x <= 3.14
\n",
+ "\n",
+ "To install the library:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fc97888f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install -q validmind"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1ff56571",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Initialize the ValidMind Library"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c4d9f164",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Register sample model\n",
+ "\n",
+ "Let's first register a sample model for use with this notebook.\n",
+ "\n",
+ "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n",
+ "\n",
+ "2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.\n",
+ "\n",
+ "3. Enter the model details and click **Next >** to continue to assignment of model stakeholders. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))\n",
+ "\n",
+ "4. Select your own name under the **MODEL OWNER** drop-down.\n",
+ "\n",
+ "5. Click **Register Model** to add the model to your inventory."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "852392e5",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Apply documentation template\n",
+ "\n",
+ "Once you've registered your model, let's select a documentation template. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.\n",
+ "\n",
+ "1. In the left sidebar that appears for your model, click **Documents** and select **Development**.\n",
+ "\n",
+ "2. Under **TEMPLATE**, select `Binary classification`.\n",
+ "\n",
+ "3. Click **Use Template** to apply the template."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6490e991",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Get your code snippet\n",
+ "\n",
+ "Initialize the ValidMind Library with the *code snippet* unique to each model per document, ensuring your test results are uploaded to the correct model and automatically populated in the right document in the ValidMind Platform when you run this notebook.\n",
+ "\n",
+ "1. On the left sidebar that appears for your model, select **Getting Started** and select `Development` from the **DOCUMENT** drop-down menu.\n",
+ "2. Click **Copy snippet to clipboard**.\n",
+ "3. Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet::"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c51ae01c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load your model identifier credentials from an `.env` file\n",
+ "\n",
+ "%load_ext dotenv\n",
+ "%dotenv .env\n",
+ "\n",
+ "# Or replace with your code snippet\n",
+ "\n",
+ "import validmind as vm\n",
+ "\n",
+ "vm.init(\n",
+ " # api_host=\"...\",\n",
+ " # api_key=\"...\",\n",
+ " # api_secret=\"...\",\n",
+ " # model=\"...\",\n",
+ " document=\"documentation\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "99e9d14f",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Preview the documentation template\n",
+ "\n",
+ "Let's verify that you have connected the ValidMind Library to the ValidMind Platform and that the appropriate *template* is selected for your model.\n",
+ "\n",
+ "You will upload documentation and test results unique to your model based on this template later on. For now, **take a look at the default structure that the template provides with [the `vm.preview_template()` function](https://docs.validmind.ai/validmind/validmind.html#preview_template)** from the ValidMind library and note the empty sections:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fd332a9d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm.preview_template()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f805ec38",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Initialize the Python environment\n",
+ "\n",
+ "Next, let's import the necessary libraries and set up your Python environment for data analysis:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8e2127cd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import xgboost as xgb\n",
+ "\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1783e13c",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Explore a ValidMind test\n",
+ "\n",
+ "Before we run a test, use [the `vm.tests.list_tests()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to return information on out-of-the-box tests available in the ValidMind Library.\n",
+ "\n",
+ "Let's assume you want to evaluate *classifier performance* for a model. Classifier performance measures how well a classification model correctly predicts outcomes, using metrics like [precision, recall, and F1 score](https://en.wikipedia.org/wiki/Precision_and_recall).\n",
+ "\n",
+ "We'll pass in a `filter` to the `list_tests` function to find the test ID for classifier performance:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a6a6f715",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm.tests.list_tests(filter=\"ClassifierPerformance\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "96a56e4b",
+ "metadata": {},
+ "source": [
+ "We've identified from the output that the test ID for the classifier performance test is `validmind.model_validation.ClassifierPerformance`.\n",
+ "\n",
+ "Use this ID combined with [the `describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) to retrieve more information about the test, including its **Required Inputs**:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f8a46c7d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "test_id = \"validmind.model_validation.sklearn.ClassifierPerformance\"\n",
+ "vm.tests.describe_test(test_id)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97053f50",
+ "metadata": {},
+ "source": [
+ "Since this test requires a dataset and a model, you can expect it to throw an error when we run it without passing in either as input:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f853c272",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " vm.tests.run_test(test_id)\n",
+ "except Exception as e:\n",
+ " print(e)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1a3115ed",
+ "metadata": {},
+ "source": [
+ "Learn more about the individual tests available in the ValidMind Library\n",
+ "
\n",
+ "Check out our
Explore tests notebook for more code examples and usage of key functions.
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "89da851b",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Working with ValidMind datasets"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "50bfdb1b",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Import the sample dataset\n",
+ "\n",
+ "Since we need a dataset to run tests, let's import the public [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset from Kaggle so that we have something to work with.\n",
+ "\n",
+ "In our below example, note that:\n",
+ "\n",
+ "- The target column, `Exited` has a value of `1` when a customer has churned and `0` otherwise.\n",
+ "- The ValidMind Library provides a wrapper to automatically load the dataset as a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3ef2dfbb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the sample dataset from the library\n",
+ "\n",
+ "from validmind.datasets.classification import customer_churn\n",
+ "\n",
+ "print(\n",
+ " f\"Loaded demo dataset with: \\n\\n\\t• Target column: '{customer_churn.target_column}' \\n\\t• Class labels: {customer_churn.class_labels}\"\n",
+ ")\n",
+ "\n",
+ "raw_df = customer_churn.load_data()\n",
+ "raw_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a5a8212f",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Split the dataset\n",
+ "\n",
+ "Let's first split our dataset to help assess how well the model generalizes to unseen data.\n",
+ "\n",
+ "Use [`preprocess()`](https://docs.validmind.ai/validmind/validmind/datasets/classification/customer_churn.html#preprocess) to split our dataset into three subsets:\n",
+ "\n",
+ "1. **train_df** — Used to train the model.\n",
+ "2. **validation_df** — Used to evaluate the model's performance during training.\n",
+ "3. **test_df** — Used later on to asses the model's performance on new, unseen data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "88c87d4a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2ae225d7",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Initialize the ValidMind dataset\n",
+ "\n",
+ "The next step is to connect your data with a ValidMind `Dataset` object. **This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind,** but you only need to do it once per dataset.\n",
+ "\n",
+ "ValidMind dataset objects provide a wrapper to any type of dataset (NumPy, Pandas, Polars, etc.) so that tests can run transparently regardless of the underlying library.\n",
+ "\n",
+ "Initialize a ValidMind dataset object using the [`init_dataset` function](https://docs.validmind.ai/validmind/validmind.html#init_dataset) from the ValidMind (`vm`) module. For this example, we'll pass in the following arguments:\n",
+ "\n",
+ "- **`dataset`** — The raw dataset that you want to provide as input to tests.\n",
+ "- **`input_id`** — A unique identifier that allows tracking what inputs are used when running each individual test.\n",
+ "- **`target_column`** — A required argument if tests require access to true values. This is the name of the target column in the dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bf0ec747",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm_train_ds = vm.init_dataset(\n",
+ " dataset=train_df,\n",
+ " input_id=\"train_dataset\",\n",
+ " target_column=customer_churn.target_column,\n",
+ ")\n",
+ "\n",
+ "vm_test_ds = vm.init_dataset(\n",
+ " dataset=test_df,\n",
+ " input_id=\"test_dataset\",\n",
+ " target_column=customer_churn.target_column,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d26f65b",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Working with ValidMind models"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d1677f6",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Train a sample model\n",
+ "\n",
+ "To train the model, we need to provide it with:\n",
+ "\n",
+ "1. **Inputs** — Features such as customer age, usage, etc.\n",
+ "2. **Outputs (Expected answers/labels)** — in our case, we would like to know whether the customer churned or not.\n",
+ "\n",
+ "Here, we'll use `x_train` and `x_val` to hold the input data (features), and `y_train` and `y_val` to hold the answers (the target we want to predict):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "39e8c7ea",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x_train = train_df.drop(customer_churn.target_column, axis=1)\n",
+ "y_train = train_df[customer_churn.target_column]\n",
+ "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n",
+ "y_val = validation_df[customer_churn.target_column]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4ac628eb",
+ "metadata": {},
+ "source": [
+ "Next, let's create an *XGBoost classifier model* that will automatically stop training if it doesn't improve after 10 tries. XGBoost is a gradient-boosted tree ensemble that builds trees sequentially, with each tree correcting the errors of the previous ones — typically known for strong predictive performance and built-in regularization to reduce overfitting.\n",
+ "\n",
+ "Setting an explicit threshold avoids wasting time and helps prevent further overfitting by stopping training when further improvement isn't happening. We'll also set three evaluation metrics to get a more complete picture of model performance:\n",
+ "\n",
+ "1. **error** — Measures how often the model makes incorrect predictions.\n",
+ "2. **logloss** — Indicates how confident the predictions are.\n",
+ "3. **auc** — Evaluates how well the model distinguishes between churn and not churn."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "255e3583",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model = xgb.XGBClassifier(early_stopping_rounds=10)\n",
+ "model.set_params(\n",
+ " eval_metric=[\"error\", \"logloss\", \"auc\"],\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f6430312",
+ "metadata": {},
+ "source": [
+ "Finally, our actual training step — where the model learns patterns from the data, so it can make predictions later:\n",
+ "\n",
+ "- The model is trained on `x_train` and `y_train`, and evaluates its performance using `x_val` and `y_val` to check if it’s learning well.\n",
+ "- To turn off printed output while training, we'll set `verbose` to `False`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e3aa3657",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model.fit(\n",
+ " x_train,\n",
+ " y_train,\n",
+ " eval_set=[(x_val, y_val)],\n",
+ " verbose=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c303a046",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Initialize the ValidMind model\n",
+ "\n",
+ "You'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for our model.\n",
+ "\n",
+ "You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4b2be11f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm_model_xgb = vm.init_model(\n",
+ " model,\n",
+ " input_id=\"xgboost\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2fa83857",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Assign predictions\n",
+ "\n",
+ "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n",
+ "\n",
+ "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n",
+ "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n",
+ "\n",
+ "If no prediction values are passed, the method will compute predictions automatically:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "229185fd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm_train_ds.assign_predictions(model=vm_model_xgb)\n",
+ "vm_test_ds.assign_predictions(model=vm_model_xgb)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d0b3312e",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Running ValidMind tests\n",
+ "\n",
+ "Now that we know how to initialize ValidMind `dataset` and `model` objects, we're ready to run some tests!\n",
+ "\n",
+ "You run individual tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module. For the examples below, we'll pass in the following arguments:\n",
+ "\n",
+ "- **`test_id`** — The ID of the test to run, as seen in the `ID` column when you run `list_tests`.\n",
+ "- **`inputs`** — A dictionary of test inputs, such as `dataset`, `model`, `datasets`, or `models`. These are ValidMind objects initialized with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) or [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "96c89f32",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Run classifier performance test with one model\n",
+ "\n",
+ "Run `validmind.data_validation.ClassifierPerformance` test with the testing dataset (`vm_test_ds`) and model (`vm_model_xgb`) as inputs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85189af9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "result = vm.tests.run_test(\n",
+ " \"validmind.model_validation.sklearn.ClassifierPerformance\",\n",
+ " inputs={\n",
+ " \"dataset\": vm_test_ds,\n",
+ " \"model\": vm_model_xgb,\n",
+ " },\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "676dff89",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Run comparison tests\n",
+ "\n",
+ "To evaluate which models might be a better fit for a use case based on their performance on selected criteria, we can run the same test with multiple models. We'll train three additional models and run the classifier performance test with for all four models using a single `run_test()` call.\n",
+ "\n",
+ "ValidMind helps streamline your documentation and testing.\n",
+ "
\n",
+ "You could call run_test() multiple times passing in different inputs, but you can also pass an input_grid object — a dictionary of test input keys and values that allow you to run a single test for a combination of models and datasets.\n",
+ "
\n",
+ "With input_grid, run comparison tests for multiple datasets, or even multiple datasets and models simultaneously — input_grid can be used with run_test() for all possible combinations of inputs, generating a cohesive and comprehensive single output.\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3d9912dc",
+ "metadata": {},
+ "source": [
+ "*Random forest classifier* models use an ensemble method that builds multiple decision trees and averages their predictions. Random forest is robust to overfitting and handles non-linear relations well, but is typically less interpretable than simpler models:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1976b7e8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# Train the random forest classifer model\n",
+ "model_rf = RandomForestClassifier()\n",
+ "model_rf.fit(x_train, y_train)\n",
+ "\n",
+ "# Initialize the ValidMind model object for the random forest classifer model\n",
+ "vm_model_rf = vm.init_model(\n",
+ " model_rf,\n",
+ " input_id=\"random_forest\",\n",
+ ")\n",
+ "\n",
+ "# Assign predictions to the test dataset for the random forest classifer model\n",
+ "vm_test_ds.assign_predictions(model=vm_model_rf)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a259927c",
+ "metadata": {},
+ "source": [
+ "*Logistic regression* models are linear models that estimate class probabilities via a logistic (sigmoid) function. Logistic regression is highly interpretable with fast training, establishing a strong baseline — however, they struggle when relationships are non-linear as real-world relationships often are:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "90bbf148",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "from sklearn.pipeline import Pipeline\n",
+ "\n",
+ "# Scaling features ensures the lbfgs solver converges reliably\n",
+ "model_lr = Pipeline([\n",
+ " (\"scaler\", StandardScaler()),\n",
+ " (\"lr\", LogisticRegression()),\n",
+ "])\n",
+ "model_lr.fit(x_train, y_train)\n",
+ "\n",
+ "# Initialize the ValidMind model object for the logistic regression model\n",
+ "vm_model_lr = vm.init_model(\n",
+ " model_lr,\n",
+ " input_id=\"logistic_regression\",\n",
+ ")\n",
+ "\n",
+ "# Assign predictions to the test dataset for the logistic regression model\n",
+ "vm_test_ds.assign_predictions(model=vm_model_lr)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a666b41",
+ "metadata": {},
+ "source": [
+ "*Decision tree classifier* models are a single tree with data split on feature thresholds. Useful as an explanability benchmark, decision trees are easy to visualize and interpret — but are prone to overfitting without pruning or ensemble techniques:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bfa1e17d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "\n",
+ "# Train the decision tree classifer model\n",
+ "model_dt = DecisionTreeClassifier()\n",
+ "model_dt.fit(x_train, y_train)\n",
+ "\n",
+ "# Initialize the ValidMind model object for the decision tree classifier model\n",
+ "vm_model_dt = vm.init_model(\n",
+ " model_dt,\n",
+ " input_id=\"decision_tree\",\n",
+ ")\n",
+ "\n",
+ "# Assign predictions to the test dataset for the decision tree classifiermodel\n",
+ "vm_test_ds.assign_predictions(model=vm_model_dt)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2c8f3268",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Run classifier performance test with multiple models\n",
+ "\n",
+ "Now, we'll use the `input_grid` to run the [`ClassifierPerformance` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html) on all four models using the testing dataset (`vm_test_ds`).\n",
+ "\n",
+ "When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2e48ce1e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "perf_comparison_result = vm.tests.run_test(\n",
+ " \"validmind.model_validation.sklearn.ClassifierPerformance:all_models\",\n",
+ " input_grid={\n",
+ " \"dataset\": [vm_test_ds],\n",
+ " \"model\": [vm_model_xgb, vm_model_rf, vm_model_lr, vm_model_dt],\n",
+ " },\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "81cbf144",
+ "metadata": {},
+ "source": [
+ "Our output indicates that the XGBoost and random forest classification models provide the strongest overall classification performance, so we'll continue our testing with those two models as input only."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3d3fb6ec",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Run classifier performance test with multiple parameter values\n",
+ "\n",
+ "Next, let's run the classifier performance test with the `param_grid` object, which runs the same test multiple times with different parameter values. We'll append an identifier to signify that this test was run with our `parameter_grid` configuration:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d0ad94c9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "parameter_comparison_result = vm.tests.run_test(\n",
+ " \"validmind.model_validation.sklearn.ClassifierPerformance:parameter_grid\",\n",
+ " input_grid={\n",
+ " \"dataset\": [vm_test_ds],\n",
+ " \"model\": [vm_model_xgb,vm_model_rf]\n",
+ " },\n",
+ " param_grid={\n",
+ " \"average\": [\"macro\", \"micro\"]\n",
+ " },\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "508c7546",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "#### Run comparison test with multiple datasets\n",
+ "\n",
+ "Let's also run the [ROCCurve test](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n",
+ "\n",
+ "We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "96c3b426",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vm_train_ds.assign_predictions(model=vm_model_rf)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2be82bae",
+ "metadata": {},
+ "source": [
+ "We'll append an identifier to signify that this test was run with our `train_vs_test` dataset comparison configuration:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4056aa1e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "roc_curve_result = vm.tests.run_test(\n",
+ " \"validmind.model_validation.sklearn.ROCCurve:train_vs_test\",\n",
+ " input_grid={\n",
+ " \"dataset\": [vm_train_ds, vm_test_ds],\n",
+ " \"model\": [vm_model_xgb,vm_model_rf],\n",
+ " },\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a05570d5",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Work with test results\n",
+ "\n",
+ "Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform. When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.\n",
+ "\n",
+ "You can do this through the ValidMind Platform interface after logging your test results ([Learn more ...](https://docs.validmind.ai/developer/model-documentation/work-with-test-results.html)), or directly via the ValidMind Library when calling `.log()` by providing an optional `section_id`. The `section_id` should be a string that matches the title of a section in the documentation template in `snake_case`.\n",
+ "\n",
+ "Let's log the results of the classifier performance test (`perf_comparison_result`) and the ROCCurve (`roc_curve_result`) test in the `model_evaluation` section of the documentation — present in the template we previewed in the beginning of this notebook:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e119bf1e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "perf_comparison_result.log(section_id=\"model_evaluation\")\n",
+ "roc_curve_result.log(section_id=\"model_evaluation\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab5205ee",
+ "metadata": {},
+ "source": [
+ "Finally, let's head to the model we connected to at the beginning of this notebook and view our inserted test results in the updated documentation ([Need more help?](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html)):\n",
+ "\n",
+ "1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.\n",
+ "\n",
+ "2. In the left sidebar that appears for your model, click **Development** under Documents.\n",
+ "\n",
+ "3. Expand the **3.2. Model Evaluation** section.\n",
+ "\n",
+ "4. Confirm that `perf_comparison_result` and `roc_curve_result` display in this section as expected."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eb196aac",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Next steps\n",
+ "\n",
+ "Now that you know how to run comparison tests with the ValidMind Library, you’re ready to take the next step. Extend the functionality of `run_test()` with your own custom test functions that can be incorporated into documentation templates just like any default out-of-the-box ValidMind test.\n",
+ "\n",
+ "Learn how to implement custom tests with the ValidMind Library.\n",
+ "
\n",
+ "Check out our
Implement comparison tests notebook for code examples and usage of key functions.
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "083c1d8d",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Discover more learning resources\n",
+ "\n",
+ "We offer many interactive notebooks to help you automate testing, documenting, validating, and more:\n",
+ "\n",
+ "- [Run tests & test suites](https://docs.validmind.ai/developer/how-to/testing-overview.html)\n",
+ "- [Use ValidMind Library features](https://docs.validmind.ai/developer/how-to/feature-overview.html)\n",
+ "- [Code samples by use case](https://docs.validmind.ai/guide/samples-jupyter-notebooks.html)\n",
+ "\n",
+ "Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "efba0f57",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Upgrade ValidMind\n",
+ "\n",
+ "After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.
\n",
+ "\n",
+ "Retrieve the information for the currently installed version of ValidMind:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0d35972c",
+ "metadata": {
+ "vscode": {
+ "languageId": "plaintext"
}
+ },
+ "outputs": [],
+ "source": [
+ "%pip show validmind"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "abcd07ef",
+ "metadata": {},
+ "source": [
+ "If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:\n",
+ "\n",
+ "```bash\n",
+ "%pip install --upgrade validmind\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5fe70b90",
+ "metadata": {},
+ "source": [
+ "You may need to restart your kernel after running the upgrade package for changes to be applied."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "copyright-54faffd51a5a4717a02b6be426d6b441",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "***\n",
+ "\n",
+ "Copyright © 2023-2026 ValidMind Inc. All rights reserved.
\n",
+ "Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.
\n",
+ "SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
},
- "nbformat": 4,
- "nbformat_minor": 5
+ "language_info": {
+ "name": "python",
+ "version": "3.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
}
diff --git a/site/notebooks/how_to/tests/run_tests/configure_tests/enable_pii_detection.ipynb b/site/notebooks/how_to/tests/run_tests/configure_tests/enable_pii_detection.ipynb
index e0ae28fff..ddbd734e9 100644
--- a/site/notebooks/how_to/tests/run_tests/configure_tests/enable_pii_detection.ipynb
+++ b/site/notebooks/how_to/tests/run_tests/configure_tests/enable_pii_detection.ipynb
@@ -157,7 +157,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To use PII detection powered by [Microsoft Presidio](https://microsoft.github.io/presidio/), install the library with the explicit `[pii-detection]` extra specifier:"
]
diff --git a/site/notebooks/quickstart/quickstart_model_documentation.Rmd b/site/notebooks/quickstart/quickstart_model_documentation.Rmd
new file mode 100644
index 000000000..fc48d8c44
--- /dev/null
+++ b/site/notebooks/quickstart/quickstart_model_documentation.Rmd
@@ -0,0 +1,227 @@
+---
+title: "Quickstart for Model Documentation (R)"
+author: "ValidMind"
+date: "2026-03-18"
+output: html_document
+---
+
+# Quickstart for Model Documentation
+
+Learn the basics of using ValidMind to document models as part of a model development workflow using R. This notebook uses the ValidMind R package (a `reticulate` wrapper around the Python library) to generate a draft of documentation for a binary classification model.
+
+We will:
+
+1. Import a sample dataset and preprocess it
+2. Split the datasets and initialize them for use with ValidMind
+3. Train a logistic regression (GLM) model and initialize it for use with testing
+4. Run the full suite of documentation tests, sending results to the ValidMind Platform
+
+## Setting up
+
+The Python path is auto-configured via the `VALIDMIND_PYTHON` environment variable.
+If not set, it falls back to the system Python. For local development, create a
+`.Renviron` file in the project root with `VALIDMIND_PYTHON=.venv/bin/python`.
+
+```{r setup, include=FALSE}
+library(reticulate)
+
+python_version <- Sys.getenv("VALIDMIND_PYTHON", Sys.which("python"))
+if (nchar(python_version) > 0 && !startsWith(python_version, "/")) {
+ python_version <- file.path(getwd(), python_version)
+}
+use_python(python_version, required = TRUE)
+
+library(validmind)
+library(dplyr)
+library(caTools)
+library(knitr)
+
+knitr::opts_chunk$set(warning = FALSE, message = FALSE)
+```
+
+## Initialize the ValidMind Library
+
+Log in to the [ValidMind Platform](https://app.prod.validmind.ai) and register a model:
+
+1. Navigate to **Inventory** and click **+ Register Model**.
+2. Under **Documents > Development**, select the `Binary classification` template.
+3. Go to **Getting Started**, select `Development` from the **DOCUMENT** drop-down, and copy the code snippet.
+
+Replace the placeholder values below with your own credentials:
+
+```{r}
+vm_r <- vm(
+ api_host = "https://app.prod.validmind.ai/api/v1/tracking",
+ api_key = "",
+ api_secret = "",
+ model = "",
+ document = "documentation"
+)
+```
+
+## Preview the documentation template
+
+Verify the connection and see the documentation structure:
+
+```{r}
+py_print(vm_r$preview_template())
+```
+
+## Load the demo dataset
+
+We use the Bank Customer Churn dataset for this demonstration:
+
+```{r}
+customer_churn <- reticulate::import(
+ "validmind.datasets.classification.customer_churn"
+)
+
+cat(sprintf(
+ paste0(
+ "Loaded demo dataset with:\n\n\t- Target column: '%s'",
+ "\n\t- Class labels: %s\n"
+ ),
+ customer_churn$target_column,
+ paste(
+ names(customer_churn$class_labels),
+ customer_churn$class_labels,
+ sep = ": ", collapse = ", "
+ )
+))
+
+data <- customer_churn$load_data()
+head(data)
+```
+
+## Initialize the raw dataset
+
+Before running tests, initialize a ValidMind dataset object for the raw data:
+
+```{r}
+vm_raw_dataset <- vm_r$init_dataset(
+ dataset = data,
+ input_id = "raw_dataset",
+ target_column = customer_churn$target_column,
+ class_labels = customer_churn$class_labels
+)
+```
+
+## Preprocess the raw dataset
+
+Handle categorical variables using one-hot encoding and remove unnecessary columns:
+
+```{r}
+# load_data() already drops RowNumber, CustomerId, Surname
+# One-hot encode categorical variables
+geo_dummies <- model.matrix(~ Geography - 1, data = data)
+gender_dummies <- model.matrix(~ Gender - 1, data = data)
+data_processed <- data %>% select(-Geography, -Gender)
+data_processed <- cbind(data_processed, geo_dummies, gender_dummies)
+```
+
+### Split the dataset
+
+Split into training (60%), validation (20%), and test (20%) sets:
+
+```{r}
+set.seed(42)
+
+# First split: 80% train+validation, 20% test
+target_col <- customer_churn$target_column
+split1 <- sample.split(data_processed[[target_col]], SplitRatio = 0.8)
+train_val_data <- subset(data_processed, split1 == TRUE)
+test_data <- subset(data_processed, split1 == FALSE)
+
+# Second split: 75% train, 25% validation (of the 80% = 60/20 overall)
+split2 <- sample.split(train_val_data[[target_col]], SplitRatio = 0.75)
+train_data <- subset(train_val_data, split2 == TRUE)
+validation_data <- subset(train_val_data, split2 == FALSE)
+```
+
+## Train a logistic regression model
+
+Train a GLM with a binomial family (logistic regression):
+
+```{r}
+formula <- as.formula(paste(target_col, "~ ."))
+model <- glm(formula, data = train_data, family = binomial)
+summary(model)
+```
+
+## Initialize the ValidMind datasets
+
+```{r}
+vm_train_ds <- vm_r$init_dataset(
+ dataset = train_data,
+ input_id = "train_dataset",
+ target_column = customer_churn$target_column
+)
+
+vm_test_ds <- vm_r$init_dataset(
+ dataset = test_data,
+ input_id = "test_dataset",
+ target_column = customer_churn$target_column
+)
+```
+
+## Initialize a model object
+
+Save the R model and initialize it with ValidMind:
+
+```{r}
+model_path <- save_model(model)
+
+vm_model <- vm_r$init_r_model(
+ model_path = model_path,
+ input_id = "model"
+)
+```
+
+### Assign predictions
+
+Link model predictions to the training and testing datasets:
+
+```{r}
+vm_train_ds$assign_predictions(model = vm_model)
+vm_test_ds$assign_predictions(model = vm_model)
+```
+
+## Run the full suite of tests
+
+Build the test configuration that maps each test to its required inputs:
+
+```{r}
+# Import the test config helper from the Python customer_churn module
+customer_churn <- reticulate::import(
+ "validmind.datasets.classification.customer_churn"
+)
+test_config <- customer_churn$get_demo_test_config()
+```
+
+Preview the test configuration:
+
+```{r}
+vm_utils <- reticulate::import("validmind.utils")
+py_print(vm_utils$preview_test_config(test_config))
+```
+
+Run the full documentation test suite and upload results to the ValidMind Platform:
+
+```{r}
+full_suite <- vm_r$run_documentation_tests(config = test_config)
+```
+
+## Next steps
+
+Head to the [ValidMind Platform](https://app.prod.validmind.ai) to view the generated documentation:
+
+1. Navigate to **Inventory** and select your model.
+2. Click **Development** under Documents to see the full draft of your model documentation.
+
+From there, you can make qualitative edits, collaborate with validators, and submit for approval.
+
+---
+
+*Copyright 2023-2026 ValidMind Inc. All rights reserved.*
+*Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.*
+*SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial*
diff --git a/site/notebooks/quickstart/quickstart_model_documentation.ipynb b/site/notebooks/quickstart/quickstart_model_documentation.ipynb
index e7c28cf33..40287aa57 100644
--- a/site/notebooks/quickstart/quickstart_model_documentation.ipynb
+++ b/site/notebooks/quickstart/quickstart_model_documentation.ipynb
@@ -184,7 +184,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/quickstart/quickstart_model_validation.Rmd b/site/notebooks/quickstart/quickstart_model_validation.Rmd
new file mode 100644
index 000000000..547aaf29d
--- /dev/null
+++ b/site/notebooks/quickstart/quickstart_model_validation.Rmd
@@ -0,0 +1,335 @@
+---
+title: "Quickstart for Model Validation (R)"
+author: "ValidMind"
+date: "2026-03-18"
+output: html_document
+---
+
+# Quickstart for Model Validation
+
+Learn the basics of using ValidMind to validate models as part of a model validation workflow using R. This notebook uses the ValidMind R package (a `reticulate` wrapper around the Python library) to generate a draft of a validation report for a binary classification model.
+
+We will:
+
+1. Import a sample dataset and preprocess it, then initialize datasets for use with ValidMind
+2. Independently verify data quality tests performed on datasets by model development
+3. Train a champion model for evaluation
+4. Run model evaluation tests with the ValidMind Library
+
+## Setting up
+
+The Python path is auto-configured via the `VALIDMIND_PYTHON` environment variable.
+If not set, it falls back to the system Python. For local development, create a
+`.Renviron` file in the project root with `VALIDMIND_PYTHON=.venv/bin/python`.
+
+```{r setup, include=FALSE}
+library(reticulate)
+
+python_version <- Sys.getenv("VALIDMIND_PYTHON", Sys.which("python"))
+if (nchar(python_version) > 0 && !startsWith(python_version, "/")) {
+ python_version <- file.path(getwd(), python_version)
+}
+use_python(python_version, required = TRUE)
+
+library(validmind)
+library(dplyr)
+library(caTools)
+library(knitr)
+
+knitr::opts_chunk$set(warning = FALSE, message = FALSE)
+```
+
+## Initialize the ValidMind Library
+
+Log in to the [ValidMind Platform](https://app.prod.validmind.ai) and register a model:
+
+1. Navigate to **Inventory** and click **+ Register Model**.
+2. Assign yourself as a **Validator** (remove yourself from Owner and Developer roles).
+3. Under **Documents > Validation**, select the `Generic Validation Report` template.
+4. Go to **Getting Started**, select `Validation` from the **DOCUMENT** drop-down, and copy the code snippet.
+
+Replace the placeholder values below with your own credentials:
+
+```{r}
+vm_r <- vm(
+ api_host = "https://app.prod.validmind.ai/api/v1/tracking",
+ api_key = "",
+ api_secret = "",
+ model = "",
+ document = "validation-report"
+)
+```
+
+## Preview the validation report template
+
+Verify the connection and see the validation report structure:
+
+```{r}
+py_print(vm_r$preview_template())
+```
+
+## Identify available tests
+
+List the tasks and tags available in the ValidMind test library:
+
+```{r}
+vm_r$tests$list_tasks_and_tags()
+```
+
+List all data quality tests for classification:
+
+```{r}
+vm_r$tests$list_tests(tags = list("data_quality"), task = "classification")
+```
+
+## Load the demo dataset
+
+We use the Bank Customer Churn dataset for this demonstration:
+
+```{r}
+customer_churn <- reticulate::import(
+ "validmind.datasets.classification.customer_churn"
+)
+
+cat(sprintf(
+ paste0(
+ "Loaded demo dataset with:\n\n\t- Target column: '%s'",
+ "\n\t- Class labels: %s\n"
+ ),
+ customer_churn$target_column,
+ paste(
+ names(customer_churn$class_labels),
+ customer_churn$class_labels,
+ sep = ": ", collapse = ", "
+ )
+))
+
+data <- customer_churn$load_data()
+head(data)
+```
+
+## Preprocess the raw dataset
+
+Handle categorical variables using one-hot encoding and remove unnecessary columns:
+
+```{r}
+# load_data() already drops RowNumber, CustomerId, Surname
+# One-hot encode categorical variables
+geo_dummies <- model.matrix(~ Geography - 1, data = data)
+gender_dummies <- model.matrix(~ Gender - 1, data = data)
+data_processed <- data %>% select(-Geography, -Gender)
+data_processed <- cbind(data_processed, geo_dummies, gender_dummies)
+```
+
+### Split the dataset
+
+Split into training (60%), validation (20%), and test (20%) sets:
+
+```{r}
+set.seed(42)
+
+# First split: 80% train+validation, 20% test
+target_col <- customer_churn$target_column
+split1 <- sample.split(data_processed[[target_col]], SplitRatio = 0.8)
+train_val_data <- subset(data_processed, split1 == TRUE)
+test_data <- subset(data_processed, split1 == FALSE)
+
+# Second split: 75% train, 25% validation (of the 80% = 60/20 overall)
+split2 <- sample.split(train_val_data[[target_col]], SplitRatio = 0.75)
+train_data <- subset(train_val_data, split2 == TRUE)
+validation_data <- subset(train_val_data, split2 == FALSE)
+```
+
+### Separate features and targets
+
+```{r}
+x_train <- train_data %>% select(-all_of(target_col))
+y_train <- train_data[[target_col]]
+```
+
+## Initialize the ValidMind datasets
+
+```{r}
+vm_raw_dataset <- vm_r$init_dataset(
+ dataset = data,
+ input_id = "raw_dataset",
+ target_column = customer_churn$target_column,
+ class_labels = customer_churn$class_labels
+)
+
+vm_train_ds <- vm_r$init_dataset(
+ dataset = train_data,
+ input_id = "train_dataset",
+ target_column = customer_churn$target_column
+)
+
+vm_validation_ds <- vm_r$init_dataset(
+ dataset = validation_data,
+ input_id = "validation_dataset",
+ target_column = customer_churn$target_column
+)
+
+vm_test_ds <- vm_r$init_dataset(
+ dataset = test_data,
+ input_id = "test_dataset",
+ target_column = customer_churn$target_column
+)
+```
+
+## Run data quality tests
+
+### Run an individual data quality test
+
+Run the ClassImbalance test on the raw dataset and log it to the platform:
+
+```{r}
+vm_r$tests$run_test(
+ test_id = "validmind.data_validation.ClassImbalance",
+ inputs = list(dataset = vm_raw_dataset)
+)$log()
+```
+
+### Run data comparison tests
+
+Compare class imbalance across dataset splits:
+
+```{r}
+comparison_tests <- list(
+ "validmind.data_validation.ClassImbalance:train_vs_validation" = list(
+ input_grid = list(dataset = list("train_dataset", "validation_dataset"))
+ ),
+ "validmind.data_validation.ClassImbalance:train_vs_test" = list(
+ input_grid = list(dataset = list("train_dataset", "test_dataset"))
+ )
+)
+
+for (test_name in names(comparison_tests)) {
+ cat(paste0("Running: ", test_name, "\n"))
+ config <- comparison_tests[[test_name]]
+ tryCatch({
+ vm_r$tests$run_test(
+ test_name,
+ input_grid = config$input_grid
+ )$log()
+ }, error = function(e) {
+ cat(paste0("Error running test ", test_name, ": ", e$message, "\n"))
+ })
+}
+```
+
+## Train the champion model
+
+Train a logistic regression (GLM) to serve as the champion model:
+
+```{r}
+formula <- as.formula(paste(target_col, "~ ."))
+model <- glm(formula, data = train_data, family = binomial)
+summary(model)
+```
+
+## Initialize the model object
+
+Save the R model and initialize it with ValidMind:
+
+```{r}
+model_path <- save_model(model)
+
+vm_xgboost <- vm_r$init_r_model(
+ model_path = model_path,
+ input_id = "xgboost_champion"
+)
+```
+
+### Assign predictions
+
+Link model predictions to the training and testing datasets:
+
+```{r}
+vm_train_ds$assign_predictions(model = vm_xgboost)
+vm_test_ds$assign_predictions(model = vm_xgboost)
+```
+
+## Run model evaluation tests
+
+### Run model performance tests
+
+List available model performance tests:
+
+```{r}
+vm_r$tests$list_tests(tags = list("model_performance"), task = "classification")
+```
+
+Run and log performance tests:
+
+```{r}
+performance_tests <- c(
+ "validmind.model_validation.sklearn.ClassifierPerformance:xgboost_champion",
+ "validmind.model_validation.sklearn.ConfusionMatrix:xgboost_champion",
+ "validmind.model_validation.sklearn.ROCCurve:xgboost_champion"
+)
+
+for (test in performance_tests) {
+ cat(paste0("Running: ", test, "\n"))
+ vm_r$tests$run_test(
+ test,
+ inputs = list(dataset = vm_test_ds, model = vm_xgboost)
+ )$log()
+}
+```
+
+### Run diagnostic tests
+
+Assess the model for overfitting:
+
+```{r}
+vm_r$tests$run_test(
+ test_id = paste0(
+ "validmind.model_validation.sklearn.OverfitDiagnosis",
+ ":xgboost_champion"
+ ),
+ input_grid = list(
+ datasets = list(list(vm_train_ds, vm_test_ds)),
+ model = list(vm_xgboost)
+ )
+)$log()
+```
+
+Test robustness:
+
+```{r}
+vm_r$tests$run_test(
+ test_id = paste0(
+ "validmind.model_validation.sklearn.RobustnessDiagnosis",
+ ":xgboost_champion"
+ ),
+ input_grid = list(
+ datasets = list(list(vm_train_ds, vm_test_ds)),
+ model = list(vm_xgboost)
+ )
+)$log()
+```
+
+### Run feature importance tests
+
+Note: `PermutationFeatureImportance` and `SHAPGlobalImportance` are not supported for R models.
+
+```{r}
+vm_r$tests$run_test(
+ "validmind.model_validation.FeaturesAUC:xgboost_champion",
+ inputs = list(dataset = vm_test_ds, model = vm_xgboost)
+)$log()
+```
+
+## Next steps
+
+Head to the [ValidMind Platform](https://app.prod.validmind.ai) to view the validation report:
+
+1. Navigate to **Inventory** and select your model.
+2. Click **Validation** under Documents.
+3. Include your logged test results as evidence, create risk assessment notes, and assess compliance.
+
+---
+
+*Copyright 2023-2026 ValidMind Inc. All rights reserved.*
+*Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.*
+*SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial*
diff --git a/site/notebooks/quickstart/quickstart_model_validation.ipynb b/site/notebooks/quickstart/quickstart_model_validation.ipynb
index 640e64015..63e17f2a8 100644
--- a/site/notebooks/quickstart/quickstart_model_validation.ipynb
+++ b/site/notebooks/quickstart/quickstart_model_validation.ipynb
@@ -259,7 +259,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/tutorials/model_development/1-set_up_validmind.ipynb b/site/notebooks/tutorials/model_development/1-set_up_validmind.ipynb
index f82f57eaa..4244924b9 100644
--- a/site/notebooks/tutorials/model_development/1-set_up_validmind.ipynb
+++ b/site/notebooks/tutorials/model_development/1-set_up_validmind.ipynb
@@ -171,7 +171,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/tutorials/model_validation/1-set_up_validmind_for_validation.ipynb b/site/notebooks/tutorials/model_validation/1-set_up_validmind_for_validation.ipynb
index c5dc1fb39..05ad11c2c 100644
--- a/site/notebooks/tutorials/model_validation/1-set_up_validmind_for_validation.ipynb
+++ b/site/notebooks/tutorials/model_validation/1-set_up_validmind_for_validation.ipynb
@@ -261,7 +261,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/use_cases/agents/document_agentic_ai.ipynb b/site/notebooks/use_cases/agents/document_agentic_ai.ipynb
index 89e815221..3c3b6817b 100644
--- a/site/notebooks/use_cases/agents/document_agentic_ai.ipynb
+++ b/site/notebooks/use_cases/agents/document_agentic_ai.ipynb
@@ -194,7 +194,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.9 <= x <= 3.11
\n",
+ "Python 3.9 <= x <= 3.14\n",
"\n",
"Let's begin by installing the ValidMind Library with large language model (LLM) support:"
]
diff --git a/site/notebooks/use_cases/model_validation/validate_application_scorecard.ipynb b/site/notebooks/use_cases/model_validation/validate_application_scorecard.ipynb
index f3df8a617..7857d42e0 100644
--- a/site/notebooks/use_cases/model_validation/validate_application_scorecard.ipynb
+++ b/site/notebooks/use_cases/model_validation/validate_application_scorecard.ipynb
@@ -247,7 +247,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]
diff --git a/site/notebooks/use_cases/nlp_and_llm/rag_benchmark_demo.ipynb b/site/notebooks/use_cases/nlp_and_llm/rag_benchmark_demo.ipynb
index 1e3eb07b6..1b56fa1b0 100644
--- a/site/notebooks/use_cases/nlp_and_llm/rag_benchmark_demo.ipynb
+++ b/site/notebooks/use_cases/nlp_and_llm/rag_benchmark_demo.ipynb
@@ -159,7 +159,7 @@
"\n",
"Recommended Python versions\n",
"
\n",
- "Python 3.8 <= x <= 3.11
\n",
+ "Python 3.8 <= x <= 3.14\n",
"\n",
"To install the library:"
]