Noetik: Interpretable RL for Video Recommendations

RL environment for video recommendation using the KuaiRand dataset with offline IQL training and gradient-based interpretability.

Dataset

KuaiRand dataset:

1,000 users
4.3M videos
11.7M interactions (full dataset)

Source: https://github.com/chongminggao/KuaiRand

Goals

Use social media interaction data to understand how people engage with content and what keeps them returning.
Analyze user behavior to uncover patterns of attention, preference, and emotional response in digital environments.
Model engagement as a reinforcement process to explore how social media platforms shape user habits.
Build a framework that can extend to other domains, such as psychometric or behavioral datasets, for broader studies of human decision-making.

Model Overview & Architecture

Overview

Environment: Based on the KuaiRand dataset, representing realistic user–video interactions on social media platforms.
Agent: A reinforcement learning agent trained to model and explain user engagement behavior.
Policy Network: Transformer-based architecture with attention to capture how past interactions influence current content choices.
Interpretability Layer: Translates model reasoning into human-understandable explanations showing why certain actions or recommendations occur.
Trainer: Uses a multi-objective optimization that balances engagement modeling with interpretability and transparency.

Architecture

Novelty

Treats social media interaction as a reinforcement learning process, simulating how users learn and adapt through feedback.
Embeds interpretability directly into the RL design, allowing insight into both model and human decision processes.
Uses attention not just for prediction accuracy but as a tool for behavioral interpretation.
Shifts the focus from maximizing engagement to understanding the cognitive and emotional mechanisms behind it.
Provides a foundation that can generalize to psychometric and behavioral research beyond social media.

Current Dataset:

KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos

https://github.com/chongminggao/KuaiRand

The following figure gives an example of the dataset. It illustrates a user interaction sequence along with the user's rich feedback signals.

These feedback signals are collected from the two main user interfaces (UI) in the Kuaishou APP shown as follows:

Quick Start

source kuairand_env/bin/activate
python demo.py

Offline IQL Training

# 1. Create train/test split (80/20)
python create_train_test_split.py

# 2. Train model with reward normalization
python train_offline_improved.py

# 3. Evaluate on test set
python evaluate_improved_model.py

Project Structure

src/
├── data_loader.py              # Dataset loader
├── environment.py              # Gymnasium environment
├── iql/                        # IQL implementation
└── training/                   # Offline training utilities

create_train_test_split.py      # Data splitting
train_offline_improved.py       # Offline IQL trainer
evaluate_improved_model.py      # Model evaluation
interpret.py                     # Gradient-based interpretability

Environment

State: 128-dimensional vector (user embedding + history + context)
Action: Discrete (video recommendations)
Reward: 0.5 * click + 0.5 * watch_ratio
Episode: 10 steps

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Notebooks		Notebooks
assets		assets
docs		docs
notebooks		notebooks
outputs		outputs
processed_data		processed_data
src		src
wandb		wandb
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
create_medium_dataset.py		create_medium_dataset.py
create_train_test_split.py		create_train_test_split.py
demo.py		demo.py
evaluate_improved_model.py		evaluate_improved_model.py
evaluate_medium_model.py		evaluate_medium_model.py
interpret.py		interpret.py
requirements.txt		requirements.txt
smoke_test.py		smoke_test.py
test_attention.py		test_attention.py
test_env.py		test_env.py
test_interpretability.py		test_interpretability.py
train.py		train.py
train_dataset.py		train_dataset.py
train_medium_dataset.py		train_medium_dataset.py
train_offline_improved.py		train_offline_improved.py
train_wandb.py		train_wandb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Noetik: Interpretable RL for Video Recommendations

Dataset

Goals

Model Overview & Architecture

Overview

Architecture

Novelty

Current Dataset:

Quick Start

Offline IQL Training

Project Structure

Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Noetik: Interpretable RL for Video Recommendations

Dataset

Goals

Model Overview & Architecture

Overview

Architecture

Novelty

Current Dataset:

Quick Start

Offline IQL Training

Project Structure

Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages