From 06e6124f43c84a664e137facf8f64d13d7df4f3e Mon Sep 17 00:00:00 2001
From: Akshat Garg <garg.akshat@g.sp.m.is.nagoya-u.ac.jp>
Date: Fri, 27 Feb 2026 15:59:58 +0530
Subject: [PATCH 1/2] Update README with RL-ready datasets information

Added section on RL-ready datasets for DQN and QMIX.
---
 preprocessing/sports/SAR_data/soccer/README.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/preprocessing/sports/SAR_data/soccer/README.md b/preprocessing/sports/SAR_data/soccer/README.md
index 241063a..7f07eda 100644
--- a/preprocessing/sports/SAR_data/soccer/README.md
+++ b/preprocessing/sports/SAR_data/soccer/README.md
@@ -25,3 +25,17 @@ Here are some examples of how to download and preprocess data:
 - **StatsBomb and SkillCorner Data:**
   - [Read the Docs Example](https://openstarlab.readthedocs.io/en/latest/Pre_Processing/Sports/SAR_data/Example/Soccer/Example_2/contents.html)
   - [Example Config File](https://github.com/open-starlab/PreProcessing/blob/master/example/config/statsbomb_skillcorner/preprocessing_statsbomb_skillcorner2024.json)
+    
+## RL-ready datasets (DQN / QMIX)
+If you are training RL models such as DQN (single-agent) and QMIX (multi-agent), you can convert the SAR `events.jsonl`
+outputs into padded tensors with consistent action tokenization and train/val/test splits.
+
+This produces a single shared multi-agent dataset with:
+- `observation`: `(B, T, N, O)` (N=10 attackers)
+- `action`: `(B, T, N)` (discrete action ids; default vocab size 16 with `PAD=15`)
+- `reward`, `done`, `mask`: `(B, T)`
+- `onball_mask`: `(B, T, N)` (for masking unavailable actions)
+
+Notes:
+- For DQN, you can flatten the agent dimension `N` into the batch dimension at load time.
+- For QMIX, consume the tensors as-is.

From c36ea251b8883494f3e030420b17e47f8ea172a4 Mon Sep 17 00:00:00 2001
From: Akshat Garg <garg.akshat@g.sp.m.is.nagoya-u.ac.jp>
Date: Wed, 4 Mar 2026 11:41:23 +0530
Subject: [PATCH 2/2] Update README with SAR-to-RL dataset conversion details
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Good catch — “RL-ready” is not a formal pipeline name in this repo.
I used it informally to mean “formatted so RL models can consume it directly” (fixed tensor shapes, padding, masks, splits, and action ids).
To make this explicit, I renamed the wording to “SAR-to-RL Dataset Conversion (DQN / QMIX)” in the docs and script description, so it reflects the actual step: converting SAR events.jsonl into model input datasets.
---
 preprocessing/sports/SAR_data/soccer/README.md | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/preprocessing/sports/SAR_data/soccer/README.md b/preprocessing/sports/SAR_data/soccer/README.md
index 7f07eda..1e65562 100644
--- a/preprocessing/sports/SAR_data/soccer/README.md
+++ b/preprocessing/sports/SAR_data/soccer/README.md
@@ -25,17 +25,14 @@ Here are some examples of how to download and preprocess data:
 - **StatsBomb and SkillCorner Data:**
   - [Read the Docs Example](https://openstarlab.readthedocs.io/en/latest/Pre_Processing/Sports/SAR_data/Example/Soccer/Example_2/contents.html)
   - [Example Config File](https://github.com/open-starlab/PreProcessing/blob/master/example/config/statsbomb_skillcorner/preprocessing_statsbomb_skillcorner2024.json)
-    
-## RL-ready datasets (DQN / QMIX)
-If you are training RL models such as DQN (single-agent) and QMIX (multi-agent), you can convert the SAR `events.jsonl`
-outputs into padded tensors with consistent action tokenization and train/val/test splits.
+
+## SAR-to-RL Dataset Conversion (DQN / QMIX)
+This section describes a SAR-to-RL dataset conversion step that formats SAR outputs (`events.jsonl`) into tensors used by
+DQN and QMIX training. This is a preprocessing/data-format step, not a training algorithm.
+The conversion script is `soccer_sar_to_rl_dataset.py`.
 
 This produces a single shared multi-agent dataset with:
 - `observation`: `(B, T, N, O)` (N=10 attackers)
 - `action`: `(B, T, N)` (discrete action ids; default vocab size 16 with `PAD=15`)
 - `reward`, `done`, `mask`: `(B, T)`
 - `onball_mask`: `(B, T, N)` (for masking unavailable actions)
-
-Notes:
-- For DQN, you can flatten the agent dimension `N` into the batch dimension at load time.
-- For QMIX, consume the tensors as-is.