Skip to content

Creating soccer_rl_preprocessing file#79

Closed
akshatgarg06 wants to merge 3 commits intomasterfrom
akshatgarg06-2
Closed

Creating soccer_rl_preprocessing file#79
akshatgarg06 wants to merge 3 commits intomasterfrom
akshatgarg06-2

Conversation

@akshatgarg06
Copy link
Copy Markdown
Collaborator

It takes existing SAR output files (events.jsonl) and converts them into a clean, model-ready RL dataset that can be used by both DQN and QMIX.

What it does

  • Reads SAR sequences from one file or a folder of events.jsonl.
  • Creates deterministic train/val/test splits (seed-based, by game_id or sequence_id).
  • Uses 10 attacking outfield players in fixed order (player_id sorted) for consistent multi-agent tensors.
  • Converts action strings to action IDs (default mapping provided, customizable via JSON).
  • Infers onball_mask (from explicit on-ball action or nearest player-to-ball fallback).
  • Builds per-timestep observation vectors from player + ball position/velocity features.
  • Pads/truncates to fixed sequence length and creates mask and done.

Why this helps
Gives one shared, reproducible dataset format for fair DQN vs QMIX comparison.
Keeps RL tensorization logic separate from existing SAR processing code, so current pipelines are not affected.

@github-actions
Copy link
Copy Markdown

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.10

@github-actions
Copy link
Copy Markdown

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.8

@github-actions
Copy link
Copy Markdown

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.9

Copy link
Copy Markdown
Collaborator

@kenjiro-mk kenjiro-mk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this converter—this script looks like it takes SAR outputs (events.jsonl) and builds an RL-ready dataset. Since SAR itself is already part of the RL preprocessing pipeline, the filename soccer_rl_preprocessing feels a bit ambiguous.

Would you consider renaming it to reflect the SAR → RL-dataset step more explicitly?

Also, since this is a Python script (argparse + __main__), adding the .py extension would help IDEs/tools recognize it properly.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.10

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.8

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.9

@akshatgarg06
Copy link
Copy Markdown
Collaborator Author

Thank you for your valuable feedback!
The file is now changed to soccer_sar_to_rl_dataset.py, and nothing has been changed in the whole code file.

@akshatgarg06 akshatgarg06 requested a review from kenjiro-mk March 4, 2026 05:59
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.10

@akshatgarg06
Copy link
Copy Markdown
Collaborator Author

@kenjiro-mk
After reviewing the READ.ME feedback, I decided to rename one of the descriptions.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.8

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.9

Copy link
Copy Markdown
Collaborator

@kenjiro-mk kenjiro-mk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request:
I’d like preprocessing/sports/SAR_data/soccer/soccer_sar_to_rl_dataset.py to be runnable from SAR_data(...).preprocess_data() in the same way as the existing SAR pipeline. The switch should be done via preprocess_method.

Proposal (implementation):

  • Extend the branching logic in Soccer_SAR_data.preprocess_data() by adding a new case for preprocess_method == "SAR2RL".
    • SAR: keep the current SAR behavior as-is.
    • SAR2RL: call soccer_sar_to_rl_dataset.build_rl_datasets_from_sar_events(...) to generate the RL dataset.

Design points (review focus):

  1. Unified entry point
  • This allows users to run it as SAR_data(..., preprocess_method="SAR2RL").preprocess_data(...), which matches the “same as others” usage pattern and works cleanly after pip installation.
  1. Clarify input/output behavior
  • For SAR2RL, use preprocessed_dir (existing argument / existing defaults) as the input root, and rely on the converter’s **/events.jsonl globbing behavior.
  • Write outputs to a dedicated subdirectory such as output_dir = <preprocessed_dir>/rl_dataset (or an equivalent deterministic subfolder) to avoid collisions with SAR outputs and prevent accidental overwrites.
  1. Minimize imports / avoid pulling heavy dependencies
  • soccer_sar_to_rl_dataset.py is designed to be lightweight (only reading precomputed events.jsonl).
    Therefore, it’s preferable to import it locally inside the SAR2RL branch, so running SAR2RL does not unnecessarily import the full SAR preprocessing stack.
  1. Tests and usage examples
  • Adding a smoke test for preprocess_method="SAR2RL" (a tiny dummy events.jsonl is sufficient) would improve confidence.
  • Adding one usage example in the README showing preprocess_method="SAR2RL" would help users avoid confusion.

@akshatgarg06
Copy link
Copy Markdown
Collaborator Author

@kenjiro-mk
Understood. “Superseded by #82 (includes SAR2RL integration in soccer_SAR_class.py).”
All the requested changes and discussions will be redirected.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants