Skip to content

SojaSurfer/BASICParser

Repository files navigation

BASICParser

Python version

This repository contains scripts for decoding Commodore 64 BASIC source files and converting them into lexical tokens in ASCII format. The parser includes a syntax tagger that classifies tokens according to their functional properties in the BASIC language. The tagset used for classification is available in two formats: as a human-readable markdown table and as a machine-readable JSON file.

Features

  • Stateful parsing of tokenized C64 BASIC files
  • PETSCII to ASCII conversion with proper encoding handling
  • Syntax tagging system for token classification
  • Context-aware disambiguation (variables, operators, commands)
  • Token chunking for multi-byte sequences (e.g., <=, variable names)
  • Assembly detection in DATA statements
  • Excel export for token analysis

Getting Started

Prerequisites

  • Python 3.12+
  • Required packages: pandas, tqdm, openpyxl

Example Use

Run the parser with the main.py script to decode files in the examples directory:

python main.py

This will:

  • Read tokenized BASIC files from examples/encoded/
  • Save decoded ASCII files to examples/decoded/ (with .bas extension)
  • Generate token analysis tables in examples/tables/ (Excel format)

Citation

@software{wagner2025basicparser,
  author = {Wagner, Julian Severin},
  title = {BASICParser},
  version = {1.0},
  year = {2025},
  url = {https://github.com/SojaSurfer/BASICParser},
  note = {Software}
}

About

BASICParser is a Python tool for parsing tokenized Commodore 64 BASIC source files into ASCII format with comprehensive syntax tagging and lexical analysis capabilities.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages