This project provides a P-DAT-Bit model for quark–gluon classification. Built upon the Particle Dual Attention Transformer (P-DAT), it replaces the linear layers in all four attention blocks with BitLinear layers to explore quantized weight representations. P-DAT-Bit preserves the original structure of two Particle Attention blocks and two Channel Attention blocks, ensuring both local (particle-level) and global (jet-level) attention mechanisms.
-
Feature Extractor
- Maps per-particle inputs (e.g., (\log E), (\log p_T), PID, etc.) into a higher-dimensional space.
- Employs edge convolutions and Conv2D layers, aggregating local neighbor information for each particle.
-
Particle Attention Block
- Performs multi-head self-attention across particles.
- Utilizes physics-inspired pairwise particle interaction matrices (like (\Delta R), (k_T), etc.) as a bias term in the attention module.
- Captures local relationships among constituents.
-
Channel Attention Block
- Treats each feature channel as a “global token” and performs attention among channels.
- Integrates jet-level interaction matrices (ratios of total momentum, energy fractions, etc.) as bias in the channel attention calculation.
- Captures global context of the entire jet.
-
BitLinear Replacement
- All linear layers inside the Particle and Channel Attention blocks are replaced by BitLinear158b layers.
- Reduces numeric precision for weights, striking a balance between efficiency and classification accuracy.
- To revert to the original full-precision model, simply replace all BitLinear158b layers with nn.Linear layers.
-
Classification Head
- After the attention blocks, features are concatenated and passed through a 1D convolution, global pooling, and an MLP (with final softmax) for binary classification (quark vs. gluon).
-
QG Dataset
- Generated by Pythia8, with jets clustered by anti-(k_T) ((R=0.4)), and selection (p_T \in [500,550]) GeV, (|y|!<!1.7).
- Official split: 1.6 M training jets, 200 k validation, 200 k test.
-
Particle-Level Features
- For each jet, keep up to 100 highest-(p_T) constituents; zero-pad if fewer.
- Typical features: (\log E), (\log p_T), (p_T/E) fractions, (\Delta \eta), (\Delta \phi), (\Delta R), PID.
-
Interaction Matrices
- Particle Interaction Matrix: For any two particles ((a, b)), compute e.g.\ (\Delta R), (m^2), (z), etc., often taking logarithms to avoid long tails.
- Jet-Level Matrix: Captures global properties (total energy, sum of fractions, primary PID, etc.) with pairwise ratios to highlight large-scale patterns.
-
Chunk Loading
- Large data (particle pairs, jet interactions) can exceed memory if loaded at once.
- This project uses a batch-wise (“chunk”) loading process, loading only a portion of data each step, then releasing memory before proceeding.
-
Data Preparation Notebook
- The file quark-gluon_data_preparation.ipynb contains all preprocessing code used to convert the raw .npz files into model-ready files. For full preprocessing details and physics motivations, please refer to the original publication.
-
Training
python3 qgtrain.py
- Configure hyperparameters (learning rate, batch size, etc.) as needed.
- Trains the model on the QG dataset, periodically checking validation metrics.
-
Testing
python3 qgtest.py
- Loads the trained weights to evaluate on the test set.
- P-DAT: M. He et al., Quark/gluon discrimination and top tagging with dual attention transformer, Eur. Phys. J. C 83:1116 (2023).
- Dataset: P.T. Komiske et al., Energy flow networks: deep sets for particle jets, JHEP 01:121 (2019).