This directory contains the dataset and its processed versions used in the SSH Shell Attack Session project.
README.md: This file explains the structure and usage of the data (1.6 KB)features.txt: List of features used in the dataset (352 B)raw/: Contains the original dataset filesssh_attacks.parquet: Original dataset containing 230,000 SSH shell attack sessions (34.7 MB)
processed/: Contains preprocessed and feature-engineered dataBOW_DATASETS/: Bag-of-Words datasetsssh_attacks_bow.parquet: BOW representation of the dataset (2.2 MB)
TFIDF_DATASETS/: TF-IDF datasetsssh_attacks_tfidf.parquet: TF-IDF representation of the dataset (4.6 MB)
ssh_attacks_decoded.parquet: Decoded dataset (21.9 MB)
- File:
ssh_attacks.parquet- Format: Parquet
- Size: 34.7 MB
- Description: Original dataset containing 230,000 SSH shell attack sessions
- Purpose: Cleaned and feature-engineered datasets used in model training
- Contents:
BOW_DATASETS/: Bag-of-Words representationssh_attacks_bow.parquet: BOW features (2.2 MB)
TFIDF_DATASETS/: Term Frequency-Inverse Document Frequency representationssh_attacks_tfidf.parquet: TF-IDF features (4.6 MB)
ssh_attacks_decoded.parquet: Decoded dataset with processed features (21.9 MB)
- Raw data: 34.7 MB
- Processed data: 28.7 MB
- Total directory size: ~64 MB