"Hidden in Plain Bytes" Source Code

This repository contains code and documentation for processing data exports from major online platforms, used in the research paper:

Julia Nonnenkamp, Naman Gupta, Abhimanyu Dev Gupta, and Rahul Chatterjee. 2025. Hidden in Plain Bytes: Investigating Interpersonal Account Compromise with Data Exports. In Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25). ACM, Taipei, Taiwan, 1–14 (October 13–17, 2025). DOI: 10.1145/3719027.3765147.

Overview

This repository processes 12 data exports obtained under "right of access" requests (GDPR/CCPA) from 6 major online platforms: Apple, Discord, Facebook, Google, Instagram, and Snapchat. The data was collected to investigate and identify interpersonal account compromise patterns using real-world data exports.

Dataset Access

The processed dataset is available on Zenodo with controlled access due to the sensitive nature of Tech-Facilitated Abuse (TFA) research. Access is restricted to prevent misuse.

To access the data:

Request access via the Zenodo repository
Download the dataset files: sam_january_cleaned.zip and sam_february_cleaned.zip
Extract both ZIP files recursively in your working directory before running the transform.zip notebook.

Environment Setup

Prerequisites

Python >=3.12
Git

Installation

Clone this repository:

git clone https://github.com/WISPR-lab/data-exports-tfa.git
cd data-exports-tfa

Create and activate a virtual environment:

python -m venv .venv313
source .venv313/bin/activate  # on Windows: .venv313\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Quick Start

Prepare your data: Ensure the dataset ZIP files are extracted in your working directory
Open the analysis notebook: Launch transform.ipynb in Jupyter or VS Code
Configure paths: Follow the instructions in the notebook to set your data input and output paths
Run the analysis: Execute the notebook cells to reproduce the data transformation and analysis

Notebook Structure

The main analysis is contained in transform.ipynb, which:

Parses raw data exports from multiple platforms
Transforms nested JSON/CSV structures into flattened DataFrames
Groups similar data elements across platforms according to characteristic features (see Section 3 or )

Supporting Scripts

The scripts/ directory contains modular utilities:

parse.py: Core parsing and data transformation logic
group_utils.py: Functions for clustering and grouping similar data elements
timeutils.py: Time-related parsing utilities

For documentation of parser utilities and data structures in an extended appendix, see appendix.md.

License

Code: Released under the MIT License
Paper: Protected under Creative Commons Non-Commercial ShareAlike license
Dataset: Available under CC International license with controlled access

Acknowledgments

This research was conducted at the University of Wisconsin–Madison.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
.gitignore		.gitignore
README.md		README.md
appendix.md		appendix.md
requirements.txt		requirements.txt
transform.ipynb		transform.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"Hidden in Plain Bytes" Source Code

Overview

Dataset Access

Environment Setup

Prerequisites

Installation

Usage

Quick Start

Notebook Structure

Supporting Scripts

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

"Hidden in Plain Bytes" Source Code

Overview

Dataset Access

Environment Setup

Prerequisites

Installation

Usage

Quick Start

Notebook Structure

Supporting Scripts

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages