Skip to content

WISPR-lab/data-exports-tfa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"Hidden in Plain Bytes" Source Code

This repository contains code and documentation for processing data exports from major online platforms, used in the research paper:

Julia Nonnenkamp, Naman Gupta, Abhimanyu Dev Gupta, and Rahul Chatterjee. 2025. Hidden in Plain Bytes: Investigating Interpersonal Account Compromise with Data Exports. In Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25). ACM, Taipei, Taiwan, 1–14 (October 13–17, 2025). DOI: 10.1145/3719027.3765147.

Overview

This repository processes 12 data exports obtained under "right of access" requests (GDPR/CCPA) from 6 major online platforms: Apple, Discord, Facebook, Google, Instagram, and Snapchat. The data was collected to investigate and identify interpersonal account compromise patterns using real-world data exports.

Dataset Access

The processed dataset is available on Zenodo with controlled access due to the sensitive nature of Tech-Facilitated Abuse (TFA) research. Access is restricted to prevent misuse.

To access the data:

  1. Request access via the Zenodo repository
  2. Download the dataset files: sam_january_cleaned.zip and sam_february_cleaned.zip
  3. Extract both ZIP files recursively in your working directory before running the transform.zip notebook.

Environment Setup

Prerequisites

  • Python >=3.12
  • Git

Installation

  1. Clone this repository:

    git clone https://github.com/WISPR-lab/data-exports-tfa.git
    cd data-exports-tfa
  2. Create and activate a virtual environment:

    python -m venv .venv313
    source .venv313/bin/activate  # on Windows: .venv313\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

Quick Start

  1. Prepare your data: Ensure the dataset ZIP files are extracted in your working directory
  2. Open the analysis notebook: Launch transform.ipynb in Jupyter or VS Code
  3. Configure paths: Follow the instructions in the notebook to set your data input and output paths
  4. Run the analysis: Execute the notebook cells to reproduce the data transformation and analysis

Notebook Structure

The main analysis is contained in transform.ipynb, which:

  • Parses raw data exports from multiple platforms
  • Transforms nested JSON/CSV structures into flattened DataFrames
  • Groups similar data elements across platforms according to characteristic features (see Section 3 or )

Supporting Scripts

The scripts/ directory contains modular utilities:

  • parse.py: Core parsing and data transformation logic
  • group_utils.py: Functions for clustering and grouping similar data elements
  • timeutils.py: Time-related parsing utilities

For documentation of parser utilities and data structures in an extended appendix, see appendix.md.

License

  • Code: Released under the MIT License
  • Paper: Protected under Creative Commons Non-Commercial ShareAlike license
  • Dataset: Available under CC International license with controlled access

Acknowledgments

This research was conducted at the University of Wisconsin–Madison.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors