Skip to content
This repository was archived by the owner on Dec 22, 2025. It is now read-only.

ammons-datalabs/Evaluate-SCAIL

Repository files navigation

⚠️ This repository is archived and maintained as a reference implementation.

Evaluate-SCAIL

High-Performance Encrypted Deduplication with Segment Chunks & Index Locality

Evaluate-SCAIL is the full research and engineering implementation of SCAIL and P-SCAIL, two high-throughput encrypted deduplication systems designed to achieve petabyte-scale metadata processing on commodity hardware.
This repository provides the client, server, storage engine, sorted-index implementation, multiprocessing pipeline, and evaluation tools used to produce the published results.

SCAIL and P-SCAIL are detailed in the accompanying research:

  • P-SCAIL – metadata scalability and parallel SCI architecture
  • SCAIL NAS 2022 conference paper – segment-level client interface and metadata reduction
  • PhD Thesis – complete design, algorithms, and evaluation results

Overview

Evaluate-SCAIL provides an end-to-end encrypted deduplication pipeline:

  • Run and compare deduplication schemes: Base, Metadedup, SCAIL, P-SCAIL
  • Load and process real-world datasets (FSL, MS/UBC), Docker layers, VM images, and synthetic traces
  • Execute the full workflow: chunking → encryption → segment formation → client lookup → server-side deduplication → SCI update
  • Benchmark performance, memory, upload volume, and metadata overhead
  • Visualise disk I/O, throughput, and metadata trends

Key Results

  • Memory reduction up to 94 percent when using 2 MiB segments.
  • Approximately 57 GiB DRAM required for 1 PB of unique deduplicated data using 2 MiB segments.
  • Parallel throughput (16 processes):
    • LDLS: 273–434 GiB/s
    • SMPS: 6.9–10.0 GiB/s
  • Metadata savings up to 97 percent across long-running workloads.
  • Redundant segment uploads typically under 1 percent per backup for long-running workloads.

System Architecture

High-Level Pipeline

flowchart TD
    subgraph CLIENT1[" CLIENT - Phase 1: Prepare & Query "]
        A[Client Files] --> B[CDC Chunking]
        B --> C[MLE Encrypt Chunks]
        C --> D[Segment Formation]
        D --> E[Generate MFP Query]
    end

    subgraph SERVER1[" SERVER - Phase 1: Lookup "]
        F[Metachunk Lookup]
        G[Return Missing Segments List]
    end

    subgraph CLIENT2[" CLIENT - Phase 2: Upload "]
        H[Upload Missing Segments:<br/>Encrypted Chunks + PEMCs]
    end

    subgraph SERVER2[" SERVER - Phase 2: Deduplication & Storage "]
        I[SCI Pass: Sorted Chunk Index]
        I --> J[Container Allocation]
        J --> K[Index Updates]
    end

    E --> F
    F --> G
    G --> H
    H --> I

    style CLIENT1 fill:#2d3748,stroke:#4299e1,stroke-width:2px
    style SERVER1 fill:#2d3748,stroke:#48bb78,stroke-width:2px
    style CLIENT2 fill:#2d3748,stroke:#4299e1,stroke-width:2px
    style SERVER2 fill:#2d3748,stroke:#48bb78,stroke-width:2px
Loading

Repository Structure

src/
  client/
  server/
  repo/
  config/
  metrics/
  datasets/
  utilities/
papers/

Getting Started

Requirements

  • Python 3.9+
  • Cython
  • Ray
  • Redis (optional)

Installation

git clone https://github.com/ammons-datalabs/Evaluate-SCAIL.git
cd Evaluate-SCAIL

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python src/sync_cython_files.py

Running Experiments

python src/fsl_dedup.py
python src/ubc_dedup.py
python src/docker_dedup.py
python src/gen_dedup.py

Viewing Results

python src/metrics/plot.py
python src/logs_viewer.py
python src/utilities/build_results/build_results_to_latex.py

Testing

python -m unittest discover -s src/tests

Documentation

  • P-SCAIL: Proposed Journal Paper
  • SCAIL: Conference Paper NAS 2022
  • PhD Thesis

Author

Jaybe Ammons
PhD, Computer Science — Birkbeck, University of London

License

See repository for license details.

About

SCAIL/P-SCAIL: Petabyte-scale encrypted deduplication with segment chunks and sorted indexing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors