Skip to content

Pre-trained HF Aggregation: Integration testing #9

@jlunder00

Description

@jlunder00

Task 7: Integration Testing

Depends on: All previous issues (#3-#8)

Smoke Tests

Run each condition for 2-3 batches to verify the full pipeline works end-to-end.

Test 1: Condition B embedding (simplest, no GNN)

./run_pipeline.sh pretrained_noprop_embedding snli "" --primary-batches 3 --finetune-batches 3

Expected: primary (3 batches) → finetune (3 batches) → eval completes.

Test 2: Condition D embedding (full pipeline)

./run_pipeline.sh pretrained_tree_embedding snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3

Expected: contrastive → primary → finetune → eval completes.

Test 3: Condition E embedding (frozen transformer)

./run_pipeline.sh pretrained_tree_frozen_xfmr_embedding snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3

Expected: Same as D. Verify in logs that "Froze pretrained transformer" appears.

Test 4: Condition F embedding (frozen GNN)

./run_pipeline.sh pretrained_tree_frozen_gnn_embedding snli contrastive \
  --contrastive-checkpoint /home/jlunder/temp_temp_storage/infonce_wikiqs_20260201_234850/checkpoints/best_model.pt \
  --primary-batches 3 --finetune-batches 3

Expected: Logs show missing/unexpected keys (architecture mismatch), then primary → finetune → eval completes.

Test 5: Condition A embedding (text mode)

./run_pipeline.sh pretrained_text_embedding snli "" --primary-batches 3 --finetune-batches 3

Expected: Loads HF tokenizer, runs primary → finetune → eval on text data.

Test 6: One matching variant

./run_pipeline.sh pretrained_tree_matching snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3

Expected: Same as D but with matching paradigm.

Freeze Verification

After smoke tests, add a quick check that frozen params actually stay frozen:

# Add temporarily to train_unified.py after model creation for Condition E:
if hasattr(model, '_aggregator'):
    agg = model._aggregator
elif hasattr(model, 'gmn'):
    agg = model.gmn._aggregator
for name, param in agg.named_parameters():
    if name.startswith('encoder.') and param.requires_grad:
        print(f"ERROR: {name} should be frozen!")
        break
else:
    print("Freeze verification: PASS")

VRAM Check

Monitor GPU memory during Condition D smoke test to confirm the model fits:

watch -n 1 nvidia-smi

Expected: <20GB peak for all-MiniLM-L6-v2 (22M params) + prop_heavy GNN (~14.6M) at batch_size=256.

Important Note

Do NOT run integration tests while other training pipelines are using the GPU. Wait for current runs to complete, or use --primary-batch-size 32 to reduce VRAM.

Also: You must pip install -e . from the branch before testing if you've made code changes. This will affect any currently running pipeline stages — only do this when no training is in progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions