Pre-trained HF Aggregation: Integration testing

## Task 7: Integration Testing

**Depends on:** All previous issues (#3-#8)

### Smoke Tests

Run each condition for 2-3 batches to verify the full pipeline works end-to-end.

#### Test 1: Condition B embedding (simplest, no GNN)
```bash
./run_pipeline.sh pretrained_noprop_embedding snli "" --primary-batches 3 --finetune-batches 3
```
Expected: primary (3 batches) → finetune (3 batches) → eval completes.

#### Test 2: Condition D embedding (full pipeline)
```bash
./run_pipeline.sh pretrained_tree_embedding snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3
```
Expected: contrastive → primary → finetune → eval completes.

#### Test 3: Condition E embedding (frozen transformer)
```bash
./run_pipeline.sh pretrained_tree_frozen_xfmr_embedding snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3
```
Expected: Same as D. Verify in logs that "Froze pretrained transformer" appears.

#### Test 4: Condition F embedding (frozen GNN)
```bash
./run_pipeline.sh pretrained_tree_frozen_gnn_embedding snli contrastive \
  --contrastive-checkpoint /home/jlunder/temp_temp_storage/infonce_wikiqs_20260201_234850/checkpoints/best_model.pt \
  --primary-batches 3 --finetune-batches 3
```
Expected: Logs show missing/unexpected keys (architecture mismatch), then primary → finetune → eval completes.

#### Test 5: Condition A embedding (text mode)
```bash
./run_pipeline.sh pretrained_text_embedding snli "" --primary-batches 3 --finetune-batches 3
```
Expected: Loads HF tokenizer, runs primary → finetune → eval on text data.

#### Test 6: One matching variant
```bash
./run_pipeline.sh pretrained_tree_matching snli "" --contrastive-batches 3 --primary-batches 3 --finetune-batches 3
```
Expected: Same as D but with matching paradigm.

### Freeze Verification

After smoke tests, add a quick check that frozen params actually stay frozen:

```python
# Add temporarily to train_unified.py after model creation for Condition E:
if hasattr(model, '_aggregator'):
    agg = model._aggregator
elif hasattr(model, 'gmn'):
    agg = model.gmn._aggregator
for name, param in agg.named_parameters():
    if name.startswith('encoder.') and param.requires_grad:
        print(f"ERROR: {name} should be frozen!")
        break
else:
    print("Freeze verification: PASS")
```

### VRAM Check

Monitor GPU memory during Condition D smoke test to confirm the model fits:
```bash
watch -n 1 nvidia-smi
```
Expected: <20GB peak for all-MiniLM-L6-v2 (22M params) + prop_heavy GNN (~14.6M) at batch_size=256.

### Important Note

**Do NOT run integration tests while other training pipelines are using the GPU.** Wait for current runs to complete, or use `--primary-batch-size 32` to reduce VRAM.

**Also:** You must `pip install -e .` from the branch before testing if you've made code changes. This will affect any currently running pipeline stages — only do this when no training is in progress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained HF Aggregation: Integration testing #9

Task 7: Integration Testing

Smoke Tests

Test 1: Condition B embedding (simplest, no GNN)

Test 2: Condition D embedding (full pipeline)

Test 3: Condition E embedding (frozen transformer)

Test 4: Condition F embedding (frozen GNN)

Test 5: Condition A embedding (text mode)

Test 6: One matching variant

Freeze Verification

VRAM Check

Important Note

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pre-trained HF Aggregation: Integration testing #9

Description

Task 7: Integration Testing

Smoke Tests

Test 1: Condition B embedding (simplest, no GNN)

Test 2: Condition D embedding (full pipeline)

Test 3: Condition E embedding (frozen transformer)

Test 4: Condition F embedding (frozen GNN)

Test 5: Condition A embedding (text mode)

Test 6: One matching variant

Freeze Verification

VRAM Check

Important Note

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions