Skip to content

Latest commit

 

History

History
228 lines (170 loc) · 9.01 KB

File metadata and controls

228 lines (170 loc) · 9.01 KB

ML Classification Feature Assessment

📋 Current Requirements

  1. Mutually Exclusive Options:

    • Cloudiness filter (default) → categorizes pairs by clear/partly cloudy/mostly cloudy
    • ML classification → categorizes pairs by good/not_good
  2. When ML is Selected:

    • Graph in Section 6 should categorize by good/not_good (not cloudiness)
    • User can filter by including/excluding good pairs
    • User can filter by including/excluding not_good pairs
  3. Current Issue: ML option doesn't make a difference - graph still driven by cloudiness filter


🔍 Current Implementation Analysis

What Works

  1. Frontend Mutual Exclusivity (lines 2632-2675 in HTML):

    • ✅ When cloudiness is checked → ML is automatically unchecked
    • ✅ When ML is checked → cloudiness is automatically unchecked
    • ✅ UI correctly shows/hides appropriate parameters
  2. ML Classification Logic (lines 3294-3367 in Python):

    • ✅ ML classification runs when enable_ml_classification is True
    • ✅ Classifies individual images, then determines pair classification
    • ✅ Filter logic exists for include_good and include_not_good
    • ✅ ML classifications are computed on-the-fly during filtering
  3. Legend Updates (lines 3867-3899 in HTML):

    • ✅ Legend correctly shows "Good/Not_Good" when ML is enabled
    • ✅ Legend shows "Clear/Partly Cloudy/Mostly Cloudy" when cloudiness is enabled

What's Broken

Gap 1: Plot Colors Fallback to Cloudiness

Location: iss_speed_html_dashboard_v2_clean.py lines 2757-2796

Problem:

if ml_enabled and ml_class is not None:
    # Use ML classification colors
    ...
else:
    # FALLS BACK TO CLOUDINESS even if ML is enabled!
    if sample_match.get('image1_properties') and sample_match.get('image2_properties'):
        # Uses cloudiness classification

Issue: When ML is enabled but ml_classification is None, the code falls back to cloudiness instead of:

  • Computing ML classification if not done
  • Or showing an error/placeholder
  • Or using a different color scheme

Root Cause: ML classifications are computed during filtering (apply_match_filters), but they might not be present in all matches, or they might be computed but then the data is re-processed.

Gap 2: ML Classifications Not Persisted

Location: iss_speed_html_dashboard_v2_clean.py lines 3294-3367

Problem:

  • ML classifications are computed on-the-fly in apply_match_filters
  • They're added to match objects temporarily: m['ml_classification'] = ml_class1
  • But when filters change or data is refreshed, ML classifications are lost
  • ML classifications are NOT stored back to processed_matches

Impact: Every time filters are refreshed, ML classifications need to be recomputed, and if they're not computed yet when plot data is requested, the plot falls back to cloudiness.

Gap 3: No Backend Mutual Exclusivity Enforcement

Location: iss_speed_html_dashboard_v2_clean.py lines 3164-3374 (apply_match_filters)

Problem:

  • Both cloudiness filter (lines ~3242-3248) and ML filter (lines 3294-3367) can be applied
  • Code doesn't check if both are enabled and prevent it
  • Frontend enforces mutual exclusivity, but backend doesn't validate it

Gap 4: Missing ML Classification During Initial Processing

Location: iss_speed_html_dashboard_v2_clean.py lines 2260-2267

Problem:

# ML classification will be applied later when the filter is enabled
# This avoids the performance hit during initial data loading
match['ml_classification'] = None
match['ml_confidence'] = 0.0

Issue: ML classification is never computed during initial processing, only when filters are applied. This means:

  • If user enables ML filter and immediately requests plot data, ML classifications don't exist yet
  • Plot data generation happens before ML classifications are computed

Gap 5: Plot Data Uses Cloudiness Even When ML Enabled

Location: iss_speed_html_dashboard_v2_clean.py lines 2764-2772

Problem:

if ml_enabled and ml_class is not None:
    # Use ML classification colors
    ...
else:
    # Falls back to cloudiness - WRONG!

Expected Behavior: If ML is enabled, it should ALWAYS use ML, never cloudiness. Should either:

  • Compute ML if not present
  • Use only ML classifications (never cloudiness)
  • Error if ML enabled but no ML data

🎯 Required Fixes

Fix 1: Enforce Backend Mutual Exclusivity

  • In apply_match_filters, check if both enable_ml_classification and enable_cloudiness are True
  • If both are True, prioritize ML (or return error)

Fix 2: Fix Plot Color Logic

  • When ML is enabled, NEVER use cloudiness for colors
  • If ML classification is missing but ML is enabled, compute it on-the-fly
  • Or use a placeholder color scheme indicating ML data is loading

Fix 3: Compute ML Early

  • When ML filter is enabled, compute ML classifications BEFORE plot data is generated
  • Store ML classifications in filtered matches
  • Ensure ML classifications are available when plot colors are determined

Fix 4: Persist ML Classifications

  • Store computed ML classifications back to match objects
  • Cache ML results to avoid recomputation
  • Ensure ML classifications survive filter refreshes

Fix 5: Ensure ML Filter Options Work

  • Verify include_good and include_not_good filter options are properly applied
  • Test that filtering by ML classification actually filters the data
  • Ensure filtered data is used in plot generation

📊 Current Flow vs Required Flow

Current (Broken) Flow:

1. User enables ML filter → Frontend unchecks cloudiness ✅
2. User clicks "Refresh Filters" → Backend receives enable_ml_classification=True
3. apply_match_filters runs → Computes ML classifications ✅
4. Filtered matches returned with ML classifications ✅
5. User requests plot data → /api/plot-data
6. Plot data generation → Checks ml_classification field
7. ❌ PROBLEM: ml_classification is None (or missing) for some matches
8. Falls back to cloudiness → ❌ Shows wrong colors

Required (Fixed) Flow:

1. User enables ML filter → Frontend unchecks cloudiness ✅
2. User clicks "Refresh Filters" → Backend receives enable_ml_classification=True
3. apply_match_filters runs → Computes ML classifications ✅
4. Filtered matches returned with ML classifications ✅
5. User requests plot data → /api/plot-data
6. Plot data generation → Checks ml_classification field
7. ✅ If ML enabled: Use ONLY ML classifications (never cloudiness)
8. ✅ If ML missing: Compute it, or use placeholder
9. ✅ Colors based on good/not_good only

🔧 Implementation Plan

  1. Fix mutual exclusivity in apply_match_filters
  2. Fix plot color logic in /api/plot-data endpoint
  3. Ensure ML classifications are computed before plot generation
  4. Test ML filtering (include_good, include_not_good)
  5. Verify graph shows good/not_good when ML is enabled

Acceptance Criteria

When ML classification is enabled:

  • Graph in Section 6 shows pairs colored by good (green) / not_good (red)
  • No cloudiness colors appear when ML is enabled
  • Filtering by include_good/include_not_good actually filters the data (already implemented)
  • Graph updates correctly when ML filter options change
  • Cloudiness filter is completely disabled when ML is enabled
  • ML and cloudiness are truly mutually exclusive

🔧 Fixes Applied

Fix 1: Enforce Backend Mutual Exclusivity

Location: iss_speed_html_dashboard_v2_clean.py lines 3177-3186

Change: Added check at the start of apply_match_filters to ensure ML and cloudiness cannot both be enabled. If both are enabled, ML takes priority and cloudiness is disabled.

# Enforce mutual exclusivity: ML classification and cloudiness cannot both be enabled
ml_enabled = filters.get('enable_ml_classification', False)
cloudiness_enabled = filters.get('enable_cloudiness', False)

if ml_enabled and cloudiness_enabled:
    # ML takes priority - disable cloudiness if both are enabled
    print("⚠️ Both ML classification and cloudiness filters are enabled. ML takes priority (cloudiness disabled).")
    filters = filters.copy()  # Don't modify the original filters dict
    filters['enable_cloudiness'] = False
    cloudiness_enabled = False

Fix 2: Fix Plot Color Logic

Location: iss_speed_html_dashboard_v2_clean.py lines 2757-2807

Change: Completely rewrote the plot color logic to enforce mutual exclusivity:

  • When ML is enabled → ONLY uses ML classifications, NEVER cloudiness
  • When cloudiness is enabled → uses cloudiness classifications
  • If ML enabled but classification missing → uses gray placeholder (not cloudiness)
  • ML and cloudiness are now truly mutually exclusive in the visualization

Before: Would fall back to cloudiness if ML classification was missing After: Uses ML-only when enabled, cloudiness-only when enabled, never both