Skip to content

feat(mbw-v4): support current_date param for historical follow-up rate computation#199

Open
aliflaming wants to merge 3 commits into
mainfrom
feat/mbw-v4-current-date-param
Open

feat(mbw-v4): support current_date param for historical follow-up rate computation#199
aliflaming wants to merge 3 commits into
mainfrom
feat/mbw-v4-current-date-param

Conversation

@aliflaming
Copy link
Copy Markdown
Collaborator

Product Description

Adds current_date parameter support to the MBW Auditing V4 job handler, enabling the dashboard to compute follow-up rate metrics as of a historical date. This powers the new "baseline at task trigger" feature on Tab 2 (Improvements within Audit): each FLW's follow-up rate is shown with a parenthetical (↑ from X%) / (↓ from X%) / (= X%) indicating change since their task was first triggered.

Technical Summary

One-line change in handle_mbw_auditing_v4_job: when job_config["current_date"] is provided, now is set to that date instead of datetime.now(). This lets the frontend run historical job instances with current_date set to each FLW's task trigger date to get the baseline follow-up rate.

Safety Assurance

  • Backwards compatible: current_date defaults to date.today() when not supplied, identical to previous behaviour
  • No schema changes, no migrations
  • Used only by the Tab 2 baseline job runs in the dashboard render code

🤖 Generated with Claude Code

…e computation

Allows the auditing job to compute metrics as of a past date by overriding
`now` in the follow-up rate calculation. Used by the dashboard to fetch each
FLW's baseline follow-up rate at the time their task was triggered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🤖 AI Code Review

PR Diff Review

🔵 Suggestions

Date injection pattern

Location: Line 269

The change allows overriding "current date" via job_config["current_date"], which is good for testing. However:

now = date.fromisoformat(job_config["current_date"]) if job_config.get("current_date") else datetime.now(tz=timezone.utc).date()

Suggestions:

  1. Add validation: fromisoformat() will raise ValueError on invalid date strings. Consider wrapping in try-except or validating the format.
  2. Type safety: The config value should be validated as a string before parsing.
  3. Documentation: Add a comment explaining this is for testing/override purposes.

Improved version:

# Allow current_date override for testing
if current_date_str := job_config.get("current_date"):
    try:
        now = date.fromisoformat(current_date_str)
    except (ValueError, TypeError) as e:
        raise ValueError(f"Invalid current_date format: {current_date_str}") from e
else:
    now = datetime.now(tz=timezone.utc).date()

Overall Assessment

Code looks functionally correct for its intended purpose. The change enables date injection for testing time-dependent logic, which is a good practice. The main concern is lack of input validation, but this may be acceptable depending on how job_config is constructed (if it's already validated elsewhere).


Powered by Claude — auto-generated review

- Adds time_start field to pipeline 3050 (form.meta.timeStart)
- Computes minute_per_visit as median(timeEnd - timeStart) per FLW
- Computes travel_time as median gap between end of one visit and
  start of the next for the same FLW (cross-day gaps >8h excluded)
- Both metrics respect task_filters for Tab 2 post-task computation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🤖 AI Code Review

PR Review: MBW Auditing V4 - Visit Duration & Travel Time Metrics

Summary

This PR adds calculation of visit duration and inter-visit travel time metrics. The implementation is generally solid but has a few issues that need attention.


🔴 Critical Issues

1. Potential KeyError on Missing Fields

Location: Lines 222-223, 235

ts_start = row.get("time_start") or ""
ts_end = row.get("visit_datetime") or ""

Issue: If time_start or visit_datetime fields don't exist in the data source, this will silently create empty strings. However, the append at line 235 happens regardless of parsing success.

Impact: visit_times can accumulate ("", "") tuples that will cause exceptions in the inter-visit gap calculation (lines 247-257) when trying to parse empty strings.

Fix: Only append to visit_times if parsing succeeds:

if 0 < mins < 300:
    visit_durations.setdefault(username, []).append(mins)
    visit_times.setdefault(username, []).append((ts_start, ts_end))

🟡 Warnings

1. Hardcoded Magic Numbers

Location: Lines 229, 254

if 0 < mins < 300:  # 5 hours
if 0 < gap_mins < 480:  # 8 hours

Issue: Magic numbers scattered in code without constants.

Recommendation: Define constants at module level:

MAX_VISIT_DURATION_MINUTES = 300  # 5 hours
MAX_INTER_VISIT_GAP_MINUTES = 480  # 8 hours

2. Assumption About Data Ordering

Location: Lines 234-235, comment

# visits_sorted is chronological so appending here is already ordered per FLW.
visit_times.setdefault(username, []).append((ts_start, ts_end))

Issue: Code assumes visits_sorted iteration order is guaranteed chronological per FLW. If data is only sorted globally (not per-FLW), the inter-visit gap calculation will be incorrect.

Recommendation: Either verify the sort order guarantees per-FLW chronological ordering, or add explicit sorting:

for u, vt_list in visit_times.items():
    vt_sorted = sorted(vt_list, key=lambda x: x[0])  # sort by start time
    for i in range(1, len(vt_sorted)):
        # ... gap calculation

3. Configuration Change Without Migration Path

Location: Line 301

now = date.fromisoformat(job_config["current_date"]) if job_config.get("current_date") else datetime.now(tz=timezone.utc).date()

Issue: Introduces new optional config parameter. If not documented, existing jobs may behave unexpectedly.

Recommendation: Ensure current_date is documented in job config schema/docs.


🔵 Suggestions

1. Median Calculation Duplication

Location: Lines 424-426, 432-434

sorted_dur = sorted(durations)
minute_per_visit = round(sorted_dur[len(sorted_dur) // 2])

Suggestion: Extract to helper function to avoid duplication:

def median_rounded(values: list[float]) -> int | None:
    if not values:
        return None
    sorted_vals = sorted(values)
    return round(sorted_vals[len(sorted_vals) // 2])

2. Type Annotation for visit_times

Location: Line 161

visit_times: dict[str, list[tuple]] = {}  # username → [(time_start, time_end)]

Suggestion: Make tuple type explicit:

visit_times: dict[str, list[tuple[str, str]]] = {}

3. Edge Case: Zero-Length Lists

Location: Lines 424, 432
The median calculation will work correctly even with len(durations) == 1, but worth noting.


✅ Good Practices Observed

  • Proper exception handling with specific exception types
  • Sanity checks on calculated values (duration/gap ranges)
  • Descriptive variable names
  • Helpful comments explaining business logic
  • Consistent code style with existing codebase

Recommendation

Fix the critical issue (#1) before merging to prevent runtime exceptions. Address warnings for production robustness.


Powered by Claude — auto-generated review

…-metric denominators

- Counts visits_completed per FLW (all forms passing task_filters)
- Tracks eligible_mothers_visited (mothers with eligible_full_intervention_bonus=1
  who were visited in the task window)
- Exposes per-metric denominators (followup_rate_denom, ebf_denom, gps_denom,
  duration_denom, travel_time_denom) to support frontend "Not enough data" guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🤖 AI Code Review

Code Review

Overall, this PR adds new metrics for visit duration and inter-visit travel time calculation. The implementation is generally sound, but there are some issues to address.


🔴 Critical Issues

1. Potential Timezone Mismatch

now = date.fromisoformat(job_config["current_date"]) if job_config.get("current_date") else datetime.now(tz=timezone.utc).date()
  • When parsing current_date from config, no timezone validation occurs, but the fallback uses UTC explicitly
  • If current_date comes from a different timezone context, eligibility calculations could be incorrect
  • Fix: Document the expected timezone for current_date or ensure UTC normalization

🟡 Warnings

1. Inconsistent Variable Naming

flw_summaries.append({
    "username": u,
    "display_name": flw_names.get(u) or flw_names.get(username) or u,
    ...
})
  • Uses flw_names.get(username) as fallback, but username is from outer scope (loop variable)
  • This appears to be a bug - should likely be just flw_names.get(u) or u
  • Fix: Remove the redundant flw_names.get(username) fallback

2. Magic Numbers Without Constants

if 0 < mins < 300:  # sanity: 0–5 hours
if 0 < gap_mins < 480:  # sanity: 0–8 hours
  • Hardcoded thresholds for sanity checks
  • Suggestion: Extract to named constants like MAX_VISIT_DURATION_MINUTES = 300

3. No Validation of Time Ordering

mins = (t1 - t0).total_seconds() / 60
if 0 < mins < 300:
  • Checks if positive, but doesn't handle cases where time_start > visit_datetime explicitly
  • Negative values are silently dropped, which may hide data quality issues
  • Suggestion: Log or count invalid time ranges for monitoring

🔵 Suggestions

1. Potential Performance Issue

eligible_mothers_visited = sum(1 for mid in mothers_visited if mother_eligibility.get(mid, False))
  • Iterates through all mothers for each FLW
  • For large datasets, consider building this incrementally during the main loop
  • Not critical, but could be optimized

2. Median Calculation Readability

minute_per_visit = round(sorted_dur[len(sorted_dur) // 2])
  • Standard median calculation, but consider using statistics.median() for clarity
  • Current implementation doesn't handle even-length lists correctly (should average two middle values)

3. Type Safety

visit_times: dict[str, list[tuple]] = {}
  • list[tuple] is vague; should be list[tuple[str, str]] for clarity

✅ Good Practices Observed

  • Proper exception handling for datetime parsing
  • Sanity checks on computed values (duration/gap limits)
  • Consistent use of setdefault() for dict initialization
  • Clear comments explaining data structures

Summary

The code is functional but has one potential bug (the username fallback) and timezone concerns that should be addressed. The rest are improvements for robustness and maintainability.


Powered by Claude — auto-generated review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant