Skip to content

Demo prep/srecon26#136

Open
Essoz wants to merge 28 commits intomainfrom
demo-prep/srecon26
Open

Demo prep/srecon26#136
Essoz wants to merge 28 commits intomainfrom
demo-prep/srecon26

Conversation

@Essoz
Copy link
Collaborator

@Essoz Essoz commented Mar 20, 2026

No description provided.

Essoz and others added 28 commits March 14, 2026 06:02
The _try_merge_hypotheses() function in APIContainRelation was
leaving text_description as the placeholder "TBD merged" on
every hypothesis that went through the merge path. These
descriptions surfaced verbatim in invariants.json and the HTML
checker report, making merged invariants uninterpretable.

Generate the description from the parent API's full name and the
generalized child param, consistent with the non-merged path.
Also assert the parent param type to satisfy the static checker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- usage-guide.md: fix broken link to 5-min-tutorial.md
  (was pointing to ./docs/5-min.md which does not exist)
- technical-doc.md: remove the 'under construction' warning
  that was the first thing visitors saw on the technical docs page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each Relation subclass now implements to_display_name(params) returning
a natural-language string like "Optimizer.zero_grad() changes
Parameter.grad: non-zero → None" instead of raw class names or opaque
text_description fields.

- base_cls.py: add _short_api_name() helper and default to_display_name()
  returning None on the Relation base class
- All 8 relation types implemented:
  APIContainRelation, ConsistencyRelation, FunctionCoverRelation,
  FunctionLeadRelation, DistinctArgumentRelation,
  ConsistentOutputRelation, ConsistentInputOutputRelation, ThresholdRelation
- checker_report._format_invariant_label() now calls to_display_name()
  first, falling back to text_description, then raw params

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- checker_report: add _extract_violation_steps() and _build_violation_entry()
  helpers; _count_failed_invariants() now tracks first_step per invariant;
  HTML report shows "first seen at step N · M occurrences" under each item
- reporting/__init__.py: export build_violations_summary()
- checker.py: write violations_summary.json alongside failed.log for each
  trace, containing first_violation_step, distinct_invariants_violated, and
  per-violation entries with display_name, relation_type, first/last step,
  occurrences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_display_names.py: one test class per relation type (8 total, 24 tests)
  verifying that key semantic tokens appear in to_display_name() output given
  known param lists — independent of the inference algorithm
- test_violation_summary.py: pure function tests for _extract_violation_steps(),
  _build_violation_entry(), and build_violations_summary() (14 tests)

All 38 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two label quality issues found during live workload testing:

- APIContainRelation.to_display_name: return None for attrs starting with
  _TRAINCHECK_ (internal proxy bookkeeping IDs that are meaningless to users)
- APIContainRelation.to_display_name: normalize non_zero → non-zero in both
  pre and post values via shared _fmt_val() helper
- _format_invariant_label: when to_display_name returns None and params
  include a _TRAINCHECK_ attr, produce "Func() [internal tracking]" instead
  of falling back to the raw text_description containing the ugly internal name
- tests: add test_post_value_non_zero_normalized and
  test_traincheck_internal_attr_hidden to test_display_names.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- checker.py: stream handler now set to WARNING level so per-invariant
  INFO logs stay in the log file; summary lines printed via print()
- cover_relation.py, lead_relation.py: convert remaining print() debug
  calls to logger.debug(); change all leave=True to leave=False on inner
  tqdm bars so they don't persist after completing
- DistinctArgumentRelation.py, consistency_relation.py,
  consistency_transient_vars.py: same leave=False cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
check_engine() now shows one bar: "{N checked · M left · X violated}
P%|█████| N/total [elapsed<remaining]" — updated after each invariant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add traincheck/progress.py — a thin tqdm wrapper that checks
utils._suppress_inner_progress before creating each bar.

check_engine() sets the flag after opening the single outer checking bar,
so only "N checked · M left · X violated" is visible during a check run.
All relation code and trace-layer code now imports from traincheck.progress
instead of tqdm directly (one-line change per file, no logic changes).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inference time

Previously to_display_name() was only called at HTML render time in
checker_report.py, so failed.log, violations_summary.json, and any
other text_description consumer still showed raw internal strings like
'FunctionCoverRelation between torch.optim... and torch.optim...'.

Now every generate_hypothesis() / infer() site that constructs an
Invariant calls to_display_name(params) directly and uses the result as
text_description, falling back to the old string only when
to_display_name returns None (e.g. unexpected param types).

_format_invariant_label() in checker_report.py already falls through
to text_description, so the HTML report continues to work unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously lead_relation, DistinctArgumentRelation, and
consistency_transient_vars each had their own ad-hoc filtering logic
("._" substring checks, torch.override HACKs) that didn't respect
ANALYSIS_SKIP_FUNC_NAMES — so the <locals> entry added to config.py
had no effect on those relations.

Now all four relation types use the same ANALYSIS_SKIP_FUNC_NAMES list
as the single source of truth for which function names to skip.
generate_hypothesis():
- Prints "[Trace N/M] Generating hypotheses" header per trace
- After each relation completes, prints "  RelationName: N hypotheses (Xs)"
  via tqdm.write() so it stays above any inner progress bars
- Removes the "Merging Hypotheses" tqdm (pure bookkeeping, not useful)

prune_incorrect_hypos(): prints "N pruned → M remaining" summary line

collect_examples(): silent unless cross-trace work is needed

infer_precondition():
- Single outer tqdm bar: "N done · M failed  P%|████| N/total [elapsed<remaining]"
- Inner bars (Scanning Positive Examples, etc.) suppressed via
  _suppress_inner_progress flag
- Prints final "N invariants · M failed" summary

precondition.py: convert two stray print() calls to logger.debug()
Add outer tqdm bar over active relations in generate_hypothesis() so
users see which relation is currently running (not just summary lines
after each completes). Each completed relation prints elapsed time and
hypothesis count via tqdm.write() so lines stay above the live bar.

Also fix indentation bug: for-hypo merge loop was outside the
for-relation loop, so only the last relation's hypotheses were merged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates per-step stdout noise from control.py (Warmup/Interval/
Skipping step printed every training step), shutdown messages from
dumper.py, AST loop/model detection messages from source_file.py,
and proxy parameter setup messages from proxy.py.

All demoted to logger.debug() so they remain accessible with -d flag
but don't clutter normal traincheck-collect output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update logging format strings in all three CLI entry points so every
log message is visually identifiable as coming from TrainCheck, not
the user's training script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- collect_trace: use force=True + WARNING default level so basicConfig
  format actually applies and INFO chatter is suppressed at runtime
- source_file: demote all annotate_stage insertion logger.info → debug
- dumper: demote attribute-dump failure logger.warning → debug (torch
  internals routinely fail, not actionable)
- call_graph_parser: convert all print() → logger.debug(); add logger

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
safe_getattr() iterates all attributes of instrumented objects; some
(e.g. torchvision dataset's .test_data) fire warnings.warn() on
access. Wrap getattr in warnings.catch_warnings(simplefilter=ignore)
so third-party deprecation warnings don't leak to the user's stderr.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Importing torch.distributed and other submodules during the two-pass
instrumentation scan fires deprecation UserWarnings (e.g. reduce_op).
Wrap both passes in warnings.catch_warnings(simplefilter=ignore) so
these don't leak to the user's terminal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use .get() for args/kwargs (PRE) and return_values/exception (POST) when
  populating pt_map, since functions instrumented without argument capture
  (e.g. Adadelta.step) omit these fields — previously caused KeyError that
  silently dropped the record and prevented APIContainRelation from triggering
- Add FolderCreationHandler that watches each trace folder and dynamically
  attaches a StreamLogHandler for any trace_*/proxy_log.json file created
  after the checker starts, fixing the checker getting stuck when training
  had not yet created trace files at startup time
- Set float('inf') sentinel in read_time_map after _save_initial_content so
  files with no live updates don't block min_read_time indefinitely
- Rename _get_api_args_map_to_check → all_needed_args_api in Checker_data
  and sort_inv_file for clarity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- consistency_relation: use .get() instead of direct key access in
  online_check to avoid KeyError when a variable lacks a tracked attribute
- contain_relation: use ASCII arrow in to_display_name for terminal safety
- collect_trace: demote InputOutputParam warning to debug to reduce noise

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g progress

checker_online.py:
- Track VIOLATION_DETAILS (step/stage pairs and sample trace per invariant)
- Track TRIGGERED_INV (invariants checked at least once), ALL_INVS,
  CURRENT_STEP and CURRENT_STAGE from each processed trace record
- Remove bare 'raise e' from API invariant exception handler so a single
  bad invariant check no longer crashes the entire checker loop
- Pass new tracking state to build_online_report_data on every report emit

checker_report.py:
- Violations sorted by first violation step (earliest first) instead of count
- Per-violation: first/last step with stage badge, full step list grouped by
  stage (e.g. [train] 1,2,3 · [eval] 100,101), expandable sample trace table
- Stage badges with distinct colors for train/eval/val/test/inference;
  unknown stages get a hash-derived color from a fallback palette
- New Checking Progress panel: stacked bar (passing/failing/not-triggered),
  collapsible list of not-yet-triggered invariants, pass rate card, and
  Current Step card showing latest step with stage badge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_check

When iterating all varids of a given type, not every variable instance
has every tracked attribute (e.g. _TRAINCHECK_grad_ID may be absent if
grad was never observed). Skip varids that don't have the attribute in
varid_map rather than crashing with KeyError.

Also remove the remaining bare 'raise e' in the API-based invariant check
block — the var-based block was fixed earlier but this one was missed,
causing the checker to crash and stop on any API invariant exception.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ne_check

Add Checker_data.attr_map (var_type → attr_name → set[VarInstId]), populated
in _set_var_map when an attribute is first observed for a variable. Replace the
broad type_map iteration in APIContainRelation.online_check and
query_var_changes_within_time_and_process with attr_map lookups that only visit
varids known to carry the attribute. This eliminates the KeyError when frozen
parameters (or any variable lacking a tracked attribute) appear in type_map.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t-observable returns

The .get() pattern silently returned empty sets even in cases that would indicate
a population bug. Replace with direct dict access guarded only by explicit
"not yet observable" early returns (no vars of this type/attr have been seen yet
-- the invariant simply cannot be checked and passes vacuously). Inside the
iteration loop, add assertions so any discrepancy between attr_map and varid_map
fails loudly rather than being masked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng attrs

Add _display_attr_name() helper that maps '_TRAINCHECK_grad_ID' -> 'grad' etc.
Use it in APIContainRelation.to_display_name (removing the return-None guard)
and ConsistencyRelation.to_display_name. Remove the now-unnecessary
[internal tracking] fallback from _format_invariant_label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…play names

'_TRAINCHECK_grad_ID' -> 'grad_ID', not 'grad'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _count_failed_invariants now tracks last_step, step_stages (step→stage
  map from all violation traces), and sample_trace (first violation)
- Offline HTML violations panel and per-trace failed-invariants lists now
  use the same expandable table format as the online report: First Step /
  Last Step / Count columns, stage badges, collapsible step timeline and
  sample trace rows
- W&B violations table gains a last_step column; summary gains
  violations/last_step
- MLflow gains violations_last_step metric

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _build_violation_steps_map() helper: step → count of distinct
  invariants violated at that step (across all failed CheckerResults)
- Propagate violation_steps_map through build_offline_report_data and
  build_online_report_data so downstream loggers can consume it
- W&B: log traincheck/violations as a metric at each step via
  wandb.log({...}, step=N) so violations appear on the same x-axis as
  training loss; add --wandb-run-id CLI arg to attach to an existing run
- MLflow: log traincheck_violations per step via mlflow.log_metric(step=N);
  switch violations table from log_dict to log_table() for proper UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant