Demo prep/srecon26 by Essoz · Pull Request #136 · OrderLab/TrainCheck

Essoz · 2026-03-20T05:39:33Z

No description provided.

The _try_merge_hypotheses() function in APIContainRelation was leaving text_description as the placeholder "TBD merged" on every hypothesis that went through the merge path. These descriptions surfaced verbatim in invariants.json and the HTML checker report, making merged invariants uninterpretable. Generate the description from the parent API's full name and the generalized child param, consistent with the non-merged path. Also assert the parent param type to satisfy the static checker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- usage-guide.md: fix broken link to 5-min-tutorial.md (was pointing to ./docs/5-min.md which does not exist) - technical-doc.md: remove the 'under construction' warning that was the first thing visitors saw on the technical docs page Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each Relation subclass now implements to_display_name(params) returning a natural-language string like "Optimizer.zero_grad() changes Parameter.grad: non-zero → None" instead of raw class names or opaque text_description fields. - base_cls.py: add _short_api_name() helper and default to_display_name() returning None on the Relation base class - All 8 relation types implemented: APIContainRelation, ConsistencyRelation, FunctionCoverRelation, FunctionLeadRelation, DistinctArgumentRelation, ConsistentOutputRelation, ConsistentInputOutputRelation, ThresholdRelation - checker_report._format_invariant_label() now calls to_display_name() first, falling back to text_description, then raw params Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- checker_report: add _extract_violation_steps() and _build_violation_entry() helpers; _count_failed_invariants() now tracks first_step per invariant; HTML report shows "first seen at step N · M occurrences" under each item - reporting/__init__.py: export build_violations_summary() - checker.py: write violations_summary.json alongside failed.log for each trace, containing first_violation_step, distinct_invariants_violated, and per-violation entries with display_name, relation_type, first/last step, occurrences Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- test_display_names.py: one test class per relation type (8 total, 24 tests) verifying that key semantic tokens appear in to_display_name() output given known param lists — independent of the inference algorithm - test_violation_summary.py: pure function tests for _extract_violation_steps(), _build_violation_entry(), and build_violations_summary() (14 tests) All 38 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two label quality issues found during live workload testing: - APIContainRelation.to_display_name: return None for attrs starting with _TRAINCHECK_ (internal proxy bookkeeping IDs that are meaningless to users) - APIContainRelation.to_display_name: normalize non_zero → non-zero in both pre and post values via shared _fmt_val() helper - _format_invariant_label: when to_display_name returns None and params include a _TRAINCHECK_ attr, produce "Func() [internal tracking]" instead of falling back to the raw text_description containing the ugly internal name - tests: add test_post_value_non_zero_normalized and test_traincheck_internal_attr_hidden to test_display_names.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- checker.py: stream handler now set to WARNING level so per-invariant INFO logs stay in the log file; summary lines printed via print() - cover_relation.py, lead_relation.py: convert remaining print() debug calls to logger.debug(); change all leave=True to leave=False on inner tqdm bars so they don't persist after completing - DistinctArgumentRelation.py, consistency_relation.py, consistency_transient_vars.py: same leave=False cleanup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

check_engine() now shows one bar: "{N checked · M left · X violated} P%|█████| N/total [elapsed<remaining]" — updated after each invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add traincheck/progress.py — a thin tqdm wrapper that checks utils._suppress_inner_progress before creating each bar. check_engine() sets the flag after opening the single outer checking bar, so only "N checked · M left · X violated" is visible during a check run. All relation code and trace-layer code now imports from traincheck.progress instead of tqdm directly (one-line change per file, no logic changes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…inference time Previously to_display_name() was only called at HTML render time in checker_report.py, so failed.log, violations_summary.json, and any other text_description consumer still showed raw internal strings like 'FunctionCoverRelation between torch.optim... and torch.optim...'. Now every generate_hypothesis() / infer() site that constructs an Invariant calls to_display_name(params) directly and uses the result as text_description, falling back to the old string only when to_display_name returns None (e.g. unexpected param types). _format_invariant_label() in checker_report.py already falls through to text_description, so the HTML report continues to work unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously lead_relation, DistinctArgumentRelation, and consistency_transient_vars each had their own ad-hoc filtering logic ("._" substring checks, torch.override HACKs) that didn't respect ANALYSIS_SKIP_FUNC_NAMES — so the <locals> entry added to config.py had no effect on those relations. Now all four relation types use the same ANALYSIS_SKIP_FUNC_NAMES list as the single source of truth for which function names to skip.

generate_hypothesis(): - Prints "[Trace N/M] Generating hypotheses" header per trace - After each relation completes, prints " RelationName: N hypotheses (Xs)" via tqdm.write() so it stays above any inner progress bars - Removes the "Merging Hypotheses" tqdm (pure bookkeeping, not useful) prune_incorrect_hypos(): prints "N pruned → M remaining" summary line collect_examples(): silent unless cross-trace work is needed infer_precondition(): - Single outer tqdm bar: "N done · M failed P%|████| N/total [elapsed<remaining]" - Inner bars (Scanning Positive Examples, etc.) suppressed via _suppress_inner_progress flag - Prints final "N invariants · M failed" summary precondition.py: convert two stray print() calls to logger.debug()

Add outer tqdm bar over active relations in generate_hypothesis() so users see which relation is currently running (not just summary lines after each completes). Each completed relation prints elapsed time and hypothesis count via tqdm.write() so lines stay above the live bar. Also fix indentation bug: for-hypo merge loop was outside the for-relation loop, so only the last relation's hypotheses were merged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eliminates per-step stdout noise from control.py (Warmup/Interval/ Skipping step printed every training step), shutdown messages from dumper.py, AST loop/model detection messages from source_file.py, and proxy parameter setup messages from proxy.py. All demoted to logger.debug() so they remain accessible with -d flag but don't clutter normal traincheck-collect output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update logging format strings in all three CLI entry points so every log message is visually identifiable as coming from TrainCheck, not the user's training script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- collect_trace: use force=True + WARNING default level so basicConfig format actually applies and INFO chatter is suppressed at runtime - source_file: demote all annotate_stage insertion logger.info → debug - dumper: demote attribute-dump failure logger.warning → debug (torch internals routinely fail, not actionable) - call_graph_parser: convert all print() → logger.debug(); add logger Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

safe_getattr() iterates all attributes of instrumented objects; some (e.g. torchvision dataset's .test_data) fire warnings.warn() on access. Wrap getattr in warnings.catch_warnings(simplefilter=ignore) so third-party deprecation warnings don't leak to the user's stderr. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Importing torch.distributed and other submodules during the two-pass instrumentation scan fires deprecation UserWarnings (e.g. reduce_op). Wrap both passes in warnings.catch_warnings(simplefilter=ignore) so these don't leak to the user's terminal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Use .get() for args/kwargs (PRE) and return_values/exception (POST) when populating pt_map, since functions instrumented without argument capture (e.g. Adadelta.step) omit these fields — previously caused KeyError that silently dropped the record and prevented APIContainRelation from triggering - Add FolderCreationHandler that watches each trace folder and dynamically attaches a StreamLogHandler for any trace_*/proxy_log.json file created after the checker starts, fixing the checker getting stuck when training had not yet created trace files at startup time - Set float('inf') sentinel in read_time_map after _save_initial_content so files with no live updates don't block min_read_time indefinitely - Rename _get_api_args_map_to_check → all_needed_args_api in Checker_data and sort_inv_file for clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- consistency_relation: use .get() instead of direct key access in online_check to avoid KeyError when a variable lacks a tracked attribute - contain_relation: use ASCII arrow in to_display_name for terminal safety - collect_trace: demote InputOutputParam warning to debug to reduce noise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g progress checker_online.py: - Track VIOLATION_DETAILS (step/stage pairs and sample trace per invariant) - Track TRIGGERED_INV (invariants checked at least once), ALL_INVS, CURRENT_STEP and CURRENT_STAGE from each processed trace record - Remove bare 'raise e' from API invariant exception handler so a single bad invariant check no longer crashes the entire checker loop - Pass new tracking state to build_online_report_data on every report emit checker_report.py: - Violations sorted by first violation step (earliest first) instead of count - Per-violation: first/last step with stage badge, full step list grouped by stage (e.g. [train] 1,2,3 · [eval] 100,101), expandable sample trace table - Stage badges with distinct colors for train/eval/val/test/inference; unknown stages get a hash-derived color from a fallback palette - New Checking Progress panel: stacked bar (passing/failing/not-triggered), collapsible list of not-yet-triggered invariants, pass rate card, and Current Step card showing latest step with stage badge Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_check When iterating all varids of a given type, not every variable instance has every tracked attribute (e.g. _TRAINCHECK_grad_ID may be absent if grad was never observed). Skip varids that don't have the attribute in varid_map rather than crashing with KeyError. Also remove the remaining bare 'raise e' in the API-based invariant check block — the var-based block was fixed earlier but this one was missed, causing the checker to crash and stop on any API invariant exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ne_check Add Checker_data.attr_map (var_type → attr_name → set[VarInstId]), populated in _set_var_map when an attribute is first observed for a variable. Replace the broad type_map iteration in APIContainRelation.online_check and query_var_changes_within_time_and_process with attr_map lookups that only visit varids known to carry the attribute. This eliminates the KeyError when frozen parameters (or any variable lacking a tracked attribute) appear in type_map. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t-observable returns The .get() pattern silently returned empty sets even in cases that would indicate a population bug. Replace with direct dict access guarded only by explicit "not yet observable" early returns (no vars of this type/attr have been seen yet -- the invariant simply cannot be checked and passes vacuously). Inside the iteration loop, add assertions so any discrepancy between attr_map and varid_map fails loudly rather than being masked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng attrs Add _display_attr_name() helper that maps '_TRAINCHECK_grad_ID' -> 'grad' etc. Use it in APIContainRelation.to_display_name (removing the return-None guard) and ConsistencyRelation.to_display_name. Remove the now-unnecessary [internal tracking] fallback from _format_invariant_label. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…play names '_TRAINCHECK_grad_ID' -> 'grad_ID', not 'grad'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- _count_failed_invariants now tracks last_step, step_stages (step→stage map from all violation traces), and sample_trace (first violation) - Offline HTML violations panel and per-trace failed-invariants lists now use the same expandable table format as the online report: First Step / Last Step / Count columns, stage badges, collapsible step timeline and sample trace rows - W&B violations table gains a last_step column; summary gains violations/last_step - MLflow gains violations_last_step metric Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add _build_violation_steps_map() helper: step → count of distinct invariants violated at that step (across all failed CheckerResults) - Propagate violation_steps_map through build_offline_report_data and build_online_report_data so downstream loggers can consume it - W&B: log traincheck/violations as a metric at each step via wandb.log({...}, step=N) so violations appear on the same x-axis as training loss; add --wandb-run-id CLI arg to attach to an existing run - MLflow: log traincheck_violations per step via mlflow.log_metric(step=N); switch violations table from log_dict to log_table() for proper UI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Essoz and others added 28 commits March 14, 2026 06:02

feat: replace multi-bar clutter with single live-stats progress bar

4645aad

check_engine() now shows one bar: "{N checked · M left · X violated} P%|█████| N/total [elapsed<remaining]" — updated after each invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: prefix all TrainCheck log output with [TrainCheck]

09c9ea7

Update logging format strings in all three CLI entry points so every log message is visually identifiable as coming from TrainCheck, not the user's training script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: keep _ID suffix when stripping _TRAINCHECK_ prefix from attr dis…

9d8d04c

…play names '_TRAINCHECK_grad_ID' -> 'grad_ID', not 'grad'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo prep/srecon26#136

Demo prep/srecon26#136
Essoz wants to merge 28 commits intomainfrom
demo-prep/srecon26

Essoz commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Essoz commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant