Protein language models usually score missense variants by how surprising the mutant residue is. We test a complementary idea: some mutations are only moderately surprising at the token level yet still reorganize local hidden-state geometry, and that geometry can be read out through mutation-centered covariance matrices. A TP53 structural case makes the intuition concrete: p.Val272Met falls into a high-geometry, low-likelihood group and shows median local RMSD 1.91 A in matched crystal structures. Motivated by that pattern, we compare wild-type and mutant ESM2 covariance matrices in local windows and evaluate the resulting signal across frozen benchmarks, stronger-baseline audits, a clinical panel derived from a 15,752-gene ClinVar scan, structural analyses, rescue curation, and cross-model checks. On canonical TP53, the released covariance-likelihood pair reaches AUC 0.7498 and survives repeated nested cross-validation. On a repaired TP53 structural subset, covariance aligns with a composite structural-tension target better than scalar likelihood does (Spearman 0.309 versus 0.036), indicating that the covariance signal is not merely a nonlinear rewrite of sequence surprise.
The flagship quantitative result is BRCA2: adding covariance-aware hidden-state geometry to the ESM-1v ensemble improves AUC from 0.6324 to 0.6890, a paired gain of 0.0566 with 95% bootstrap confidence interval [0.0131, 0.1063], and that gain disappears under covariance permutation (empirical p = 0.0010). The manuscript's most distinctive qualitative result is a performance-blind scale-repair failure mode: in a high-covariance, high-disagreement basic-substitution regime, ESM2-150M overstates covariance disruption relative to matched controls, while ESM2-650M largely repairs that error pattern. Across 13 gene-matched pairs, mean Frobenius gap reduction is 0.7250 and mean pair-score gap reduction is 0.3632 (exact sign-flip p = 0.000244 for both), whereas the likelihood channel shows no corresponding repair; a stricter same-position sister-substitution follow-up across 5 genes preserves covariance-specific repair across 8 sister pairs (exact p = 0.007812). Breadth remains bounded but real: a clinical-panel audit improves over the frozen ESM2 scalar baseline in 4 of 4 positive-focus genes, yields 12 prioritized rescue candidates, and condenses into a five-case rescue gallery across three genes plus an explicit BRCA1 anti-case; on the support-ranked feasible top-25 panel, 10 genes are positive, 2 are clearly negative, and 13 remain ambiguous. A stronger-baseline MSH2 follow-up is decisively negative, with performance falling from 0.9233 to 0.8457 and the best alpha collapsing to 0.0, while the cross-model analyses remain supportive rather than nuclear and the final localization pass still falls short of a nuclear rescue regime. Local hidden-state covariance is not a universal upgrade to zero-shot missense prediction. It is a falsifiable, auditable, gene-dependent signal that can improve a strong external baseline, expose a concrete ESM2 failure mode, and track real structural strain better than likelihood alone.