Skip to content

cmd/seed: synthetic dataset for local development#70

Open
ZukwiZ wants to merge 1 commit into
masterfrom
feat/dev-seed-synthetic
Open

cmd/seed: synthetic dataset for local development#70
ZukwiZ wants to merge 1 commit into
masterfrom
feat/dev-seed-synthetic

Conversation

@ZukwiZ
Copy link
Copy Markdown
Collaborator

@ZukwiZ ZukwiZ commented May 27, 2026

Summary

Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle sinusoid with occasional spikes and quiet days) for exercising the dashboard locally without needing a real production export. The generator deliberately spans every period the chart picker offers (7d / 30d / 3m / 6m / 1y), so any range renders meaningful data.

Safety properties

  • Refuses to run unless `Config.Environment == "development"`.
  • `INSERT … ON CONFLICT (id) DO NOTHING`, so re-running is a no-op.
  • Steam IDs use a clearly-synthetic `76561198000000000` prefix; real Steam IDs sit higher up the 64-bit range, so generated rows are easy to spot in the DB.
  • Snowflake IDs encode the same `created_at` + sequence layout as the production generator (`domain/models/snowflake.go`), so synthetic rows sort chronologically alongside any real rows already present.

.gitignore

Adds `internal-docs/` (author scratch space) and `internal/devseed/fixtures/` (room for any future local-only CSV fixtures) so neither leaks into the public repo.

Test plan

  • `go build ./cmd/seed`
  • Against a local dev DB with `Environment=development`: `go run ./cmd/seed` prints `generated …` and `seed complete: N inserted, 0 already present`.
  • Re-run: prints `seed complete: 0 inserted, N already present` (idempotent).
  • With `Environment` set to anything else: exits non-zero with a refusal message.
  • After seeding, the dashboard at `/` shows ~6 months of activity across all period picker ranges.

Made with Cursor


Note

Low Risk
Dev-only CLI with an environment guard and idempotent inserts; not linked to the production server binary.

Overview
Adds a dev-only seed path so local Postgres can hold a deterministic ~6-month reversal history (~9.8k rows) without production exports.

go run ./cmd/seed loads config, exits unless Environment is development, then bulk-inserts generated rows into the public DB via ON CONFLICT (id) DO NOTHING (safe to re-run). internal/devseed builds daily volume with variance, marketplace mix, sources, optional expungements, synthetic Steam IDs, and snowflake IDs aligned with production ordering.

README documents the workflow and dashboard period coverage; .gitignore excludes internal-docs/ and optional internal/devseed/fixtures/.

Reviewed by Cursor Bugbot for commit b067b6b. Bugbot is set up for automated code reviews on this repo. Configure here.

Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle
sinusoid with occasional spikes and quiet days) for exercising the
dashboard locally without needing real production exports. The
generator deliberately spans every period (7d / 30d / 3m / 6m / 1y)
so the chart UI has data to render at any range.

Safety properties:

- Refuses to run unless Config.Environment == "development".
- INSERT … ON CONFLICT (id) DO NOTHING, so re-running is a no-op.
- Steam IDs use a clearly-synthetic 76561198000000000 prefix.
- Snowflake IDs encode the same created_at + sequence layout as
  the production generator, so synthetic rows sort chronologically
  alongside any real rows already in the DB.

internal-docs/ and internal/devseed/fixtures/ are added to .gitignore
to keep author scratch space and any future local CSV fixtures out of
the public repo.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b067b6b. Configure here.

Comment thread cmd/seed/main.go
}
}()

reversals := devseed.GenerateSynthetic(time.Now().UTC())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seed not idempotent due to time-dependent snowflake IDs

Medium Severity

GenerateSynthetic receives time.Now().UTC(), and all generated snowflake IDs embed createdAt timestamps derived from that value. Since today shifts daily and the nowMs cap changes every millisecond, re-running the seed at a different time produces entirely different snowflake IDs. Because ON CONFLICT (id) DO NOTHING keys on these IDs, a second run on a different day inserts ~9.8k additional rows instead of being a no-op, contradicting the documented idempotency guarantee. Anchoring to a fixed reference time instead of time.Now() would make the output truly deterministic.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b067b6b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants