feat(geo): add CI workflow for DBSCAN/k-means geo point clustering#4298
feat(geo): add CI workflow for DBSCAN/k-means geo point clustering#4298
Conversation
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
#4003) Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
|
@copilot retry |
The PR is already complete — all 20 tests are in place (including the AC-9 DBSCAN performance test and AC-10 k-means performance test added in commit |
There was a problem hiding this comment.
Pull request overview
Adds CI coverage for geo point clustering performance targets by introducing two new performance tests (DBSCAN and k-means) and a dedicated GitHub Actions workflow to run the clustering test suite.
Changes:
- Added DBSCAN (10k points) and k-means (100k points, k=10) performance tests to meet AC-9/AC-10.
- Added a dedicated GitHub Actions workflow to run focused geo clustering tests and upload artifacts.
- Updated test file header metadata and workflow acceptance-criteria comment block.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/geo/test_geo_clustering.cpp | Adds two performance tests and required includes/header metadata updates. |
| .github/workflows/geo-point-clustering-dbscan-kmeans-ci.yml | Introduces a dedicated workflow to run clustering tests and publish results. |
You can also share your feedback on Copilot code review. Take the survey.
| TEST(KMeansCluster, Performance_100kPoints_K10_Under2Seconds) { | ||
| // Generate 100 000 points spread along 10 well-separated bands (one per | ||
| // cluster), each 10 000 points wide, offset by 2° in longitude. | ||
| // This ensures the algorithm produces exactly 10 stable clusters. | ||
| constexpr int kPointsPerCluster = 10000; | ||
| constexpr int kK = 10; | ||
| std::vector<GeometryInfo> pts; | ||
| pts.reserve(static_cast<std::size_t>(kPointsPerCluster * kK)); | ||
| for (int c = 0; c < kK; ++c) { | ||
| const double base_lon = static_cast<double>(c) * 2.0; // 2° separation | ||
| for (int j = 0; j < kPointsPerCluster; ++j) { | ||
| // Tiny jitter within ±0.005° (~500 m) to avoid identical points. | ||
| const double jitter = (j % 100) * 0.0001; | ||
| pts.push_back(makePoint(base_lon + jitter, 48.0 + jitter)); | ||
| } | ||
| } |
| const double base_lon = static_cast<double>(c) * 2.0; // 2° separation | ||
| for (int j = 0; j < kPointsPerCluster; ++j) { | ||
| // Tiny jitter within ±0.005° (~500 m) to avoid identical points. | ||
| const double jitter = (j % 100) * 0.0001; |
| // Performance target: ≤ 5 s single-threaded (design spec, not a hard CI | ||
| // gate, but flagged as a test failure to surface regressions). | ||
| EXPECT_LE(elapsed_s, 5.0) | ||
| << "DBSCAN on 10 000 points exceeded 5 s target (took " | ||
| << elapsed_s << " s)"; |
| # Triggered on every push / PR that touches the clustering sources, | ||
| # its test file, or this workflow itself. | ||
| # Targets v1.8.0 (roadmap:geo:phase4:clustering:Issue-1747). |
| run: | | ||
| set -o pipefail | ||
| cd build | ||
| ./tests/themis_tests \ |
|
@copilot Review Kommentare umsetzten |
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
All 5 review comments addressed in commit
|
Description
Adds CI coverage and completes the test suite for the geo point clustering implementation (DBSCAN and k-means) targeting Issue #4003 / roadmap:172:geo:v1.8.0.
Changes made:
DbscanCluster.Performance_10kPoints_Under5Seconds(AC-9)KMeansCluster.Performance_100kPoints_K10_Under2Seconds(AC-10)THEMIS_RUN_PERF_TESTS=1withGTEST_SKIP()to avoid flakiness on shared CI runnersseed=0deterministic init picks one centroid per cluster, makingEXPECT_EQ(num_clusters, kK)reliable±0.005°to"100 discrete steps of 0.0001° (0.0000..0.0099°, ~0..1.1 km)"geo-point-clustering-dbscan-kmeans-ci.yml) covering all 10 acceptance criteria (AC-1 through AC-10)Issue-1747→Issue-4003(roadmap:172:geo:v1.8.0:geo-point-clustering-dbscan-and-k-means)timeout 300to the unified binary test step to prevent CI stallsTHEMIS_RUN_PERF_TESTS=1opt-in requirementType of Change
Testing
📚 Research & Knowledge (wenn applicable)
/docs/research/angelegt?/docs/research/implementation_influence/eingetragen?Relevante Quellen:
Checklist
Original prompt
This section details on the original issue you should resolve
<issue_title>Geo Point Clustering: DBSCAN and k-means</issue_title>
<issue_description>### Context
This issue implements the roadmap item 'Geo Point Clustering: DBSCAN and k-means' for the geo domain. It is sourced from the consolidated roadmap under 🟡 Medium Priority — Near-term (v1.5.0 – v1.8.0) and targets milestone v1.8.0.
Primary detail section: Geo Point Clustering: DBSCAN and k-means
Goal
Deliver the scoped changes for Geo Point Clustering: DBSCAN and k-means in src/geo/ and complete the linked detail section in a release-ready state for v1.8.0.
Detailed Scope
Geo Point Clustering: DBSCAN and k-means
Priority: Medium
Target Version: v2.4.0
Status: ✅ Implemented in
include/geo/geo_clustering.h+src/geo/geo_clustering.cppWhat was implemented:
dbscanCluster(points, DbscanConfig)— density-based spatial clustering:kDbscanNoise(-1)kmeansCluster(points, KMeansConfig)— Lloyd's algorithm:seed == 0) ork-means++ probabilistic seeding (
seed != 0, LCG PRNG)spanning < a few hundred kilometres
tolerance_mcomputation
std::invalid_argumentwhen k == 0 or k > valid point counttests/geo/test_geo_clustering.cppPerformance Targets (design):
Scientific References:
Acceptance Criteria
dbscanCluster(points, DbscanConfig)— density-based spatial clustering:kDbscanNoise(-1)kmeansCluster(points, KMeansConfig)— Lloyd's algorithm:seed == 0) ortolerance_mstd::invalid_argumentwhen k == 0 or k > valid point counttests/geo/test_geo_clustering.cppRelationships
References
Generated from the consolidated source roadmap. Keep the roadmap and issue in sync wh...
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.