Skip to content
This repository was archived by the owner on Apr 13, 2026. It is now read-only.

Commit 2111985

Browse files
authored
Use slicks for scanning and discovery (#20)
* Use fixed origin for DATE_BIN alignment Ensure DATE_BIN uses a fixed origin (config.start) instead of the per-query t0 so bucket boundaries remain consistent across chunked/split queries. Adds a bin_origin variable and updates the SQL DATE_BIN(...) third argument to reference that origin, preventing misaligned bins when scanning runs in multiple segments. * Use slicks for scanning and sensor discovery Replace custom InfluxDB SQL scans with the slicks library: add slicks to requirements, delegate data-availability scanning to slicks.scan_data_availability and convert its ScanResult into the flat API dict format, and replace manual sensor SQL queries with slicks.discover_sensors (including fallback behavior). Update service connectivity check to use slicks.connect_influxdb3 / slicks.get_influx_client. Add docstrings and small refactors (private helper renames) while removing direct InfluxDBClient3 SQL query logic. * Slackbot code clean up Handle Slack event errors * Revert "Use slicks for scanning and sensor discovery" This reverts commit ab09885. * Sensor scan fallback logic update _build_sensor_fallback_range now finds the run with the longest duration and uses its time range for the fallback. * Use slicks for scanning and discovery * Update CI * Add status page, timezone fix, test utilities Add a simple HTML status page at / (app.py) that reports InfluxDB connectivity, runs, sensors and scanner status for quick debugging (imports HTMLResponse). Adjust periodic worker retry behavior to wait 60s after failures and keep the regular interval after successful scans (periodic_worker.py). Fix timezone handling in sensor discovery (sql.py) by converting UTC-aware datetimes to naive datetimes to work around slicks 0.1.3 generating invalid SQL for InfluxDB 3; apply same fix to fallback ranges. Add several debugging/reproduction scripts for inspecting slicks and InfluxDB behavior (installer/data-downloader/testing/*) to help reproduce timestamp/connection issues. Update docker-compose to inject INFLUX_URL, INFLUX_TOKEN and INFLUX_DB into the api and scanner services so slicks child processes can self-configure. * Add multi-season support to data-downloader Introduce multi-season configuration and handling across the data-downloader service and frontend. Key changes: - Config: add SeasonConfig and _parse_seasons to read SEASONS env var; expose seasons in Settings with sensible defaults. - Backend services: manage per-season Runs/Sensors repositories (file suffixing), return seasons list via /api/seasons, accept optional season parameter for runs/sensors/note/query endpoints, scan all configured seasons and store results per-season, and default to newest season when none provided. - Influx/query: fetch_signal_series now accepts a database override and service query routes pass season-specific DBs. - Scanner/Storage: adjust season start/end boundaries (season starts Aug previous year, ends Jan 1 next year); storage filenames include season suffix. - Periodic worker: support scheduling either by interval or daily time via SCAN_DAILY_TIME env var. - Frontend: fetch seasons, add season selector UI, propagate selected season to all API calls and adapt visuals (accent color), and add season type to types. - Docker compose: add SEASONS and SCAN_DAILY_TIME environment wiring and minor formatting fixes. - .gitignore: ignore installer/data-downloader/data. Fallbacks and defaults are provided when SEASONS is unset, and scanning continues per-season even if individual season scans fail. * Data Downloader Flowchart * Restore lappy_test_image.png Update .gitignore * Parse schema.table and pass to DB connector Handle config.table values that may include a schema (e.g. "iox.WFR25"). server_scanner.scan_runs now splits config.table into schema and table_name and passes them to slicks.connect_influxdb3; it also sets the local table variable to None to rely on the global configuration. sql.fetch_unique_sensors now includes schema and table when connecting as well. These changes ensure the DB connector receives explicit schema/table info and that scanning functions behave correctly when config.table is provided as "schema.table". * Update requirements.txt
1 parent 77d6a8a commit 2111985

27 files changed

Lines changed: 830 additions & 337 deletions

.github/workflows/docker-build.yml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,13 @@ jobs:
1717
- name: Checkout code
1818
uses: actions/checkout@v4
1919

20+
- name: Prepare environment file for CI
21+
run: |
22+
if [ ! -f installer/.env ]; then
23+
cp installer/.env.example installer/.env
24+
echo "Created .env file from .env.example"
25+
fi
26+
2027
- name: Validate docker-compose files
2128
run: |
2229
cd installer
@@ -32,7 +39,7 @@ jobs:
3239

3340
strategy:
3441
matrix:
35-
service: [slackbot, lappy, startup-data-loader, file-uploader]
42+
service: [slackbot, lap-detector, startup-data-loader, file-uploader]
3643

3744
steps:
3845
- name: Checkout code
@@ -52,7 +59,10 @@ jobs:
5259
- name: Extract metadata
5360
id: meta
5461
run: |
55-
echo -e "tags=${{ env.REGISTRY }}/${IMAGE_NAME_LOWER}/${{ matrix.service }}:latest\n${{ env.REGISTRY }}/${IMAGE_NAME_LOWER}/${{ matrix.service }}:${{ github.sha }}" >> $GITHUB_OUTPUT
62+
echo "tags<<EOF" >> $GITHUB_OUTPUT
63+
echo "${{ env.REGISTRY }}/${IMAGE_NAME_LOWER}/${{ matrix.service }}:latest" >> $GITHUB_OUTPUT
64+
echo "${{ env.REGISTRY }}/${IMAGE_NAME_LOWER}/${{ matrix.service }}:${{ github.sha }}" >> $GITHUB_OUTPUT
65+
echo "EOF" >> $GITHUB_OUTPUT
5666
5767
- name: Build and push Docker image
5868
uses: docker/build-push-action@v5

.github/workflows/stack-smoke-test.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,17 @@ jobs:
2929
- name: Ensure CI script is executable
3030
run: chmod +x ./dev-utils/ci/stack-smoke-test.sh
3131

32+
- name: Prepare environment file for CI
33+
run: |
34+
if [ ! -f installer/.env ]; then
35+
cp installer/.env.example installer/.env
36+
echo "Created .env file from .env.example"
37+
fi
38+
ls -la installer/.env
39+
3240
- name: Run installer smoke test
3341
env:
42+
CI: "true"
3443
DOCKER_BUILDKIT: "1"
3544
COMPOSE_DOCKER_CLI_BUILD: "1"
3645
run: ./dev-utils/ci/stack-smoke-test.sh

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,13 @@ installer/*.dbc
203203
!installer/example.dbc
204204

205205
installer/slackbot/logs/*
206+
207+
installer/sandbox/prompt-guide.txt
208+
209+
wfr-telemetry
210+
/installer/data-downloader/data
206211
installer/slackbot/*.png
212+
!installer/slackbot/lappy_test_image.png
207213
installer/slackbot/*.jpg
208214
installer/slackbot/*.jpeg
209215

installer/data-downloader/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,51 @@ Both JSON files are shared through the `./data` directory so every service (fron
2121
3. Open http://localhost:3000 to access the web UI, and keep the API running on http://localhost:8000 if you want to call it directly.
2222

2323
## Runtime behaviour
24+
```mermaid
25+
sequenceDiagram
26+
participant Worker as periodic_worker.py
27+
participant Service as DataDownloaderService
28+
participant Scanner as server_scanner.py
29+
participant Slicks as slicks library
30+
participant InfluxDB as InfluxDB3
31+
participant Storage as JSON Storage
32+
33+
Worker->>Service: run_full_scan(source="periodic")
34+
Service->>Service: Sort seasons by year (newest first)
35+
36+
loop For each season (WFR25, WFR26)
37+
Service->>Scanner: scan_runs(ScannerConfig{<br/>database: season.database,<br/>year: season.year})
38+
Scanner->>Slicks: connect_influxdb3(url, token, db)
39+
Scanner->>Slicks: scan_data_availability(start, end, table, bin_size)
40+
41+
loop Adaptive scanning (inside slicks)
42+
Slicks->>InfluxDB: Try query_grouped_bins()<br/>(DATE_BIN + COUNT(*))
43+
alt Success
44+
InfluxDB-->>Slicks: Return bins with counts
45+
else Failure (timeout/size)
46+
Slicks->>Slicks: Binary subdivision
47+
Slicks->>InfluxDB: query_exists_per_bin()<br/>(SELECT 1 LIMIT 1 per bin)
48+
InfluxDB-->>Slicks: Return existence flags
49+
end
50+
end
51+
52+
Slicks-->>Scanner: ScanResult (windows)
53+
Scanner-->>Service: List[dict] (formatted runs)
54+
55+
Service->>Service: fetch_unique_sensors(season.database)
56+
Service->>Storage: runs_repos[season.name].merge_scanned_runs(runs)
57+
Storage-->>Storage: Atomic write to runs_WFR25.json
58+
Service->>Storage: sensors_repos[season.name].write_sensors(sensors)
59+
Storage-->>Storage: Atomic write to sensors_WFR25.json
60+
61+
alt Season scan failed
62+
Service->>Service: Log error, continue to next season
63+
end
64+
end
65+
66+
Service->>Storage: status_repo.mark_finish(success)
67+
Storage-->>Storage: Update scanner_status.json
68+
```
2469

2570
- `frontend` serves the compiled React bundle via nginx and now proxies `/api` requests (including `/api/scan` and `/api/scanner-status`) directly to the FastAPI container. When the UI is loaded from anything other than `localhost`, the client automatically falls back to relative `/api/...` calls so a single origin on a VPS still reaches the backend. Override `VITE_API_BASE_URL` if you want the UI to talk to a different host (for example when running `npm run dev` locally) and keep that host in `ALLOWED_ORIGINS`.
2671
- `api` runs `uvicorn backend.app:app`, exposing

installer/data-downloader/backend/app.py

Lines changed: 86 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from fastapi import BackgroundTasks, FastAPI, HTTPException
66
from fastapi.middleware.cors import CORSMiddleware
7+
from fastapi.responses import HTMLResponse
78
from pydantic import BaseModel
89

910
from backend.config import get_settings
@@ -40,14 +41,19 @@ def healthcheck() -> dict:
4041
return {"status": "ok"}
4142

4243

44+
@app.get("/api/seasons")
45+
def list_seasons() -> List[dict]:
46+
return service.get_seasons()
47+
48+
4349
@app.get("/api/runs")
44-
def list_runs() -> dict:
45-
return service.get_runs()
50+
def list_runs(season: str | None = None) -> dict:
51+
return service.get_runs(season=season)
4652

4753

4854
@app.get("/api/sensors")
49-
def list_sensors() -> dict:
50-
return service.get_sensors()
55+
def list_sensors(season: str | None = None) -> dict:
56+
return service.get_sensors(season=season)
5157

5258

5359
@app.get("/api/scanner-status")
@@ -56,10 +62,10 @@ def scanner_status() -> dict:
5662

5763

5864
@app.post("/api/runs/{key}/note")
59-
def save_note(key: str, payload: NotePayload) -> dict:
60-
run = service.update_note(key, payload.note.strip())
65+
def save_note(key: str, payload: NotePayload, season: str | None = None) -> dict:
66+
run = service.update_note(key, payload.note.strip(), season=season)
6167
if not run:
62-
raise HTTPException(status_code=404, detail=f"Run {key} not found")
68+
raise HTTPException(status_code=404, detail=f"Run {key} not found (season={season})")
6369
return run
6470

6571

@@ -69,7 +75,77 @@ def trigger_scan(background_tasks: BackgroundTasks) -> dict:
6975
return {"status": "scheduled"}
7076

7177

72-
@app.post("/api/data/query")
73-
def query_data(payload: DataQueryPayload) -> dict:
78+
@app.post("/api/query")
79+
def query_signal(payload: DataQueryPayload, season: str | None = None) -> dict:
7480
limit = None if payload.no_limit else (payload.limit or 2000)
75-
return service.query_signal_series(payload.signal, payload.start, payload.end, limit)
81+
return service.query_signal_series(
82+
payload.signal,
83+
payload.start,
84+
payload.end,
85+
limit,
86+
season=season
87+
)
88+
89+
90+
@app.get("/", response_class=HTMLResponse)
91+
def index():
92+
"""Simple status page for debugging."""
93+
influx_status = "Unknown"
94+
influx_color = "gray"
95+
try:
96+
service._log_influx_connectivity()
97+
influx_status = "Connected"
98+
influx_color = "green"
99+
except Exception as e:
100+
influx_status = f"Error: {e}"
101+
influx_color = "red"
102+
103+
# Default to first season for overview
104+
runs = service.get_runs()
105+
sensors = service.get_sensors()
106+
scanner_status = service.get_scanner_status()
107+
seasons_list = service.get_seasons()
108+
seasons_html = ", ".join([f"{s['name']} ({s['year']})" for s in seasons_list])
109+
110+
html = f"""
111+
<!DOCTYPE html>
112+
<html>
113+
<head>
114+
<title>DAQ Data Downloader Status</title>
115+
<style>
116+
body {{ font-family: sans-serif; max-width: 800px; margin: 2rem auto; line-height: 1.6; }}
117+
h1 {{ border-bottom: 2px solid #eee; padding-bottom: 0.5rem; }}
118+
.card {{ border: 1px solid #ddd; border-radius: 8px; padding: 1.5rem; margin-bottom: 1.5rem; }}
119+
.status-ok {{ color: green; font-weight: bold; }}
120+
.status-err {{ color: red; font-weight: bold; }}
121+
code {{ background: #f4f4f4; padding: 2px 5px; border-radius: 4px; }}
122+
</style>
123+
</head>
124+
<body>
125+
<h1>DAQ Data Downloader Status</h1>
126+
127+
<div class="card">
128+
<h2>System Status</h2>
129+
<p><strong>InfluxDB Connection:</strong> <span style="color: {influx_color}">{influx_status}</span></p>
130+
<p><strong>Scanner Status:</strong> {scanner_status.get('status', 'Unknown')} (Last run: {scanner_status.get('last_run', 'Never')})</p>
131+
<p><strong>API Version:</strong> 1.1.0 (Multi-Season Support)</p>
132+
</div>
133+
134+
<div class="card">
135+
<h2>Active Config</h2>
136+
<p><strong>Seasons Configured:</strong> {seasons_html}</p>
137+
</div>
138+
139+
<div class="card">
140+
<h2>Default Season Stats ({seasons_list[0]['name'] if seasons_list else 'None'})</h2>
141+
<ul>
142+
<li><strong>Runs Found:</strong> {len(runs.get('runs', []))}</li>
143+
<li><strong>Sensors Found:</strong> {len(sensors.get('sensors', []))}</li>
144+
</ul>
145+
</div>
146+
147+
<p><a href="/docs">API Docs</a> | <a href="/api/seasons">JSON Seasons List</a> | <a href="http://localhost:3000">Frontend</a></p>
148+
</body>
149+
</html>
150+
"""
151+
return HTMLResponse(content=html)

installer/data-downloader/backend/config.py

Lines changed: 53 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,76 @@ def _parse_origins(raw: str | None) -> List[str]:
1212
return [origin.strip() for origin in raw.split(",") if origin.strip()]
1313

1414

15+
class SeasonConfig(BaseModel):
16+
name: str # e.g. "WFR25"
17+
year: int # e.g. 2025
18+
database: str # e.g. "WFR25"
19+
color: str | None = None # e.g. "222 76 153"
20+
21+
22+
def _parse_seasons(raw: str | None) -> List[SeasonConfig]:
23+
"""Parse SEASONS env var: "WFR25:2025:222 76 153,WFR26:2026:..."."""
24+
if not raw:
25+
# Default fallback if not set
26+
return [SeasonConfig(name="WFR25", year=2025, database="WFR25", color="#DE4C99")]
27+
28+
seasons = []
29+
for part in raw.split(","):
30+
part = part.strip()
31+
if not part:
32+
continue
33+
try:
34+
# Split into at most 3 parts: Name, Year, Color
35+
parts = part.split(":", 2)
36+
name = parts[0]
37+
38+
if len(parts) >= 2:
39+
year = int(parts[1])
40+
else:
41+
# Malformed or simple format not supported purely by regex?
42+
# Actually if just "WFR25", split gives ['WFR25']
43+
# require at least year
44+
continue
45+
46+
color = parts[2] if len(parts) > 2 else None
47+
48+
# Assume DB name matches Season Name
49+
seasons.append(SeasonConfig(name=name, year=year, database=name, color=color))
50+
except ValueError:
51+
continue
52+
53+
if not seasons:
54+
return [SeasonConfig(name="WFR25", year=2025, database="WFR25")]
55+
56+
# Sort by year descending (newest first)
57+
seasons.sort(key=lambda s: s.year, reverse=True)
58+
return seasons
59+
60+
1561
class Settings(BaseModel):
1662
"""Centralised configuration pulled from environment variables."""
1763

1864
data_dir: str = Field(default_factory=lambda: os.getenv("DATA_DIR", "./data"))
1965

2066
influx_host: str = Field(default_factory=lambda: os.getenv("INFLUX_HOST", "http://localhost:9000"))
2167
influx_token: str = Field(default_factory=lambda: os.getenv("INFLUX_TOKEN", ""))
22-
influx_database: str = Field(default_factory=lambda: os.getenv("INFLUX_DATABASE", "WFR25"))
68+
69+
# Global/Default Influx settings (used for connectivity check or default fallback)
2370
influx_schema: str = Field(default_factory=lambda: os.getenv("INFLUX_SCHEMA", "iox"))
2471
influx_table: str = Field(default_factory=lambda: os.getenv("INFLUX_TABLE", "WFR25"))
2572

26-
scanner_year: int = Field(default_factory=lambda: int(os.getenv("SCANNER_YEAR", "2025")))
27-
scanner_bin: str = Field(default_factory=lambda: os.getenv("SCANNER_BIN", "hour")) # hour or day
73+
seasons: List[SeasonConfig] = Field(default_factory=lambda: _parse_seasons(os.getenv("SEASONS")))
74+
75+
# Scanner settings common to all seasons (unless we want per-season granularity later)
76+
scanner_bin: str = Field(default_factory=lambda: os.getenv("SCANNER_BIN", "hour"))
2877
scanner_include_counts: bool = Field(default_factory=lambda: os.getenv("SCANNER_INCLUDE_COUNTS", "true").lower() == "true")
2978
scanner_initial_chunk_days: int = Field(default_factory=lambda: int(os.getenv("SCANNER_INITIAL_CHUNK_DAYS", "31")))
3079

3180
sensor_window_days: int = Field(default_factory=lambda: int(os.getenv("SENSOR_WINDOW_DAYS", "7")))
3281
sensor_lookback_days: int = Field(default_factory=lambda: int(os.getenv("SENSOR_LOOKBACK_DAYS", "30")))
3382

3483
periodic_interval_seconds: int = Field(default_factory=lambda: int(os.getenv("SCAN_INTERVAL_SECONDS", "3600")))
84+
scan_daily_time: str | None = Field(default_factory=lambda: os.getenv("SCAN_DAILY_TIME"))
3585

3686
allowed_origins: List[str] = Field(default_factory=lambda: _parse_origins(os.getenv("ALLOWED_ORIGINS", "*")))
3787

installer/data-downloader/backend/influx_queries.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@ def _normalize(dt: datetime) -> datetime:
1414
return dt.astimezone(timezone.utc)
1515

1616

17-
def fetch_signal_series(settings: Settings, signal: str, start: datetime, end: datetime, limit: int | None) -> dict:
17+
def fetch_signal_series(
18+
settings: Settings,
19+
signal: str,
20+
start: datetime,
21+
end: datetime,
22+
limit: int | None,
23+
database: str | None = None
24+
) -> dict:
1825
start_dt = _normalize(start)
1926
end_dt = _normalize(end)
2027
if start_dt >= end_dt:
@@ -35,8 +42,11 @@ def fetch_signal_series(settings: Settings, signal: str, start: datetime, end: d
3542
AND time <= TIMESTAMP '{end_dt.isoformat()}'
3643
ORDER BY time{limit_clause}
3744
"""
45+
46+
# Use provided database or fallback to default setting
47+
target_db = database if database else settings.influx_database
3848

39-
with InfluxDBClient3(host=settings.influx_host, token=settings.influx_token, database=settings.influx_database) as client:
49+
with InfluxDBClient3(host=settings.influx_host, token=settings.influx_token, database=target_db) as client:
4050
tbl = client.query(sql)
4151
points = []
4252
for idx in range(tbl.num_rows):
@@ -56,6 +66,7 @@ def fetch_signal_series(settings: Settings, signal: str, start: datetime, end: d
5666
"start": start_dt.isoformat(),
5767
"end": end_dt.isoformat(),
5868
"limit": limit,
69+
"database": target_db,
5970
"row_count": len(points),
6071
"points": points,
6172
"sql": " ".join(line.strip() for line in sql.strip().splitlines()),

0 commit comments

Comments
 (0)