diff --git a/.env.example b/.env.example index 1474a13..ccd6d8a 100644 --- a/.env.example +++ b/.env.example @@ -28,3 +28,51 @@ NEO4J_AUTH=neo4j/neo4jpassword # ------------------------------ NEO4J_HEAP_INIT=1G NEO4J_HEAP_MAX=2G + +# ------------------------------ +# GraphRAG / ChatSEEK Integration +# ------------------------------ +# Enable GraphRAG features (set to 1 to enable) +# REQUIRES: pip install -r requirements-graphrag.txt +# (installs PyTorch + CUDA stack, 4-8GB download) +# Lightweight install: pip install -r requirements.txt (chat disabled) +SCIDK_GRAPHRAG_ENABLED=0 + +# LLM provider: "local_ollama" (default), "openai", or "azure_openai" +SCIDK_GRAPHRAG_LLM_PROVIDER=local_ollama + +# Ollama model name (default: llama3:8b) +SCIDK_GRAPHRAG_MODEL=llama3:8b + +# Schema cache TTL in seconds (default: 300) +SCIDK_GRAPHRAG_SCHEMA_CACHE_TTL_SEC=300 + +# Verbose mode: include extracted entities and Cypher in responses (0 or 1) +SCIDK_GRAPHRAG_VERBOSE=0 + +# Query library path for curated Cypher examples (Sprint 1) +SCIDK_QUERY_LIBRARY_PATH=query_library/scidk_queries.yaml + +# Privacy filters (comma-separated, optional) +SCIDK_GRAPHRAG_ALLOW_LABELS= +SCIDK_GRAPHRAG_DENY_LABELS= +SCIDK_GRAPHRAG_EXCLUDE_PROPERTIES= + +# ------------------------------ +# Chat Provider Settings (for reasoning path) +# ------------------------------ +# Chat LLM provider: "ollama" (default), "anthropic", "openai" +SCIDK_CHAT_LLM_PROVIDER=ollama + +# Ollama settings +SCIDK_CHAT_OLLAMA_ENDPOINT=http://localhost:11434 +SCIDK_CHAT_OLLAMA_MODEL=llama3.1:8b + +# Claude settings (optional, for reasoning path) +SCIDK_CHAT_CLAUDE_API_KEY= + +# OpenAI settings (optional, for reasoning path) +SCIDK_CHAT_OPENAI_API_KEY= + +# Anthropic key for entity extraction (optional, falls back to pattern matching) +SCIDK_ANTHROPIC_API_KEY= diff --git a/.gitignore b/.gitignore index 61d705d..a544053 100644 --- a/.gitignore +++ b/.gitignore @@ -70,3 +70,8 @@ dev/code-imports/nc3rsEDA/ # Backups are for local work, not the repository backups/ + +# Documentation cleanup - archived and temporary docs +dev/test-runs/tmp/ +dev/code-imports/ +docs/archive/ diff --git a/DEMO_PROGRESS_INDICATORS.md b/DEMO_PROGRESS_INDICATORS.md deleted file mode 100644 index 79138a2..0000000 --- a/DEMO_PROGRESS_INDICATORS.md +++ /dev/null @@ -1,208 +0,0 @@ -# Demo: Progress Indicators for Long Operations - -This document provides demo steps for showcasing the progress indicators feature in SciDK. - -## Feature Overview - -**What it does**: Provides real-time visual feedback during long-running operations (scans, commits, reconciliations) including: -- Progress bars with percentage completion -- Real-time status updates (e.g., "Processing file 50/200...") -- Estimated time remaining -- Cancel button to abort operations -- Responsive UI that doesn't block during operations - -## Prerequisites - -1. SciDK application running (default: http://localhost:5000) -2. A directory with multiple files for scanning (20+ files recommended for visible progress) - -## Demo Steps - -### 1. Demonstrate Background Scan with Progress Tracking - -**Goal**: Show progress bar, status updates, and ETA during a scan operation. - -**Steps**: -1. Navigate to the Files page (`/datasets`) -2. In the "Provider Browser" section: - - Select "Filesystem" as the provider - - Select or enter a directory path with 20+ files - - Click "🔍 Scan This Folder" -3. Observe the "Scans Summary" section below: - - **Progress bar appears** showing completion percentage - - **Status message updates** in real-time (e.g., "Processing 50/200 files... (25/s)") - - **ETA displays** time remaining (e.g., "~2m remaining") - - Progress bar color: blue (running) → green (completed) - -**Expected Output**: -``` -scan running — /path/to/data — 50/200 (25%) — Processing 50/200 files... (25/s) — ~1m remaining [Cancel] -[Progress bar: ████████░░░░░░░░ 25%] -``` - -### 2. Demonstrate Real-Time Status Updates - -**Goal**: Show different status messages as the scan progresses. - -**Steps**: -1. Start a scan on a large directory (100+ files) -2. Watch the status message change through different phases: - - "Initializing scan..." - - "Counting files..." - - "Processing 500 files..." - - "Processing 150/500 files... (50/s)" - -**What to highlight**: -- Status messages provide context about what's happening -- Messages update automatically without page refresh -- Processing rate (files/second) is calculated and displayed - -### 3. Demonstrate Commit Progress - -**Goal**: Show progress tracking for Neo4j commit operations. - -**Steps**: -1. Complete a scan first (or use an existing scan) -2. In the "Scans Summary" section, find your scan -3. Click "Commit to Graph" button -4. Observe progress updates: - - "Preparing commit..." - - "Committing to in-memory graph..." - - "Building commit rows..." - - "Built commit rows: 200 files, 50 folders" - - "Writing to Neo4j..." - -**Expected Output**: -``` -commit running — /path/to/data — 200/201 (99%) — Writing to Neo4j... -[Progress bar: ███████████████░ 99%] -``` - -### 4. Demonstrate Cancel Functionality - -**Goal**: Show that long-running operations can be canceled. - -**Steps**: -1. Start a scan on a large directory (500+ files) -2. While the scan is running, locate the "Cancel" button next to the task -3. Click "Cancel" -4. Observe: - - Task status changes to "canceled" - - Progress bar stops updating - - Operation terminates gracefully - -**What to highlight**: -- Cancel button only appears for running tasks -- Canceled tasks are marked clearly -- System remains stable after cancellation - -### 5. Demonstrate UI Responsiveness - -**Goal**: Show that the UI remains interactive during long operations. - -**Steps**: -1. Start a long-running scan (100+ files) -2. While scan is in progress, try these interactions: - - Click the "Refresh" button → Works immediately - - Browse to a different folder → Navigation works - - Click through tabs → UI remains responsive - - Start another scan (up to 2 concurrent tasks) → Works - -**What to highlight**: -- Page doesn't freeze or become unresponsive -- Background tasks run independently -- User can continue working while operations complete - -### 6. Demonstrate Multiple Concurrent Tasks - -**Goal**: Show that multiple operations can run simultaneously with individual progress tracking. - -**Steps**: -1. Start a scan on directory A -2. Immediately start a scan on directory B -3. Observe: - - Both scans show independent progress bars - - Each has its own status message and ETA - - Both complete successfully - -**System Limits**: -- Default: Maximum 2 concurrent background tasks -- Configurable via `SCIDK_MAX_BG_TASKS` environment variable - -### 7. Demonstrate Progress History - -**Goal**: Show completed tasks remain visible for reference. - -**Steps**: -1. Complete several scan/commit operations -2. Observe the "Scans Summary" section: - - Completed tasks show "completed" status - - Progress bars are green - - All metadata preserved (file count, duration, path) - - Click scan ID or path to view details - -## Key Features Demonstrated - -✅ **Progress bars** - Visual indication of completion percentage -✅ **Real-time status updates** - "Processing file 50/200..." -✅ **Estimated time remaining** - "~2m remaining" -✅ **UI remains responsive** - No blocking during operations -✅ **Cancel button** - Ability to abort long operations -✅ **Processing rate** - Shows files/second throughput -✅ **Multiple concurrent tasks** - Up to 2 operations simultaneously -✅ **Graceful completion** - Green progress bar when done - -## Technical Details - -### Architecture -- **Backend**: Python threading for background tasks in `/api/tasks` endpoint -- **Frontend**: JavaScript polling (1-second interval) to fetch task status -- **Progress Calculation**: `processed / total` for percentage, rate-based ETA - -### API Endpoints -- `POST /api/tasks` - Create background task (scan or commit) -- `GET /api/tasks` - List all tasks with progress -- `GET /api/tasks/` - Get specific task details -- `POST /api/tasks//cancel` - Cancel running task - -### Progress Fields -```json -{ - "id": "task_id_here", - "type": "scan", - "status": "running", - "progress": 0.5, - "processed": 100, - "total": 200, - "eta_seconds": 120, - "status_message": "Processing 100/200 files... (50/s)", - "started": 1234567890.0, - "ended": null -} -``` - -## Troubleshooting - -**Progress not updating**: -- Check browser console for errors -- Verify polling is active (1-second interval) -- Check backend logs for task worker errors - -**ETA not shown**: -- ETA calculated after processing >10 files -- Very fast operations may complete before ETA displays -- This is normal behavior - -**Tasks stuck at "running"**: -- Check backend process isn't hung -- Verify file permissions for scan directory -- Check system resources (CPU, memory) - -## Future Enhancements (Not in This Release) - -- Server-Sent Events (SSE) for more efficient real-time updates -- WebSocket support for instant progress streaming -- Estimated time remaining for commit operations -- Detailed operation logs accessible from UI -- Resume capability for canceled operations -- Priority queue for task scheduling diff --git a/DEMO_SETUP.md b/DEMO_SETUP.md deleted file mode 100644 index 3aad845..0000000 --- a/DEMO_SETUP.md +++ /dev/null @@ -1,583 +0,0 @@ -# SciDK Demo Setup Guide - -Quick reference for running and testing the SciDK application. - -## Prerequisites - -- Python 3.9+ installed -- Node.js (for E2E tests) -- Docker (for Neo4j, optional but recommended) -- Rclone (optional, for remote provider testing) - -## Quick Start - -### 1. Start Neo4j (Optional but Recommended) - -```bash -# Start Neo4j in Docker -docker-compose -f docker-compose.neo4j.yml up -d - -# Neo4j will be available at: -# - Browser: http://localhost:7474 -# - Bolt: bolt://localhost:7687 -# - Default credentials: neo4j / your-password-here -``` - -### 2. Activate Python Environment - -```bash -# Activate virtual environment -source .venv/bin/activate - -# Or on some systems: -. .venv/bin/activate - -# Verify activation (should show .venv path) -which python -``` - -### 3. Start the Application - -```bash -# RECOMMENDED: Use the scidk-serve command -scidk-serve - -# Alternative: Run as module (also works after the fix) -python -m scidk - -# Server starts at: http://127.0.0.1:5000 -``` - -**Note**: Use `scidk-serve` or `python -m scidk` (not `python -m scidk.app`) to avoid import path issues with test stubs. - -### 4. Access the Application - -Open your browser and navigate to: **http://127.0.0.1:5000** - -## Page Navigation Quick Reference - -| Page | URL | Purpose | -|------|-----|---------| -| **Home** | `/` | Landing page, search, filters, quick chat | -| **Chat** | `/chat` | Full chat interface (multi-user) | -| **Files** | `/datasets` | Browse files, scans, snapshots, data cleaning | -| **Map** | `/map` | Graph visualization (Neo4j + local schema) | -| **Labels** | `/labels` | Graph schema management (3-column layout) | -| **Links** | `/links` | Link definition wizard | -| **Extensions** | `/extensions` | Plugin/extension management | -| **Integrations** | `/integrations` | External service integrations | -| **Settings** | `/settings` | Neo4j, interpreters, rclone, chat, plugins | -| **Login** | `/login` | User authentication | - -## Creating Test Data - -### Option 1: Scan Local Directory - -1. Navigate to **Files** page (`/datasets`) -2. Select "Provider Browser" tab -3. Choose provider: `filesystem` -4. Select or enter a directory path (e.g., `/home/user/Documents`) -5. Check "Recursive" if needed -6. Click **"Go"** to browse -7. Click **"Scan"** to index files - -### Option 2: Use Test Data Script - -```bash -# Run a test scan on the project itself -python -c " -from scidk.core import filesystem -from scidk.app import create_app - -app = create_app() -with app.app_context(): - ext = app.extensions['scidk'] - # Scan the docs folder - result = ext['graph'].scan_source( - provider='filesystem', - root_id='/', - path='docs', - recursive=True - ) - print(f'Scanned {len(result.get(\"checksums\", []))} files') -" -``` - -### Option 3: Use Existing Test Fixtures - -The test suite creates temporary test data. You can reference `tests/conftest.py` for fixture patterns. - -## Common Demo Workflows - -### Workflow 1: File Discovery & Viewing - -1. **Scan** a directory (Files page) -2. **Browse** snapshot results -3. **Click** on a file to view details -4. **View** interpretations (CSV table, JSON tree, etc.) -5. **Navigate** back to files list - -### Workflow 2: Graph Visualization - -#### Option A: Using Local Labels -1. **Navigate** to Labels page (`/labels`) -2. **Create** a new label (e.g., "Project") -3. **Add** properties (e.g., name: string, budget: number) -4. **Define** relationships (e.g., "HAS_FILE" → File) -5. **Save** the label -6. **Navigate** to Map page (`/map`) -7. **Select** "Local Labels" from Source dropdown -8. **View** schema visualization (nodes appear in red = definition only, no instances) -9. **Observe** relationships shown as edges - -#### Option B: Using Neo4j Schema -1. **Navigate** to Settings (`/settings`) -2. **Connect** to Neo4j (configure URI, username, password) -3. **Test** connection to verify it works -4. **Navigate** to Labels page (`/labels`) -5. **Click** "Pull from Neo4j" to sync schema -6. **Navigate** to Map page (`/map`) -7. **Select** "Neo4j Schema" from Source dropdown -8. **View** schema pulled from database (nodes in green) - -#### Option C: Combined View (Default) -1. **Scan** files and commit to Neo4j (Files page) -2. **Navigate** to Map page (`/map`) -3. **Source** defaults to "All Sources" -4. **View** combined graph with color-coded nodes: - - **Blue**: In-memory graph (actual scanned data) - - **Red**: Local labels (definitions only, no instances) - - **Green**: Neo4j schema (pulled from database) - - **Orange/Purple/Teal/Yellow**: Mixed sources -5. **Filter** by labels/relationships (dropdowns populate dynamically) -6. **Adjust** layout and appearance -7. **Interact** with nodes (click, drag) - -### Workflow 3: Schema Management - -1. **Navigate** to Labels page -2. **Create** a new label (e.g., "Dataset") -3. **Add** properties (e.g., name: string, size: int) -4. **Define** relationships (e.g., "HAS_FILE") -5. **Push** schema to Neo4j - -#### Import/Export with Arrows.app - -**Import from Arrows.app:** -1. Design schema at https://arrows.app -2. Export JSON from Arrows (File → Export → JSON) -3. In scidk, navigate to Labels page -4. Click "Import from Arrows.app" -5. Paste JSON or upload file -6. Click "Import" to create labels - -**Export to Arrows.app:** -1. Navigate to Labels page -2. Click "Export to Arrows.app" -3. Download JSON file -4. Open https://arrows.app -5. Import file (File → Import → From JSON) -6. View/edit schema in Arrows - -### Workflow 4: Integration & Link Creation - -**Option A: Configure External API Integration** -1. **Navigate** to Integrations page (`/integrations`) -2. **Configure** external service (API endpoint, auth) -3. **Test** connection to verify it works -4. **Save** integration configuration -5. **Navigate** to Links page to use the integration - -**Option B: Direct Link Creation** -1. **Navigate** to Links page (`/links`) -2. **Create** new link definition -3. **Choose** data source (CSV, API, or Cypher) -4. **Configure** source and target labels -5. **Preview** link results -6. **Execute** link to create relationships -7. **View** in Map - -### Workflow 5: Search & Chat - -**Quick Chat (from Home):** -1. **Home page**: Enter search query OR use quick chat input -2. **View** results filtered by type -3. **Get** inline responses without leaving home - -**Full Chat Interface:** -1. **Navigate** to Chat page (`/chat`) -2. **Login** if using multi-user mode -3. **Ask** questions about indexed files -4. **Get** context-aware responses with file references -5. **View** conversation history (persisted per user) - -### Workflow 6: Data Cleaning - -1. **Navigate** to Files page (`/datasets`) -2. **Browse** snapshot or search for files -3. **Select** files to delete (individual or bulk) -4. **Click** delete button -5. **Confirm** deletion -6. **System** automatically cleans up: - - File nodes from graph - - Associated relationships - - Orphaned link records -7. **View** updated file list - -## Configuration for Demo - -### First-Time Setup: User Authentication - -1. **Navigate** to Login page (`/login`) - or you'll be redirected on first visit -2. **Create** an account (if no users exist, first user becomes admin) -3. **Login** with username/password -4. **Note**: Multi-user mode supports: - - Role-based access control (Admin/User) - - Per-user chat history - - Session management with auto-lock after inactivity - -### Neo4j Connection - -1. Navigate to **Settings** page (`/settings`) -2. Click **"Neo4j"** tab in settings -3. Enter Neo4j details: - - URI: `bolt://localhost:7687` - - Username: `neo4j` - - Database: `neo4j` - - Password: `[your password]` -4. Click **"Save Settings"** -5. Click **"Connect"** to test connection -6. Success message confirms connection - -### Interpreter Configuration - -1. On **Settings** page, click **"Interpreters"** tab -2. Enable desired interpreters: - - CSV, JSON, YAML (common formats) - - Python, Jupyter (code files) - - Excel (workbooks) -3. Configure advanced settings: - - Suggest threshold - - Batch size -4. Click **"Save"** to apply changes - -### Rclone Mounts (Optional) - -1. On **Settings** page, click **"Rclone"** tab -2. Configure remote: - - Remote: `myremote:` - - Subpath: `/folder/path` - - Name: `MyRemote` - - Read-only: checked (recommended for demo) -3. Click **"Create Mount"** -4. Click **"Refresh Mounts"** to see updated list - -### Chat Backend Configuration - -1. On **Settings** page, click **"Chat"** tab -2. Configure chat backend: - - LLM service endpoint - - API key (if required) - - Context settings -3. Click **"Save Settings"** -4. Test by sending a message from Home or Chat page - -### External Service Integrations - -1. Navigate to **Integrations** page (`/integrations`) -2. Select an integration to configure -3. Enter service-specific settings: - - API endpoint URL - - Authentication credentials (encrypted at rest) - - JSONPath extraction (optional) - - Target label mapping (optional) -4. Click **"Test Connection"** to verify -5. Click **"Save"** to enable integration - -**OR** configure in Settings: -1. On **Settings** page, click **"Integrations"** tab -2. Scroll to "API Endpoint Mappings" -3. Configure endpoint: - - **Name**: Descriptive name (e.g., "Users API") - - **URL**: Full API endpoint (e.g., `https://api.example.com/users`) - - **Auth Method**: None, Bearer Token, or API Key - - **Auth Value**: Token/key if authentication required - - **JSONPath**: Extract specific data (e.g., `$.data[*]`) - - **Maps to Label**: Target label for imported data -4. Click **"Test Connection"** to verify -5. Click **"Save Endpoint"** to register - -**Using Integrations in Links:** -- Registered endpoints appear in Links wizard -- Select an endpoint as a data source -- Field mappings auto-populate from endpoint config - -**Security Notes:** -- Auth tokens encrypted at rest in settings database -- Set `SCIDK_API_ENCRYPTION_KEY` environment variable for production -- Without this variable, ephemeral key is generated (not persistent across restarts) - -**Example: JSONPlaceholder Test API** -``` -Name: JSONPlaceholder Users -URL: https://jsonplaceholder.typicode.com/users -Auth Method: None -JSONPath: $[*] -Maps to Label: User -``` - -### Configuration Backup & Restore - -1. On **Settings** page, click **"General"** tab -2. Scroll to "Configuration Management" -3. **Export** settings: - - Click **"Export Settings"** - - Download JSON backup file -4. **Import** settings: - - Click **"Import Settings"** - - Select JSON backup file - - Confirm import - - Application restores all configurations - -## Troubleshooting - -### Application Won't Start - -```bash -# Check if port 5000 is already in use -lsof -i :5000 - -# Use a different port -SCIDK_PORT=5001 scidk-serve -``` - -### Neo4j Connection Fails (502 Error) - -**If you get a 502 error when connecting to Neo4j:** -- Make sure you're using `scidk-serve` or `python -m scidk` (not `python -m scidk.app`) -- The issue is caused by a local test stub shadowing the real neo4j package -- See "Technical Note: Import Path Issue" below for details - -**Other Neo4j issues:** -- Verify Neo4j is running: `docker ps | grep neo4j` -- Check credentials match Settings page -- Ensure bolt port 7687 is accessible -- Check logs: `docker logs ` - -### No Files Showing - -- Verify scan completed successfully -- Check database file exists: `ls -la *.db` -- Check console for errors -- Try scanning a small directory first - -### Interpreter Not Working - -- Verify interpreter enabled in Settings -- Check file extension matches interpreter -- Review Python console for import errors -- Ensure required packages installed (see `requirements.txt`) - -### Map Page Empty - -- Ensure Neo4j connected (Settings page) -- Verify schema committed (Labels page → Push to Neo4j) -- Verify files committed (Files page → Commit button) -- Check Neo4j browser: http://localhost:7474 - -## Demo Tips - -### Before the Demo - -- [ ] Start Neo4j before application -- [ ] Clear test/sensitive data from database -- [ ] Prepare interesting dataset (variety of file types) -- [ ] Pre-scan dataset so demo isn't waiting for scan -- [ ] Test Neo4j connection in Settings -- [ ] Have 2-3 example questions ready for Chat - -### During the Demo - -**Suggested Demo Flow:** -1. **Login**: Show authentication (multi-user support) -2. **Home Page**: - - Demonstrate search with filters - - Show summary cards (file count, scan count, extensions) - - Try quick chat input (inline responses) -3. **Files Workflow**: - - Browse → Scan → Snapshot → File Detail → Interpretation - - Show data cleaning (delete files, auto-cleanup relationships) -4. **Labels Page**: - - Show 3-column layout (list, editor, instance browser) - - Create/edit label with properties - - Define relationships - - Show keyboard navigation (arrow keys, Enter, Escape) - - Push schema to Neo4j -5. **Map Visualization**: - - Show combined view (in-memory + local labels + Neo4j schema) - - Demonstrate filters (labels, relationships) - - Show color-coding (blue/red/green for different sources) - - Adjust layout and appearance controls -6. **Integrations**: - - Configure external API endpoint - - Test connection - - Show encrypted credential storage -7. **Links Creation**: - - Quick wizard walkthrough - - Use configured integration as data source - - Preview and execute to create relationships -8. **Chat Interface**: - - Ask context-aware questions about indexed files - - Show conversation history (persisted per user) - - Demonstrate file references in responses -9. **Settings**: - - Show modular settings tabs (Neo4j, Interpreters, Rclone, Chat, etc.) - - Demonstrate configuration backup/restore - -### Known Limitations (to mention if asked) - -- Scans are synchronous (page waits for completion) -- Very large files (>10MB) may have limited preview -- Chat requires external LLM service configuration -- Map rendering slows with 1000+ nodes -- Rclone features require rclone installed on system -- Session auto-locks after inactivity (configurable timeout) - -## Testing the Application - -### Run E2E Tests - -```bash -# Ensure app is running on http://127.0.0.1:5000 - -# In a separate terminal: -npm run e2e -``` - -### Run Unit Tests - -```bash -# All tests -pytest tests/ - -# Specific test file -pytest tests/test_scan_browse_indexed.py - -# With coverage report -python -m coverage run -m pytest tests/ -python -m coverage report -``` - -### Manual Testing - -See **`dev/ux-testing-checklist.md`** for comprehensive page-by-page testing guide. - -## Stopping the Application - -### Stop Flask - -Press `Ctrl+C` in the terminal running the Flask app - -### Stop Neo4j - -```bash -docker-compose -f docker-compose.neo4j.yml down -``` - -### Deactivate Python Environment - -```bash -deactivate -``` - -## Environment Files - -The application uses `.env` files for configuration: - -- `.env` - Default/development settings (in use) -- `.env.example` - Template with all options -- `.env.dev`, `.env.beta`, `.env.stable` - Environment-specific - -To switch environments: -```bash -cp .env.dev .env # Use dev settings -``` - -## Database Files - -SciDK uses SQLite databases: - -- `scidk_path_index.db` - File index and scan history -- `scidk_settings.db` - Application settings (Neo4j, interpreters) -- `data/files.db` - Legacy/alternative file storage (if used) - -To reset data: -```bash -# Backup first! -cp scidk_path_index.db scidk_path_index.db.backup - -# Remove databases to start fresh -rm scidk_path_index.db scidk_settings.db - -# Restart app (will recreate with schema) -python -m scidk.app -``` - -## Additional Resources - -- **Feature Index**: `FEATURE_INDEX.md` (comprehensive feature list by page) -- **Development Protocols**: `dev/README-planning.md` -- **UX Testing Checklist**: `dev/ux-testing-checklist.md` -- **E2E Testing Guide**: `docs/e2e-testing.md` -- **API Documentation**: `docs/MVP_Architecture_Overview_REVISED.md` -- **Main README**: `README.md` - -## Quick Commands Reference - -```bash -# Start everything for demo -docker-compose -f docker-compose.neo4j.yml up -d # Neo4j -source .venv/bin/activate # Python env -scidk-serve # Flask app (RECOMMENDED) - -# Run tests -npm run e2e # E2E tests -pytest tests/ # Unit tests - -# Check coverage -python -m coverage run -m pytest tests/ -python -m coverage report -python -m coverage html # HTML report in htmlcov/ - -# Stop everything -# Ctrl+C in Flask terminal -docker-compose -f docker-compose.neo4j.yml down -deactivate -``` - ---- - -## Technical Note: Import Path Issue - -**Why use `scidk-serve` instead of `python -m scidk.app`?** - -The repository contains a `neo4j/` directory with a test stub (`neo4j/__init__.py`) used for mocking in tests. When you run `python -m scidk.app`, Python adds the current directory to `sys.path[0]`, causing the local stub to shadow the real `neo4j` package from `.venv/lib/python3.x/site-packages/`. This results in: - -- **Error**: `type object 'GraphDatabase' has no attribute 'driver'` -- **HTTP 502** when trying to connect to Neo4j in Settings - -**Solutions** (in order of preference): -1. ✅ **Use `scidk-serve`** - Entry point doesn't add cwd to sys.path -2. ✅ **Use `python -m scidk`** - Now includes `__main__.py` that removes cwd from path -3. ❌ **Don't use `python -m scidk.app`** - Adds cwd to sys.path (causes issue) - -The fix has been implemented with: -- `scidk/__main__.py` - Removes cwd from sys.path before importing -- `pyproject.toml` - Excludes `neo4j*` from package builds -- `.gitignore` - Documents the stub's purpose - -**For developers**: The `neo4j/` stub should remain for test compatibility, but runtime execution should use methods 1 or 2 above. - ---- - -**Ready to demo!** Follow the workflows above and refer to `dev/ux-testing-checklist.md` for detailed testing. diff --git a/FEATURE_INDEX.md b/FEATURE_INDEX.md deleted file mode 100644 index 51e78ec..0000000 --- a/FEATURE_INDEX.md +++ /dev/null @@ -1,647 +0,0 @@ -# SciDK Feature Index - -**Purpose**: Current application layout and feature inventory for product planning and demo preparation. - -**Last Updated**: 2026-02-09 - ---- - -## Application Structure - -### Navigation & Pages - -| Page | Route | Primary Purpose | -|------|-------|----------------| -| Home | `/` | Landing page with search, filters, quick chat | -| Chat | `/chat` | Full chat interface (multi-user, database-persisted) | -| Files/Datasets | `/datasets` | Browse scans, manage file data, commit to Neo4j | -| File Detail | `/datasets/` | View file metadata and interpretations | -| Workbook Viewer | `/datasets//workbook` | Excel sheet preview with navigation | -| Map | `/map` | Interactive graph visualization (Neo4j + local schema) | -| Labels | `/labels` | Graph schema management (properties, relationships) | -| Links | `/links` | Link definition wizard (create relationships) | -| Extensions | `/extensions` | Plugin/extension management | -| Integrations | `/integrations` | External service integrations | -| Settings | `/settings` | Neo4j, interpreters, rclone, chat, plugins, integrations | -| Login | `/login` | User authentication (multi-user with RBAC) | - ---- - -## Feature Groups by Page - -### 1. Home Page (`/`) - -**Search & Discovery** -- Full-text file search with query input -- Filter by file extension -- Filter by interpreter type -- Provider/path-based filtering -- Recursive path toggle -- Reset filters option - -**Dashboard & Summary** -- File count display -- Scan count summary -- Extension breakdown -- Interpreter type summary -- Recent scans list - -**Quick Actions** -- Inline chat input (quick queries without leaving home) -- Direct navigation to all main pages - ---- - -### 2. Chat Page (`/chat`) - -**Conversation Interface** -- Full-featured chat UI with message history -- Context-aware responses (references indexed files/graph) -- Markdown rendering in responses -- Timestamped messages -- Scrollable history - -**Multi-User & Security** (Recent: PR #40) -- User authentication system -- Role-based access control (RBAC) -- Database-persisted chat history -- Per-user conversation isolation -- Admin role for system management - -**Session Management** (Recent: PR #44) -- Auto-lock after inactivity timeout -- Configurable timeout settings -- Session expiration handling - ---- - -### 3. Files/Datasets Page (`/datasets`) - -**Provider Browser Tab** -- Provider dropdown (filesystem, rclone remotes) -- Path selection and manual entry -- Recursive scan toggle -- Fast list mode (skip detailed metadata) -- Max depth control -- Browse before scan (preview file tree) -- Initiate scan with progress tracking - -**Snapshot Browser Tab** -- Scan dropdown (view historical scans) -- Snapshot file list with pagination -- Path prefix filter -- Extension/type filter -- Custom extension input -- Page size controls -- Previous/Next pagination -- "Use Live" switch (latest data) - -**Snapshot Search** -- Query input for snapshot data -- Extension-based search -- Prefix-based search -- Clear and reset options - -**Data Management** -- Commit snapshot to Neo4j -- Commit progress/status indicators -- Recent scans management -- Refresh scans list - -**RO-Crate Integration** -- Open RO-Crate viewer modal -- Display RO-Crate metadata -- Export capability - -**Data Cleaning Workflow** (Recent: PR #46) -- Delete individual files from dataset -- Bulk delete multiple files -- Bidirectional relationship cleanup (removes orphaned links) -- Confirmation prompts for destructive actions -- Real-time UI updates after deletion - ---- - -### 4. File Detail Page (`/datasets/`) - -**Metadata Display** -- Filename, full path -- File size, last modified -- Checksum/ID -- Provider information - -**Interpretation Viewer** -- Multiple interpretation tabs (CSV, JSON, YAML, Python, etc.) -- CSV: Table preview -- JSON: Formatted/collapsible tree -- Python: Syntax-highlighted code -- YAML: Structured display -- Excel: Sheet selector (links to workbook viewer) - -**Actions** -- Back navigation -- Copy path/ID to clipboard -- View raw content -- Navigate to related files - ---- - -### 5. Workbook Viewer (`/datasets//workbook`) - -**Sheet Navigation** -- Sheet selector dropdown -- Switch between sheets -- Active sheet indicator - -**Table Preview** -- Rendered table with headers -- Formatted cell values -- Horizontal/vertical scrolling -- Row/column count display -- Preview limit indicator (first N rows) - -**Navigation** -- Back to file detail -- Back to files list -- Breadcrumb navigation - ---- - -### 6. Map/Graph Visualization (`/map`) - -**Graph Display** -- Interactive node/edge rendering -- Auto-layout on load -- Node labels and colors -- Relationship edges -- Color-coded sources: - - Blue: In-memory graph (scanned data) - - Red: Local labels (definitions only) - - Green: Neo4j schema (pulled from database) - - Mixed colors: Combined sources - -**Data Source Selection** -- "All Sources" (combined view, default) -- "In-Memory Graph" (scanned files only) -- "Local Labels" (schema definitions) -- "Neo4j Schema" (pulled from database) - -**Filtering** -- Label type filter dropdown -- Relationship type filter -- Multiple filter combinations -- Clear filters option - -**Layout Controls** -- Layout mode selector (force-directed, circular, etc.) -- Save positions button -- Load saved positions -- Re-layout on demand - -**Appearance Controls** -- Node size slider -- Edge width slider -- Font size slider -- High contrast toggle -- Immediate visual updates - -**Interaction** -- Click and drag nodes -- Pan graph canvas -- Zoom in/out (mousewheel) -- Click nodes for details -- Click edges for relationship info - -**Export & Instance Preview** -- Download CSV (graph data export) -- Instance preview selector -- "Preview Instances" button -- Formatted instance data display - ---- - -### 7. Labels Page (`/labels`) - -**Schema Definition** (Recent: PR #38 - Three-column layout with instance browser) -- Three-column layout: - - Left: Label list sidebar (resizable, 200px-50% width) - - Center: Label editor/wizard - - Right: Instance browser (shows actual nodes for selected label) -- Create new labels -- Edit existing labels -- Define label properties (name, type: string/int/float/etc.) -- Add/remove properties -- Property type dropdown - -**Relationship Management** -- Add relationships to labels -- Define relationship name -- Select target label -- Define relationship properties (optional) -- Remove relationships - -**Neo4j Synchronization** -- Push to Neo4j (local → database) -- Pull from Neo4j (database → local) -- Success/failure feedback -- Sync status indicators - -**Arrows.app Integration** -- Import schema from Arrows.app (JSON) -- Export schema to Arrows.app -- Paste JSON or upload file -- Bidirectional workflow support - -**Label Operations** -- Delete label (with confirmation) -- Save label changes -- Validation feedback - -**Keyboard Navigation** (Recent: PR #37) -- Arrow Up/Down: Navigate label list -- Home/End: Jump to first/last -- PageUp/PageDown: Navigate 10 items at a time -- Enter: Open selected label in editor -- Escape: Return focus to sidebar -- Visual focus indicators -- Auto-scroll to focused item - -**Instance Browser** (Recent: PR #38) -- View actual nodes for selected label -- Instance count display -- Property values preview -- Pagination for large instance sets -- Link to node details - -**Resizable Layout** (Recent: PR #38) -- Draggable divider between sidebar and editor -- Min/max width constraints (200px - 50%) -- Resize cursor indicator -- Persistent layout preferences - ---- - -### 8. Links Page (`/links`) - -**Link Definition Wizard** -- Multi-step wizard interface -- Link name input -- Data source selection: - - CSV data source (paste CSV) - - API endpoint source (URL + JSONPath) - - Cypher query source (direct Neo4j query) -- Target label configuration -- Field mapping (source → target properties) -- Relationship type definition -- Relationship property mapping -- Preview sample links -- Save definition - -**Link Management** -- List of saved definitions -- Select/view/edit definitions -- Delete definition (with confirmation) -- Duplicate definition names prevented - -**Execution** -- Execute link button (per definition) -- Execution progress indicator -- Success message (# relationships created) -- Error handling and feedback - -**Jobs & History** -- Link execution jobs list -- Job status (pending, running, completed, failed) -- View job details (logs, errors) -- Re-run failed jobs (if supported) - -**Keyboard Navigation** -- Arrow Up/Down: Navigate link definitions -- Home/End: Jump to first/last -- PageUp/PageDown: Navigate 10 items at a time -- Enter: Open selected link in wizard -- Escape: Return focus to sidebar -- Visual focus indicators -- Auto-scroll to focused item - -**Resizable Layout** -- Draggable divider between sidebar and wizard -- Min/max width constraints (200px - 50%) -- Matches Labels page structure -- Resize cursor indicator -- Highlight during resize - ---- - -### 9. Extensions Page (`/extensions`) - -**Plugin Management** -- View installed extensions -- Enable/disable extensions -- Extension metadata display -- Configuration options (per extension) - ---- - -### 10. Integrations Page (`/integrations`) - -**External Service Configuration** -- List of available integrations -- Configure integration settings -- Test connections -- Enable/disable integrations - ---- - -### 11. Settings Page (`/settings`) - -**Modular Settings Structure** (Recent: PR #43 - Template partials) -Settings organized into separate template files for maintainability: - -**General Settings** (`_general.html`) -- Application-wide configurations -- Session timeout settings -- UI preferences - -**Neo4j Configuration** (`_neo4j.html`) -- URI input (default: bolt://localhost:7687) -- Username input (default: neo4j) -- Database name input (default: neo4j) -- Password input with show/hide toggle -- Save settings button -- Connect/disconnect buttons -- Connection test with feedback -- Test graph operations button - -**Interpreter Configuration** (`_interpreters.html`) -- List of available interpreters -- Enable/disable toggle per interpreter -- File extension associations display -- Advanced settings: - - Suggest threshold input - - Batch size input -- Save button for interpreter settings - -**Rclone Mounts Configuration** (`_rclone.html`) -- Remote input field -- Subpath input field -- Mount name input -- Read-only checkbox -- Create mount button -- Mount list display -- Refresh mounts button -- Remove mount option - -**Chat Settings** (`_chat.html`) -- Chat backend configuration -- LLM service settings -- Context settings - -**Plugin Settings** (`_plugins.html`) -- Plugin-specific configurations -- Plugin enable/disable controls - -**Integrations Settings** (`_integrations.html`) -- Integration service configurations -- API endpoint mappings: - - Name, URL, Auth Method (None/Bearer/API Key) - - Auth value (encrypted at rest) - - JSONPath extraction - - Maps to Label (optional) - - Test connection button - - Save endpoint button -- Encrypted credential storage -- Test endpoint connections - -**Alerts Settings** (`_alerts.html`) (Recent: task:ops/monitoring/alert-system) -- Alert/notification system for critical events -- SMTP Configuration: - - Host, port, username, password (encrypted) - - From address, TLS toggle - - Test email button - - Save configuration -- Alert Definitions: - - Pre-configured alerts: - - Import Failed - - High Discrepancies (threshold: 50) - - Backup Failed - - Neo4j Connection Lost - - Disk Space Critical (threshold: 95%) - - Enable/disable toggles - - Recipient configuration (comma-separated emails) - - Threshold adjustment (where applicable) - - Test alert button (sends test notification) - - Update button -- Alert History: - - Recent alert trigger history - - Success/failure status - - Condition details - - Timestamp tracking -- Backend integration: - - Backup manager triggers backup_failed alerts - - Extensible for scan/import, reconciliation, health checks - - Alert trigger logging and tracking - -**Configuration Backup/Restore** (Recent: PR #41) -- Export all settings to JSON -- Import settings from JSON backup -- Secure authentication for backup operations -- Validation on import -- Success/error feedback - ---- - -### 12. Login Page (`/login`) - -**Authentication** (Recent: PR #40) -- Username/password form -- Session creation -- Redirect to home after login -- Error handling - -**Security Features** -- Password hashing (bcrypt) -- Session management -- CSRF protection -- Role-based permissions check - ---- - -## Cross-Cutting Features - -### Security & Access Control (Recent: PR #40) -- Multi-user authentication system -- Role-based access control (RBAC): - - Admin: Full system access - - User: Standard access to features -- Session-based authentication -- Password encryption (bcrypt) -- Database-persisted user accounts -- Permissions checks on endpoints -- Auto-lock after inactivity (PR #44) - -### Data Cleaning (Recent: PR #46) -- Delete files from datasets (individual or bulk) -- Bidirectional relationship cleanup: - - Remove File nodes - - Remove associated relationships - - Clean up orphaned link records -- Confirmation prompts -- Real-time UI updates -- Error handling and rollback - -### Configuration Management (Recent: PR #41) -- Export/import all settings (JSON format) -- Backup and restore workflows -- Secure credential handling (encrypted at rest) -- Validation on import -- Test authentication before backup operations - -### Session Management (Recent: PR #44) -- Configurable inactivity timeout -- Auto-lock and redirect to login -- Session expiration handling -- Persistent session state - -### Template Modularization (Recent: PR #43) -- Settings page broken into template partials: - - `_general.html`, `_neo4j.html`, `_interpreters.html` - - `_rclone.html`, `_chat.html`, `_plugins.html`, `_integrations.html` -- Improved maintainability -- Easier to add new settings sections - ---- - -## Technical Capabilities - -### Data Sources -- Local filesystem scanning -- Rclone remote providers -- API endpoints (with auth: Bearer, API Key) -- CSV/JSON data import -- Direct Neo4j Cypher queries - -### File Interpretation -- CSV (table preview) -- JSON (formatted tree) -- YAML (structured display) -- Python (syntax-highlighted) -- Jupyter notebooks -- Excel workbooks (multi-sheet) -- Generic text files -- Binary file handling (hex preview) - -### Graph Database Integration -- Neo4j connection (Bolt protocol) -- Schema push/pull synchronization -- Node and relationship creation -- Cypher query execution -- Graph visualization -- Instance browsing - -### Search & Indexing -- Full-text search (SQLite FTS) -- Extension-based filtering -- Interpreter-based filtering -- Path-based filtering -- Provider-based filtering -- Recursive/non-recursive scans - -### Export & Integration -- CSV export (graph data) -- RO-Crate metadata export -- Arrows.app schema import/export -- Configuration backup/restore (JSON) -- API endpoint integration - ---- - -## Architecture Notes - -### Database Stack -- **SQLite**: File index, scan history, settings, chat history, user accounts -- **Neo4j**: Graph database (optional, for visualization and relationships) - -### Frontend -- **Flask**: Python web framework -- **Jinja2**: Template engine (modular partials) -- **JavaScript**: Interactive UI (graph rendering, drag/drop, keyboard nav) - -### Authentication -- **Flask-Login**: Session management -- **Bcrypt**: Password hashing -- **RBAC**: Role-based permissions - -### Testing -- **Playwright E2E**: TypeScript tests (`e2e/*.spec.ts`) -- **Pytest**: Python unit/integration tests -- **98.3% interactive element coverage** (117/119 elements) - ---- - -## Demo-Ready Features - -### Critical Path Working -✅ Scan a folder (local filesystem) -✅ Browse scanned files -✅ View file interpretations -✅ Commit to Neo4j -✅ Visualize graph in Map -✅ Search files -✅ Chat interface (with multi-user support) - -### Recent Improvements (Feb 2026) -✅ Multi-user authentication with RBAC (PR #40) -✅ Configuration backup/restore (PR #41) -✅ Modular settings templates (PR #43) -✅ Auto-lock after inactivity (PR #44) -✅ Data cleaning with bidirectional relationship management (PR #46) -✅ Three-column Labels layout with instance browser (PR #38) -✅ Comprehensive keyboard navigation (PR #37) - ---- - -## Usage Patterns - -### Common Workflows - -**1. File Discovery & Interpretation** -Home → Files → Scan → Browse Snapshot → File Detail → View Interpretations - -**2. Graph Visualization** -Settings → Connect Neo4j → Labels → Define Schema → Push to Neo4j → Files → Commit → Map → Visualize - -**3. Schema Design with Arrows.app** -Arrows.app → Export JSON → Labels → Import → Edit/Refine → Push to Neo4j → Map - -**4. Link Creation** -Labels → Define Labels → Links → Create Definition → Configure Source/Target → Preview → Execute → Map - -**5. Search & Chat** -Home → Search Query → View Results → Chat → Ask Questions → Get Context-Aware Responses - -**6. Data Cleaning** -Files → Browse Snapshot → Select Files → Delete (individual or bulk) → Confirm → Refresh - -**7. Configuration Management** -Settings → Configure All Services → Export Settings → (Later) Import Settings to Restore - ---- - -## Known Limitations - -- Scans are synchronous (page waits for completion) -- Very large files (>10MB) may have limited preview -- Chat requires external LLM service (if not configured) -- Map rendering slows with 1000+ nodes -- Rclone features require rclone installed on system - ---- - -## References - -- **UX Testing Checklist**: `dev/ux-testing-checklist.md` -- **Demo Setup Guide**: `DEMO_SETUP.md` -- **Dev Protocols**: `dev/README-planning.md` -- **E2E Testing Guide**: `docs/e2e-testing.md` -- **Test Coverage Index**: `dev/test-coverage-index.md` diff --git a/README.md b/README.md index d52c2a3..96ba959 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,9 @@ pip install -e . # runtime deps only # or for development (tests, Playwright helpers, etc.) pip install -e .[dev] # alternatively, use the provided requirements files: -# pip install -r requirements.txt # runtime-only -# pip install -r requirements-dev.txt # dev extras (editable) +# pip install -r requirements.txt # runtime-only (lightweight, no ML/GPU) +# pip install -r requirements-graphrag.txt # runtime + GraphRAG/ChatSEEK chat features (PyTorch + CUDA, 4-8GB) +# pip install -r requirements-dev.txt # dev extras (editable) ``` Note: do NOT run `pip install -r pyproject.toml`. pyproject.toml is not a requirements file; it is a build/metadata file. Use one of the commands above instead. diff --git a/analyze_feedback_refactored.py b/analyze_feedback_refactored.py new file mode 100644 index 0000000..f93780d --- /dev/null +++ b/analyze_feedback_refactored.py @@ -0,0 +1,124 @@ +""" +GraphRAG Feedback Analysis Script + +Analyzes user feedback on GraphRAG query results. + +Parameters: +- analysis_type: Type of analysis to perform (stats, entities, queries, terminology) +- limit: Maximum number of results to show +""" +from scidk.services.graphrag_feedback_service import get_graphrag_feedback_service + + +def run(context): + """ + Analyze GraphRAG feedback with configurable parameters. + + Args: + context: Execution context containing parameters + + Returns: + Dict with analysis results (wrappable in SciDKData) + """ + # Get parameters from context + params = context.get('parameters', {}) + analysis_type = params.get('analysis_type', 'stats') + limit = params.get('limit', 10) + + # Get feedback service + service = get_graphrag_feedback_service() + + # Perform analysis based on type + try: + if analysis_type == 'stats': + data = get_stats(service) + elif analysis_type == 'entities': + data = get_entity_corrections(service, limit) + elif analysis_type == 'queries': + data = get_query_reformulations(service, limit) + elif analysis_type == 'terminology': + data = get_terminology_mappings(service) + else: + return { + 'status': 'error', + 'error': f'Unknown analysis type: {analysis_type}', + 'data': [] + } + + return { + 'status': 'success', + 'analysis_type': analysis_type, + 'data': data + } + + except Exception as e: + return { + 'status': 'error', + 'error': str(e), + 'data': [] + } + + +def get_stats(service): + """Get feedback statistics as structured data.""" + stats = service.get_feedback_stats() + + # Return as list of rows for table display + return [ + {'metric': 'Total feedback entries', 'value': stats['total_feedback_count']}, + {'metric': 'Answered question', 'value': stats['answered_yes_count']}, + {'metric': 'Did not answer', 'value': stats['answered_no_count']}, + {'metric': 'Answer rate', 'value': f"{stats['answer_rate']}%"}, + {'metric': 'Entity corrections provided', 'value': stats['entity_corrections_count']}, + {'metric': 'Query reformulations', 'value': stats['query_corrections_count']}, + {'metric': 'Terminology mappings', 'value': stats['terminology_corrections_count']} + ] + + +def get_entity_corrections(service, limit): + """Get entity corrections as structured data.""" + corrections = service.get_entity_corrections(limit=limit) + + # Transform into flat table structure + rows = [] + for corr in corrections: + entity_corr = corr['corrections'] + rows.append({ + 'query': corr['query'], + 'extracted': corr['extracted'], + 'removed': entity_corr.get('removed', ''), + 'added': entity_corr.get('added', '') + }) + + return rows + + +def get_query_reformulations(service, limit): + """Get query reformulations as structured data.""" + reformulations = service.get_query_reformulations(limit=limit) + + # Transform into flat table structure + rows = [] + for reform in reformulations: + rows.append({ + 'original_query': reform['original_query'], + 'corrected_query': reform['corrected_query'], + 'entities_extracted': reform['entities_extracted'] + }) + + return rows + + +def get_terminology_mappings(service): + """Get terminology mappings as structured data.""" + mappings = service.get_terminology_mappings() + + # Transform dict into table structure + rows = [] + for user_term, schema_term in mappings.items(): + rows.append({ + 'user_term': user_term, + 'schema_term': schema_term + }) + + return rows diff --git a/demo_data/Core_Facility_Equipment/README.md b/demo_data/Core_Facility_Equipment/README.md new file mode 100644 index 0000000..c3fb67e --- /dev/null +++ b/demo_data/Core_Facility_Equipment/README.md @@ -0,0 +1,3 @@ +# Core Facility Equipment + +Demo project for SciDK testing. diff --git a/demo_data/Core_Facility_Equipment/maintenance/service_records.pdf b/demo_data/Core_Facility_Equipment/maintenance/service_records.pdf new file mode 100644 index 0000000..ba2785a Binary files /dev/null and b/demo_data/Core_Facility_Equipment/maintenance/service_records.pdf differ diff --git a/demo_data/Core_Facility_Equipment/training/microscopy_training_slides.pdf b/demo_data/Core_Facility_Equipment/training/microscopy_training_slides.pdf new file mode 100644 index 0000000..ba2785a Binary files /dev/null and b/demo_data/Core_Facility_Equipment/training/microscopy_training_slides.pdf differ diff --git a/demo_data/Project_A_Cancer_Research/README.md b/demo_data/Project_A_Cancer_Research/README.md new file mode 100644 index 0000000..fefedd6 --- /dev/null +++ b/demo_data/Project_A_Cancer_Research/README.md @@ -0,0 +1,3 @@ +# Project A Cancer Research + +Demo project for SciDK testing. diff --git a/demo_data/Project_A_Cancer_Research/protocols/cell_culture_protocol.pdf b/demo_data/Project_A_Cancer_Research/protocols/cell_culture_protocol.pdf new file mode 100644 index 0000000..ba2785a Binary files /dev/null and b/demo_data/Project_A_Cancer_Research/protocols/cell_culture_protocol.pdf differ diff --git a/demo_data/Project_A_Cancer_Research/results/flow_cytometry/analysis_20240115.fcs b/demo_data/Project_A_Cancer_Research/results/flow_cytometry/analysis_20240115.fcs new file mode 100644 index 0000000..f186044 --- /dev/null +++ b/demo_data/Project_A_Cancer_Research/results/flow_cytometry/analysis_20240115.fcs @@ -0,0 +1 @@ +Demo file: analysis_20240115.fcs diff --git a/demo_data/Project_A_Cancer_Research/results/microscopy/sample_001.tif b/demo_data/Project_A_Cancer_Research/results/microscopy/sample_001.tif new file mode 100644 index 0000000..24daf1d --- /dev/null +++ b/demo_data/Project_A_Cancer_Research/results/microscopy/sample_001.tif @@ -0,0 +1 @@ +Demo file: sample_001.tif diff --git a/demo_data/Project_A_Cancer_Research/results/microscopy/sample_002.tif b/demo_data/Project_A_Cancer_Research/results/microscopy/sample_002.tif new file mode 100644 index 0000000..5994ad0 --- /dev/null +++ b/demo_data/Project_A_Cancer_Research/results/microscopy/sample_002.tif @@ -0,0 +1 @@ +Demo file: sample_002.tif diff --git a/demo_data/Project_B_Proteomics/README.md b/demo_data/Project_B_Proteomics/README.md new file mode 100644 index 0000000..f8d2c65 --- /dev/null +++ b/demo_data/Project_B_Proteomics/README.md @@ -0,0 +1,3 @@ +# Project B Proteomics + +Demo project for SciDK testing. diff --git a/demo_data/Project_B_Proteomics/analysis/go_enrichment.csv b/demo_data/Project_B_Proteomics/analysis/go_enrichment.csv new file mode 100644 index 0000000..2d7d8b4 --- /dev/null +++ b/demo_data/Project_B_Proteomics/analysis/go_enrichment.csv @@ -0,0 +1,4 @@ +Sample,Value +A,1 +B,2 +C,3 diff --git a/demo_data/Project_B_Proteomics/figures/volcano_plot.png b/demo_data/Project_B_Proteomics/figures/volcano_plot.png new file mode 100644 index 0000000..5e52fdb --- /dev/null +++ b/demo_data/Project_B_Proteomics/figures/volcano_plot.png @@ -0,0 +1 @@ +Demo file: volcano_plot.png diff --git a/demo_data/Project_B_Proteomics/raw_data/mass_spec_run001.raw b/demo_data/Project_B_Proteomics/raw_data/mass_spec_run001.raw new file mode 100644 index 0000000..6ca763f --- /dev/null +++ b/demo_data/Project_B_Proteomics/raw_data/mass_spec_run001.raw @@ -0,0 +1 @@ +Demo file: mass_spec_run001.raw diff --git a/demo_data/Project_B_Proteomics/raw_data/mass_spec_run002.raw b/demo_data/Project_B_Proteomics/raw_data/mass_spec_run002.raw new file mode 100644 index 0000000..5b50fdd --- /dev/null +++ b/demo_data/Project_B_Proteomics/raw_data/mass_spec_run002.raw @@ -0,0 +1 @@ +Demo file: mass_spec_run002.raw diff --git a/demo_data/iLab_Exports/ilab_equipment_sample.csv b/demo_data/iLab_Exports/ilab_equipment_sample.csv new file mode 100644 index 0000000..22b7cbf --- /dev/null +++ b/demo_data/iLab_Exports/ilab_equipment_sample.csv @@ -0,0 +1,6 @@ +Service Name,Core,PI,Location,Equipment ID,Description +Confocal Microscope LSM 880,Microscopy Core,Dr. Alice Smith,"Biology Building, Room 101",EQ-001,Advanced confocal imaging with spectral detection +Flow Cytometer BD FACS Aria III,Flow Cytometry Core,Dr. Bob Jones,"Medical Sciences, Room 205",EQ-002,High-speed cell sorting and multicolor analysis +Mass Spectrometer Orbitrap Fusion,Proteomics Core,Dr. Carol Williams,"Chemistry Building, Room 310",EQ-003,High-resolution protein mass spectrometry +Electron Microscope TEM 120kV,Electron Microscopy Core,Dr. David Brown,"Materials Science, Room 150",EQ-004,Transmission electron microscopy for nano-scale imaging +NMR Spectrometer 600MHz,NMR Core,Dr. Emily Davis,"Chemistry Building, Room 220",EQ-005,High-field NMR for structural analysis diff --git a/demo_data/iLab_Exports/ilab_pi_directory_sample.csv b/demo_data/iLab_Exports/ilab_pi_directory_sample.csv new file mode 100644 index 0000000..e5743ce --- /dev/null +++ b/demo_data/iLab_Exports/ilab_pi_directory_sample.csv @@ -0,0 +1,8 @@ +PI Name,Email,Department,Lab,Phone,Office +Dr. Alice Smith,alice.smith@university.edu,Biology,Smith Lab - Cell Biology,555-0101,Biology 101 +Dr. Bob Jones,bob.jones@university.edu,Molecular Medicine,Jones Lab - Cancer Research,555-0102,Medical Sciences 205 +Dr. Carol Williams,carol.williams@university.edu,Chemistry,Williams Lab - Protein Chemistry,555-0103,Chemistry 310 +Dr. David Brown,david.brown@university.edu,Materials Science,Brown Lab - Nanomaterials,555-0104,Materials Science 150 +Dr. Emily Davis,emily.davis@university.edu,Chemistry,Davis Lab - Structural Chemistry,555-0105,Chemistry 220 +Dr. Frank Miller,frank.miller@university.edu,Neuroscience,Miller Lab - Systems Neuroscience,555-0106,Neuroscience 412 +Dr. Grace Wilson,grace.wilson@university.edu,Immunology,Wilson Lab - Adaptive Immunity,555-0107,Immunology 305 diff --git a/demo_data/iLab_Exports/ilab_services_sample.csv b/demo_data/iLab_Exports/ilab_services_sample.csv new file mode 100644 index 0000000..8678ca9 --- /dev/null +++ b/demo_data/iLab_Exports/ilab_services_sample.csv @@ -0,0 +1,6 @@ +Service Name,Core,Rate Per Hour,Service ID,Active +Confocal Microscopy Training,Microscopy Core,50,SVC-001,Yes +Flow Cytometry Analysis,Flow Cytometry Core,75,SVC-002,Yes +Mass Spectrometry Run,Proteomics Core,100,SVC-003,Yes +Sample Preparation - Proteomics,Proteomics Core,60,SVC-004,Yes +NMR Spectroscopy Analysis,NMR Core,85,SVC-005,No diff --git a/dev b/dev index df04d4b..2ace260 160000 --- a/dev +++ b/dev @@ -1 +1 @@ -Subproject commit df04d4bc7fe1d6bc6c94b5caa900eb7583f0ab7c +Subproject commit 2ace2601c0ac8590aa27e66a81976574df1a957a diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 9f78199..ab566f2 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -680,5 +680,5 @@ app.register_blueprint(custom_bp) - **Operations Manual**: [OPERATIONS.md](OPERATIONS.md) - **API Reference**: [API.md](API.md) - **Security Guide**: [SECURITY.md](SECURITY.md) -- **Feature Index**: [FEATURE_INDEX.md](../FEATURE_INDEX.md) +- **Feature Index**: [FEATURE_INDEX.md](../dev/features/FEATURE_INDEX.md) - **Testing Documentation**: [testing.md](testing.md) diff --git a/DEVELOPMENT.md b/docs/DEVELOPMENT.md similarity index 100% rename from DEVELOPMENT.md rename to docs/DEVELOPMENT.md diff --git a/docs/E2E_and_Neo4j_Task_Planning_REVISED.md b/docs/E2E_and_Neo4j_Task_Planning_REVISED.md deleted file mode 100644 index 1622b63..0000000 --- a/docs/E2E_and_Neo4j_Task_Planning_REVISED.md +++ /dev/null @@ -1,33 +0,0 @@ -# E2E and Neo4j Task Planning (Revised — Interpreter Terminology) - -This plan aligns E2E testing and the Neo4j refactor with the Interpreter Management System and current API contracts. - -## Story: E2E Testing & Neo4j Integration -- ID: story:e2e-testing -- Objective: Establish reliable E2E scaffolding (pytest + Playwright) to validate SciDK core flows and support Neo4j persistence refactor. - -## Phases -1. Smoke E2E baseline: Validate core flows (Scan, Browse, Interpreters, Map) without Neo4j. -2. Neo4j refactor: Make Neo4j the live graph store (foundational). -3. Expanded E2E: Add Neo4j-specific tests, interpreter workflows, and negatives. - -## Success Criteria -- Core MVP flows pass E2E in CI; Neo4j driver integration solid and tested. -- Interpreter registration and execution validated E2E. - -## Tasks -- task:e2e:01-smoke-baseline — Playwright smoke E2E baseline (MVP flows). RICE 999. Status: Ready. -- task:e2e:02-neo4j-refactor — Neo4j as live graph store. RICE 998. Status: Ready. -- task:e2e:03-expanded-e2e — Neo4j-specific E2E + interpreter workflows + negatives. RICE 997. Status: Planned. - -## Interpreter Terminology and APIs -- Use Interpreters (not Enrichers) consistently. -- Required endpoints: GET/POST /api/interpreters, GET /api/interpreters/, POST /api/interpreters//test, POST /api/scans//interpret. - -## E2E Notes -- Prefer BASE_URL injection; keep smoke tests fast (<5s/spec) and independent of external services. -- Add data-testid hooks for Settings, Interpreters, Map, Scan flows. - -## References -- MVP_Architecture_Overview_REVISED.md -- SciDK_Interpreter_Management_System.md diff --git a/docs/MVP_Architecture_Overview_REVISED.md b/docs/MVP_Architecture_Overview_REVISED.md deleted file mode 100644 index de2e270..0000000 --- a/docs/MVP_Architecture_Overview_REVISED.md +++ /dev/null @@ -1,83 +0,0 @@ -# MVP Architecture Overview (Revised — Interpreter‑centric) - -This document aligns the MVP architecture with the Interpreter Management System and current repository terminology. Interpreters are lightweight, read‑only metadata extractors that understand specific file formats. - -## Core UI Areas -- Home / Scan: start scans via POST /api/scan (or background via /api/tasks) -- Files / Browse: explore scan snapshot via GET /api/scans//fs -- Interpreters: render per‑file insights (Python, CSV, IPYNB for MVP) -- Map: view schema and export instances -- Interpreter Settings: configure interpreter assignments/rules and registration -- Rclone Mounts (feature‑flagged): manage safe local FUSE mounts -- Background Tasks: monitor async scan/interpret/commit - -## Key APIs (MVP) - -### Filesystem providers -- GET /api/providers -- GET /api/provider_roots?provider_id= -- GET /api/browse?provider_id=&root_id=&path=[&recursive=false&max_depth=1&fast_list=false] -- POST /api/scan -- GET /api/datasets, GET /api/datasets/ - -### Scans -- GET /api/scans//status -- GET /api/scans//fs -- POST /api/scans//interpret - - Body: { include?, exclude?, max_size_bytes?, after_rowid?, max_files?, overwrite? } - - Returns: { status, processed_count, error_count, filtered_by_size, filtered_by_include, filtered_no_interpreter, next_cursor } -- POST /api/scans//commit - - Returns commit summary including optional Neo4j verification fields - -### Background tasks -- POST /api/tasks { type: 'scan' | 'commit' | 'interpret', ... } -- GET /api/tasks, GET /api/tasks/ - -### Interpreters: registry and execution -- GET /api/interpreters → list available interpreters { id, name, runtime, supported_extensions, metadata_schema } -- GET /api/interpreters/ -- POST /api/interpreters → register new interpreter { name, runtime, extensions, script, metadata_schema, ... } -- POST /api/interpreters//test → run test on a sample file { file_path } → { status, result, errors, warnings, execution_time_ms } - -### Graph: schema and instance exports -- GET /api/graph/schema -- GET /api/graph/schema.neo4j (optional; 501 if driver/misconfig) -- GET /api/graph/schema.apoc (optional; 502 if APOC unavailable) -- GET /api/graph/instances.csv?label=