Goal: Get from zero to your first RO-Crate in under 30 minutes.
Prerequisites: Python 3.10+, git, and 5 minutes.
# Clone the repository
git clone https://github.com/yourusername/scidk.git
cd scidk
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate # bash/zsh
# or: source .venv/bin/activate.fish # fish shell
# Install SciDK in editable mode
pip install -e .
# Initialize environment (optional but recommended)
source scripts/init_env.shVerify installation:
scidk-serve --help
# Should show: usage: scidk-serve ...# Start SciDK
scidk-serve
# or: python3 -m scidk.appServer starts at: http://127.0.0.1:5000
Open in your browser and you should see the SciDK home page.
- Navigate to Files page (http://127.0.0.1:5000/datasets)
- Select provider: Local Filesystem
- Enter a path (e.g.,
/home/user/Documentsor use the repository root) - Check "Recursive" if you want subdirectories
- Click Scan Files
- Wait for scan to complete (progress shown in Background Tasks)
curl -X POST http://127.0.0.1:5000/api/scan \
-H "Content-Type: application/json" \
-d '{"path": "/path/to/your/data", "recursive": true}'After scanning completes:
- Files page shows all discovered datasets
- Click any dataset to see details:
- File metadata (size, type, timestamps)
- Interpreted content (for Python, CSV, JSON, YAML, IPYNB, XLSX)
- Import dependencies (for code files)
API alternative:
# List all scanned datasets
curl http://127.0.0.1:5000/api/datasets
# Get specific dataset details
curl http://127.0.0.1:5000/api/datasets/<dataset-id>Currently manual selection via browsing. For programmatic selection:
# Use search to find specific file types
curl "http://127.0.0.1:5000/api/search?q=csv"
# Filter by interpreter
curl "http://127.0.0.1:5000/api/search?q=python_code"Mark interesting datasets mentally or via notes—RO-Crate packaging is next.
For a scanned directory, generate a minimal RO-Crate:
# Generate RO-Crate JSON-LD for a directory
curl "http://127.0.0.1:5000/api/rocrate?path=/path/to/scanned/dir" > ro-crate-metadata.jsonThe RO-Crate will include:
- Root Dataset entity
- File/Folder entities (depth=1 by default)
- Contextual metadata per RO-Crate spec
- Set environment variable:
export SCIDK_FILES_VIEWER=rocrate - Restart server
- Files page will show "Open in RO-Crate Viewer" button
- Click to view embedded crate metadata
Create a complete RO-Crate package with data files:
# Using demo script (recommended)
./scripts/demo_rocrate_export.sh /path/to/scanned/dir ./my-crate.zip
# Manual steps:
# 1. Generate ro-crate-metadata.json (step 6)
# 2. Copy data files into crate directory
# 3. Zip the complete package
mkdir -p my-crate
curl "http://127.0.0.1:5000/api/rocrate?path=/path/to/dir" > my-crate/ro-crate-metadata.json
cp -r /path/to/dir/* my-crate/
zip -r my-crate.zip my-crate/Result: my-crate.zip is a valid RO-Crate package containing:
ro-crate-metadata.json(JSON-LD metadata)- Data files from your scanned directory
# Unzip and inspect
unzip -l my-crate.zip
cat my-crate/ro-crate-metadata.json | jq '.@graph[] | select(.["@type"] == "Dataset")'
# Validate with ro-crate-py (optional)
pip install rocrate
python3 -c "from rocrate.rocrate import ROCrate; c = ROCrate('my-crate'); print(c.root_dataset)"# Check what's using port 5000
lsof -i :5000
# Change port
export SCIDK_PORT=5001
scidk-serve- Verify the path exists and is readable
- Check recursive flag if scanning subdirectories
- Install
ncdufor faster scanning:brew install ncdu(macOS) orsudo apt install ncdu(Linux)
- Ensure you're running the latest code from main branch
- Check that
/api/rocrateendpoint is implemented (planned for v0.1.0) - See
dev/features/ui/feature-rocrate-viewer-embedding.mdfor implementation status
Explore more features:
- Map page (http://127.0.0.1:5000/map): Visualize knowledge graph schema
- Labels & Links: Annotate files with custom labels and relationships
- Providers: Connect remote sources via rclone (S3, Google Drive, etc.)
- Neo4j: Enable persistent graph storage (see README § Neo4j integration)
Documentation:
- Full README:
/README.md - Development workflow:
dev/README-planning.md - RO-Crate feature spec:
dev/features/ui/feature-rocrate-viewer-embedding.md
Community:
- Report issues: https://github.com/yourusername/scidk/issues
- Contributing:
CONTRIBUTING.md
Total time: ~25 minutes from clone to packaged RO-Crate
You're ready! You've now:
- ✅ Installed SciDK
- ✅ Scanned a directory
- ✅ Browsed files and metadata
- ✅ Generated RO-Crate JSON-LD
- ✅ Exported a complete RO-Crate ZIP package
Happy crate-ing! 🎉