Skip to content

ziyaom2-stack/vision-bio-agent

Repository files navigation

Vision Bioinformatics Agent (Prototype)

A local multi-agent bioinformatics analysis assistant for ocular biology research. Prototype built in 3 days to explore the architecture proposed in the UCI summer internship project.

Why this prototype

The UCI proposal describes a local, multi-agent system that runs bioinformatics workflows (scRNA-seq, spatial transcriptomics, multi-omics, QC, DE, pathway enrichment, statistical modeling) on institutional infrastructure without sending sensitive data to the cloud. A core feature is persistent workflow memory.

This prototype covers the core architectural components end-to-end:

  • Local LLM (Qwen2.5 via Ollama) — no data leaves the machine
  • Agent that translates natural language into tool calls
  • Multi-agent mode (Supervisor + Executor) for multi-step workflows
  • Persistent workflow memory (SQLite)
  • FastAPI backend with a chatbot-style web UI
  • Docker / Docker Compose for one-command deployment

Bioinformatics tools are mocked so the engineering layers can be validated independently from domain logic. In a real deployment, the mock functions would wrap scanpy / Seurat / scvi-tools.

Features

  • Local LLM via Ollama (Qwen2.5-3b). No data leaves the machine.
  • Three agent modes:
    • single — one LLM call, one tool call
    • multi — Supervisor decomposes the request, Executor runs each step
    • rag — Retrieval-Augmented: searches similar past workflows before deciding
  • Persistent workflow memory in SQLite, with vector embeddings for semantic search
  • RAG semantic search using nomic-embed-text embeddings + cosine similarity (no external vector DB needed at prototype scale)
  • Both custom and LangGraph implementations of the multi-agent loop, for comparison
  • FastAPI backend with OpenAPI docs
  • Vanilla HTML/JS chat UI with three tabs: Chat / Semantic Search / History
  • Docker Compose for one-command deployment

Architecture

+-------------------+
|  Browser (HTML)   |  chatbot-style UI
+---------+---------+
          |  HTTP /api/chat, /api/history
          v
+-------------------+
|  FastAPI backend  |  app.py
+---------+---------+
          |
          v
+-----------------------------+
|  Agent layer (agent_core)   |
|  - single-agent             |
|  - multi-agent              |
|     supervisor + executor   |
+-----+-------------------+---+
      |                   |
      v                   v
+-----------+      +--------------+
| Local LLM |      |  Tool layer  |  (mock scanpy/Seurat ops)
| (Ollama   |      |  QC, DE,     |
|  Qwen2.5) |      |  pathway,    |
+-----------+      |  spatial,    |
                   |  annotation  |
                   +------+-------+
                          |
                          v
                   +--------------+
                   |  SQLite      |  workflows.db
                   |  workflow    |  persistent memory
                   |  history     |
                   +--------------+

Project structure

LLM/
├── agent_core.py         core agent logic — LLM, tools, memory, RAG-augmented agent
├── agent_langgraph.py    alternative LangGraph implementation of the multi-agent loop
├── rag.py                embedding + cosine similarity over SQLite-stored vectors
├── app.py                FastAPI web server (chat + search + history APIs)
├── mini_agent.py         CLI version (Day 1 demo, kept for reference)
├── static/
│   └── index.html        chat UI with Chat / Search / History tabs
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .dockerignore
├── .gitignore
└── README.md

How to run

Option A — Local Python (fastest for development)

Prereqs: Python 3.11+, Ollama installed and qwen2.5:3b pulled.

pip install -r requirements.txt
uvicorn app:app --reload

Open http://localhost:8000

Option B — Docker Compose (closer to production)

docker compose up --build
# inside the ollama container, pull the model:
docker exec -it vision-agent-ollama ollama pull qwen2.5:3b

Open http://localhost:8000

Option C — CLI only

python mini_agent.py

API endpoints

Method Path Purpose
GET / Chat UI (HTML)
POST /api/chat Send a message. Body: {"message": "...", "mode": "single" or "multi"}
GET /api/history Recent workflow history (persistent memory)
GET /api/tools List available tools
GET /api/health Liveness probe
GET /docs Auto-generated OpenAPI docs

Try it

In the chat UI, try:

  • 对 sample.csv 做质量控制
  • 分析 retina_data.csv 的差异表达
  • 对基因 TP53, RHO, OPN1MW 做通路富集
  • Multi-step: 帮我对 sample.csv 做完整流程:先QC,再差异表达,再富集分析

Then open the Workflow History tab — every call is persisted in workflows.db, so it survives restarts.

Design decisions worth highlighting

Why a Provider abstraction for the LLM? Development runs against local Ollama (CPU is slow but private). For lab deployment on the HPC3 GPUs, the same code points at a vLLM service. Switching is a single environment variable (LLM_PROVIDER, OLLAMA_BASE_URL).

Why SQLite for memory? Single-machine, low-concurrency, zero-ops. The schema (workflows table with tool_name, tool_args, result, timestamp, agent_role) is the foundation for the JD's "versioned, reusable, iteratively refinable" requirement. Migration to PostgreSQL is straightforward when the lab scales to multiple users.

Why mock the bioinformatics? This prototype is about the engineering scaffolding — agent orchestration, memory, API, deployment. The mock functions have the same signatures the real scanpy wrappers would have, so swapping them in later does not change the agent code.

Multi-agent design. Two roles: a Supervisor that decomposes a request into ordered sub-tasks, and an Executor that picks the right tool per sub-task. This mirrors the JD's "minimal agent framework, gradually extended to specialized modular agents". Future agents could be added for validation, report writing, etc.

Reliability of open-source LLM outputs. Mitigated with: structured JSON prompts, few-shot examples, temperature=0, and a parsing layer that tolerates markdown code fences. Production would add JSON-schema validation and retry-with-correction.

What's intentionally not done yet

  • Real scanpy / Seurat integration — needs a domain expert to define the parameter space.
  • Slurm submission — would replace the in-process tool call with sbatch for long-running jobs.
  • Vector memory for semantic search over past workflows — the project mentions this; SQLite handles the metadata, but a vector DB (Chroma / Qdrant) would handle "find me past analyses similar to this one".
  • Authentication — not needed for a single-user prototype.
  • Streaming responses — currently the API blocks until the agent finishes. Streaming would improve UX for long tool chains.

Tech stack

  • LLM runtime: Ollama (Qwen2.5-3b for dev, easily swapped to 7B or 32B on lab GPUs)
  • Backend: FastAPI + Uvicorn
  • Agent framework: Custom minimal loop (no LangGraph yet — kept dependency surface small for the prototype, but the abstractions map cleanly onto LangGraph nodes/edges if migration is desired)
  • Memory: SQLite
  • Frontend: Vanilla HTML/JS (no React build step — keeps the prototype reviewable)
  • Container: Docker + Docker Compose

Notes for the maintainer / reviewer

This prototype was built in roughly 3 days. The goal was not feature completeness — it was to validate that the core architecture from the proposal is buildable end-to-end with the tools available, and to surface the design questions that will matter in the real build (provider abstraction, memory schema, multi-agent decomposition, deployment topology).

I am happy to discuss any of the above and the alternatives I considered.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors