A local multi-agent bioinformatics analysis assistant for ocular biology research. Prototype built in 3 days to explore the architecture proposed in the UCI summer internship project.
The UCI proposal describes a local, multi-agent system that runs bioinformatics workflows (scRNA-seq, spatial transcriptomics, multi-omics, QC, DE, pathway enrichment, statistical modeling) on institutional infrastructure without sending sensitive data to the cloud. A core feature is persistent workflow memory.
This prototype covers the core architectural components end-to-end:
- Local LLM (Qwen2.5 via Ollama) — no data leaves the machine
- Agent that translates natural language into tool calls
- Multi-agent mode (Supervisor + Executor) for multi-step workflows
- Persistent workflow memory (SQLite)
- FastAPI backend with a chatbot-style web UI
- Docker / Docker Compose for one-command deployment
Bioinformatics tools are mocked so the engineering layers can be validated independently from domain logic. In a real deployment, the mock functions would wrap scanpy / Seurat / scvi-tools.
- Local LLM via Ollama (Qwen2.5-3b). No data leaves the machine.
- Three agent modes:
single— one LLM call, one tool callmulti— Supervisor decomposes the request, Executor runs each steprag— Retrieval-Augmented: searches similar past workflows before deciding
- Persistent workflow memory in SQLite, with vector embeddings for semantic search
- RAG semantic search using
nomic-embed-textembeddings + cosine similarity (no external vector DB needed at prototype scale) - Both custom and LangGraph implementations of the multi-agent loop, for comparison
- FastAPI backend with OpenAPI docs
- Vanilla HTML/JS chat UI with three tabs: Chat / Semantic Search / History
- Docker Compose for one-command deployment
+-------------------+
| Browser (HTML) | chatbot-style UI
+---------+---------+
| HTTP /api/chat, /api/history
v
+-------------------+
| FastAPI backend | app.py
+---------+---------+
|
v
+-----------------------------+
| Agent layer (agent_core) |
| - single-agent |
| - multi-agent |
| supervisor + executor |
+-----+-------------------+---+
| |
v v
+-----------+ +--------------+
| Local LLM | | Tool layer | (mock scanpy/Seurat ops)
| (Ollama | | QC, DE, |
| Qwen2.5) | | pathway, |
+-----------+ | spatial, |
| annotation |
+------+-------+
|
v
+--------------+
| SQLite | workflows.db
| workflow | persistent memory
| history |
+--------------+
LLM/
├── agent_core.py core agent logic — LLM, tools, memory, RAG-augmented agent
├── agent_langgraph.py alternative LangGraph implementation of the multi-agent loop
├── rag.py embedding + cosine similarity over SQLite-stored vectors
├── app.py FastAPI web server (chat + search + history APIs)
├── mini_agent.py CLI version (Day 1 demo, kept for reference)
├── static/
│ └── index.html chat UI with Chat / Search / History tabs
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .dockerignore
├── .gitignore
└── README.md
Prereqs: Python 3.11+, Ollama installed and qwen2.5:3b pulled.
pip install -r requirements.txt
uvicorn app:app --reload
docker compose up --build
# inside the ollama container, pull the model:
docker exec -it vision-agent-ollama ollama pull qwen2.5:3b
python mini_agent.py
| Method | Path | Purpose |
|---|---|---|
| GET | / | Chat UI (HTML) |
| POST | /api/chat | Send a message. Body: {"message": "...", "mode": "single" or "multi"} |
| GET | /api/history | Recent workflow history (persistent memory) |
| GET | /api/tools | List available tools |
| GET | /api/health | Liveness probe |
| GET | /docs | Auto-generated OpenAPI docs |
In the chat UI, try:
对 sample.csv 做质量控制分析 retina_data.csv 的差异表达对基因 TP53, RHO, OPN1MW 做通路富集- Multi-step:
帮我对 sample.csv 做完整流程:先QC,再差异表达,再富集分析
Then open the Workflow History tab — every call is persisted in workflows.db, so it survives restarts.
Why a Provider abstraction for the LLM? Development runs against local Ollama (CPU is slow but private). For lab deployment on the HPC3 GPUs, the same code points at a vLLM service. Switching is a single environment variable (LLM_PROVIDER, OLLAMA_BASE_URL).
Why SQLite for memory? Single-machine, low-concurrency, zero-ops. The schema (workflows table with tool_name, tool_args, result, timestamp, agent_role) is the foundation for the JD's "versioned, reusable, iteratively refinable" requirement. Migration to PostgreSQL is straightforward when the lab scales to multiple users.
Why mock the bioinformatics? This prototype is about the engineering scaffolding — agent orchestration, memory, API, deployment. The mock functions have the same signatures the real scanpy wrappers would have, so swapping them in later does not change the agent code.
Multi-agent design. Two roles: a Supervisor that decomposes a request into ordered sub-tasks, and an Executor that picks the right tool per sub-task. This mirrors the JD's "minimal agent framework, gradually extended to specialized modular agents". Future agents could be added for validation, report writing, etc.
Reliability of open-source LLM outputs. Mitigated with: structured JSON prompts, few-shot examples, temperature=0, and a parsing layer that tolerates markdown code fences. Production would add JSON-schema validation and retry-with-correction.
- Real
scanpy/Seuratintegration — needs a domain expert to define the parameter space. - Slurm submission — would replace the in-process tool call with
sbatchfor long-running jobs. - Vector memory for semantic search over past workflows — the project mentions this; SQLite handles the metadata, but a vector DB (Chroma / Qdrant) would handle "find me past analyses similar to this one".
- Authentication — not needed for a single-user prototype.
- Streaming responses — currently the API blocks until the agent finishes. Streaming would improve UX for long tool chains.
- LLM runtime: Ollama (Qwen2.5-3b for dev, easily swapped to 7B or 32B on lab GPUs)
- Backend: FastAPI + Uvicorn
- Agent framework: Custom minimal loop (no LangGraph yet — kept dependency surface small for the prototype, but the abstractions map cleanly onto LangGraph nodes/edges if migration is desired)
- Memory: SQLite
- Frontend: Vanilla HTML/JS (no React build step — keeps the prototype reviewable)
- Container: Docker + Docker Compose
This prototype was built in roughly 3 days. The goal was not feature completeness — it was to validate that the core architecture from the proposal is buildable end-to-end with the tools available, and to surface the design questions that will matter in the real build (provider abstraction, memory schema, multi-agent decomposition, deployment topology).
I am happy to discuss any of the above and the alternatives I considered.