goresearch

Generate validated, citation-rich research reports from a single Markdown brief. goresearch plans queries, searches the web (optionally via SearxNG), fetches and extracts readable text, and can call an external OpenAI-compatible LLM to synthesize a clean Markdown report with numbered citations, references, and an evidence-check appendix. Local LLM containers are not provided.

Features
Installation
Quick start
Configuration
Usage
Caching and reproducibility
Verification & manifest guide
Robots, opt-out, and politeness policy
Tests
Roadmap
Contributing
Support
License
Project status
Full CLI reference
Run locally with Docker
Local stack helpers (optional)
Troubleshooting & FAQ

Features

End-to-end pipeline: brief parsing → planning → search → fetch/extract → selection/dedup → budgeting → synthesis → validation → verification → rendering.
Grounded synthesis: strictly uses supplied extracts; numbered inline citations map to a final References section.
Evidence check: second pass extracts claims, maps supporting sources, and flags weakly supported statements.
Deterministic and scriptable: low-temperature prompts, structured logs, and explicit flags/envs.
Pluggable search: defaults to self-hosted SearxNG; adapters can be swapped without changing the rest.
Polite fetching: user agent, timeouts, redirect caps, content-type checks, and optional HTTP cache with conditional requests.
Public web only: blocks localhost/private IPs and URL-embedded credentials to avoid private services.
Token budgeting: proportional truncation prevents dropping sources while fitting model context.
Reproducibility: embedded manifest and sidecar JSON record URLs and content digests used in synthesis.
Dry run: plan queries and select URLs without calling the model.

Installation

Prerequisites:

Go 1.23+ (module toolchain go1.24.6)
An OpenAI-compatible server (local OSS runtime recommended), with model name and API key
Optional: a SearxNG instance URL (and API key if required)

Install the CLI directly:

go install github.com/hyperifyio/goresearch/cmd/goresearch@latest

Or build from source:

git clone https://github.com/hyperifyio/goresearch
cd goresearch
go build -o bin/goresearch ./cmd/goresearch

Quick start

One‑liner (deterministic dry run)

Copy and paste this command. It writes a small “hello research” brief, runs a dry run (no LLM required), and prints the beginning of the resulting Markdown report.

printf "%s\n" "# Hello Research — Brief introduction to goresearch" "" \
  "Audience: Developers and researchers" \
  "Tone: Practical, welcoming" \
  "Target length: 800 words" "" \
  "Key questions: What is goresearch? How does it work? What makes it useful for researchers and developers?" \
  > hello-research.md && \
goresearch -dry-run -input hello-research.md -output hello-research-report.md && \
sed -n '1,24p' hello-research-report.md

Expected output (first lines):

# goresearch (dry run)

Topic: Hello Research — Brief introduction to goresearch
Audience: Developers and researchers
Tone: Practical, welcoming
Target Length (words): 800

Planned queries:
1. Hello Research — Brief introduction to goresearch specification
2. Hello Research — Brief introduction to goresearch documentation
3. Hello Research — Brief introduction to goresearch reference
4. Hello Research — Brief introduction to goresearch tutorial
5. Hello Research — Brief introduction to goresearch best practices
6. Hello Research — Brief introduction to goresearch faq
7. Hello Research — Brief introduction to goresearch examples
8. Hello Research — Brief introduction to goresearch comparison
9. Hello Research — Brief introduction to goresearch limitations
10. Hello Research — Brief introduction to goresearch contrary findings

Selected URLs:
1. Hello Research — Brief introduction to goresearch specification — https://github.com/hyperifyio/goresearch
2. Hello Research — Brief introduction to goresearch reference — https://goresearch.dev/reference
3. Hello Research — Brief introduction to goresearch documentation — https://goresearch.dev/documentation
4. Hello Research — Brief introduction to goresearch tutorial — https://goresearch.dev/tutorial

Tip: remove -dry-run and set LLM_BASE_URL (e.g., https://your-llm.example.com/v1), LLM_MODEL (e.g., your/model-id), and LLM_API_KEY (if your server requires it). SEARX_URL is optional.

“Hello research” brief and result

Brief used above:

# Hello Research — Brief introduction to goresearch

Audience: Developers and researchers
Tone: Practical, welcoming  
Target length: 800 words

Key questions: What is goresearch? How does it work? What makes it useful for researchers and developers?

Result (dry run) is written to hello-research-report.md. See also the committed sample at hello-research-report.md and reports/hello-research-brief-introduction-to-goresearch/report.md.

Create a minimal request.md with topic and optional hints:

# Cursor MDC format — concise overview for plugin authors
Audience: Senior engineers
Tone: Practical, matter-of-fact
Target length: 1200 words

Key questions: spec, examples, best practices.

Run goresearch with your LLM endpoint configured (external OpenAI-compatible API):

export LLM_BASE_URL="https://your-llm.example.com/v1"
export LLM_MODEL="your/model-id"

goresearch \
  -input request.md \
  -output report.md \
  -llm.base "$LLM_BASE_URL" \
  -llm.model "$LLM_MODEL" \
  -llm.key "$LLM_API_KEY"

Open report.md. You should see a title and date, an executive summary, body sections with bracketed citations like [3], a References list with URLs, an Evidence check appendix, and a reproducibility footer.

Tip: explore without calling the LLM first:

goresearch -input request.md -output report.md -dry-run -searx.url "$SEARX_URL"

Configuration

You can configure via flags or environment variables.

Environment variables:

LLM_BASE_URL: base URL for the OpenAI-compatible server (e.g., http://localhost:1234/v1)
LLM_MODEL: model name
LLM_API_KEY: API key for the server
SEARX_URL: SearxNG base URL (e.g., https://searx.example.com)
SEARX_KEY (optional): SearxNG API key
TOPIC_HASH (optional): included for traceability in cache scoping

Primary flags (with defaults):

-input (default: request.md): path to input Markdown research request
-output (default: report.md): path for the final Markdown report
-searx.url: SearxNG base URL
-searx.key: SearxNG API key (optional)
-searx.ua: Custom User-Agent for SearxNG requests (default identifies goresearch)
-search.file: Path to a JSON file providing offline search results for a file-based provider
-llm.base: OpenAI-compatible base URL
-llm.model: model name
-llm.key: API key
-max.sources (default: 12): total sources cap
-max.perDomain (default: 3): per-domain cap
-max.perSourceChars (default: 12000): per-source character limit for excerpts
-min.snippetChars (default: 0): minimum snippet chars to keep a search result
-lang (default: empty): language hint, e.g. en or fi
-dry-run (default: false): plan/select without calling the LLM
- -v (default: false): verbose console output (progress). Detailed logs are controlled via -log.level.
- -log.level (default: info): structured log level for the log file: trace|debug|info|warn|error|fatal|panic
- -log.file (default: logs/goresearch.log): path to write structured JSON logs
- -debug-verbose (default: false): allow logging raw chain-of-thought (CoT) for debugging Harmony/tool-call interplay. Off by default.
-cache.dir (default: .goresearch-cache): cache directory
-cache.maxAge (default: 0): purge cache entries older than this duration (e.g. 24h, 7d); 0 disables
-cache.clear (default: false): clear entire cache before run
-cache.topicHash: optional topic hash to scope cache (accepted for traceability)
-cache.strictPerms (default: false): restrict cache at rest (0700 dirs, 0600 files)
-robots.overrideDomains (default from env ROBOTS_OVERRIDE_DOMAINS): comma-separated domain allowlist to ignore robots.txt, requires -robots.overrideConfirm
-robots.overrideConfirm (default: false): second confirmation flag required to activate robots override allowlist
-domains.allow (comma-separated): only allow these hosts/domains; subdomains included
-domains.deny (comma-separated): block these hosts/domains; takes precedence over allow
-tools.enable (default: false): enable the tool-orchestrated chat mode
-tools.dryRun (default: false): do not execute tools; append structured dry-run envelopes
-tools.maxCalls (default: 32): maximum number of tool calls per run
-tools.maxWallClock (default: 0): wall-clock cap for the tool loop (e.g., 30s); 0 disables
-tools.perToolTimeout (default: 10s): per-tool execution timeout
-tools.mode (default: harmony): chat protocol mode: harmony or legacy
-verify / -no-verify (default: -verify): enable or disable the fact-check verification pass and Evidence check appendix

Full CLI reference

For a comprehensive, auto-generated list of all flags and environment variables, see: docs/cli-reference.md.

Run locally with Docker (optional)

Important: On Apple M2 virtual machines (including this development environment), Docker is not available due to nested virtualization limits. Use the non-Docker alternatives documented below (for example, Homebrew/venv SearxNG and a local LLM). On machines with Docker installed, you can run the full local stack.

Prerequisites

Docker Desktop with Compose v2 (or Docker Engine + docker compose CLI)
Recommended: ≥4 CPUs and ≥8 GB RAM for the LLM service
Network access to pull images on first run

goresearch is a CLI that you run on the host. Docker Compose is only used to provide optional dependencies.

Start SearxNG locally (optional for web search):

docker compose -f docker-compose.optional.yml up -d searxng

Optional services live in docker-compose.optional.yml and can be brought up as needed (TLS proxy only):

# TLS reverse proxy via Caddy — optional (needs optional file)
docker compose -f docker-compose.optional.yml --profile tls up -d caddy-tls

Environment variables

Compose will read a local .env file when present and also respects exported shell variables. Useful settings:

LLM_BASE_URL: base URL for your LLM server
LLM_MODEL: model identifier known to your LLM server
LLM_API_KEY: API key if your server requires one (not baked into images)
SEARX_URL: internal URL for SearxNG (default http://searxng:8080)
SSL_VERIFY: enable SSL certificate verification; set to false for self-signed certificates (default true)
APP_UID / APP_GID: host user/group IDs to avoid permission issues on bind mounts (e.g., APP_UID=$(id -u) APP_GID=$(id -g) before make up)

Health checks and readiness

Services declare health checks: searxng probes /status.

Check health via:

docker compose -f docker-compose.optional.yml ps
docker compose -f docker-compose.optional.yml logs -f --tail=100

Robots, opt-out, and politeness policy

... (content unchanged from original README)

Tests

Run all tests:

go test ./...

... (remaining sections unchanged from original README)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

goresearch

Table of contents

Features

Installation

Quick start

One‑liner (deterministic dry run)

“Hello research” brief and result

Configuration

Full CLI reference

Run locally with Docker (optional)

Prerequisites

Environment variables

Health checks and readiness

Robots, opt-out, and politeness policy

Tests

FilesExpand file tree

README.Extra.md

Latest commit

History

README.Extra.md

File metadata and controls

goresearch

Table of contents

Features

Installation

Quick start

One‑liner (deterministic dry run)

“Hello research” brief and result

Configuration

Full CLI reference

Run locally with Docker (optional)

Prerequisites

Environment variables

Health checks and readiness

Robots, opt-out, and politeness policy

Tests