CLI tool that automates literature research from research questions to curated, ranked, and exported paper sets with structured reports.
- Generates search facets and academic queries from one or more research questions
- Searches Semantic Scholar for candidate papers
- Screens and analyzes papers with an LLM through LiteLLM
- Ranks papers and exports reports, references, JSON data, and PDFs
- Supports resume via a saved
state.json
uv pip install litresearchFor local development:
uv sync
uv run nox- Set an LLM API key for a LiteLLM-supported provider:
export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here- Optionally set a Semantic Scholar key for better rate limits:
export S2_API_KEY=your_key_here- Copy the example config and tune defaults:
cp litresearch.toml.example litresearch.toml- Run the pipeline:
litresearch run "What is the impact of large language models on software engineering?"- Inspect the output directory:
output/
report.md
paper_analyses.md
references.bib
references.ris
data.json
papers/
state.json
Run one or more research questions:
litresearch run \
"How do large language models affect developer productivity?" \
"What evidence exists about code quality impacts?"Override settings from the CLI:
litresearch run \
"How do LLMs affect software engineering?" \
--model anthropic/claude-sonnet-4-20250514 \
--top-n 10 \
--threshold 50 \
--output-dir runs/llm-se \
--overwriteResume an interrupted run:
litresearch resume output/state.jsonInspect current configuration:
litresearch configSettings load in this order:
- CLI flags
- Environment variables
litresearch.toml- Built-in defaults
Supported environment variables:
OPENAI_API_KEYANTHROPIC_API_KEYOPENROUTER_API_KEYS2_API_KEYS2_TIMEOUTS2_REQUESTS_PER_SECONDSCREENING_SELECTION_MODESCREENING_TOP_PERCENTSCREENING_TOP_KSCREENING_THRESHOLD
Example litresearch.toml:
default_model = "openai/gpt-4o-mini"
screening_selection_mode = "top_percent"
screening_top_percent = 0.3
screening_threshold = 60
top_n = 20
max_results_per_query = 20
s2_timeout = 10
s2_requests_per_second = 1.0
pdf_first_pages = 4
pdf_last_pages = 2
output_dir = "output"Screening selection modes:
top_percent(default): deep-analyze the top share of screened papers globallytop_k: deep-analyze the top K screened papers globallythreshold: deep-analyze papers scoring>= screening_threshold
Semantic Scholar tuning:
s2_timeout: request timeout in secondss2_requests_per_second: global request rate cap across S2 endpoints
report.md: main literature review report with research questions, search summary, top papers, and synthesispaper_analyses.md: detailed per-paper analysis for all analyzed papersreferences.bib: BibTeX for ranked papers when citation data is availablereferences.ris: RIS export for citation managersdata.json: machine-readable export of the pipeline statepapers/: downloaded open-access PDFs for ranked papersstate.json: resumable pipeline checkpoint
uv run nox
uv run litresearch --helpThis is an MVP-oriented proof of concept intended to answer one question clearly: is the end-to-end literature research workflow useful enough to keep investing in?