SEC Deep Dive

Analyze entire SEC filings with questions that require reading the whole document, not just retrieved chunks. Powered by Recursive Language Models by Alex L. Zhang, Tim Kraska, and Omar Khattab.

Standard RAG retrieves a few chunks and hopes the answer is in there. SEC Deep Dive stores the entire filing as a variable in a Python REPL and lets GPT-5 programmatically navigate it: grepping sections, slicing tables, and spawning recursive sub-LM calls over specific parts. It handles 10-Ks that are 200+ pages without truncation.

Good for questions like:

"List every related-party transaction anywhere in this filing" (full-document scan)
"Compare revenue recognition across these 3 filings" (cross-document reasoning)
"What's the ratio of non-performing loans to total assets?" (multi-section financial math)

Install

Clone and install in editable mode. There is no PyPI package: everything runs from source.

git clone https://github.com/siddvoh/secdd.git
cd secdd
pip install -e .

That's it. pip install -e . installs the dependencies from pyproject.toml and wires up the secdd CLI.

Prerequisites

Python 3.11+
An OpenAI API key with access to GPT-5 models (or run fully local with Ollama)

Set the key in your shell or in a .env file (copied from .env.example):

cp .env.example .env   # then edit .env and paste your key
# or:
export OPENAI_API_KEY=sk-...

Your key stays on your machine. secdd reads it at runtime and passes it straight to OpenAI.

CLI

# Simple query (single LLM call)
secdd "What are the main risk factors in retail?"

# Analyze local files
secdd filing.pdf "What are the main risk factors?"
secdd 10k_2023.htm 10k_2024.htm "Compare revenue recognition policies"

# RLM mode: deep multi-pass analysis with recursive sub-calls
secdd --rlm filing.pdf "List every related-party transaction"

# Fetch directly from EDGAR by ticker
secdd --ticker AAPL "Summarize the risk factors in the latest 10-K"
secdd --ticker MSFT --form 10-Q "What changed in revenue recognition?"

Options:

Flag	Description
`--rlm`	Enable Recursive Language Model mode (deep analysis, 30-90s)
`--ticker, -t`	Fetch filing from EDGAR by ticker (implies `--rlm`)
`--form, -f`	SEC form type (default: `10-K`)
`--reasoning-effort, -r`	`none`, `low`, `medium` (default), `high`, `xhigh`
`--depth, -d`	RLM recursion depth, 1=fast, 5=thorough (default: 2)
`--iterations, -i`	RLM max iterations, 5=fast, 30=thorough (default: 15)
`--environment, -e`	REPL sandbox: `docker` (default, safe) or `local` (fast, dev only)
`--local`	Use local Ollama model instead of OpenAI
`--version, -v`	Print version and exit

Python Library

import asyncio
from secdd.engine import analyze_filing

result = asyncio.run(analyze_filing(
    query="What are Apple's biggest risk factors?",
    filing_text=open("apple_10k.txt").read(),
    filing_info="Apple Inc (AAPL) | 10-K | Filed: 2024-11-01",
))

print(result.answer)
print(f"Cost: ${result.estimated_cost_usd:.3f}")
print(f"Sub-calls: {result.num_sub_calls}")

Fetch from EDGAR programmatically

from secdd.edgar import get_filing

filing = asyncio.run(get_filing(ticker="AAPL", form_type="10-K"))
print(f"{filing.company_name} | {filing.form_type} | {len(filing.text):,} chars")

Configuration

Variable	Required	Description
`OPENAI_API_KEY`	Yes (unless `--local`)	Your OpenAI API key
`EDGAR_USER_AGENT`	Recommended	Contact info the SEC wants in the User-Agent, e.g. `Jane Doe <jane@example.com>`
`RLM_ENVIRONMENT`	No	`docker` (default, sandboxed) or `local` (in-process, faster but unsafe with untrusted inputs)
`RLM_MAX_WORKERS`	No	Max parallel RLM completions (default `3`). Lower if you see OOM.
`LOCAL_MODEL_BASE_URL`	No	Ollama URL (e.g. `http://localhost:11434/v1`)
`LOCAL_ROOT_MODEL_NAME`	No	Local root model (e.g. `qwen3:8b`)
`LOCAL_SUB_MODEL_NAME`	No	Local sub model (e.g. `qwen3:4b`)

Set these in a .env file (loaded automatically) or export them in your shell. .env is gitignored: your key never leaves your machine.

Local Models with Ollama (Optional)

Run analysis using local models via Ollama instead of OpenAI. No API key needed.

# 1. Install and start Ollama
ollama serve
ollama pull qwen3:8b && ollama pull qwen3:4b

# 2. Set environment variables
export LOCAL_MODEL_BASE_URL=http://localhost:11434/v1
export LOCAL_ROOT_MODEL_NAME=qwen3:8b
export LOCAL_SUB_MODEL_NAME=qwen3:4b

# 3. Use --local flag
secdd --local --rlm filing.pdf "Summarize the risk factors"

In Python, pass use_local=True to analyze_filing().

Architecture

User question + ticker
        |
        v
   EDGAR API --> Fetch full filing text (free, no key needed)
        |
        v
   RLM Engine (github.com/alexzhang13/rlm)
        |
        |-- Root LM: GPT-5.4
        |     Receives: query + metadata (NOT the filing text)
        |     Writes: Python code to navigate the filing
        |     Calls: llm_query() for semantic sub-analysis
        |
        +-- Sub LM: GPT-5.4-nano
              Receives: specific section/chunk from root LM's code
              Returns: structured analysis back to root LM
        |
        v
   FINAL(answer) with section citations

The filing text lives in the REPL as a Python variable context. The root LM never has it in its context window: it writes code like context[50000:80000] or re.findall(r'Item 7', context) to navigate, then uses llm_query() to reason about specific chunks.

Cost

A typical single-filing query costs $0.10 to $0.30. Cross-document queries with many sub-calls can hit $0.50 to $1.00.

Model	Role	Input	Output
GPT-5.4	Root (orchestration)	$2.50/1M tokens	$15.00/1M tokens
GPT-5.4-nano	Sub (bulk processing)	$0.20/1M tokens	$1.25/1M tokens

Project Structure

secdd/
├── secdd/              # Main package (CLI + core library)
│   ├── __init__.py
│   ├── cli.py          # `secdd` command entry point
│   ├── engine.py       # RLM wrapper, model config, cost tracking
│   ├── edgar.py        # EDGAR API (fetch filings by ticker)
│   └── prompts.py      # Finance + government RLM system prompts
├── tests/              # Pytest suite (pure-function tests, no network)
├── pyproject.toml
├── requirements.txt
├── .env.example
└── LICENSE

Acknowledgments

This project is a thin SEC-filing wrapper around the Recursive Language Models (RLM) library by Alex L. Zhang, Tim Kraska, and Omar Khattab. RLM is what makes full-document analysis possible.

Please cite their paper if you use this:

Alex L. Zhang, Tim Kraska, Omar Khattab. "Recursive Language Models." arXiv:2512.24601, 2026.

@misc{zhang2026recursivelanguagemodels,
      title={Recursive Language Models},
      author={Alex L. Zhang and Tim Kraska and Omar Khattab},
      year={2026},
      eprint={2512.24601},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
}

Limitations

EDGAR rate limit: ~10 requests/second. The code handles this but bulk fetching is slow.
Some older SEC filings are in SGML/ASCII format and parse poorly.
RLM queries take 30 to 90 seconds due to multiple LM calls. Not suitable for autocomplete-style UX.
Use RLM_ENVIRONMENT=docker in production. The local REPL uses in-process code execution, which is fine for dev but not safe with untrusted inputs.

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEC Deep Dive

Install

Prerequisites

CLI

Python Library

Fetch from EDGAR programmatically

Configuration

Local Models with Ollama (Optional)

Architecture

Cost

Project Structure

Acknowledgments

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
secdd		secdd
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SEC Deep Dive

Install

Prerequisites

CLI

Python Library

Fetch from EDGAR programmatically

Configuration

Local Models with Ollama (Optional)

Architecture

Cost

Project Structure

Acknowledgments

Limitations

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages