Skip to content

Feat/e2e pipeline#210

Merged
seyoung4503 merged 26 commits intomasterfrom
feat/e2e-pipeline
Feb 24, 2026
Merged

Feat/e2e pipeline#210
seyoung4503 merged 26 commits intomasterfrom
feat/e2e-pipeline

Conversation

@seyoung4503
Copy link
Collaborator

#️⃣ Issue Number

  • TBD

📝 요약(Summary)

src/lang2sql/ 신규 모듈 코어를 설계하고 자연어 → SQL → 실행 결과까지 이어지는 e2e 파이프라인을
구현했습니다.
RunContext를 제거하고 "define-by-run" 철학(순수 Python 제어흐름)으로 전환했으며, LLM/DB 어댑터, DB
방언별 프롬프트, 샘플 DB 세팅 스크립트, 초보자용 튜토리얼까지 포함합니다.

💬 To Reviewers (선택)

  • run() / _run() 분리 설계를 중점적으로 봐주세요. run()이 hook 발행 + 에러 래핑을 담당하고, 서브클래스는
    _run()만 구현합니다. __call__은 run()으로 위임합니다.
  • db_dialect 파라미터가 BaselineNL2SQL → SQLGenerator → prompts/{dialect}.md 로 전달되는 흐름이 적절한지
    확인 부탁드립니다.
  • core/ 패키지가 외부 의존성 없이 stdlib만 사용하는 제약이 지켜졌는지 확인 부탁드립니다.

PR Checklist

  • BaseComponent / BaseFlow: run() 공개 API (hook 실행), _run() 추상 메서드, call → run() 위임
  • CatalogEntry TypedDict: 스키마 메타데이터 계약
  • KeywordRetriever: BM25 기반 stdlib-only 테이블 검색 (str → list[CatalogEntry])
  • LLMPort / DBPort Protocol: invoke(messages) → str / execute(sql) → list[dict]
  • SQLGenerator: LLM 호출, sql 파싱, db_dialect별 프롬프트 로드
  • SQLExecutor: SQL 실행, 빈 SQL 검증
  • BaselineNL2SQL: KeywordRetriever → SQLGenerator → SQLExecutor e2e 파이프라인
  • AnthropicLLM / OpenAILLM: api_key 파라미터, env var 자동 fallback
  • SQLAlchemyDB: SQLAlchemy 2.x 기반, 모든 SQLAlchemy 지원 DB 연결 가능
  • DB 방언별 프롬프트 파일 (sqlite, postgresql, mysql, bigquery, duckdb)
  • anthropic, sqlalchemy core 의존성으로 격상 (optional extras 제거)
  • scripts/setup_sample_db.py: SQLite/PostgreSQL 샘플 데이터 생성 스크립트
  • quickstart.md: 설치 → API 키 → 샘플 DB → 파이프라인 실행 → 커스터마이징 전체 튜토리얼
  • 테스트: test_components_sql_generator.py, test_components_sql_executor.py, test_flows_nl2sql.py (총
    75개 통과)

…t value pipeline

  - Remove RunComponent Protocol, _apply(), _run_steps()
  - SequentialFlow.run() now takes and returns Any (plain value pipe)
  - BaselineFlow demoted to deprecated alias with DeprecationWarning
…tr → list[CatalogEntry] API

  - run(run: RunContext) → run(query: str) -> list[CatalogEntry]
  - Re-export CatalogEntry from retrieval __init__
  - Remove RunContext contract tests from test_core_base
  - Update SequentialFlow tests to new value-pipe signature
  - Add BaselineFlow deprecation warning assertion
  - Update KeywordRetriever tests to str → list[CatalogEntry]
  - Replace run_query()/RunContext examples with SequentialFlow.run(value)
  - Mark RunContext as legacy utility in RunContext_ko.md
…) the abstract impl

  BaseComponent and BaseFlow now expose run() as the public API (with hook events),
  while _run() becomes the abstract method subclasses implement.
  __call__() is a convenience alias that delegates to run().

  This fixes the design gap where direct .run() calls bypassed the hook system.
  Implements run(query, schemas) -> str with BM25-retrieved schema context building,
  LLMPort.invoke() call,  block extraction, and ComponentError on missing block.
  Implements run(sql) -> list[dict] with empty-sql guard and DBPort.execute() delegation.
  Orchestrates KeywordRetriever → SQLGenerator → SQLExecutor with shared hook injection.
…egrations

  AnthropicLLM: system message extraction, IntegrationMissingError on missing package.
  OpenAILLM: raw openai SDK (not langchain-openai).
  SQLAlchemyDB: SQLAlchemy 2.x text() wrapping and row._mapping conversion.
…roject.toml

  Exports CatalogEntry, LLMPort, DBPort, KeywordRetriever, SQLGenerator, SQLExecutor,
  BaselineNL2SQL, TraceHook, MemoryHook, NullHook, and exception classes.
  Adds [anthropic] and [sqlalchemy] optional dependency groups.
… pipeline

  FakeLLM and FakeDB defined inline (no external mock libraries).
  E2E test verifies 3 component start events via MemoryHook.
…ingError message

  Allow callers to pass an explicit API key to OpenAILLM and AnthropicLLM.
  Remove the optional-extra hint from IntegrationMissingError since both
  packages are now core dependencies.
…ncies

  Remove [project.optional-dependencies] section. Both packages are
  required for the e2e pipeline and should not be optional extras.
  SQLGenerator now accepts db_dialect (sqlite, postgresql, mysql,
  bigquery, duckdb) and loads the matching prompt from
  components/generation/prompts/{dialect}.md. system_prompt takes
  precedence when both are provided.

  BaselineNL2SQL forwards db_dialect to SQLGenerator.
  Covers: sqlite/postgresql prompt loading, unsupported dialect raises
  ValueError, system_prompt overrides db_dialect.
  Creates customers/products/orders/order_items tables with Korean sample
  data for SQLite (default) and PostgreSQL. Fixes postgres default URL to
  match docker-compose-postgres.yml credentials (postgres:postgres).
  Covers installation, API key setup, sample DB creation, SQLAlchemyDB
  connection, catalog definition, BaselineNL2SQL with db_dialect, Hook
  tracing, customization, error handling, and a full feature checklist
  runnable without real API keys using FakeLLM/FakeDB.
…rror message

  Add api_key: str | None = None to AnthropicLLM. Remove extra= from
  IntegrationMissingError since anthropic is now a core dependency.
@seyoung4503 seyoung4503 merged commit 5f4a009 into master Feb 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant