Open
Conversation
…he KG pipeline, and enhance query node extraction with normalization and n-grams.
…tionResult structure - Removed Normalizer from SLMExtractor, TextRankExtractor, TfidfExtractor, and YakeExtractor. - Updated ExtractionResult to use 'keywords' instead of 'nodes'. - Simplified extraction logic in SLMExtractor to handle individual chunks directly. - Enhanced error logging in extractors for better debugging. - Introduced OpenRouterClient for handling API requests to OpenRouter. - Updated llm_extract_keywords.py to utilize environment variables for API keys. - Removed unused visualizer and normalizer files to streamline the codebase. - Added new prompt for OpenRouter keyword extraction. - Refactored pipeline and run_kg_pipeline to improve configuration handling and logging.
…ncies and ensure spaCy model installation
Contributor
Author
|
I'm trying to simplify this PR, I will notify when I'm done. |
This was referenced Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Knowledge Graph Pipeline for Query Difficulty Estimation
Introduces a fully modular knowledge graph (KG) construction pipeline under
src/knowledge_graph/. The KG is intended to support query difficulty estimation by representing corpus concepts and their relationships as a graph and for graph-based retrieval. Seesrc/knowledge_graph/README.mdfor detailed information about this module.PRs structure
The PRs depend on the previous ones.
PR Review — Knowledge Graph: Extraction Cache & Extractor Selection
Key Changes
build.pyload_chunksmethod for loading chunks and meta pkl generated by index builder.llm_extract_keywords.pyrun_kg_pipeline.pypipeline.pyextractorsllm_extract_keywords.py, for example)linkerspipeline.pyOther files
prompts.py: The prompts used in the LLMs calls.openrouter_client.py: Client used for interacting with OpenRouter, required API key.models.py: Stores the models used in the KG code.Notable Design Decisions
No merging of partial runs.
--extractor openroutervia the pipelineextracts inline without updating
extractions/latest.json. Usellm_extract_keywords.pydirectly if you want to cache the results first.