RAG-RuleSync is an end-to-end NLP/LLM pipeline that extracts structured Design-for-Manufacturability (DFM) rules from unstructured PDFs, manuals, and specifications. It extracts rules, assigns logic, and formats them into exact CAD-ready mathematical constraints.
For an in-depth, file-by-file breakdown of the system architecture, constraints, and pipelines, please see Project_description.md.
Install the required packages.
pip install -r requirements.txtCreate a .env file in the root directory and add your Groq API key.
GROQ_API_KEY=gsk_your_actual_key_here
GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instructLaunch the interactive web interface to upload PDFs and visualize the extraction real-time.
python -m streamlit run simple_streamlit_app.pyOpen your browser to http://localhost:8503. Output tables are saved to the output/ directory as _FINAL_FORMATTED.csv.
Use the batch script to process an entire folder of PDFs concurrently. Note: Be cautious with strict API rate limits when doing bulk extractions.
python batch_extract_rules.py --input /path/to/pdfs --output output/compiled_rulesimport asyncio
from core.enhanced_rule_engine import EnhancedConfig, EnhancedRuleEngine
config = EnhancedConfig()
engine = EnhancedRuleEngine(config)
result = asyncio.run(engine.extract_rules_from_text(
'Wall thickness must be at least 0.8mm for injection molding.',
filename='my_document.pdf'
))
print(f"Extraction result: {result}")Data flows from raw JSON extractions (output/{pdf_name}.json) into the final structured schema handled by the DFM refinement pipeline (output/{pdf_name}_FINAL_FORMATTED.csv). The CSV contains:
RuleCategory: E.g., Sheet Metal, Turning, Injection Molding.Name: Semantically generated rule name.Operator/ExpName: The CAD-ready mathematical operators constraints (e.g.,SheetMetal.Thickness >= 0.8).RuleText: Verbatim text.