Extract and validate action items from long meeting transcripts using LangChain's Map-Reduce chain pattern.
Built a multi-stage Map–Reduce LLM pipeline using LangChain to extract and validate action items from long meeting transcripts.
{
"task": "",
"owner": "",
"deadline": "",
"confidence": 0.0
}Map-Reduce-Chain/
├── src/
│ ├── config.py
│ ├── models.py # Pydantic schemas
│ ├── document_loader.py # LangChain Documents + metadata
│ ├── map_chain.py # MAP chain (Prompt + LLM + Parser)
│ ├── reduce_chain.py # REDUCE chain
│ ├── confidence_chain.py # Confidence scoring chain
│ ├── validation.py
│ ├── main.py # Pipeline orchestration
│ └── prompts/
│ ├── map_prompt.yaml
│ └── reduce_prompt.yaml
├── tests/
├── data/
├── notebooks/
├── .env.example
└── README.md
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txt# Copy the example to .env
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-...pytest tests/ -v- ✅ Define action item schema
- ✅ Transcript ingestion & metadata handling
- ✅ Smart chunking (by speaker turns, 1-2 minutes)
- ✅ MAP prompt + LangChain chain
- ✅ Output validation & retry logic
- ✅ Merge logic definition
- ✅ REDUCE prompt + chain
- ✅ Confidence scoring layer
- ✅ Edge case handling
- ✅ UI/CLI implementation
- ✅ Documentation
- Map-Reduce Chains: Split, process, and consolidate
- PromptTemplate: Reusable prompt patterns
- LLMChain: Chain prompts with LLM calls
- PydanticOutputParser: Structured extraction
- Document Objects: Metadata-aware text processing
- Custom Text Splitters: Preserve speaker context
- Retry & Validation: Reliability patterns
from src.main import ActionItemExtractor
extractor = ActionItemExtractor()
items = extractor.extract("path/to/transcript.txt", "meeting_001")
for item in items:
print(f"Task: {item.task}")
print(f"Owner: {item.owner}")
print(f"Confidence: {item.confidence}")python src/main.py transcript.txt actions.jsonstreamlit run src/app.pyEdit src/config.py to customize:
LOG_LEVEL: DEBUG, INFO, WARNING, ERRORBATCH_SIZE: Number of chunks to process at onceCONFIDENCE_THRESHOLD: Minimum confidence score (0-1)OPENAI_MODEL: LLM to use (gpt-4, gpt-3.5-turbo)
Run all tests:
pytest tests/ -vRun specific test:
pytest tests/ -k map -vWith coverage:
pytest tests/ --cov=src --cov-report=html- Scalability: Process transcripts of any length
- Reliability: LLM operates on focused contexts
- Debuggability: Each stage is testable independently
- Flexibility: Easy to add validation layers
Raw Transcript
↓
[Ingestion] → Add metadata, normalize
↓
[Chunking] → Speaker turns, preserve context
↓
[MAP Phase] → Extract candidates from each chunk
↓
[REDUCE Phase] → Deduplicate, fill gaps, normalize
↓
[Confidence Scoring] → Rate certainty
↓
[Validation] → Handle edge cases
↓
Structured Action Items
- Requires clear speaker labels in transcript
- Performance degrades on very long transcripts (>1hr) without chunking optimization
- Confidence scores are heuristic-based
- LLM hallucinations possible on ambiguous deadlines
- Create a feature branch
- Write tests for new features
- Run
blackandflake8before committing - Update README if adding new functionality
Built with: LangChain, OpenAI, Pydantic, Streamlit