NBIM LLM-Powered Dividend Reconciliation System

Overview

This project implements an agent-based reconciliation system powered by Large Language Models (LLMs) to automate dividend booking reconciliation between NBIM and Custody data sources.

It addresses real-world operational challenges by identifying reconciliation breaks, explaining causes, and generating control recommendations. See in agents/README.md for detailed explanation of agents and their prompts.

Project Structure

NBIM_case/
│
├── agents/
│   ├── README.md
│   ├── data_verification_agent.py
│   ├── reconciliation_information_agent.py
│   ├── reconciliation_eval_transaction_agent.py
│   ├── reconciliation_remediation_agent.py
│   └── reconciliation_summary_agent.py
│
├── raw_data/
│   ├── NBIM_Dividend_Bookings.csv
│   └── CUSTODY_Dividend_Bookings.csv
│
├── processed_data/
│   ├── dividend_detailed_join.csv
│   ├── dividend_llm_flagged.csv
│   └── dividend_transaction_summaries.csv
│
├── notebooks/
│   ├── README.md
│   └── data_preprocessing.ipynb
│
├── .env
├── main.py
└── README.md

System Architecture and Workflow

The program operates in two modes, each corresponding to a distinct workflow:

Figure 1 — LLM-driven data verification and reconciliation pipeline.

1. Data Verification Mode

Command:

python3 main.py --verify

This mode validates that the data preprocessing logic is correct before reconciliation begins.

Loads and analyzes the Jupyter notebook in notebooks/data_preprocessing.ipynb.

Uses an LLM to verify that column renaming, data cleaning, and joins follow the expected standards.

Produces a structured verification report to provide data quality.

2. Reconciliation Mode

Command:

python3 main.py --reconciliation

This mode runs the full LLM-driven reconciliation workflow from raw data to summarized results.

Step 1: Data Ingestion and Cleaning

Reads NBIM and Custody CSV files from raw_data/.
Standardizes column names and merges the datasets on event_key and isin.
Produces processed_data/dividend_detailed_join.csv (a detailed outer-joined dataset)

Step 2: Break Identification

Analyzes both datasets to detect common reconciliation break types such as mismatched FX rates, missing events, or incorrect settlement amounts.
Generates a structured dictionary of break types, causes, and control recommendations.

Step 3: Break Evaluation

Evaluates each break type across all transactions using contextual LLM reasoning.
Flags records that exhibit reconciliation inconsistencies.
Produces processed_data/dividend_llm_flagged.csv (a dataset with automated LLM-based flags)

Step 4: Transaction Summarization

Generates short, professional summaries for each transaction explaining what was detected and what action is recommended.
Produces processed_data/dividend_transaction_summaries.csv — a human-readable output ready for audit or review.

Step 5: Reconciliation Remediation

Reviews flagged transactions with the NBIM and Custody records to suggest concrete record-level corrections.
Generates proposed updates and accompanying reasons for each remediation.
Outputs the results to processed_data/dividend_llm_remediated.csv for review and audit.

Data Flow Summary

Input Files

NBIM_Dividend_Bookings.csv
CUSTODY_Dividend_Bookings.csv

Processing Steps

Clean and normalize both datasets.
Merge them by event_key and isin.
Identify and evaluate reconciliation breaks using LLM analysis.
Generate audit-ready summaries.

Output Files

dividend_detailed_join.csv -> Full merged dataset
dividend_llm_flagged.csv -> Transactions with LLM-generated flags
dividend_transaction_summaries.csv -> Summarized reconciliation insights
dividend_llm_remediated -> Suggestions to remediated transactions

Technical Stack

Python 3.10+
LangChain – LLM orchestration framework
Pydantic – Structured validation of model outputs
pandas – Data wrangling and joining
dotenv – Secure environment variable management

Design Goals

Automate reconciliation analysis using LLM reasoning
Reduce manual investigation time
Provide explainable and auditable outputs
Standardize break classification and control documentation

Risk Assessment for target state

The NBIM LLM-Powered Reconciliation System introduces automation into a financial control process.
Key risks and mitigations are summarized below.

Risk	Impact	Likelihood	Mitigation
Incorrect LLM output (hallucination or mislabeling)	High	Medium	Use deterministic pre-checks, strict Pydantic schemas, and low-temperature settings.
Missed or false reconciliation breaks	High	Medium	Combine rule-based checks with LLM reasoning; validate against historical test cases.
Unsafe remediations or data overwrites	High	Low	No direct system writes: all remediations are exported to CSV for human approval.
Data quality or join logic errors	Medium	Medium	Preflight verification agent checks joins, types, and transformations before execution.
Prompt drift / inconsistent behavior	Medium	Medium	Keep a saved version of every LLM prompt and regularly test it on known example cases to make sure it still produces the expected results.
Sensitive data exposure	High	Low	Restrict inputs to necessary columns; store secrets in `.env`.
Auditability and explainability gaps	Medium	Medium	Log all prompts, model versions, and outputs; structured JSON ensures reproducibility.
Model unavailability / vendor outage	Medium	Low	Include fallback “rules-only” mode for continuity.
Untrusted or compromised LLM APIs	High	Low	Use only verified and secure API providers (e.g., OpenAI with authentication keys). Enforce HTTPS, validate responses, and reject unsigned or tampered outputs.

Summary

The system uses an LLM-first approach with guardrails rather than hard deterministic rules. Model outputs are constrained by schema validation, and any proposed remediations are exported to CSV for analyst approval (human-in-the-loop). Fully autonomous financial operations require strong governance and control mechanisms to ensure reliability and trust. At the system level, this includes schema validation, prompt versioning (record of prompt templates), secure API usage, and immutable audit logs for every model decision.
Operationally, all remediation outputs remain human-reviewed before execution, preventing unauthorized or automated postings. Longer-term autonomy should introduce layered controls such as confidence scoring, dual-approval workflows, and rule-based policy checks to contain model risk while maintaining efficiency.

Future Extensions

Add a Streamlit dashboard for reviewing flagged transactions
Integrate retrieval-augmented memory for contextual learning (valuable with previous cases)
Implement confidence scoring on flagging
Enable continuous learning from human feedback (fine-tune analysis based on expert input)
I plan to implement an in-context learning mechanism based on retrieval of similar historical records to support automated remediation. The approach involves clustering records by key attributes (such as break type, instrument, currency, and reconciliation flags), then applying case-based reasoning principles to retrieve and adapt solutions from comparable past cases. This would allow the system to suggest or automatically apply remediation actions based on prior successful resolutions.
Integrate rule-based validation layer (Combine deterministic financial rules with LLM reasoning to ensure control robustness and reduce hallucination risk)
Connect to real-time custody APIs (enables real time reconciliation)

Output

Example output of a reconciliation run:

(venv) anders@Anders-sin-MacBook-Pro NBIM_case % python3 main.py --reconciliation

Running Reconciliation ...

Break Types Dictionary:
 {0: 'Dividend Amount Discrepancy', 1: 'Payment Date Mismatch', 2: 'Tax Rate Variance', 3: 'Currency Conversion Errors', 4: 'Event Type Classification Errors', 5: 'Missing Events', 6: 'Settlement Amount Differences'}

Causes Dictionary:
 {0: 'Differences in dividend calculations between the investment manager and custodian can lead to discrepancies in gross and net amounts reported. This may arise from variations in tax treatment or currency conversion rates applied by each party.', 1: 'Payment dates may differ due to processing times or delays in communication between the investment manager and custodian, leading to mismatches in expected cash flows.', 2: 'Variations in tax rates applied by the investment manager versus the custodian can result in discrepancies in the net amounts received, particularly for international dividends subject to different withholding tax rates.'}

Controls Dictionary:
 {0: 'Implement a standardized process for calculating dividends that includes clear guidelines on tax treatment and currency conversion to ensure consistency across both systems.', 1: "Establish a timeline for payment processing that aligns both parties' expectations and includes regular updates to minimize discrepancies in payment dates.", 2: 'Regularly review and reconcile tax rates applied to dividends, ensuring both parties are aligned on the applicable rates and any changes in tax legislation.'}
Loaded 5 rows from processed_data/dividend_detailed_join.csv

Evaluating break type: Dividend Amount Discrepancy (grouped by 'event_key')
Evaluating Dividend Amount Discrepancy: 100%|██████████████████████████████████████████| 3/3 [00:05<00:00,  1.78s/it]
Completed evaluation for 3 transactions under break type 'Dividend Amount Discrepancy'

Evaluating break type: Payment Date Mismatch (grouped by 'event_key')
Evaluating Payment Date Mismatch: 100%|████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.83s/it]
Completed evaluation for 3 transactions under break type 'Payment Date Mismatch'

Evaluating break type: Tax Rate Variance (grouped by 'event_key')
Evaluating Tax Rate Variance: 100%|████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.69s/it]
Completed evaluation for 3 transactions under break type 'Tax Rate Variance'

Evaluating break type: Currency Conversion Errors (grouped by 'event_key')
Evaluating Currency Conversion Errors: 100%|███████████████████████████████████████████| 3/3 [00:05<00:00,  1.84s/it]
Completed evaluation for 3 transactions under break type 'Currency Conversion Errors'

Evaluating break type: Event Type Classification Errors (grouped by 'event_key')
Evaluating Event Type Classification Errors: 100%|█████████████████████████████████████| 3/3 [00:05<00:00,  1.86s/it]
Completed evaluation for 3 transactions under break type 'Event Type Classification Errors'

Evaluating break type: Missing Events (grouped by 'event_key')
Evaluating Missing Events: 100%|███████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.81s/it]
Completed evaluation for 3 transactions under break type 'Missing Events'

Evaluating break type: Settlement Amount Differences (grouped by 'event_key')
Evaluating Settlement Amount Differences: 100%|████████████████████████████████████████| 3/3 [00:05<00:00,  1.83s/it]
Completed evaluation for 3 transactions under break type 'Settlement Amount Differences'

Saved LLM-flagged dataset to processed_data/dividend_llm_flagged.csv

Processed summary for event_key 950123456: The reconciliation of transaction event_key 950123456 has confirmed that all relevant data points align between NBIM and Custody. There are no discrepancies in dividend amounts, payment dates, tax rates, currency conversions, event type classifications, or settlement amounts. All necessary event details are present and consistent, indicating a successful reconciliation with adherence to control guidance for standardized processes and regular reviews.

Processed summary for event_key 960789012: The reconciliation of transaction event_key 960789012 revealed multiple discrepancies, including a dividend amount discrepancy of $342.77 between custody and NBIM records, a payment date mismatch of five days, and a tax rate variance where the custody records indicated a 20% tax rate compared to 25% in NBIM records. Additionally, significant currency conversion errors were identified, with a notable difference in the average FX rate used. The event type was misclassified as 'DVCA' instead of the appropriate classification for a cross-currency dividend payment. To address these issues, it is recommended to implement standardized processes for dividend calculations, align payment processing timelines, and regularly review tax rates to ensure consistency and accuracy across both systems.

Processed summary for event_key 970456789: The reconciliation of transaction event_key 970456789 revealed significant discrepancies in the reported dividend amounts between NBIM and Custody, with NBIM reporting 502,163.9 CHF and Custody reporting a total of 94,605 CHF. Additionally, the analysis identified missing events due to multiple bank accounts associated with the same event_key, suggesting potential unrecorded dividend events. Furthermore, there were settlement amount differences across the rows, indicating inconsistencies in the net amounts settled for the respective nominal bases. No issues were found regarding payment date mismatches, tax rate variances, currency conversion errors, or event type classification errors. It is recommended to implement standardized processes for dividend calculations and regular reconciliations to address these discrepancies.


Running Remediation Agent to propose record-level fixes...

Loaded 3 flagged rows and 5 joined transaction rows.
Generating remediations for 3 transactions...

Remediating: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:32<00:00, 10.70s/it]
Overwrote 5 total rows to processed_data/dividend_llm_remediated.csv

Proposed remediation for event_key 950123456: Updated custodian from 'JPMORGAN_CHASE' to 'CUST/JPMORGANUS' to match custody records.

Proposed remediation for event_key 960789012: Updated payment date to match custody record (25.05.2025).

Proposed remediation for event_key 970456789: Updated custodian from 'UBS_SWITZERLAND' to 'CUST/UBSCH' to match custody records for all rows.

Proposed remediation for event_key 970456789: Adjusted avg_fx_rate_quotation_to_portfolio from 12.4567 to 1.0 to reflect custody records for all rows.

Proposed remediation for event_key 970456789: Updated gross_amount_quotation, net_amount_quotation, and net_amount_settlement for the third row to match custody records, correcting discrepancies in the amounts.


Remediation complete. 5 updated rows proposed and written to processed_data/dividend_llm_remediated.csv

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
agents		agents
assets		assets
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBIM LLM-Powered Dividend Reconciliation System

Overview

Project Structure

System Architecture and Workflow

1. Data Verification Mode

2. Reconciliation Mode

Step 1: Data Ingestion and Cleaning

Step 2: Break Identification

Step 3: Break Evaluation

Step 4: Transaction Summarization

Step 5: Reconciliation Remediation

Data Flow Summary

Design Goals

Risk Assessment for target state

Summary

Future Extensions

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NBIM LLM-Powered Dividend Reconciliation System

Overview

Project Structure

System Architecture and Workflow

1. Data Verification Mode

2. Reconciliation Mode

Step 1: Data Ingestion and Cleaning

Step 2: Break Identification

Step 3: Break Evaluation

Step 4: Transaction Summarization

Step 5: Reconciliation Remediation

Data Flow Summary

Design Goals

Risk Assessment for target state

Summary

Future Extensions

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages