This directory contains approximately 80,000 historically-themed banking transactions generated based on actual events from the Medici Bank's operations during 1390-1440. The data represents major financial and political events in Renaissance Italy.
The dataset was expanded from the original 20,000 transactions by running generate_additional_data.py, which:
- Appended 60,000 more legitimate transactions (same historical distribution)
- Injected a hidden embezzlement trail (~230 transactions, ~95,610 florins, 1420–1424) for forensic analysis exercises
medici_transactions.csv(~11 MB) - Transaction data in CSV format (~80,000 rows)medici_transactions.json(~29 MB) - Transaction data in JSON format (~80,000 records)
generate_historical_data.py- Script to generate the initial 20,000 historical transactionsgenerate_additional_data.py- Script to expand to 80,000+ transactions and inject the embezzlement trailvalidate_transactions.py- Script to validate the generated data
Each transaction contains the following fields:
| Field | Type | Description |
|---|---|---|
id |
Integer | Unique transaction identifier |
date |
ISO Date | Transaction date (YYYY-MM-DD) |
branch |
String | Branch location (Florence, Rome, Venice, etc.) |
type |
String | Transaction type (see types below) |
counterparty |
String | The other party in the transaction |
description |
String | Detailed transaction description |
debit_account |
String | Account to debit |
debit_amount |
Decimal | Amount to debit (in florins) |
credit_account |
String | Primary account to credit |
credit_amount |
Decimal | Amount to credit (in florins) |
credit_account_2 |
String | Secondary credit account (optional) |
credit_amount_2 |
Decimal | Secondary credit amount (optional) |
currency |
String | Currency used (typically "florin") |
The data includes the following transaction types based on historical banking operations:
- deposit (31.7%) - Customer deposits, especially from papal sources
- operating_expense (13.6%) - Daily operating costs (wages, rent, supplies, etc.)
- war_financing (13.1%) - Loans to Florence, Venice for various wars
- loan_repayment (10.3%) - Loan repayments with interest
- loan_issuance (9.2%) - Loans to merchants and nobles
- bill_of_exchange (8.0%) - International money transfers (Medici innovation)
- withdrawal (7.8%) - Customer withdrawals
- alum_trade (6.3%) - Trade in alum from papal mines
- ransom_payment (<0.1%) - Special: Council of Constance ransom (35,000 florins)
The data is based on real historical events:
- In 1410, Pope John XXIII appointed the Medici as papal bankers
- The Rome branch held ~100,000 florins in papal deposits
- This was the bank's most important client relationship
- The data includes the famous 35,000 florin ransom payment (May 29, 1415)
- Giovanni di Bicci de' Medici paid this to secure Pope John XXIII's release
- This was almost half the bank's profits from its first 20 years
- First War (1390-1402): Against Gian Galeazzo Visconti
- Second War (1422-1426): Against Duke Filippo Maria of Milan
- Significant war financing transactions during these periods
- Conflicts between Venice and Milan
- Medici financed Florence's participation
- Data shows increased war financing during this period
- Rome (33.1%) - Papal banking center
- Florence (22.1%) - Home base
- Venice (9.7%) - Major trading partner
- London, Bruges, Avignon, Geneva, Milan - International network
All transactions maintain proper double-entry accounting:
- Total Transactions: ~80,230
- Every transaction is perfectly balanced (debits = credits).
# View first 10 transactions in CSV
head -11 medici_transactions.csv
# Count total transactions
wc -l medici_transactions.csv
# Search for specific events
grep "ransom" medici_transactions.csv
grep "war_financing" medici_transactions.csv# Generate fresh transaction data
python3 generate_historical_data.pyThis will create new medici_transactions.csv and medici_transactions.json files.
# Run validation checks
python3 validate_transactions.pyThis validates:
- CSV and JSON structure
- Double-entry accounting (debits = credits)
- Date formats
- Transaction distributions
- Historical event coverage
date,branch,type,amount,description
1415-05-29,Constance,ransom_payment,35000.0,"Payment of 35,000 florin ransom for Pope John XXIII"date,branch,type,amount,description
1390-01-05,Rome,deposit,38176.7,"Deposit from Vatican Treasury to Rome branch"date,branch,type,amount,description
1390-01-01,Florence,war_financing,82833.66,"Emergency war financing for Florence defense"date,branch,type,debit,credit,credit2,description
1390-01-06,Florence,loan_repayment,829.16,720.31,108.85,"Loan repayment from Duke of Milan with interest"The data is designed to reflect:
- ✓ Realistic transaction amounts (exponential distribution)
- ✓ Historical interest rates (8-25% per annum)
- ✓ Branch distribution (Rome dominant for papal banking)
- ✓ Seasonal patterns in business activity
- ✓ War periods showing increased financing
- ✓ Major historical events (Council of Constance ransom)
- ✓ Banking innovations (bills of exchange)
- ✓ Papal monopolies (alum trade)
Hidden Forensic Exercise
The dataset includes a deliberately embedded embezzlement scenario for use in data engineering and forensic analysis exercises. A fictitious supplier named "Ser Benedetto Forniture" appears in the Florence branch operating expense records between 1420 and 1424, representing ~230 fraudulent transactions totalling approximately 95,610 florins.
The fraud is designed to be plausible on casual inspection but detectable through simple statistical analysis, including:
- Benford's Law analysis of first digits
- Vendor concentration analysis
- Round-number clustering
- Payment frequency analysis
For instructors: Full details of the scheme, detection methods, discussion questions, and grading rubric are in INSTRUCTOR_EMBEZZLEMENT_GUIDE.md. Do not distribute to students before the exercise.
This data can be imported into the existing medici-banking.py ledger system for demonstration purposes. Each transaction follows the double-entry accounting principles implemented in the main codebase.
The historical context is drawn from:
- Raymond de Roover's "The Rise and Decline of the Medici Bank" (1963)
- Contemporary chronicles and papal records
- Historical research on Renaissance Italian banking
This data is part of the MediciMess educational project and is released under the MIT License.