Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f68768e
Check
mdmuminul Feb 27, 2026
6c41ef9
https://chatgpt.com/s/t_69a2965bcc808191906e28d9aee7299c
Feb 28, 2026
419b9f2
Improve checkout parsing, logging, and basic validation
Feb 28, 2026
e38851e
Add checkout validation tests and scaffold transaction_verification s…
Feb 28, 2026
8491cc9
Add transaction_verification gRPC service and Docker Compose setup
Feb 28, 2026
606456c
Connect orchestrator to transaction_verification via gRPC
Feb 28, 2026
f8c3e61
Add suggestions gRPC service and connect it to orchestrator
Feb 28, 2026
b9c7581
Add fraud detection gRPC check to checkout flow
Feb 28, 2026
0cb5ee2
Add threaded orchestration for backend gRPC services
Feb 28, 2026
4cd2062
Fraud Detection
mdmuminul Feb 28, 2026
702d31f
protobuf version Update
mdmuminul Feb 28, 2026
06c1f45
Fix frontend checkout JSON request and response handling
Feb 28, 2026
ccba99c
Creating Proto file for Transaction Verification
mdmuminul Feb 28, 2026
ca7831f
transaction_verification
mdmuminul Mar 2, 2026
ffbbf84
Suggestions
mdmuminul Mar 2, 2026
6e6d4fa
Merge branch 'Mumin' into anup_dev
Mar 2, 2026
2f82c3f
Revert "Merge branch 'Mumin' into anup_dev"
Mar 2, 2026
c9fadc8
initial commit
Mar 2, 2026
5ad3d74
incremental changes
Mar 2, 2026
30618b2
Merge pull request #1 from anup28kmr/anup_dev
anup28kmr Mar 2, 2026
10d9011
Implement Seminar 5 event ordering with vector clocks
Mar 11, 2026
2b2d913
code refactoring and adding vector clock implementation
Mar 13, 2026
c7fe3bf
Implement Seminar 7 leader election
Sten-Qy-Li Mar 25, 2026
1208b71
Add 3-executor leader election bonus implementation
Sten-Qy-Li Apr 1, 2026
4d11f02
Fix checkpoint 2 clear broadcast and add verification script
Sten-Qy-Li Apr 4, 2026
e4fb60a
Add Checkpoint 2 documentation and diagrams
Sten-Qy-Li Apr 4, 2026
8bf9154
Merge branch 'individual-sten-qy-li' into anup_dev
Apr 5, 2026
49ced61
code refactoring
Apr 5, 2026
b38b29c
code refactoring and updated system models
Apr 6, 2026
b3c9171
system diagram updated
Apr 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
__pycache__
__pycache__
.idea
5 changes: 5 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions .idea/ds-practice-2026.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

214 changes: 188 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,208 @@
# Distributed Systems @ University of Tartu
# Distributed Systems Practice Project - Checkpoint 2

This repository contains the initial code for the practice sessions of the Distributed Systems course at the University of Tartu.
## How to demonstrate that this repository works
This section is intentionally first so it can be used as a short live-demo checklist.

## Getting started
1. Start Docker Desktop, then start the full stack from the repository root.

### Overview
```powershell
docker compose up --build -d
docker compose ps
```

The code consists of multiple services. Each service is located in a separate folder. The `frontend` service folder contains a Dockerfile and the code for an example bookstore application. Each backend service folder (e.g. `orchestrator` or `fraud_detection`) contains a Dockerfile, a requirements.txt file and the source code of the service. During the practice sessions, you will implement the missing functionality in these backend services, or extend the backend with new services.
Expected result: all 9 services are up (`frontend`, `orchestrator`, the 3 backend services, `order_queue`, and the 3 executor replicas).

There is also a `utils` folder that contains some helper code or specifications that are used by multiple services. Check the `utils` folder for more information.
2. Run the reusable Checkpoint 2 verification script.

### Running the code with Docker Compose [recommended]
```powershell
.\scripts\checkpoint2-checks.ps1
```

To run the code, you need to clone this repository, make sure you have Docker and Docker Compose installed, and run the following command in the root folder of the repository:
After the first full build, the quicker rerun is:

```bash
docker compose up
```powershell
.\scripts\checkpoint2-checks.ps1 -SkipBuild
```

This will start the system with the multiple services. Each service will be restarted automatically when you make changes to the code, so you don't have to restart the system manually while developing. If you want to know how the services are started and configured, check the `docker-compose.yaml` file.
Expected result: all implementation-based checks pass, including valid checkout, rejection cases, vector-clock log checks, queue/executor checks, and leader failover.

The checkpoint evaluations will be done using the code that is started with Docker Compose, so make sure that your code works with Docker Compose.
3. If the teaching assistants want a manual happy-path demo, open the frontend at `http://127.0.0.1:8080` and submit a normal order. The REST API is also available at `http://127.0.0.1:8081`.

If, for some reason, changes to the code are not reflected, try to force rebuilding the Docker images with the following command:
4. If they want manual API testing, use the prepared payload files in the repo:

```bash
docker compose up --build
```powershell
Invoke-WebRequest `
-Uri http://127.0.0.1:8081/checkout `
-Method POST `
-ContentType "application/json" `
-Body (Get-Content .\test_checkout.json -Raw)
```

### Run the code locally
Swap in `test_checkout_fraud.json`, `test_checkout_empty_items.json`, and `test_checkout_terms_false.json` to show rejection paths.

Even though you can run the code locally, it is recommended to use Docker and Docker Compose to run the code. This way you don't have to install any dependencies locally and you can easily run the code on any platform.
5. Show the logs that prove the distributed behavior:

If you want to run the code locally, you need to install the following dependencies:
```powershell
docker compose logs --no-color --tail 200 orchestrator transaction_verification fraud_detection suggestions
docker compose logs --no-color --tail 200 order_queue order_executor_1 order_executor_2 order_executor_3
```

backend services:
- Python 3.8 or newer
- pip
- [grpcio-tools](https://grpc.io/docs/languages/python/quickstart/)
- requirements.txt dependencies from each service
Point out:
- `vc=[...]` in the 3 backend services
- `clear_broadcast_sent final_vc=[...]` in the orchestrator
- `action=enqueue` and `action=dequeue` in the queue
- `executing order=` in exactly one executor replica

6. Show the leader-election bonus by either rerunning the script or doing a quick manual failover:

```powershell
docker compose stop order_executor_3
Invoke-WebRequest `
-Uri http://127.0.0.1:8081/checkout `
-Method POST `
-ContentType "application/json" `
-Body (Get-Content .\test_checkout.json -Raw)
docker compose logs --no-color --since 30s order_queue order_executor_1 order_executor_2 order_executor_3
docker compose up -d order_executor_3
```

frontend service:
- It's a simple static HTML page, you can open `frontend/src/index.html` in your browser.
Expected result: another executor becomes leader after timeout, dequeues the next approved order, and execution still happens exactly once.

7. Stop the stack when the demo is over.

```powershell
docker compose down
```

And then run each service individually.
## Checkpoint 2 deliverables in this repo
This repository now contains the implementation and documentation required for Checkpoint 2:

- vector clocks across `transaction_verification`, `fraud_detection`, and `suggestions`
- order queuing plus 3 replicated order executors
- leader election and mutual exclusion for queue consumption
- logs that expose vector-clock values, queue actions, and executor leadership
- a reusable verification script at `scripts/checkpoint2-checks.ps1`
- the required vector-clocks diagram, leader-election diagram, and system-model write-up

## Vector clocks
The vector clock has 3 positions in the fixed service order `[TV, FD, SUG]`.

Each backend service:
- stores per-order state after `InitOrder`
- merges the incoming vector clock with its local vector clock
- increments its own component before logging and replying
- clears the order only if `local_vc <= final_vc`

The diagram below shows one successful execution observed in this repository. The orchestrator starts `ValidateItems` and `ValidateUserData` together, so their relative order may swap between runs. The diagram documents one valid run captured from the logs.

![Vector clocks diagram](./docs/diagrams/vector-clocks.svg)

Observed successful event sequence:

| Step | Service | Event | Vector clock |
| --- | --- | --- | --- |
| 1 | Transaction verification | `ValidateUserData` | `[1, 0, 0]` |
| 2 | Transaction verification | `ValidateItems` | `[2, 0, 0]` |
| 3 | Fraud detection | `CheckUserFraud` | `[1, 1, 0]` |
| 4 | Suggestions | `PrecomputeSuggestions` | `[2, 0, 1]` |
| 5 | Transaction verification | `ValidateCardFormat` | `[3, 0, 0]` |
| 6 | Fraud detection | `CheckCardFraud` | `[3, 2, 0]` |
| 7 | Suggestions | `FinalizeSuggestions` | `[3, 2, 2]` |

Bonus behavior:
- the orchestrator merges all completed event clocks into one `final_vc`
- the orchestrator broadcasts `ClearOrder(final_vc)` to all 3 services
- each service clears only when its local vector clock is not ahead of the final one

## Leader election and mutual exclusion
The order execution tier uses 3 replicas: `order_executor_1`, `order_executor_2`, and `order_executor_3`.

The implementation follows a bully-style pattern:
- a replica starts an election only if no healthy leader is known
- a replica contacts only higher-numbered peers during election
- the highest live executor becomes leader and announces itself
- the leader sends heartbeats
- followers start a new election if the leader times out
- only the current leader dequeues from `order_queue`

![Leader election diagram](./docs/diagrams/leader-election.svg)

Why this satisfies the checkpoint requirements:
- leader election is visible in logs through `starting election`, `became leader`, and `new leader is ...`
- mutual exclusion is enforced because only the leader calls `Dequeue`
- the failover path is demonstrable with 3 replicas by stopping the current leader and submitting another valid order

## System model
### Architecture
The system is a small distributed online-bookstore workflow:

![Architecture diagram](./docs/diagrams/architecture-diagram.jpg)

- `frontend` serves the browser UI
- `orchestrator` accepts checkout requests over HTTP and coordinates the workflow
- `transaction_verification`, `fraud_detection`, and `suggestions` are gRPC services that participate in the vector-clock event flow
- `order_queue` stores approved orders in FIFO order
- `order_executor_1..3` form a replicated execution tier that elects a leader and consumes approved orders

### System flow
The following diagram shows the end-to-end flow of an order through the system:

![System flow diagram](./docs/diagrams/system-flow-diagram.jpg)

### Communication model
- the browser communicates with the orchestrator over HTTP
- the orchestrator communicates with backend services over synchronous gRPC calls
- executor replicas communicate with each other over gRPC for election, coordinator announcements, and heartbeats
- the order queue is a separate gRPC service used by the orchestrator and the current leader
- all services run in Docker Compose on one virtual network, but they still behave as separate processes with separate local state

### Concurrency and ordering
- the orchestrator starts multiple validation/fraud/suggestion steps in parallel threads
- there is no global clock
- ordering is captured by vector clocks rather than wall-clock timestamps
- approval requires the full dependency chain to complete successfully
- queue consumption is serialized by leadership: only one replica is allowed to dequeue at a time

### Failure assumptions
- the executor layer assumes crash-stop failures, not Byzantine behavior
- a failed leader is detected through missing heartbeats
- after timeout, surviving replicas re-run election and the highest live replica becomes leader
- backend service state for vector clocks is kept in memory per order, so restarting a container loses that in-memory state
- the queue is also in-memory, so queued orders are not durable across queue restarts

### Safety properties
- every order gets a unique `orderId` from the orchestrator
- vector-clock logs expose causal relationships between backend events
- approved orders are enqueued once by the orchestrator
- only the elected leader dequeues and executes an approved order
- the clear broadcast uses the merged final vector clock so services do not clear too early

### Known limitations
- there is no persistent database yet
- the queue and service caches are process-local memory only
- the frontend and orchestrator are single-instance services
- retries and network partitions are not handled beyond the simple crash-stop assumptions needed for this checkpoint

## Logs and verification
The reusable verification script is `scripts/checkpoint2-checks.ps1`.

It checks:
- Docker and Docker Compose availability
- Compose startup
- Python syntax for all backend services
- one valid checkout
- three rejection scenarios
- vector-clock log presence
- queue enqueue and dequeue behavior
- leader failover and executor recovery

Prepared input files:
- `test_checkout.json`
- `test_checkout_fraud.json`
- `test_checkout_empty_items.json`
- `test_checkout_terms_false.json`

The required documentation assets are also available in `docs/`:
- `docs/diagrams/vector-clocks.svg`
- `docs/diagrams/leader-election.svg`
- `docs/README.md`
Loading