Contributing to Common Chronicle

Thank you for your interest in contributing to Common Chronicle! This document provides guidelines for contributors to ensure consistency and quality across the codebase.

📋 Table of Contents

Project Overview
System Architecture
Development Environment Setup
Code Standards
Documentation Standards
Contribution Process
Testing Requirements
Commit Guidelines

Project Overview

Common Chronicle is an AI-powered historical research tool that transforms manual research into automated timeline generation. The project consists of:

Backend: FastAPI-based Python application with PostgreSQL database
Frontend: React + TypeScript with Vite build system
AI Integration: Multiple LLM providers (OpenAI, Gemini, Ollama)
Data Sources: Wikipedia, Wikinews, and custom datasets

System Architecture

The following diagram illustrates the high-level architecture of the system, showing the flow from user request to timeline generation.

graph TD
    direction TB

    subgraph "1. User Interaction"
        User_Request("User Submits Research Topic via Frontend")
    end

    subgraph "2. Task Initialization"
        API_Request["Frontend sends API request to create a task"]
        Queue_Job["API creates task in DB & queues background job"]
    end

    subgraph "3. Timeline Generation Pipeline (Async)"
        Orchestrator_Start("Orchestrator starts processing the job")

        subgraph "Pipeline Steps"
            direction TB
            P1["3a. Keyword Extraction"]
            P2["3b. Article Acquisition"]
            P3["3c. Relevance Scoring & Filtering"]
            P4["3d. Event Extraction & Merging"]
        end
        Orchestrator_End("Orchestrator saves final timeline to DB")
    end

    subgraph "4. Displaying Results"
        Notification["Frontend is notified (WebSocket/Poll)"]
        Display("User views the completed timeline")
    end

    subgraph "External Systems & Data"
        LLM["AI Services (LLM Providers)"]
        Data_Sources["Data Sources (Wikipedia, II-Commons)"]
        DB[(Database - PostgreSQL)]
    end

    %% --- Connecting the Flow ---
    User_Request --> API_Request
    API_Request --> Queue_Job
    Queue_Job --> Orchestrator_Start

    Orchestrator_Start --> P1
    P1 --> P2
    P2 --> P3
    P3 --> P4
    P4 --> Orchestrator_End

    Orchestrator_End --> Notification
    Notification --> Display

    %% --- Connecting Dependencies ---
    P1 -->|Calls| LLM
    P2 -->|Queries| Data_Sources
    P3 -->|Calls| LLM
    P4 -->|Calls| LLM

    Queue_Job -->|Writes to| DB
    Orchestrator_End -->|Writes to| DB
    Notification -->|Reads from| DB

    classDef mainflow fill:#d4edff,stroke:#007bff,stroke-width:2px;
    class User_Request,API_Request,Queue_Job,Orchestrator_Start,P1,P2,P3,P4,Orchestrator_End,Notification,Display mainflow;

Architectural Notes: This project utilizes a hybrid task processing model that combines WebSocket communication with database polling for a balance of real-time feedback and reliability.

Task Trigger: Instead of a traditional message queue (like Celery or RQ), background processing for a pending task is initiated directly by a WebSocket connection from the client (/ws/timeline/from_task/{task_id}). This simplifies deployment by avoiding a separate worker process.
Real-time Progress: The Orchestrator pushes live progress updates back to the client through this same WebSocket.
Reliability: The WebSocket connection also polls the database for the task's final status (completed or failed). This ensures that even if the connection is interrupted and re-established, the user will still receive the final result once it's available in the database, providing a robust fault-tolerance mechanism.

Development Environment Setup

Prerequisites

Python 3.12
Node.js 18+
PostgreSQL 12+
Git
uv (recommended) or pip

About uv

uv is a fast Python package installer and resolver, written in Rust. It's significantly faster than pip and provides better dependency resolution. We recommend using uv for development.

Install uv:

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or using pip
pip install uv

Backend Setup

# Clone the repository
git clone https://github.com/Intelligent-Internet/Common_Chronicle.git
cd Common_Chronicle

# Set up Python environment
# Using uv (recommended)
uv venv
source .venv/bin/activate
# Or on Windows: .venv\Scripts\activate

# Or using traditional venv
# python -m venv .venv
# source .venv/bin/activate
# Or on Windows: .venv\Scripts\activate

# Install dependencies
# Using uv (recommended)
uv sync --group dev

# Or using pip
pip install -e ".[all-dev]"

# Set up environment variables
cp config.env.example .env
# Edit .env with your database URL and API keys

# Initialize your local database
# As this project doesn't ship with shared migrations, you need to create an initial one.
# Make sure your database is created, then run:
alembic revision --autogenerate -m "Init local database"

Add new line to your first alembic version file:

import pgvector.sqlalchemy

Then you can continue:

alembic upgrade head

# Set up pre-commit hooks
python scripts/setup-pre-commit.py
# Or on Windows: scripts\setup-pre-commit.bat

# Start the backend server
python main.py

Frontend Setup

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start the development server
npm run dev

Core Data Source Setup (II-Commons)

To enable the powerful local search feature against a full English Wikipedia dataset, you need to run a local vector database instance provided by the II-Commons project. This is optional for general development but required for working on features related to local article acquisition.

1. Download Prebuilt Dataset

First, download the prebuilt "Wikipedia English" dataset from the II-Commons project. Please refer to their repository for the specific download links (often hosted on platforms like Hugging Face). Let's assume you've downloaded and extracted it to a local path like D:\datasets\wikipedia_en.

2. Run the Database with Docker

Use the following command to run a PostgreSQL container that serves the dataset. Make sure to replace the volume path with the actual path to your downloaded dataset.

# Note for Windows users: You might need to adjust the volume path format.
# For example: -v D:/datasets/wikipedia_en:/var/lib/postgresql/data
sudo docker run --rm -it \
  --name postgres-localvector \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres.1234 \
  -e POSTGRES_DB=localvector \
  -e PGDATA=/var/lib/postgresql/data/basebackup \
  -v /path/to/your/downloaded/wikipedia_en:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres-17-parade-vchord

3. Finalize Database Setup

Once the container is running, connect to it using a psql client and run the following commands to optimize it for queries.

-- Connect to the local vector database, for example:
-- psql -h localhost -p 5432 -U postgres -d localvector
-- (The password is "postgres.1234")

-- Inside psql:
ALTER SYSTEM SET vchordrq.probes = 100;

After running ALTER SYSTEM, you must restart the Docker container for the change to take effect.

Your local Wikipedia data source is now ready! Ensure your .env file's DATABASE_URL points to this local database instance to use it.

Code Standards

Python Code Standards

Follow PEP 8 for code style
Use type hints for all function parameters and return values
Import organization: Use isort and black for formatting
Line length: Maximum 88 characters (Black default)
Naming conventions: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants

TypeScript Code Standards

Use TypeScript strict mode
Follow ESLint configuration
Prefer interfaces over types for object shapes
Use meaningful names for variables and functions
Organize imports: External libraries first, then internal modules

General Guidelines

Write self-documenting code with clear names
Keep functions small and focused on single responsibilities
Use consistent error handling patterns
Add logging for debugging and monitoring
Write tests for critical functionality

🔧 Pre-commit Hooks

Pre-commit hooks automatically enforce code quality and consistency before each commit.

Setup

# First, copy the OS-specific configuration file
# On Linux/macOS:
cp .pre-commit-config.yaml.linux .pre-commit-config.yaml
# On Windows:
copy .pre-commit-config.yaml.windows .pre-commit-config.yaml

# Automatic setup
python scripts/setup-pre-commit.py

# Manual verification
pre-commit run --all-files

What Gets Checked

Code Formatting: black, isort, autoflake (Python) + ESLint (TypeScript)
Code Quality: ruff (Python linting) + mypy (type checking)
File Standards: Whitespace, newlines, YAML validation, large file detection

Common Commands

# Run all hooks manually
pre-commit run --all-files

# Skip hooks in emergency (use sparingly)
git commit --no-verify -m "Emergency fix"

# Fix most Python issues automatically
ruff --fix .

Documentation Standards

Core Principles

Minimalist Approach - Add comments only when necessary
Business Logic Priority - Explain "why" rather than "what"
Type System Optimization - Leverage type systems to avoid redundant documentation

Documentation Layers

Module-Level (File Header)

"""
[Module Name] for [primary purpose].

Architecture: [Show relationships with other components]
Key Features: [Brief list of main features]
"""

Class-Level

class ClassName:
    """
    [Concise description].
    """

Function-Level (Complex Logic Only)

# For complex business logic - explain the approach
async def calculate_relevance_score(event: TimelineEvent, viewpoint: str) -> float:
    """Calculate relevance using semantic similarity and temporal proximity."""

Technical Debt Markers

FIXME: Issues requiring urgent attention
TODO: Planned improvements or refactoring
HACK: Temporary workarounds with known limitations
PERF: Performance optimization opportunities

Contribution Process

For New Contributors

Fork the repository and create a feature branch
Set up pre-commit hooks: python scripts/setup-pre-commit.py
Make your changes following the coding standards
Add tests for new functionality
Update documentation as needed
Commit your changes (pre-commit hooks run automatically)
Create a pull request with a clear description

A great way to start is by looking for issues tagged with good first issue or help wanted. These are tasks that we've identified as good entry points into the project.

Pull Request Guidelines

Use descriptive titles that explain what the PR does
Include a detailed description of changes made
Reference related issues using GitHub's linking syntax
Ensure all tests pass and documentation is updated
Request reviews from relevant maintainers

Testing Requirements

Python Testing

Use pytest for unit and integration tests
Maintain test coverage above 80%
Include tests for error conditions and edge cases
Use fixtures for common test data

TypeScript Testing

Use Jest and React Testing Library for frontend tests
Test component behavior, not implementation details
Include accessibility tests where applicable
Mock external dependencies appropriately

Commit Guidelines

Follow the Conventional Commits specification:

<type>[optional scope]: <description>

Types

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes
refactor: Code refactoring
test: Adding or updating tests
chore: Maintenance tasks

Examples

feat(timeline): add event filtering by date range
fix(api): handle missing viewpoint_id in task creation
docs(readme): update installation instructions

Submitting Issues

Before creating an issue, please check if a similar one already exists. We provide several issue templates to help you structure your report. Please use the appropriate template for bug reports, feature requests, or questions to ensure we have all the necessary information.

Quality Assurance

Automated Checks (Pre-commit)

Code formatting and linting
Type checking
File standards validation
Security checks

Manual Checks

Test coverage >80%
Performance validation
Documentation quality
Integration testing

Getting Help

GitHub Issues: For bug reports and feature requests
GitHub Discussions: For general questions and discussions

Configuration Files

File	Purpose
`.pre-commit-config.yaml`	Pre-commit hooks configuration
`pyproject.toml`	Python project configuration
`frontend/eslint.config.js`	Frontend linting rules
`config.env.example`	Environment variables template

Tool Migration Notes

We've replaced pylint with ruff for better performance and modern Python development.

License

By contributing to Common Chronicle, you agree that your contributions will be licensed under the Apache License 2.0.

Thank you for contributing to Common Chronicle! Your efforts help make historical research more accessible to everyone. 🏛️✨

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History