Thank you for your interest in contributing to Common Chronicle! This document provides guidelines for contributors to ensure consistency and quality across the codebase.
- Project Overview
- System Architecture
- Development Environment Setup
- Code Standards
- Documentation Standards
- Contribution Process
- Testing Requirements
- Commit Guidelines
Common Chronicle is an AI-powered historical research tool that transforms manual research into automated timeline generation. The project consists of:
- Backend: FastAPI-based Python application with PostgreSQL database
- Frontend: React + TypeScript with Vite build system
- AI Integration: Multiple LLM providers (OpenAI, Gemini, Ollama)
- Data Sources: Wikipedia, Wikinews, and custom datasets
The following diagram illustrates the high-level architecture of the system, showing the flow from user request to timeline generation.
graph TD
direction TB
subgraph "1. User Interaction"
User_Request("User Submits Research Topic via Frontend")
end
subgraph "2. Task Initialization"
API_Request["Frontend sends API request to create a task"]
Queue_Job["API creates task in DB & queues background job"]
end
subgraph "3. Timeline Generation Pipeline (Async)"
Orchestrator_Start("Orchestrator starts processing the job")
subgraph "Pipeline Steps"
direction TB
P1["3a. Keyword Extraction"]
P2["3b. Article Acquisition"]
P3["3c. Relevance Scoring & Filtering"]
P4["3d. Event Extraction & Merging"]
end
Orchestrator_End("Orchestrator saves final timeline to DB")
end
subgraph "4. Displaying Results"
Notification["Frontend is notified (WebSocket/Poll)"]
Display("User views the completed timeline")
end
subgraph "External Systems & Data"
LLM["AI Services (LLM Providers)"]
Data_Sources["Data Sources (Wikipedia, II-Commons)"]
DB[(Database - PostgreSQL)]
end
%% --- Connecting the Flow ---
User_Request --> API_Request
API_Request --> Queue_Job
Queue_Job --> Orchestrator_Start
Orchestrator_Start --> P1
P1 --> P2
P2 --> P3
P3 --> P4
P4 --> Orchestrator_End
Orchestrator_End --> Notification
Notification --> Display
%% --- Connecting Dependencies ---
P1 -->|Calls| LLM
P2 -->|Queries| Data_Sources
P3 -->|Calls| LLM
P4 -->|Calls| LLM
Queue_Job -->|Writes to| DB
Orchestrator_End -->|Writes to| DB
Notification -->|Reads from| DB
classDef mainflow fill:#d4edff,stroke:#007bff,stroke-width:2px;
class User_Request,API_Request,Queue_Job,Orchestrator_Start,P1,P2,P3,P4,Orchestrator_End,Notification,Display mainflow;
Architectural Notes: This project utilizes a hybrid task processing model that combines WebSocket communication with database polling for a balance of real-time feedback and reliability.
- Task Trigger: Instead of a traditional message queue (like Celery or RQ), background processing for a pending task is initiated directly by a WebSocket connection from the client (
/ws/timeline/from_task/{task_id}). This simplifies deployment by avoiding a separate worker process. - Real-time Progress: The
Orchestratorpushes live progress updates back to the client through this same WebSocket. - Reliability: The WebSocket connection also polls the database for the task's final status (
completedorfailed). This ensures that even if the connection is interrupted and re-established, the user will still receive the final result once it's available in the database, providing a robust fault-tolerance mechanism.
- Python 3.12
- Node.js 18+
- PostgreSQL 12+
- Git
- uv (recommended) or pip
uv is a fast Python package installer and resolver, written in Rust. It's significantly faster than pip and provides better dependency resolution. We recommend using uv for development.
Install uv:
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or using pip
pip install uv# Clone the repository
git clone https://github.com/Intelligent-Internet/Common_Chronicle.git
cd Common_Chronicle
# Set up Python environment
# Using uv (recommended)
uv venv
source .venv/bin/activate
# Or on Windows: .venv\Scripts\activate
# Or using traditional venv
# python -m venv .venv
# source .venv/bin/activate
# Or on Windows: .venv\Scripts\activate
# Install dependencies
# Using uv (recommended)
uv sync --group dev
# Or using pip
pip install -e ".[all-dev]"
# Set up environment variables
cp config.env.example .env
# Edit .env with your database URL and API keys
# Initialize your local database
# As this project doesn't ship with shared migrations, you need to create an initial one.
# Make sure your database is created, then run:
alembic revision --autogenerate -m "Init local database"Add new line to your first alembic version file:
import pgvector.sqlalchemyThen you can continue:
alembic upgrade head
# Set up pre-commit hooks
python scripts/setup-pre-commit.py
# Or on Windows: scripts\setup-pre-commit.bat
# Start the backend server
python main.py# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Start the development server
npm run devTo enable the powerful local search feature against a full English Wikipedia dataset, you need to run a local vector database instance provided by the II-Commons project. This is optional for general development but required for working on features related to local article acquisition.
1. Download Prebuilt Dataset
First, download the prebuilt "Wikipedia English" dataset from the II-Commons project. Please refer to their repository for the specific download links (often hosted on platforms like Hugging Face). Let's assume you've downloaded and extracted it to a local path like D:\datasets\wikipedia_en.
2. Run the Database with Docker
Use the following command to run a PostgreSQL container that serves the dataset. Make sure to replace the volume path with the actual path to your downloaded dataset.
# Note for Windows users: You might need to adjust the volume path format.
# For example: -v D:/datasets/wikipedia_en:/var/lib/postgresql/data
sudo docker run --rm -it \
--name postgres-localvector \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres.1234 \
-e POSTGRES_DB=localvector \
-e PGDATA=/var/lib/postgresql/data/basebackup \
-v /path/to/your/downloaded/wikipedia_en:/var/lib/postgresql/data \
-p 5432:5432 \
postgres-17-parade-vchord3. Finalize Database Setup
Once the container is running, connect to it using a psql client and run the following commands to optimize it for queries.
-- Connect to the local vector database, for example:
-- psql -h localhost -p 5432 -U postgres -d localvector
-- (The password is "postgres.1234")
-- Inside psql:
ALTER SYSTEM SET vchordrq.probes = 100;After running ALTER SYSTEM, you must restart the Docker container for the change to take effect.
Your local Wikipedia data source is now ready! Ensure your .env file's DATABASE_URL points to this local database instance to use it.
- Follow PEP 8 for code style
- Use type hints for all function parameters and return values
- Import organization: Use
isortandblackfor formatting - Line length: Maximum 88 characters (Black default)
- Naming conventions:
snake_casefor functions/variables,PascalCasefor classes,UPPER_CASEfor constants
- Use TypeScript strict mode
- Follow ESLint configuration
- Prefer interfaces over types for object shapes
- Use meaningful names for variables and functions
- Organize imports: External libraries first, then internal modules
- Write self-documenting code with clear names
- Keep functions small and focused on single responsibilities
- Use consistent error handling patterns
- Add logging for debugging and monitoring
- Write tests for critical functionality
Pre-commit hooks automatically enforce code quality and consistency before each commit.
# First, copy the OS-specific configuration file
# On Linux/macOS:
cp .pre-commit-config.yaml.linux .pre-commit-config.yaml
# On Windows:
copy .pre-commit-config.yaml.windows .pre-commit-config.yaml
# Automatic setup
python scripts/setup-pre-commit.py
# Manual verification
pre-commit run --all-files- Code Formatting:
black,isort,autoflake(Python) +ESLint(TypeScript) - Code Quality:
ruff(Python linting) +mypy(type checking) - File Standards: Whitespace, newlines, YAML validation, large file detection
# Run all hooks manually
pre-commit run --all-files
# Skip hooks in emergency (use sparingly)
git commit --no-verify -m "Emergency fix"
# Fix most Python issues automatically
ruff --fix .- Minimalist Approach - Add comments only when necessary
- Business Logic Priority - Explain "why" rather than "what"
- Type System Optimization - Leverage type systems to avoid redundant documentation
"""
[Module Name] for [primary purpose].
Architecture: [Show relationships with other components]
Key Features: [Brief list of main features]
"""class ClassName:
"""
[Concise description].
"""# For complex business logic - explain the approach
async def calculate_relevance_score(event: TimelineEvent, viewpoint: str) -> float:
"""Calculate relevance using semantic similarity and temporal proximity."""FIXME:Issues requiring urgent attentionTODO:Planned improvements or refactoringHACK:Temporary workarounds with known limitationsPERF:Performance optimization opportunities
- Fork the repository and create a feature branch
- Set up pre-commit hooks:
python scripts/setup-pre-commit.py - Make your changes following the coding standards
- Add tests for new functionality
- Update documentation as needed
- Commit your changes (pre-commit hooks run automatically)
- Create a pull request with a clear description
A great way to start is by looking for issues tagged with good first issue or help wanted. These are tasks that we've identified as good entry points into the project.
- Use descriptive titles that explain what the PR does
- Include a detailed description of changes made
- Reference related issues using GitHub's linking syntax
- Ensure all tests pass and documentation is updated
- Request reviews from relevant maintainers
- Use
pytestfor unit and integration tests - Maintain test coverage above 80%
- Include tests for error conditions and edge cases
- Use fixtures for common test data
- Use
JestandReact Testing Libraryfor frontend tests - Test component behavior, not implementation details
- Include accessibility tests where applicable
- Mock external dependencies appropriately
Follow the Conventional Commits specification:
<type>[optional scope]: <description>
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changesrefactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks
feat(timeline): add event filtering by date range
fix(api): handle missing viewpoint_id in task creation
docs(readme): update installation instructions
Before creating an issue, please check if a similar one already exists. We provide several issue templates to help you structure your report. Please use the appropriate template for bug reports, feature requests, or questions to ensure we have all the necessary information.
- Code formatting and linting
- Type checking
- File standards validation
- Security checks
- Test coverage >80%
- Performance validation
- Documentation quality
- Integration testing
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For general questions and discussions
| File | Purpose |
|---|---|
.pre-commit-config.yaml |
Pre-commit hooks configuration |
pyproject.toml |
Python project configuration |
frontend/eslint.config.js |
Frontend linting rules |
config.env.example |
Environment variables template |
We've replaced pylint with ruff for better performance and modern Python development.
By contributing to Common Chronicle, you agree that your contributions will be licensed under the Apache License 2.0.
Thank you for contributing to Common Chronicle! Your efforts help make historical research more accessible to everyone. 🏛️✨