Skip to content

Commit 10461fa

Browse files
committed
refactor: structured project, internationalized code, and relocated data
0 parents  commit 10461fa

152 files changed

Lines changed: 46268 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# =============================================================================
2+
# THETA Environment Configuration
3+
# =============================================================================
4+
# Copy this file to .env and modify as needed.
5+
# All paths are relative to PROJECT_ROOT unless specified as absolute paths.
6+
#
7+
# Usage:
8+
# cp .env.example .env
9+
# # Edit .env with your paths
10+
# =============================================================================
11+
12+
# =============================================================================
13+
# Core Directories (usually no need to change)
14+
# =============================================================================
15+
16+
# Project root directory (auto-detected from script location)
17+
# Uncomment and set only if you need to override auto-detection
18+
# PROJECT_ROOT=/path/to/THETA
19+
20+
# Source directory (contains models and embedding modules)
21+
# SRC_DIR=${PROJECT_ROOT}/src
22+
23+
# Models module directory (topic modeling algorithms)
24+
# MODELS_DIR=${SRC_DIR}/models
25+
26+
# Embedding module directory
27+
# EMBEDDING_DIR=${SRC_DIR}/embedding
28+
29+
# Agent module directory
30+
# AGENT_DIR=${PROJECT_ROOT}/agent
31+
32+
# =============================================================================
33+
# Data Directories
34+
# =============================================================================
35+
36+
# Workspace directory for user data
37+
# WORKSPACE_DIR=${PROJECT_ROOT}/workspace
38+
39+
# Data directory (cleaned datasets)
40+
# DATA_DIR=${WORKSPACE_DIR}/data
41+
42+
# Raw data directory
43+
# RAW_DATA_DIR=${DATA_DIR}/raw_data
44+
45+
# =============================================================================
46+
# Output Directories
47+
# =============================================================================
48+
49+
# Result directory (model outputs, embeddings, BOW matrices, etc.)
50+
# RESULT_DIR=${PROJECT_ROOT}/result
51+
52+
# HuggingFace cache directory
53+
# HF_CACHE_DIR=${PROJECT_ROOT}/hf_cache
54+
55+
# =============================================================================
56+
# Model Directories
57+
# =============================================================================
58+
59+
# Base directory for embedding models
60+
# EMBEDDING_MODELS_DIR=${PROJECT_ROOT}/embedding_models
61+
62+
# Qwen embedding model paths (by size)
63+
# QWEN_MODEL_0_6B=${EMBEDDING_MODELS_DIR}/qwen3_embedding_0.6B
64+
# QWEN_MODEL_4B=${EMBEDDING_MODELS_DIR}/qwen3_embedding_4B
65+
# QWEN_MODEL_8B=${EMBEDDING_MODELS_DIR}/qwen3_embedding_8B
66+
67+
# SBERT model path (for baseline models like CTM)
68+
# SBERT_MODEL_PATH=${ETM_DIR}/model/baselines/sbert/sentence-transformers/all-MiniLM-L6-v2
69+
70+
# =============================================================================
71+
# Agent Configuration (for LLM-based analysis)
72+
# =============================================================================
73+
74+
# OpenAI API configuration (for agent features)
75+
# OPENAI_API_KEY=your-api-key-here
76+
# OPENAI_API_BASE=https://api.openai.com/v1
77+
78+
# Agent API server configuration
79+
# API_HOST=0.0.0.0
80+
# API_PORT=8000
81+
82+
# =============================================================================
83+
# GPU Configuration
84+
# =============================================================================
85+
86+
# Default GPU device ID
87+
# CUDA_VISIBLE_DEVICES=0
88+
89+
# =============================================================================
90+
# Logging
91+
# =============================================================================
92+
93+
# Log level: DEBUG, INFO, WARNING, ERROR
94+
# LOG_LEVEL=INFO

.github/workflows/deploy-docs.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Deploy Documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- 'doc/**'
9+
- 'mkdocs.yml'
10+
- 'mkdocs.zh.yml'
11+
- 'docs-requirements.txt'
12+
workflow_dispatch:
13+
14+
permissions:
15+
contents: write
16+
17+
jobs:
18+
deploy:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
with:
23+
fetch-depth: 0
24+
25+
- name: Setup Python
26+
uses: actions/setup-python@v5
27+
with:
28+
python-version: '3.x'
29+
30+
- name: Cache pip
31+
uses: actions/cache@v4
32+
with:
33+
path: ~/.cache/pip
34+
key: ${{ runner.os }}-pip-${{ hashFiles('docs-requirements.txt') }}
35+
36+
- name: Install dependencies
37+
run: pip install -r docs-requirements.txt
38+
39+
- name: Deploy to GitHub Pages
40+
run: mkdocs gh-deploy --force

.gitignore

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.env
2+
__pycache__/
3+
*.py[cod]
4+
*.class
5+
workspace/*
6+
!workspace/.gitkeep
7+
result/*
8+
!result/.gitkeep
9+
data/*
10+
!data/.gitkeep
11+
*.log
12+
.DS_Store

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 CodeSoul.co
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
 (0)