Skip to content

Repository Access Authentication

github-actions[bot] edited this page Mar 14, 2026 · 1 revision

Repository Access & Authentication

This page documents how wikigen authenticates with GitHub and accesses repositories using either SSH or Personal Access Token (PAT) authentication. It covers the git clone implementation, token substitution mechanisms, optimization techniques, and security considerations for credential handling.

Authentication Methods

wikigen supports two mutually exclusive authentication methods for cloning repositories: SSH (default) and GitHub Personal Access Token (PAT). The choice of authentication method is configured via the -token CLI flag or GITHUB_TOKEN environment variable.

SSH Authentication (Default)

SSH authentication is the default method used when no PAT token is provided. In this mode, wikigen converts HTTPS GitHub URLs to SSH URLs and relies on a pre-configured SSH key registered with GitHub.

SSH URL Format:

git@github.com:{owner}/{repo}.git

SSH authentication requires:

  • A valid SSH key pair
  • The public key registered in GitHub account settings (Settings → SSH and GPG keys)
  • SSH agent running (or SSH key in ~/.ssh/id_rsa)

The conversion from HTTPS to SSH occurs in the gitClone function (main.go:160-164). When no token is provided, any HTTPS GitHub URLs are replaced with the SSH equivalent format and .git suffix is appended if missing.

Sources: wikigen/main.go:160-164

PAT Authentication (GitHub Personal Access Token)

PAT authentication is used when the -token flag is provided or GITHUB_TOKEN environment variable is set. This method is useful in environments where SSH key configuration is impractical, such as CI/CD pipelines or containerized environments.

HTTPS URL Format with Token Substitution:

https://{PAT}@github.com/{owner}/{repo}.git

To use PAT authentication:

  1. Create a GitHub Personal Access Token:

    • Navigate to GitHub → Settings → Developer settings → Personal access tokens
    • Click "Generate new token"
    • Assign minimum required scopes: repo (full control of private repositories) or public_repo (for public repositories only)
    • Copy the token value
  2. Provide token to wikigen:

    # Via command-line flag
    ./wikigen -token ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx owner/repo
    
    # Via environment variable
    export GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
    ./wikigen owner/repo
    
    # Via .env file
    # .env
    GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

The token is substituted into the HTTPS URL at git clone time via string replacement in the gitClone function (main.go:158). The original https:// scheme is replaced with https://{token}@ before the clone operation.

Sources: wikigen/main.go:156-158, .env.example:1-2

Git Clone Implementation

The gitClone function implements the repository cloning logic with support for both authentication methods, incremental updates via git pull, and clone optimization.

Clone Flow

flowchart TD
    A["gitClone called<br/>repo_url, token, dest_dir"] --> B{"Repo exists<br/>at dest_dir?"}
    B -->|Yes| C["git pull --ff-only<br/>Update existing"]
    B -->|No| D{"Token<br/>provided?"}
    D -->|Yes| E["HTTPS Clone<br/>with PAT"]
    D -->|No| F["SSH Clone<br/>Convert URL"]
    E --> G["git clone --depth=1<br/>--single-branch<br/>Clone URL"]
    F --> G
    G --> H["Return"]
    C --> H
Loading

Existing Repository Updates

If the destination directory already contains a .git directory, wikigen updates the existing repository using git pull --ff-only instead of re-cloning. This enables incremental updates without full re-download and preserves any local state.

Sources: wikigen/main.go:147-152

Fresh Clone

For new repositories, wikigen performs a full clone with authentication method selection and optimization flags applied.

Default Behavior:

  1. If no token is provided, convert HTTPS URL to SSH format (main.go:160-164)
  2. If token is provided, substitute token into HTTPS URL (main.go:156-158)
  3. Execute git clone with optimization flags (main.go:167)

Sources: wikigen/main.go:147-171

Clone Command Optimization

wikigen executes git clone with two optimization flags:

Flag Purpose Effect
--depth=1 Shallow clone Downloads only the latest commit, reducing bandwidth and storage
--single-branch Single branch clone Fetches only the default branch (typically main/master), not all branches

Combined, these flags reduce clone time and disk usage to a minimum while preserving full source code access needed for documentation generation.

Sources: wikigen/main.go:167

Token Handling and Environment Configuration

Token Configuration Precedence

wikigen respects the following precedence order for the -token flag (from highest to lowest priority):

  1. CLI flag: -token command-line argument
  2. Environment variable: GITHUB_TOKEN system environment variable
  3. .env file: GITHUB_TOKEN in .env or .env.local in the current working directory
  4. Default: Empty string (falls back to SSH authentication)

The .env file is automatically loaded at startup (main.go:719-739) before CLI flags are parsed.

Sources: wikigen/main.go:908, wikigen/main.go:719-739

Environment Variables

# GitHub Personal Access Token (empty = use SSH)
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Optional: Can also be set via CLI
# ./wikigen -token ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx owner/repo

Sources: .env.example:1-2

Security Considerations

Token Storage and Exposure Risk

Risks with Token Handling:

  • Plaintext Storage: Tokens stored in .env files are in plaintext. Never commit .env files to version control.
  • Environment Variable Exposure: Environment variables may be visible in process listings or CI/CD logs if not properly masked.
  • URL Logging: Git clone URLs containing tokens may appear in logs.

Mitigation Strategies:

  1. CI/CD Environments: Use secrets management provided by your CI/CD platform

    # GitHub Actions example
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

    Sources: README.md:259-265

  2. Local Development: Restrict file permissions

    chmod 600 .env
  3. Token Scope: Grant minimum required scopes when creating PAT

    • public_repo for public repositories only
    • repo for private repositories
  4. Token Rotation: Periodically regenerate tokens

    • GitHub recommends rotating every 90 days
    • OAuth tokens expire after approximately 1 year Sources: README.md:278
  5. SSH Preferred: Use SSH authentication in environments where SSH keys can be managed securely

    • SSH keys are not transmitted in URLs
    • No secrets in environment variables or .env files

Repository Input Validation

wikigen validates all repository inputs before cloning to prevent various attack vectors.

Validation Rules:

  1. Format Validation: Repository must match owner/repo pattern
  2. Path Traversal Prevention: Rejects paths containing ..
  3. Shell Injection Prevention: Rejects special shell characters (;, &, |, `, $, (, ), {, }, [, ], !, ~)

The validation regex pattern (main.go:79) strictly enforces alphanumeric characters, dots, underscores, and hyphens:

^[a-zA-Z0-9._-]+/[a-zA-Z0-9._-]+$

Sources: wikigen/main.go:79-92

Attack Prevention

flowchart TD
    A["User Input:<br/>owner/repo"] --> B{"Matches<br/>regex?"}
    B -->|No| C["❌ Reject<br/>Invalid format"]
    B -->|Yes| D{"Contains<br/>'..'?"}
    D -->|Yes| E["❌ Reject<br/>Path traversal"]
    D -->|No| F{"Contains<br/>shell chars?"}
    F -->|Yes| G["❌ Reject<br/>Shell injection"]
    F -->|No| H["✅ Accept<br/>Proceed with clone"]
    C --> Z["Return error"]
    E --> Z
    G --> Z
    H --> I["Safe to execute"]
Loading

Prevented Attack Vectors:

Attack Type Example Prevention
Path Traversal ../../../etc/passwd Rejects .. in input
Shell Injection owner/repo; rm -rf / Rejects ;, &, |, etc.
Command Substitution owner/$(whoami) Rejects $, (, )
Glob Expansion owner/repo* Rejects *, ?, [, ]

Token Exposure in CI/CD

When using wikigen in GitHub Actions, follow secure token handling practices:

# ✅ CORRECT: Use secrets management
env:
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: ./wikigen -token "$GITHUB_TOKEN" owner/repo

# ❌ WRONG: Token visible in logs
run: ./wikigen -token ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx owner/repo

GitHub Actions automatically masks secrets in job logs, but tokens should not be written to logs explicitly.

Sources: README.md:259-273

SSH Key Setup

For SSH authentication to function, an SSH key must be configured:

Local Development

  1. Generate SSH key (if not already present):

    ssh-keygen -t ed25519 -C "your_email@example.com"
    # Or for older systems: ssh-keygen -t rsa -b 4096
  2. Start SSH agent:

    eval "$(ssh-agent -s)"
    ssh-add ~/.ssh/id_ed25519
  3. Register public key with GitHub:

    • Copy public key: cat ~/.ssh/id_ed25519.pub
    • GitHub → Settings → SSH and GPG keys → New SSH key
    • Paste key and save
  4. Verify SSH access:

    ssh -T git@github.com
    # Expected: "Hi username! You've successfully authenticated..."

CI/CD Environments (GitHub Actions)

For GitHub Actions, SSH keys should not be used. Instead, use the built-in GITHUB_TOKEN:

env:
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: ./wikigen -token "$GITHUB_TOKEN" owner/repo

Alternatively, configure SSH in Actions via webfactory/ssh-agent@v0.5.4:

- uses: webfactory/ssh-agent@v0.5.4
  with:
    ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
# Then use SSH authentication (omit -token flag)

Clone Optimization Deep Dive

Depth-1 Shallow Clone

The --depth=1 flag performs a shallow clone, downloading only the most recent commit instead of full history:

Benefits:

  • Reduces bandwidth usage by 50-90% for large repositories
  • Reduces disk space usage proportionally
  • Faster clone time

Trade-offs:

  • Cannot access git history before the latest commit
  • Some git operations (e.g., git log --all) are limited

For wikigen's use case (analyzing current source code), this is ideal since documentation generation only requires the latest state.

Impact: A repository with 10,000+ commits clones in seconds instead of minutes.

Sources: wikigen/main.go:167, README.md:18

Single-Branch Clone

The --single-branch flag downloads only the default branch (typically main or master):

Benefits:

  • Reduces bandwidth by eliminating other branches
  • Reduces disk space
  • Faster clone overall

Trade-offs:

  • Other branches are not available locally
  • Cannot analyze branch-specific code

For multi-branch repositories, this flag reduces clone time by 10-40% depending on branch count.

Sources: wikigen/main.go:167

Authentication Flow Diagram

The complete authentication and clone flow:

sequenceDiagram
    participant User
    participant CLI as wikigen CLI
    participant Env as Environment
    participant Git as git command
    participant GitHub as GitHub API

    User->>CLI: ./wikigen -token PAT owner/repo
    CLI->>Env: Load .env file
    Env-->>CLI: Configuration loaded
    CLI->>CLI: Validate owner/repo format
    CLI->>CLI: Check: token provided?
    alt Token Provided
        CLI->>CLI: Construct HTTPS URL<br/>https://PAT@github.com/owner/repo.git
        CLI->>Git: git clone --depth=1<br/>--single-branch HTTPS_URL
    else No Token (SSH)
        CLI->>CLI: Convert to SSH URL<br/>git@github.com:owner/repo.git
        CLI->>Git: git clone --depth=1<br/>--single-branch SSH_URL
    end
    Git->>GitHub: Connect with auth
    GitHub-->>Git: Authentication successful
    Git-->>CLI: Clone complete
    CLI-->>User: Repository ready for analysis
Loading

Related Pages

Clone this wiki locally