Skip to content

deadcode-walker/forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latest release   C++20   MIT License

Linux   Windows

Forge

High-performance data toolkit and text editor for the command line.
Built for people who tell their LLM to do the work. 60+ commands for CSV, TSV, JSON, JSONL, and plain text.

Overview  •  Data Commands  •  Text Commands  •  Pipeline  •  Expressions  •  Performance  •  Install  •  Build  •  License


Overview

Forge is a single-binary CLI tool for structured data processing and plain-text file editing. One tool, 60+ commands, zero runtime dependencies beyond libc.

  • Data processing: filter, sort, join, group, pivot, deduplicate CSV/TSV/JSON/JSONL at millions of rows per second
  • Text editing: cat, grep, sed, insert, delete, patch -- everything an LLM or script needs to read and modify text files
  • Composable pipeline: chain any operations in a single pass with forge pipe
  • Expression engine: arithmetic, comparisons, regex, conditionals, 20+ built-in functions

Built in C++20 with memory-mapped I/O, parallel sort via Intel TBB, and xxhash-based deduplication.


Data Commands

Inspection info head tail schema count sample freq stats describe
Column Ops select drop rename reorder mergecols splitcol addcol
Row Ops filter sort reverse slice shuffle unique dedup isolate
Transforms lower upper trim replace fill transform
Analysis groupby enumerate derive
Reshape pivot unpivot coalesce
Multi-File diff merge concat join intersect subtract
Export export split validate

Text Commands

Plain-text file operations for LLM toolchains and scripting. All support stdin via -.

Inspect cat wc grep
Edit sed insert delete patch prepend append
Transform lines
# Print file with line numbers
forge cat src/main.cpp --line-numbers --range 10:25

# Search with context
forge grep src/main.cpp -e "TODO|FIXME" -n 2 --line-numbers

# Find and replace
forge sed config.txt -f "localhost" -r "prod.example.com" -o config.txt

# Replace lines 15-20 with new content
forge patch src/main.cpp --range 15:20 -v "    return 0;\n}" -o src/main.cpp

# Remove empty lines, trim whitespace, deduplicate
forge lines notes.txt -o clean.txt --nonempty --trim --unique

Pipeline

Chain operations in a single pass. No intermediate files.

forge pipe data.csv \
  subtract breach.csv --on email \
  filter "salary > 50000" \
  derive "salary * 12" --name annual \
  select name,email,annual \
  sort annual:desc \
  -o clean.csv

Add --json for machine-readable pipeline summary:

{"input_rows":50000,"input_columns":5,"output_rows":12340,"output_columns":3,"steps":5}

Expressions

The expression engine powers filter, derive, and pipeline steps.

# Filter with conditions
forge filter data.csv -e "salary > 80000 AND department == 'Engineering'" -o out.csv

# Computed columns with arithmetic
forge derive data.csv -e "salary * 12" --name annual -o out.csv
forge derive data.csv -e "round(price * quantity * 1.08, 2)" --name total -o out.csv

# Conditional columns
forge derive data.csv -e "if(salary > 80000, 'senior', 'junior')" --name level -o out.csv

# String functions
forge derive data.csv -e "concat(first, ' ', last)" --name fullname -o out.csv
Arithmetic + - * / %
Comparison == != > < >= <=
Pattern ~ !~ (regex match)
Logic AND OR NOT
String len upper lower trim concat substr replace contains startswith endswith
Numeric abs round floor ceil min max
Control if empty notempty

Install

curl -fsSL https://raw.githubusercontent.com/deadcode-walker/forge/main/install.sh | sh

Installs the latest release as forge-cli to /usr/local/bin. Run it again to update.

Install a specific version:

curl -fsSL https://raw.githubusercontent.com/deadcode-walker/forge/main/install.sh | sh -s v1.3.0

Performance

Benchmarked on AMD Ryzen 9 9950X3D, 562MB CSV (13.5M rows, 15 columns):

Operation Time Throughput
count 0.07s 192M rows/s
head 5 0.02s instant
info 1.2s 10.9M rows/s
select 2 cols 1.4s 9.6M rows/s
filter 2.1s 6.4M rows/s
lower 2.3s 5.9M rows/s
export TSV 1.6s 8.3M rows/s
sort (parallel) 12.9s --
unique (xxhash) 6.3s 2.5M rows/s

Build

Requirements
C++ compiler GCC 12+, Clang 15+, or MSVC 2022+
CMake 3.20+
xxhash latest
TBB Intel Threading Building Blocks

csv-parser v2.5.0 is vendored and requires no separate install.


Linux
# Arch
sudo pacman -S xxhash tbb

# Ubuntu/Debian
sudo apt install libxxhash-dev libtbb-dev

# Build
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

Windows (MSYS2)
pacman -S mingw-w64-x86_64-xxhash mingw-w64-x86_64-tbb
cmake -B build -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)


Usage

# Inspect
forge info data.csv
forge head data.csv -n 20
forge describe data.csv --json

# Filter and sort
forge filter data.csv -e "salary > 80000" -o filtered.csv
forge sort data.csv -c salary:desc -o sorted.csv

# Computed columns
forge derive data.csv -e "salary * 12" --name annual -o enriched.csv

# Group and aggregate
forge groupby data.csv -c department -a salary:mean,salary:count -o summary.csv

# Join
forge join users.csv orders.csv -c user_id --type left -o joined.csv

# Text editing
forge cat src/main.cpp --line-numbers --range 1:50
forge grep src/ -e "TODO" --count
forge patch config.yaml --range 12:14 -v "port: 8080\nhost: 0.0.0.0" -o config.yaml

# Pipeline: chain everything
forge pipe data.csv \
  filter "status == 'active'" \
  derive "price * qty" --name total \
  groupby region -a total:sum \
  sort total_sum:desc \
  -o report.csv

# Typo? Forge suggests the right command
forge fliter data.csv
# error: unknown command 'fliter' (did you mean 'filter'?)

License

MIT

About

High-performance CSV/TSV/JSON data toolkit. Unified CLI + GUI binary. Memory-mapped I/O, parallel sort, expression engine, 7-12M rows/sec.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages