whatenc

Text encoding type classifier.

whatenc is a command-line tool that identifies the encoding or transformation of a given string or file.

The model is trained on text samples from the English, Greek, Russian, Hebrew, and Arabic Wikipedia corpora, chosen to represent a diverse set of writing systems (Latin, Greek, Cyrillic, Hebrew, and Arabic scripts). Each line is encoded using multiple encoding schemes to generate labeled examples.

How It Works

whatenc uses a character-level 1D Convolutional Neural Network trained directly on bigram token sequences.

Each training sample is represented as:

bigram of characters, padded to a fixed maximum length
a true length scalar feature, allowing the network to learn relative string lengths

This neural approach achieves near-perfect classification accuracy after only a few epochs.

Supported Encodings

whatenc currently recognizes the following formats and transformations:

Category	Encodings
Base encodings	`base32`, `base64`, `base85`, `hex`, `url`
Text ciphers	`morse`
Compression	`gzip64`
Hash digests	`md5`, `sha1`, `sha224`, `sha256`, `sha384`, `sha512`

Installation

You can install whatenc using pipx:

pipx install whatenc

Usage

API

from whatenc import Classifier

classifier = Classifier()
print(classifier.predict("hello, world!")) # returns: [('plain', 1.0), ('md5', 7.686760500681856e-26), ('base85', 2.864714171264974e-35)]

CLI

whatenc hello
whatenc samples.txt

Examples

[+] input: ZW5jb2RlIHRvIGJhc2U2NCBmb3JtYXQ=
   [~] top guess   = base64
      [=] base64   = 1.000
      [=] base85   = 0.000
      [=] plain    = 0.000

[+] input: hello
   [~] top guess   = plain
      [=] plain    = 1.000
      [=] md5      = 0.000
      [=] base64   = 0.000

[*] loading model
[+] input: האקדמיה ללשון העברית
   [~] top guess   = plain
      [=] plain    = 1.000
      [=] base64   = 0.000
      [=] base85   = 0.000

[*] loading model
[+] input: bfa99df33b137bc8fb5f5407d7e58da8
   [~] top guess   = md5
      [=] md5      = 0.999
      [=] sha1     = 0.001
      [=] sha224   = 0.000

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
training		training
whatenc		whatenc
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whatenc

How It Works

Supported Encodings

Installation

Usage

API

CLI

Examples

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whatenc

How It Works

Supported Encodings

Installation

Usage

API

CLI

Examples

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages