Skip to content

philterd/go-phileas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-phileas

A Golang port of the Phileas Java library for deidentifying and redacting PII, PHI, and other sensitive information from text.

Overview

go-phileas analyzes text searching for sensitive information and can manipulate it in a variety of ways. It uses policies (defined in JSON or YAML) to configure what types of sensitive information to find and how to handle it when found.

Compatibility Notes

Note that this port of Phileas is not 1:1 with the Java version. There are some differences:

  • This project includes support for policies in YAML as well as JSON.
  • This project does not include all redaction strategies present in the Java version.
  • This project includes a CLI.
  • This project does not include support for PDF documents which is present in the Java version.

There is also a phileas-python which is a Python port of the Java version.

Supported Sensitive Information Types

  • Ages (e.g., "45 years old", "aged 30", "61 y/o")
  • Bank Routing Numbers
  • Bitcoin Addresses
  • Credit Card Numbers (Visa, MasterCard, American Express, Diners Club, Discover, JCB)
  • Custom Dictionaries (inline word lists or file-based)
  • Dates (multiple formats: MM/DD/YYYY, YYYY-MM-DD, Month DD YYYY, etc.)
  • Driver's License Numbers
  • Email Addresses
  • IBAN Codes
  • IP Addresses (IPv4 and IPv6)
  • MAC Addresses
  • Passport Numbers
  • Phone Numbers (US and international)
  • Social Security Numbers (SSN) and Taxpayer Identification Numbers (TIN)
  • Tracking Numbers (UPS, FedEx, USPS)
  • URLs
  • Vehicle Identification Numbers (VINs)
  • ZIP Codes
  • Named Entities (persons, locations, organizations, etc. via the ph-eye NER service)

Installation

go get github.com/philterd/go-phileas

Usage

Filtering with a Policy Struct

package main

import (
    "fmt"
    "github.com/philterd/go-phileas/pkg/policy"
    "github.com/philterd/go-phileas/pkg/services"
)

func main() {
    pol := &policy.Policy{
        Name: "my-policy",
        Identifiers: policy.Identifiers{
            SSN: &policy.SSNFilter{
                SSNFilterStrategies: []policy.FilterStrategy{
                    {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
                },
            },
            EmailAddress: &policy.EmailAddressFilter{},
        },
    }

    svc, err := services.NewFilterService(pol)
    if err != nil {
        panic(err)
    }
    result, err := svc.Filter(pol, "my-context", "My SSN is 123-45-6789 and email is john@example.com.")
    if err != nil {
        panic(err)
    }

    fmt.Println(result.FilteredText)
    // Output: My SSN is {{{REDACTED-ssn}}} and email is {{{REDACTED-email-address}}}.

    for _, span := range result.Spans {
        fmt.Printf("Found %s: %q at position %d-%d\n",
            span.FilterType, span.Text, span.CharacterStart, span.CharacterEnd)
    }
}

The second argument to Filter is the context name. All Filter calls sharing the same context name use the same token→replacement store, ensuring that the same PII value always receives the same replacement within a context (referential integrity). The default in-memory store is created automatically by NewFilterService. To supply a custom store (e.g., Redis for multi-process deployments) use NewFilterServiceWithContext.

Filtering from JSON Policy

package main

import (
    "fmt"
    "github.com/philterd/go-phileas/pkg/services"
)

func main() {
    policyJSON := `{
        "identifiers": {
            "age": {
                "ageFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
            }
        }
    }`

    result, err := services.FilterJSON(policyJSON, "context", "The patient is 45 years old.")
    if err != nil {
        panic(err)
    }

    fmt.Println(result.FilteredText)
    // Output: The patient is {{{REDACTED-age}}}.
}

Explaining (inspecting spans without redacting)

Use Explain when you want to see what would be detected without modifying the text:

package main

import (
    "fmt"
    "github.com/philterd/go-phileas/pkg/policy"
    "github.com/philterd/go-phileas/pkg/services"
)

func main() {
    pol := &policy.Policy{
        Name: "my-policy",
        Identifiers: policy.Identifiers{
            SSN:          &policy.SSNFilter{},
            EmailAddress: &policy.EmailAddressFilter{},
        },
    }

    svc, err := services.NewFilterService(pol)
    if err != nil {
        panic(err)
    }
    spans, err := svc.Explain(pol, "my-context", "My SSN is 123-45-6789 and email is john@example.com.")
    if err != nil {
        panic(err)
    }

    for _, span := range spans {
        fmt.Printf("Found %s: %q at position %d-%d\n",
            span.FilterType, span.Text, span.CharacterStart, span.CharacterEnd)
    }
    // Output:
    // Found ssn: "123-45-6789" at position 10-21
    // Found email-address: "john@example.com" at position 35-51
}

Available Filter Strategies

Strategy Description
REDACT Replace with a redaction placeholder (default). Use %t in redactionFormat for the filter type and %v for the original value.
RANDOM_REPLACE Replace with a randomly generated but realistic value of the same type (deterministic per context+value pair for referential integrity).
STATIC_REPLACE Replace with a fixed static value specified in staticReplacement.
CRYPTO_REPLACE Encrypt the sensitive information (requires crypto configuration in the policy).
HASH_SHA256_REPLACE Replace the sensitive information with its SHA-256 hash.
LAST_4 Keep only the last 4 characters of the sensitive information.
MASK Replace each character with a mask character (default: *). Set maskCharacter to use a different character.

Named Entities via ph-eye

go-phileas integrates with ph-eye, a standalone HTTP microservice that runs AI/NLP models for named-entity recognition (NER). This allows go-phileas to detect and redact named entities such as person names, locations, and organizations — types of sensitive information that cannot be reliably caught by regular expressions alone.

How it works

When a policy contains a pheye identifier, go-phileas sends the input text to the configured ph-eye service endpoint (POST /find). The service returns a list of detected entities with their character offsets, labels (e.g. Person), and confidence scores. go-phileas converts those into spans and applies the configured filter strategy (redact, replace, mask, etc.) just like any other identifier.

Because ph-eye is an external service, you need a running ph-eye instance reachable from your application. The default endpoint is http://localhost:18080. Refer to the ph-eye documentation for setup instructions.

Configuration

The phEyeConfiguration object controls the connection to ph-eye:

Field Type Default Description
endpoint string http://localhost:18080 URL of the ph-eye service.
timeout int 600 HTTP timeout in seconds.
labels string Person Comma-separated entity labels to detect (e.g. "Person", "Person,Location").

Additional filter options:

Field Type Default Description
phEyeFilterStrategies []FilterStrategy REDACT How to handle identified spans.
removePunctuation bool false Strip punctuation before sending text to ph-eye.
bearerToken string Bearer token for authenticating with ph-eye.
ignored []string Terms to skip, compared case-insensitively.
enabled bool true Set to false to disable without removing from the policy.

Usage

Unlike the regex-based identifiers, pheye is a list — you can configure multiple ph-eye instances in one policy (e.g. to target different models or endpoints).

Go struct

pol := &policy.Policy{
    Name: "my-policy",
    Identifiers: policy.Identifiers{
        PhEye: []policy.PhEyeFilter{
            {
                PhEyeConfiguration: policy.PhEyeConfiguration{
                    Endpoint: "http://localhost:18080",
                    Labels:   "Person",
                },
                PhEyeFilterStrategies: []policy.FilterStrategy{
                    {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
                },
            },
        },
    },
}

svc, err := services.NewFilterService(pol)
if err != nil {
    panic(err)
}
result, err := svc.Filter(pol, "context", "George Washington was the first president.")
// result.FilteredText → "{{{REDACTED-pheye}}} was the first president."

JSON policy

{
  "identifiers": {
    "pheye": [
      {
        "phEyeConfiguration": {
          "endpoint": "http://localhost:18080",
          "labels": "Person"
        },
        "phEyeFilterStrategies": [
          {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}
        ]
      }
    ]
  }
}

For more details, see the identifiers reference.

Custom Dictionary Filter

go-phileas supports custom dictionary filters for redacting specific words or phrases. Terms can be provided inline or loaded from a file (one word per line). A policy can contain multiple dictionary filters, each with its own word list and strategy.

Go struct

pol := &policy.Policy{
    Name: "my-policy",
    Identifiers: policy.Identifiers{
        Dictionaries: []policy.DictionaryFilter{
            {
                Terms: []string{"Alice", "Bob", "Acme Corp"},
                DictionaryFilterStrategies: []policy.FilterStrategy{
                    {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
                },
            },
        },
    },
}

svc, err := services.NewFilterService(pol)
if err != nil {
    panic(err)
}
result, err := svc.Filter(pol, "context", "Alice and Bob work at Acme Corp.")
// result.FilteredText → "{{{REDACTED-custom-dictionary}}} and {{{REDACTED-custom-dictionary}}} work at {{{REDACTED-custom-dictionary}}}."

JSON policy

{
  "identifiers": {
    "dictionaries": [
      {
        "terms": ["Alice", "Bob", "Acme Corp"],
        "dictionaryFilterStrategies": [
          {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}
        ]
      }
    ]
  }
}

To load words from a file, use the files field:

{
  "identifiers": {
    "dictionaries": [
      {
        "files": ["/etc/phileas/sensitive-names.txt"],
        "dictionaryFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[NAME REMOVED]"}]
      }
    ]
  }
}

For more details, see the identifiers reference.

Policy JSON Format

{
  "identifiers": {
    "ssn": {
      "ssnFilterStrategies": [{
        "strategy": "REDACT",
        "redactionFormat": "{{{REDACTED-%t}}}"
      }]
    },
    "emailAddress": {
      "emailAddressFilterStrategies": [{
        "strategy": "STATIC_REPLACE",
        "staticReplacement": "[EMAIL REMOVED]"
      }]
    },
    "ipAddress": {},
    "phoneNumber": {},
    "creditCard": {},
    "date": {},
    "age": {},
    "url": {
      "requireHttpWwwPrefix": true
    },
    "zipCode": {
      "requireDelimiter": false
    },
    "dictionaries": [
      {
        "terms": ["Alice", "Bob"],
        "dictionaryFilterStrategies": [{"strategy": "REDACT"}]
      }
    ]
  }
}

CLI

go-phileas includes a command-line tool, phileas, that redacts text from the command line.

Build

make build-cli

Or directly with go build:

go build -o phileas ./cmd/phileas

Usage

phileas --policy <policy.json> --input <input.txt> [--context <context>]
phileas --policy <policy.json> --input <input.txt> --evaluate --spans <spans.json> [--context <context>]
Flag Required Description
--policy Yes Path to the JSON policy file
--input Yes Path to the input text file to redact
--context No Context name to associate with the filter operation. If omitted, context checks are skipped.
--evaluate No Enable evaluation mode — prints precision, recall, and F1 instead of redacted text
--spans When --evaluate is set Path to a JSON file containing ground-truth spans

The redacted text is written to standard output. Errors are written to standard error and the process exits with a non-zero status.

Example

Given a policy file policy.json:

{
  "identifiers": {
    "ssn": {
      "ssnFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
    },
    "emailAddress": {
      "emailAddressFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[EMAIL]"}]
    }
  }
}

And an input file input.txt:

My SSN is 123-45-6789 and my email is john@example.com.

Run:

phileas --policy policy.json --input input.txt

Output:

My SSN is {{{REDACTED-ssn}}} and my email is [EMAIL].

Evaluating performance

Use --evaluate with --spans to measure how well a policy detects sensitive information against a set of labeled ground-truth spans.

The --spans file is a JSON array of span objects with characterStart and characterEnd fields:

[
  {"characterStart": 10, "characterEnd": 21},
  {"characterStart": 38, "characterEnd": 54}
]

Run:

phileas --policy policy.json --input input.txt --evaluate --spans spans.json

Output:

True Positives:  2
False Positives: 0
False Negatives: 0
Precision:       1.0000
Recall:          1.0000
F1:              1.0000

See CLI documentation for full details.

Building

go build ./...

Testing

go test ./...

License

Copyright 2026 Philterd, LLC.

Licensed under the Apache License, Version 2.0. See LICENSE for details.

"Phileas" and "Philter" are registered trademarks of Philterd, LLC.

This project is a Go port of Phileas, which is also Apache-2.0 licensed.

About

A library to deidentify and redact PII, PHI, and other sensitive information from text.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors