IP Knowledge Layer

Open IP enrichment knowledge layer for cloud infrastructure, crawler networks, Tor, ASN attribution, and VPN-adjacent network intelligence.

The repository publishes normalized machine-readable datasets intended for SIEM pipelines, fraud systems, enrichment services, gateways, analytics stacks, and operational network tooling.

Primary outputs:

ip-knowledge.jsonl
ip-knowledge.csv
cloud-prefixes.csv
asn-signals.csv
cidr-tags.txt

Overview

Most public IP datasets focus on a single domain:

cloud ranges
Tor exits
crawler infrastructure
ASN ownership
VPN signals

IP Knowledge Layer consolidates those signals into a unified enrichment layer with normalized metadata, provider attribution, confidence scoring, and source provenance.

The goal is operational context.

CIDR / ASN
    -> layer
    -> provider
    -> service
    -> tags
    -> confidence
    -> source

Instead of only identifying a prefix, consumers can classify infrastructure characteristics and attach explainable metadata to network events.

Current Dataset Snapshot

Metric	Value
Records	113,349
Prefix records	111,419
ASN signals	1,930
Sources	12
Collector errors	0

Layer Distribution

Layer	Records
`hosting-cloud`	97,973
`anonymity`	11,615
`asn-signal`	1,930
`crawler-bot`	1,831

Top Providers

Provider	Records
Azure	73,422
AWS	15,675
Tor	11,615
GitHub	6,677
Oracle Cloud	1,078

Architecture

                    Public Sources
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
   Cloud Ranges      Crawler Feeds       Tor Signals
        │                  │                  │
        └──────────────┬───┴──────────────────┘
                       ▼
              Normalization Layer
              CIDR + metadata merge
                       ▼
               Attribution Engine
            provider / tags / confidence
                       ▼
                 Export Pipeline
        JSONL / CSV / TXT / summaries
                       ▼
              Operational Consumers
      SIEM / WAF / Fraud / Analytics

Layers

`hosting-cloud`

Official cloud, CDN, edge, and developer-platform infrastructure ranges.

Providers currently include:

AWS
Azure
Google Cloud
Cloudflare
Fastly
GitHub
Oracle Cloud

`crawler-bot`

Crawler, AI bot, monitoring, scanner, SEO, and preview infrastructure derived from:

CrawlerScope

`anonymity`

Tor relay and exit infrastructure derived from:

Tor-Radar

`asn-signal`

ASN-level VPN-adjacent aggregate attribution.

This layer intentionally publishes ASN evidence only, not raw VPN endpoint inventories.

Files

File	Description
`ip-knowledge.jsonl`	Full normalized enrichment layer
`ip-knowledge.csv`	Tabular export for analytics/SIEM tooling
`cloud-prefixes.csv`	Cloud/CDN/developer platform prefixes
`asn-signals.csv`	ASN-level VPN-adjacent signals
`cidr-tags.txt`	Lightweight CIDR-to-tags feed
`summary.json`	Build metadata and aggregate statistics
`source-index.json`	Source inventory and provenance

Download

BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"

curl -fsSLO "$BASE/ip-knowledge.jsonl"
curl -fsSLO "$BASE/cloud-prefixes.csv"
curl -fsSLO "$BASE/asn-signals.csv"
curl -fsSLO "$BASE/cidr-tags.txt"

Record Format

Example JSONL record:

{
  "prefix": "104.16.0.0/13",
  "layer": "hosting-cloud",
  "provider": "Cloudflare",
  "service": "edge",
  "tags": [
    "cdn",
    "edge",
    "proxy"
  ],
  "confidence": 0.99,
  "source_id": "cloudflare-v4"
}

Usage Examples

Extract Cloudflare prefixes

curl -fsSL "$BASE/cloud-prefixes.csv" \
  | awk -F, '$3 == "Cloudflare" { print }'

Extract Tor exits

curl -fsSL "$BASE/ip-knowledge.jsonl" \
  | jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'

Extract AI crawler infrastructure

curl -fsSL "$BASE/ip-knowledge.jsonl" \
  | jq -r 'select(.tags | index("ai-crawler")) | .prefix'

Find ASN signals for a provider

curl -fsSL "$BASE/asn-signals.csv" \
  | awk -F, '$3 == "NordVPN" { print }'

Operational Use Cases

Domain	Usage
Fraud Detection	VPN/Tor/datacenter scoring
SIEM Enrichment	Infrastructure attribution
WAF Pipelines	Cloud and crawler classification
Threat Hunting	Network context correlation
Bot Management	AI crawler visibility
Internal Analytics	Infrastructure intelligence

Local Update

python3 scripts/update.py

Preferred local enrichment sources:

../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv

If local datasets are unavailable, the collector falls back to public upstream sources.

GitHub Actions

Dataset builds run every 6 hours.

.github/workflows/ip-knowledge-layer.yml

Only current datasets are stored in full. Historical snapshots remain compact to avoid repository growth.

Notes

CIDRs are preserved without full IPv4 expansion
Overlapping provider ranges are intentionally retained
Confidence reflects source reliability, not maliciousness
ASN VPN signals are aggregate indicators, not endpoint dumps
The project avoids mass RDAP/WHOIS crawling during CI builds

Roadmap

Planned additions:

ASN rollup datasets
Prefix overlap analysis
Historical diff exports
Provider metadata index
Compressed ASN-to-prefix layers
Confidence weighting improvements

License

CC0-1.0. See LICENSE.

Disclaimer

This repository publishes operational network enrichment data derived from public and derived infrastructure sources. Consumers are responsible for validating suitability within their own environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP Knowledge Layer

Overview

Current Dataset Snapshot

Layer Distribution

Top Providers

Architecture

Layers

`hosting-cloud`

`crawler-bot`

`anonymity`

`asn-signal`

Files

Download

Record Format

Usage Examples

Extract Cloudflare prefixes

Extract Tor exits

Extract AI crawler infrastructure

Find ASN signals for a provider

Operational Use Cases

Local Update

GitHub Actions

Notes

Roadmap

License

Disclaimer

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

IP Knowledge Layer

Overview

Current Dataset Snapshot

Layer Distribution

Top Providers

Architecture

Layers

hosting-cloud

crawler-bot

anonymity

asn-signal

Files

Download

Record Format

Usage Examples

Extract Cloudflare prefixes

Extract Tor exits

Extract AI crawler infrastructure

Find ASN signals for a provider

Operational Use Cases

Local Update

GitHub Actions

Notes

Roadmap

License

Disclaimer

`hosting-cloud`

`crawler-bot`

`anonymity`

`asn-signal`