research: optimal LLM prompt structure for network traffic analysis

## Background

The story feature currently sends the LLM a list of the top N conversations (sorted by bytes) plus a protocol/category breakdown. The conversation cap (`STORY_MAX_CONVERSATIONS=20`) was questioned as potentially deceptive — the LLM only sees a slice of traffic and has no sense of what it's missing.

A packet-budget approach was considered but rejected: the LLM never sees raw packets anyway (it sees one summary line per conversation), so a packet count is a poor proxy for prompt quality. More importantly, LLMs don't process long repetitive lists well — attention degrades over distance and the model anchors on the first and last entries, treating the middle as noise. Feeding 50 nearly-identical conversation lines may actually produce *worse* narratives than a well-structured 20.

## Core question

**What prompt structure actually helps an LLM reason well about network traffic?**

The hypothesis is that the LLM needs the *shape* of the traffic — not an enumeration of flows. Specifically:

- What protocols/apps dominate (covered)
- What the outliers are — risky, unusual, or anomalous flows (partially covered)
- Aggregate structure: top destinations, unique external hosts, protocol × risk distribution
- A sense of coverage — what fraction of traffic is represented

## Research tasks

- [ ] Review existing literature or blog posts on LLM prompting for structured/tabular data (network logs, security events)
- [ ] Investigate how SIEM/security tools (Splunk, Elastic, Darktrace, etc.) summarise traffic for human analysts — what abstractions do they use?
- [ ] Look at how other network analysis AI tools (if any) structure their prompts
- [ ] Identify what aggregate views would give the LLM the most signal per token:
  - Top-N destination hosts (grouped, not per-flow)
  - Protocol × risk matrix
  - Unique external ASNs/orgs seen
  - Time distribution of traffic (bursty vs. steady)
- [ ] Evaluate whether the current per-conversation line format is optimal or if a different schema would be clearer to the model

## Goal

Produce a concrete recommendation for how to restructure the story prompt — what to add, what to remove or compress, and what the ideal conversation-list size is given typical LLM attention behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: optimal LLM prompt structure for network traffic analysis #148

Background

Core question

Research tasks

Goal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: optimal LLM prompt structure for network traffic analysis #148

Description

Background

Core question

Research tasks

Goal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions