Skip to content

add proposal/discussion on capacity planning#42

Open
florolf wants to merge 1 commit into
transparency-dev:mainfrom
florolf:capacity-notes
Open

add proposal/discussion on capacity planning#42
florolf wants to merge 1 commit into
transparency-dev:mainfrom
florolf:capacity-notes

Conversation

@florolf
Copy link
Copy Markdown

@florolf florolf commented May 11, 2026

Proposal as posted on Matrix + my takeaway from the discussion with @rgdd, who asked me to put this into a PR for posterity.

I only wrote the latter up today and didn't take any notes while it was still fresh in my mind, so I might be misrepresenting stuff or leaving something important out. Sorry about that! Feel free to correct me/propose changes.

@rgdd
Copy link
Copy Markdown
Collaborator

rgdd commented May 18, 2026

Thanks for thinking about this and documenting @florolf.

Write up is nice, main thing I'd like to add/comment on wrt. conclusion:

  • Seems like 100qps might have been an unnecessarily big jump, which, e.g., have
    made it difficult for some (potential) witness operators to configure it.
  • When doing some napkin math, the current 10qps list would likely be able to
    accomodate a lot of the "long tail" / lower-frequency logs; and perhaps one or
    two high-profile ones with higher qps like Go's checksum database.
  • From CT, we're probably expecting something like 10 qps.
  • From MTC, we're probably talking about a qps in the same ballpark (?)
  • We don't have that many other high-qps logs right now, and having something
    like 10qps reserved for that will probably serve us well for some time.
  • So if it increases the number of participating witnesses, then it might be a
    better trade-off to have several 10qps lists (.2, .3) where we basically have
    one which is the "longer tail one" and another which is the "higher-qps one".
    And the "higher qps-one" we expect to fill up a bit quicker, and when it's
    full we will create another one. Or maybe we should even create multipe ones
    right away, and witnesses configure as many as they can even though, e.g., .3
    is not being populated quite yet?
  • Working on defining tombstone for proper deallocation = worth while to do
    soon since CT is interested in taking part (and sharding is frequent there).

Feel free to copy-paste the above into the PR at the end, imo no need to spend more cycles on polishing as long as we get the notes persisted in a way that can be followed.

WDYT, would this be good steps to discuss with al and filippo:

  • How about removing 100qps list, and instead doing 10qps-1klogs.1, 10qps-1klogs.2, 10qps-1klogs.3? With the idea that we steer CT, MTC, other higher-qps logs in this direction. While the long tail of low-frequency logs and multiplexed logs are steered to the current 10qps list (which we think will have enough capacity for quite a while).
  • Define tombstone, use this in the witness network to reliably remove logs without making the maintainers juicy targets for attacks wrt. DoS:ing already configured logs.

@florolf
Copy link
Copy Markdown
Author

florolf commented May 19, 2026

Thanks for doing another pass over this, I've added your notes to the document.

Define tombstone

I think picking that thread up again would be very much worthwhile now that people are picking up tlog-tiles/tlog-witness in a number of places. Would you say that would be a new spec, or part of tlog-checkpoint or tlog-witness? Happy to discuss this some more, but maybe this PR is not the best place.

How about removing 100qps list, and instead doing

I'm still not sure what we should do with the things that are currently on that list. It's currently nominally at 11qps. How would you split it up? I'm still not sure if organizing (and perhaps naming) the smaller lists by "area" (CT vs "long tail", for example) is a good thing (because it allows witness operators some choice in what to support with their resources) or a bad thing (for the same reason, given that the WN is supposed to steer capacity).

@rgdd
Copy link
Copy Markdown
Collaborator

rgdd commented May 19, 2026

I think picking that thread up again would be very much worthwhile now that
people are picking up tlog-tiles/tlog-witness in a number of places. Would you
say that would be a new spec, or part of tlog-checkpoint or tlog-witness?

I think it depends a bit on what we want the spec to say. But after refreshing
my mind on this topic together with Nisse, I think the sketch is something like:

  • Witness has an HTTP POST /add-tombstone endpoint
  • Tombstone is defined to have semantics similar to "I will never cosign a
    tree for log origin $ORIGIN with size larger than $SIZE".
  • Tombstone is signed by the log when sending the request
  • Tombstone is cosigned by the witness and returns a cosignature on it
  • We don't recall hashing out distribution of the tombstone, but it might be a
    good idea to return it in the same place where the log usually publishes its
    checkpoint. So that monitors easily discover the log is 'done', and that
    witnesses agree that the log is 'done' so it's safe to stop monitoring it.

To me this suggests tlog-witness might be a good place to define the API
endpoint. Maybe a separate spec for tlog-tombstone (semantics similar to
tlog-cosignature), or maybe it fits in tlog-cosignature since it's afterall
about cosignature semantics. The tombstone itself is probably a signed note,
and similar to a checkpoint but with slightly different semantics. So maybe
this could also fit in tlog-checkpoint. Or just tlog-tombstone.

Step one is probably to write a proposal so we can get people involved in the
discussion, and once we have something we think is good get it defined in the
appropriate places?

Motivation of the proposal:

  • Can safely get monitors to stop downloading a log that's done
  • Can safely de-allocate witnesses from the witness network's log lists

I asked nisse if we had any good notes on this, and he said there might be an
issue somewhere in sigsum/project/documentation; and with some grep there might
be some adhoc meeting minutes. But no proper docs / proposal was typed up. And
it was never fleshed out more than just a 'ah we think we will need this later'.

Happy to discuss this some more, but maybe this PR is not the best place.

Agreed -- let's start by fleshing out an initial proposal in a pad?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants