Skip to content

Revamp Kafka #831

@lars-t-hansen

Description

@lars-t-hansen

At the moment, sonalyze takes (optionally) a single -kafka argument whose value is a broker address. It will then ingest data over an unencrypted channel from that broker, and then commit the data it has ingested (making them unavailable for other clients). This was fine for initial work but we may want something better. In particular, we may want to ingest data from multiple brokers, perhaps partitioned by cluster, and we may want to not commit the data when we do so because there may be other listeners. As a case in point, we currently have two brokers on naic-monitor, one receiving data for mlx and fox and the other for the sigma2 systems. The mlx/fox data are being ingested into a timeseries db by a different listener, and are currently unavailable to sonalyze because it is listening only to the other broker. But with the ability to listen to multiple brokers, and more importantly not to commit the data as it is read, the sonalyze database could continue to serve data also for mlx/fox, which is helpful in this phase as we're migrating to the other backend and dashboard.

I'm envisioning that -kafka can be specified several times and that its value is more complex. Consider:

-kafka localhost:1234+cluster:mlx.hpc.uio.no/nocommit+cluster:fox.educloud.no/nocommit -kafka localhost:2468

where the last value is for clusters not previously mentioned and implicitly there's always a commit (or we modify the syntax to allow /nocommit there too).

We don't want to spend a ton of time on this, so if it looks like a boondoggle, don't do it - it's a transitional technology.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions