Skip to content

Make async backend queue size configurable #1441

@githoober

Description

@githoober

Make async backend queue size configurable

Summary

The PHP agent's async backend communication appears to use a fixed internal queue size with no public configuration option to increase it.

In our CLI workload, events are produced faster than the background sender can flush them to APM Server. Once that queue fills up, later events are rejected locally and never arrive at the server. In agent logs we see:

[ERROR] [Backend-Comm] [backend_comm.cpp:1187] [enqueueEventsToSendToApmServer] Already queued events are above max queue
[ERROR] [Backend-Comm] [backend_comm.cpp:1220] [enqueueEventsToSendToApmServer] Exiting...; resultCode: resultFailure (6)
[ERROR] [Ext-API] [elastic_apm_API.cpp:258] [elasticApmSendToServer] Exiting...; resultCode: resultFailure (6)

From reading the source, this seems related to the internal queue size constant ELASTIC_APM_MAX_QUEUE_SIZE_IN_BYTES in backend_comm.cpp.

Environment

  • Ubuntu on WSL2
  • PHP 7.3
  • Elastic APM PHP agent 1.15.1
  • APM Server 8.12.1
  • CLI SAPI
  • async_backend_comm = true
  • transaction_max_spans = 10000
  • Single long-running CLI process handling about 600 transactions in one run

Workload

Each loop iteration creates one transaction with custom spans.

The extension also auto-instruments many DB calls for us, so each transaction generates a relatively large number of events.

The configured span limit is transaction_max_spans = 10000, but the actual observed span count is much lower than that. In practice, a single transaction is usually around 1000 spans, give or take, and a single CLI process run handles about 600 such transactions.

Observed behavior

  • With async_backend_comm = true, only the earlier transactions in the process are delivered reliably.
  • After the internal queue fills up, later events are rejected and do not reach APM Server.
  • In our tests, only about 40% of transactions arrived when processing a large batch in one process.
  • With async_backend_comm = false, all transactions are delivered, but total runtime is about 2x slower.

Expected behavior

There should be a supported way to tune the async send queue/buffer size for workloads that produce bursts of APM events, especially for CLI jobs and other long-running processes.

Request

Please add a public configuration option for the async backend queue size, for example:

elastic_apm.max_send_queue_size = 100MB

The exact option name and units are up to you. The important part is having a supported way to raise the queue limit when the default is too small for a given workload.

If a configurable byte-based queue is not the preferred design, an equivalent supported setting would still solve the operational problem.

Why this matters

Right now the only practical workaround we found is:

elastic_apm.async_backend_comm = false

That avoids the queue entirely, but it also makes the CLI job significantly slower and couples job latency to APM Server responsiveness.

Other Elastic APM agents already expose queue/buffer sizing controls. For example, Java has max_queue_size and Go has ELASTIC_APM_API_BUFFER_SIZE. I could not find an equivalent setting in the PHP agent documentation.

Additional context

  • This seems related to, but not the same as, #836.
  • If useful, I can provide a reduced reproduction script and full debug logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions