Make async backend queue size configurable
Summary
The PHP agent's async backend communication appears to use a fixed internal queue size with no public configuration option to increase it.
In our CLI workload, events are produced faster than the background sender can flush them to APM Server. Once that queue fills up, later events are rejected locally and never arrive at the server. In agent logs we see:
[ERROR] [Backend-Comm] [backend_comm.cpp:1187] [enqueueEventsToSendToApmServer] Already queued events are above max queue
[ERROR] [Backend-Comm] [backend_comm.cpp:1220] [enqueueEventsToSendToApmServer] Exiting...; resultCode: resultFailure (6)
[ERROR] [Ext-API] [elastic_apm_API.cpp:258] [elasticApmSendToServer] Exiting...; resultCode: resultFailure (6)
From reading the source, this seems related to the internal queue size constant ELASTIC_APM_MAX_QUEUE_SIZE_IN_BYTES in backend_comm.cpp.
Environment
- Ubuntu on WSL2
- PHP 7.3
- Elastic APM PHP agent 1.15.1
- APM Server 8.12.1
- CLI SAPI
async_backend_comm = true
transaction_max_spans = 10000
- Single long-running CLI process handling about 600 transactions in one run
Workload
Each loop iteration creates one transaction with custom spans.
The extension also auto-instruments many DB calls for us, so each transaction generates a relatively large number of events.
The configured span limit is transaction_max_spans = 10000, but the actual observed span count is much lower than that. In practice, a single transaction is usually around 1000 spans, give or take, and a single CLI process run handles about 600 such transactions.
Observed behavior
- With
async_backend_comm = true, only the earlier transactions in the process are delivered reliably.
- After the internal queue fills up, later events are rejected and do not reach APM Server.
- In our tests, only about 40% of transactions arrived when processing a large batch in one process.
- With
async_backend_comm = false, all transactions are delivered, but total runtime is about 2x slower.
Expected behavior
There should be a supported way to tune the async send queue/buffer size for workloads that produce bursts of APM events, especially for CLI jobs and other long-running processes.
Request
Please add a public configuration option for the async backend queue size, for example:
elastic_apm.max_send_queue_size = 100MB
The exact option name and units are up to you. The important part is having a supported way to raise the queue limit when the default is too small for a given workload.
If a configurable byte-based queue is not the preferred design, an equivalent supported setting would still solve the operational problem.
Why this matters
Right now the only practical workaround we found is:
elastic_apm.async_backend_comm = false
That avoids the queue entirely, but it also makes the CLI job significantly slower and couples job latency to APM Server responsiveness.
Other Elastic APM agents already expose queue/buffer sizing controls. For example, Java has max_queue_size and Go has ELASTIC_APM_API_BUFFER_SIZE. I could not find an equivalent setting in the PHP agent documentation.
Additional context
- This seems related to, but not the same as, #836.
- If useful, I can provide a reduced reproduction script and full debug logs.
Make async backend queue size configurable
Summary
The PHP agent's async backend communication appears to use a fixed internal queue size with no public configuration option to increase it.
In our CLI workload, events are produced faster than the background sender can flush them to APM Server. Once that queue fills up, later events are rejected locally and never arrive at the server. In agent logs we see:
From reading the source, this seems related to the internal queue size constant
ELASTIC_APM_MAX_QUEUE_SIZE_IN_BYTESinbackend_comm.cpp.Environment
async_backend_comm = truetransaction_max_spans = 10000Workload
Each loop iteration creates one transaction with custom spans.
The extension also auto-instruments many DB calls for us, so each transaction generates a relatively large number of events.
The configured span limit is
transaction_max_spans = 10000, but the actual observed span count is much lower than that. In practice, a single transaction is usually around 1000 spans, give or take, and a single CLI process run handles about 600 such transactions.Observed behavior
async_backend_comm = true, only the earlier transactions in the process are delivered reliably.async_backend_comm = false, all transactions are delivered, but total runtime is about 2x slower.Expected behavior
There should be a supported way to tune the async send queue/buffer size for workloads that produce bursts of APM events, especially for CLI jobs and other long-running processes.
Request
Please add a public configuration option for the async backend queue size, for example:
elastic_apm.max_send_queue_size = 100MBThe exact option name and units are up to you. The important part is having a supported way to raise the queue limit when the default is too small for a given workload.
If a configurable byte-based queue is not the preferred design, an equivalent supported setting would still solve the operational problem.
Why this matters
Right now the only practical workaround we found is:
elastic_apm.async_backend_comm = falseThat avoids the queue entirely, but it also makes the CLI job significantly slower and couples job latency to APM Server responsiveness.
Other Elastic APM agents already expose queue/buffer sizing controls. For example, Java has
max_queue_sizeand Go hasELASTIC_APM_API_BUFFER_SIZE. I could not find an equivalent setting in the PHP agent documentation.Additional context