Conversation
| - `required_lead_time_ms`: integer - minimum startup lead time in milliseconds (e.g., codec init, decode warmup, audio backend buffering, DAC latency). Measure this from server transmit time of the start/restart trigger message ([`stream/start`](#server--client-streamstart) or [`stream/clear`](#server--client-streamclear)) to the timestamp of the first subsequent audio chunk. | ||
| - `min_buffer_ms`: integer - requested minimum ongoing buffer duration in milliseconds during playback (primarily for live streams), used to absorb network jitter and continuous-playback pipeline delays. |
There was a problem hiding this comment.
I'm a bit conflicted on these being statically set in the hello message (I know the PR mentions we make this assumption). In ESPHome, the very first run has at least double the lead time whereas subsequent lead times are much shorter (though this depends on the exact configuration, so it's tricky). For the min buffer, this could easily change on a cell connection at various points in time depending on signal strength. This one seems a lot harder for the server to dynamically update with though if we even had some mechanism to update it.
Should we consider allowing the hello message to be sent again in the middle of a live connection to update values (this has also come up for devices that want to switch between allowing volume control from the server or not; e.g., you plug a VPE into an amp and want the amp to handle the reset so you disable volume control on the server.
There was a problem hiding this comment.
I've been thinking more about this. I can't come up with a way, at least in the ESPHome implementation, to actually determine these numbers. At best, I can make an educated case, but some of it will depend on the chip (ESP32 vs ESP32S3) and how many other things are running (microWakeWord/voice assistant stuff) and that makes it even harder to get good values that aren't overly conservative.
However, it is easy to determine this stuff empirically if we play audio once or have the connection open. I can just measure directly and add in a small margin. The client can also easily get an estimate for the latency from the time messages by just considering worst-case RTT (or something like 95th percentile).
So I don't know how to best do this in the hello message! I could save these details to the flash, but I don't know if I trust network latency to always be consistent day to day/across reboots. I don't know how a phone app could do it well either based on historical values... what if you are connected to local wifi vs on a cellular network
e9f9964 to
f5d7e32
Compare
There were 2 major gaps in the
player@v1role this PR aims to solve:Both issues are resolved by adding
required_lead_time_msandmin_buffer_msto theplayer@v1_supportobject. The server is then responsible for ensuring that both constraints are respected.Network latency is factored into
required_lead_time_msandmin_buffer_msby the player, also allowing for high-latency clients (like a player on a mobile network).Limitations
Network latency is not measured and remains static throughout the lifetime of the connection.
While this is sufficient for almost all real-world scenarios, including mobile-network players, it is still a minor limitation.