feat(flask): Add span streaming support and request body capture#6264
feat(flask): Add span streaming support and request body capture#6264ericapisani wants to merge 6 commits into
Conversation
Add request body data to the streaming segment span when span streaming is enabled. Update existing tests to cover both streaming and non-streaming code paths, and add new tests for request body capture, size limits, and header scrubbing in streaming mode. Fixes PY-2323 Fixes #6021
|
bugbot run |
|
@sentry review |
Codecov Results 📊✅ 2183 passed | ⏭️ 167 skipped | Total: 2350 | Pass Rate: 92.89% | Execution Time: 4m 36s All tests are passing successfully. ❌ Patch coverage is 5.88%. Project has 13007 uncovered lines. Files with missing lines (3)
Generated by Codecov Action |
| extractor = FlaskRequestExtractor(request) | ||
|
|
||
| if not request_body_within_bounds(client, content_length): | ||
| data = AnnotatedValue.substituted_because_over_size_limit() |
There was a problem hiding this comment.
This general logic mostly follows what happens with the request extractor, but Instead of removing the string entirely, it substitutes it with a value to explain why a value was removed, as opposed to providing only an empty string.
alexander-alderman-webb
left a comment
There was a problem hiding this comment.
I just looked at the request body attribute since I'm solving this in Starlette right now, and the same problem applies here 😅
| client = sentry_sdk.get_client() | ||
| if has_span_streaming_enabled(client.options): | ||
| _set_request_body_data_on_streaming_segment(request, client) |
There was a problem hiding this comment.
There are issues with eagerly consuming the request body when the user also tries to consume the raw stream in their handler. My flask route defined in
from flask import Flask, request
import sentry_sdk
app = Flask(__name__)
sentry_sdk.init(
dsn="...",
debug=True,
traces_sample_rate=1.0,
_experiments = {
"trace_lifecycle": "stream",
}
)
@app.post("/raw-wsgi")
def raw_wsgi():
content_length = request.environ.get("CONTENT_LENGTH")
body = request.environ["wsgi.input"].read()
return {
"size": len(body),
"body": body.decode("utf-8", errors="replace"),
}
if __name__ == "__main__":
app.run(port=5000, debug=True)hangs when I run
curl -X POST http://127.0.0.1:5000/raw-wsgi \
-H 'Content-Type: text/plain' \
--data-binary 'hello from curl'There was a problem hiding this comment.
I've added a few changes in commits c90dc55 and b7c0ec6 which moves the setting of the http.request.body.data attribute to within the request_finished Flask signal (as opposed to the request_started one that the changes were originally in).
Just a heads up though: it looks like the example you gave will hang even without the sentry SDK because there's no "end" of the request defined through either the content length or transfer encoding header that the request.environ["wsgi.input"].read() requires in the curl request (so the read blocks forever on an EOF that never comes).
… of just as the request is getting started
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b7c0ec6. Configure here.
| @@ -88,17 +95,17 @@ def setup_once() -> None: | |||
|
|
|||
| before_render_template.connect(_add_sentry_trace) | |||
| request_started.connect(_request_started) | |||
There was a problem hiding this comment.
Missing Content-Length header bypasses max_request_body_size in streaming span body capture
When Content-Length is absent or zero, request_body_within_bounds(client, 0) returns True for any limit except "never", so _set_request_body_data_on_streaming_segment falls through to call request.get_data() without any size enforcement, potentially reading an unbounded body into memory.
Verification
Line 97 registers _request_finished which calls _set_request_body_data_on_streaming_segment. That function computes content_length = int(request.content_length or 0) — when no Content-Length header is present this is 0. request_body_within_bounds(client, 0) with e.g. bodies='small': not (False or (True and 0 > 1000) or False) = True, so the bounds check passes. The code then checks for cached form data and _cached_data; if neither exists (e.g. route read via wsgi.input directly), it calls request.get_data() at line ~209 with no size cap. Flask's own max_content_length can mitigate this only if explicitly configured. The starlette integration avoids this by short-circuiting with if not content_length: return request_info before attempting a read.
Identified by Warden find-bugs · HMR-ZFF
Add span streaming support to the Flask integration. When span streaming is enabled via
_experiments={"trace_lifecycle": "stream"}, the integration now captures request body data as an attribute on the segment span.The request body handling respects existing size limits and content type rules — oversized bodies get the substitution marker, unparseable raw data gets annotated accordingly (different from what we do today where we remove the value entirely).
Fixes PY-2323
Fixes #6021