Skip to content

feat(flask): Add span streaming support and request body capture#6264

Draft
ericapisani wants to merge 6 commits into
masterfrom
py-2323-migrate-flask
Draft

feat(flask): Add span streaming support and request body capture#6264
ericapisani wants to merge 6 commits into
masterfrom
py-2323-migrate-flask

Conversation

@ericapisani
Copy link
Copy Markdown
Member

@ericapisani ericapisani commented May 13, 2026

Add span streaming support to the Flask integration. When span streaming is enabled via _experiments={"trace_lifecycle": "stream"}, the integration now captures request body data as an attribute on the segment span.

The request body handling respects existing size limits and content type rules — oversized bodies get the substitution marker, unparseable raw data gets annotated accordingly (different from what we do today where we remove the value entirely).

Fixes PY-2323
Fixes #6021

Add request body data to the streaming segment span when span streaming is enabled. Update existing tests to cover both streaming and non-streaming code paths, and add new tests for request body capture, size limits, and header scrubbing in streaming mode.

Fixes PY-2323
Fixes #6021
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 13, 2026

PY-2323

@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

@ericapisani
Copy link
Copy Markdown
Member Author

@sentry review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Codecov Results 📊

2183 passed | ⏭️ 167 skipped | Total: 2350 | Pass Rate: 92.89% | Execution Time: 4m 36s

All tests are passing successfully.

❌ Patch coverage is 5.88%. Project has 13007 uncovered lines.

Files with missing lines (3)
File Patch % Lines
starlette.py 5.44% ⚠️ 365 Missing
flask.py 10.53% ⚠️ 153 Missing
_wsgi_common.py 32.12% ⚠️ 93 Missing

Generated by Codecov Action

Comment thread sentry_sdk/integrations/flask.py Outdated
Comment thread sentry_sdk/integrations/flask.py Outdated
extractor = FlaskRequestExtractor(request)

if not request_body_within_bounds(client, content_length):
data = AnnotatedValue.substituted_because_over_size_limit()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general logic mostly follows what happens with the request extractor, but Instead of removing the string entirely, it substitutes it with a value to explain why a value was removed, as opposed to providing only an empty string.

@ericapisani ericapisani marked this pull request as ready for review May 14, 2026 13:55
@ericapisani ericapisani requested a review from a team as a code owner May 14, 2026 13:55
Copy link
Copy Markdown
Contributor

@alexander-alderman-webb alexander-alderman-webb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the request body attribute since I'm solving this in Starlette right now, and the same problem applies here 😅

Comment on lines +167 to +169
client = sentry_sdk.get_client()
if has_span_streaming_enabled(client.options):
_set_request_body_data_on_streaming_segment(request, client)
Copy link
Copy Markdown
Contributor

@alexander-alderman-webb alexander-alderman-webb May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are issues with eagerly consuming the request body when the user also tries to consume the raw stream in their handler. My flask route defined in

from flask import Flask, request
import sentry_sdk

app = Flask(__name__)


sentry_sdk.init(
    dsn="...",
    debug=True,
    traces_sample_rate=1.0,
    _experiments = {
        "trace_lifecycle": "stream",
    }
)

@app.post("/raw-wsgi")
def raw_wsgi():
    content_length = request.environ.get("CONTENT_LENGTH")

    body = request.environ["wsgi.input"].read()

    return {
        "size": len(body),
        "body": body.decode("utf-8", errors="replace"),
    }


if __name__ == "__main__":
    app.run(port=5000, debug=True)

hangs when I run

curl -X POST http://127.0.0.1:5000/raw-wsgi \
  -H 'Content-Type: text/plain' \
  --data-binary 'hello from curl'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few changes in commits c90dc55 and b7c0ec6 which moves the setting of the http.request.body.data attribute to within the request_finished Flask signal (as opposed to the request_started one that the changes were originally in).

Just a heads up though: it looks like the example you gave will hang even without the sentry SDK because there's no "end" of the request defined through either the content length or transfer encoding header that the request.environ["wsgi.input"].read() requires in the curl request (so the read blocks forever on an EOF that never comes).

@ericapisani ericapisani marked this pull request as draft May 15, 2026 13:57
Comment thread sentry_sdk/integrations/flask.py Outdated
@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run
@sentry review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b7c0ec6. Configure here.

Comment thread sentry_sdk/integrations/_wsgi_common.py
@@ -88,17 +95,17 @@ def setup_once() -> None:

before_render_template.connect(_add_sentry_trace)
request_started.connect(_request_started)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Content-Length header bypasses max_request_body_size in streaming span body capture

When Content-Length is absent or zero, request_body_within_bounds(client, 0) returns True for any limit except "never", so _set_request_body_data_on_streaming_segment falls through to call request.get_data() without any size enforcement, potentially reading an unbounded body into memory.

Verification

Line 97 registers _request_finished which calls _set_request_body_data_on_streaming_segment. That function computes content_length = int(request.content_length or 0) — when no Content-Length header is present this is 0. request_body_within_bounds(client, 0) with e.g. bodies='small': not (False or (True and 0 > 1000) or False) = True, so the bounds check passes. The code then checks for cached form data and _cached_data; if neither exists (e.g. route read via wsgi.input directly), it calls request.get_data() at line ~209 with no size cap. Flask's own max_content_length can mitigate this only if explicitly configured. The starlette integration avoids this by short-circuiting with if not content_length: return request_info before attempting a read.

Identified by Warden find-bugs · HMR-ZFF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate flask to span first

2 participants