Release 1.19 lyft by maheepm-lyft · Pull Request #83 · lyft/flink

maheepm-lyft · 2026-01-29T16:20:31Z

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…etimes

(cherry picked from commit 95163e8)

…mentation (apache#24355)

…column expansion

…ob is suspended during Restarting phase

suspend and cancel reset the ExecutionGraph in a similar way. I move the common logic into its own method to make this more prominent in the code.

…out for AdaptiveSchedulerTest (apache#24400)

This closes apache#24403.

…Builder.cleanupInRocksdbCompactFilter

…3.11 wheel package This closes apache#24439.

…iptorGroup out of the RPC main thread]" This reverts commit d18a4bf. (cherry picked from commit 7a709bf)

…hod construct resource has exception. This closes apache#24462.

…n.time.Time

…ogs when the task exited. (apache#23922)

…used by multiple writes to the same sink table and shared staging directory This closes apache#24492 * Fix unstable TableSourceITCase#testTableHintWithLogicalTableScanReuse * Moves the staging dir configuration into builder for easier testing --------- Co-authored-by: Matthias Pohl <matthias.pohl@aiven.io> (cherry picked from commit 7d0111d)

…lerTest.testOnReadBufferRequestError

This closes apache#24505

… as stable in docs This closes apache#24518

…ls.sh` script attempt to retrieve the Java Home. This closes apache#24527.

This closes apache#24547

This closes apache#24514

… literal without seconds (apache#26555)

…in adaptive scheduler Also enable this strategy by default via the introduced config option

…on via `CompiledPlan`

Co-authored-by: Matthias Pohl <github@mapohl.com>

…in GHA (apache#26593)

…sabled for all connections unexpectedly

…Y type by evaluating Unsafe.arrayBaseOffset(byte[].class) in TM rather than in JM (apache#26592) Fix HashPartitioner codegen for BINARY/VARBINARY type by evaluating BYTE_ARRAY_BASE_OFFSET in TM instead of JM. The issue is, if JM memory is set > 32G while TM memory is set < 32G, this causes JVM to treat the JAVA process > 32G as large heap JVM. This can impact Unsafe behavior. For eg: UNSAFE.arrayBaseOffset(byte[].class) will return 24 for large heap JVM while 16 for others. Due to this, the tasks that run on TM (<32 G while JM > 32G or vice versa) that try to read the byte[] for MurmurHash read wrong memory locations. Signed-off-by: Jiangjie (Becket) Qin <becket.qin@gmail.com>

…XPLAIN

…es in batch mode (apache#27016) In apache#26433, we removed the EOI marker in the form of Long.MAX_VALUE as the checkpoint id. Since streaming pipelines can continue to checkpoint even after their respective operators have been shut down, it is not safe to use a constant as this can lead to duplicate commits. However, in batch pipelines we only have one commit on job shutdown. Using any checkpoint id should suffice in this scenario. Any pending committables should be processed by the ComitterOperator when the operator shuts down. No further checkpoints will take place. There are various connectors which rely on this behavior. I don't see any drawbacks from keeping this behavior for batch pipelines.

If a resource is lazily created in open, we can only close after checking for null. Otherwise a failure during initialization will trigger secondary failures.

…eckpoint notification is delayed (apache#27157)

Relax Beam dependency

The build-system requires in pyproject.toml still had the upstream range <2.49.0 while setup.py was already relaxed to <2.62.0. This causes macOS sdist builds to fail because no Beam version in the 2.43–2.49 range is available on Artifactory. Matches the relaxation already applied in setup.py (commit 7c2ca50).

1996fanrui and others added 30 commits February 20, 2024 22:21

[FLINK-34336][test] Fix the bug that AutoRescalingITCase may hang som…

c91029b

…etimes

[FLINK-34202][python] Optimize Python nightly CI time (apache#24321)

a25fca9

(cherry picked from commit 95163e8)

[FLINK-34479][documentation] Fix missed changelog configs in the docu…

d19e886

…mentation (apache#24355)

[FLINK-34476][table-planner] Consider assignment operator during TVF …

f21ee01

…column expansion

[FLINK-34496] Break circular dependency in static initialization

dd77ee5

[FLINK-34265][doc] Add the doc of named parameters (apache#24377)

0af2540

[FLINK-34518][runtime] Fixes AdaptiveScheduler#suspend bug when the j…

d743ee3

…ob is suspended during Restarting phase

[hotfix][runtime] Refactors suspend and cancel logic

62d1b8f

suspend and cancel reset the ExecutionGraph in a similar way. I move the common logic into its own method to make this more prominent in the code.

[BP-1.19][FLINK-34274][runtime] Implicitly disable resource wait time…

3c04316

…out for AdaptiveSchedulerTest (apache#24400)

[FLINK-34498] GSFileSystemFactory should not log full Flink config

628ae78

[FLINK-34499] Configuration#toString hides sensitive values

5016325

[FLINK-33436][docs] Add the docs of built-in async-profiler

12ea64c

This closes apache#24403.

[FLINK-34522][core] Changing the Time to Duration for StateTtlConfig.…

161defe

…Builder.cleanupInRocksdbCompactFilter

[hotfix] Fix the StateTtlConfig#newBuilder doc from Time to Duration

697d2b6

[Hotfix] Fix Duration class can't load for pyflink

7618bde

[FLINK-34582][realse][python] Updates cibuildwheel to support cpython…

fa738bb

…3.11 wheel package This closes apache#24439.

Revert "[FLINK-33532][network] Move the serialization of ShuffleDescr…

837f8e5

…iptorGroup out of the RPC main thread]" This reverts commit d18a4bf. (cherry picked from commit 7a709bf)

[FLINK-34616][python] Fix python dist dir doesn't clean when open met…

75c88fa

…hod construct resource has exception. This closes apache#24462.

[FLINK-34622][docs] fix typo in execution_mode.md

0a85a08

[FLINK-34617][docs] Correct the Javadoc of org.apache.flink.api.commo…

c6d96b7

…n.time.Time

[FLINK-33798][statebackend/rocksdb] automatically clean up rocksdb l…

943d9a4

…ogs when the task exited. (apache#23922)

[FLINK-34571][test] Fix flaky test SortMergeResultPartitionReadSchedu…

4f7f6a9

…lerTest.testOnReadBufferRequestError

[FLINK-34593][release] Add release note for version 1.19

511814b

This closes apache#24505

[FLINK-34716][release] Build 1.19 docs in GitHub Action and mark 1.19…

a6a4667

… as stable in docs This closes apache#24518

[hotfix] Fix maven property typo in root pom.xml

b6ba252

[FLINK-34725][dist] Fix wrong config file dir when `config-parser-uti…

f53c562

…ls.sh` script attempt to retrieve the Java Home. This closes apache#24527.

[FLINK-34706][docs] Deprecates 1.17 docs.

8fd8b3a

This closes apache#24547

Update japicmp configuration for 1.19.0

6eeae5f

This closes apache#24514

[hotfix][tests] Migrate TransformationTest to Junit5

94c3261

autophagy and others added 30 commits May 15, 2025 16:17

[FLINK-37803][table] Fix SQL serialization when using LocalTime value…

62f6c0f

… literal without seconds (apache#26555)

[FLINK-33977][runtime] Support minimize TM number during downscaling …

8348f7d

…in adaptive scheduler Also enable this strategy by default via the introduced config option

[FLINK-37820][table-planner] Support AsyncScalarFunction registrati…

e9ff7bd

…on via `CompiledPlan`

[FLINK-34487][ci] Adds Python Wheels nightly GHA workflow

0b1b88b

Co-authored-by: Matthias Pohl <github@mapohl.com>

[FLINK-34582] Updates cibuildwheel to support cpython 3.11 wheel package

ecbacfd

[FLINK-37804][python][build] Fix build mac wheels error on Python3.8 …

858008f

…in GHA (apache#26593)

[hotfix] Bump PyFlink grpcio package to fix build

278782e

[hotfix] Adapt docs and checks to bumped grpcio version

99c4055

[FLINK-37870][checkpoint] Fix the bug that unaligned checkpoint is di…

b55bba0

…sabled for all connections unexpectedly

[hotfix] Exclude numpy 2.3.0 in PyFlink, cause it fails wheel build

5951536

[FLINK-37946][backport][doc] fix sql syntax of page Table API & SQL E…

15c5517

…XPLAIN

[hotfix][docs] Update docs to the latest 1.19 version

dd865ac

[FLINK-38092][release] Bump japicmp configuration post 1.19.3

e40bdbd

[FLINK-38143][python] Fix pyflink flat YAML based config support

e92ba2d

[FLINK-35556] Fix constant in RocksDBSharedResourcesFactory

914c1a4

[FLINK-35556] Harden RocksDBSharedResourcesFactoryTest

9ecbcad

[FLINK-38486] Harden shutdown of system UDFs

16a9d2f

If a resource is lazily created in open, we can only close after checking for null. Otherwise a failure during initialization will trigger secondary failures.

[FLINK-38574][checkpoint] Avoid reusing re-uploaded sst files when ch…

bd916bd

…eckpoint notification is delayed (apache#27157)

bruh

9c2c431

python version in cfg

6f29508

bruh

6e2582e

more setup.py stuff

0e3f9fd

change beam version

7c2ca50

Merge pull request #84 from lyft/release-1.19-lyft-upgradetest

b5594be

Relax Beam dependency

bump min python version

e381b49

bump beam version in pom

3d2d9f7

bump dev-requirements, revert pom.xml

b133ea9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 1.19 lyft#83

Release 1.19 lyft#83
maheepm-lyft wants to merge 2361 commits intorelease-1.17-lyftfrom
release-1.19-lyft

maheepm-lyft commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

maheepm-lyft commented Jan 29, 2026

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants