Skip to content

Rendering Performance Improvements#566

Merged
lessthanjacob merged 12 commits intomainfrom
js/performance-improvements
Apr 9, 2026
Merged

Rendering Performance Improvements#566
lessthanjacob merged 12 commits intomainfrom
js/performance-improvements

Conversation

@lessthanjacob
Copy link
Copy Markdown
Contributor

@lessthanjacob lessthanjacob commented Jan 31, 2026

Background

#562 has surfaced a clear regression in recent versions of the library:

Metric v0.10.0 v1.2.1 (current) Change
Speed (i/s) 5.16 1.92 2.7x slower
Memory 26.7 MB 112 MB 4.2x more

This has been compounding for awhile, and there are two core contributing factors:

  • The "structure" of a given view is built each time that view is used, and we've been adding more processing in this path over time.
  • During the render process, we're allocating a lot extra Hash instances as a side-effect of managing and passing around local_options.

While there are certainly a lot of opportunities for rework here, this PR aims to improve performance while avoiding too significant of a refactor (with the understanding that we'd like to put more emphasis on moving to a V2 version of the library in the near future).

Changelog

  • Introduce a frozen-snapshot caching mechanism for ViewCollection to avoid recalculating the structure of each view at render time.
    • On first access, all views are compiled into a single, deeply frozen @cache hash. Subsequent reads are a nil check on a single reference (no lock, no computation).
    • Cache is invalidated on inherit (subclassed blueprints) and when a new view is accessed for the first time (dynamic view creation).
  • Merge view into local_options once per render call, instead of every time a field is extracted. Freeze the merged hash to prevent accidental mutation across iterations.
  • Only allocate a new options_without_default Hash in AssociationExtractor if :default or :default_if keys are present.
  • Memoize one instance of the configured default extractor via Configuration#default_extractor and reuse it for each Field.
  • Memoize one instance of AssociationExtractor per Blueprint and reuse it for each Association.
  • Extend Metrics/MethodLength to 20.

ViewCollection Caching

There were a few options considered for caching fields/transformers in ViewCollection:

  1. Build the full cache upfront the first time any view is accessed.
  2. Lazily build a per-view cache as needed.
  3. Use TracePoint to build the full cache when a Blueprint class definition ends.

Option 1 struck the best balance between consistency/safety and code simplicity, at the expense of negligible overhead when the first render occurs.

Option 2 incurs less overhead, but requires more complexity for consistent cache handling and thread-safety.

Option 3 puts the cache building in a spot where we wouldn't add any overhead to render calls, but is more complex and harder to reason about.

Thread Safety

The cache uses a frozen snapshot pattern: on first access, all view structures are compiled into a single, deeply frozen Hash ({ fields: { ... }.freeze, transformers: { ... }.freeze }.freeze) and assigned to @cache in a single reference write. A Mutex serializes the compilation itself to prevent duplicate work, but once @cache is populated, reads bypass the mutex entirely (just return if @cache).

This approach is safe across all Ruby engines (MRI, JRuby, TruffleRuby). The earlier implementation used a @finalized boolean flag with a double-checked lock, which is formally unsafe on JRuby: an unlocked read of a boolean ivar has no memory visibility guarantee under the JVM memory model. The frozen snapshot avoids this because object reference assignment is atomic, and the snapshot itself is immutable once published.

Introducing a library like concurrent-ruby could also work here, but given the simplicity of the use case, the current approach felt sufficient.

Benchmarks

Using the same benchmark in the linked issue:

Metric Before After Improvement
Speed 5.084 i/s 8.738 i/s +71.9%
Memory 76.6M 15.0M -80.4%
Allocations 929k ~100k -89.2%

Comparisons

Speed (Iterations/Second)

Rank Serializer Speed vs Top
1 Panko 21.4 i/s
2 as_json 19.4 i/s 1.11x slower
3 fast_jsonapi 13.2 i/s 1.62x slower
4 Alba 9.4 i/s 2.29x slower
5 Blueprinter 8.7 i/s 2.46x slower
6 Roar 2.8 i/s 7.78x slower
7 Grape Entity 2.5 i/s 8.45x slower
8 AMS 0.5 i/s 45.4x slower

Memory Usage

Rank Serializer Memory Objects vs Top
1 Blueprinter 15.0M 100k --
2 as_json 23.0M 270k 1.53x more
3 Panko 23.1M 270k 1.53x more
4 Alba 24.6M 140k 1.64x more
5 fast_jsonapi 25.2M 350k 1.68x more
6 Grape Entity 100.5M 1.06M 6.69x more
7 Roar 120.2M 801k 8.00x more
8 AMS 245.1M 2.99M 16.31x more

Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
…s in Rendering

Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
@lessthanjacob lessthanjacob requested review from a team and ritikesh as code owners January 31, 2026 03:40
@lessthanjacob lessthanjacob self-assigned this Jan 31, 2026
Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
@lessthanjacob lessthanjacob force-pushed the js/performance-improvements branch from a5ef505 to 4f5168c Compare January 31, 2026 03:41
Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
@lessthanjacob lessthanjacob force-pushed the js/performance-improvements branch from 7d52d4f to bbb86fa Compare January 31, 2026 03:48
jhollinger
jhollinger previously approved these changes Feb 6, 2026
@lessthanjacob lessthanjacob changed the title Rendering Performance Improvements [PERFG-31] Rendering Performance Improvements Feb 9, 2026
@lessthanjacob lessthanjacob changed the title [PERFG-31] Rendering Performance Improvements Rendering Performance Improvements Feb 9, 2026
…ance-improvements

Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
jhollinger
jhollinger previously approved these changes Mar 9, 2026
  Replace three cache ivars (@Finalized, @cached_fields_for,
  @cached_transformers) with a single @cache reference that is either nil
  or a deeply frozen Hash. This provides the same lazy compilation
  behavior with two improvements:

  - Thread safety on JRuby/TruffleRuby without relying on boolean
    visibility guarantees (atomic reference assignment + frozen payload)
  - Zero lock overhead on the hot path after first compilation
    (nil check before mutex instead of always acquiring)

  Also freeze local_options_with_view in Rendering#prepare_data to
  prevent accidental mutation of the shared options hash across
  iterations.

Signed-off-by: Jacob Sheehy <jacobjsheehy@gmail.com>
jhollinger
jhollinger previously approved these changes Mar 31, 2026
Copy link
Copy Markdown
Contributor

@jhollinger jhollinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lessthanjacob lessthanjacob merged commit 6f89b57 into main Apr 9, 2026
7 checks passed
@lessthanjacob lessthanjacob deleted the js/performance-improvements branch April 9, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants